Organization of clustering and correlation analytics #13

bentsherman · 2017-12-19T14:41:53Z

As I'm looking at how to implement mixture model clustering, I'm beginning to see a multi-stage pipeline with options at several points:

*.emx ---> clustering [---> ???] ---> correlation ---> *.cmx

clustering:
- none
- k-means
- GMM

correlation:
- Pearson
- Spearman
- ...

So I'm trying to figure out how to best implement this pipeline for the long-term. It looks like KINCv1 can combine clustering with any correlation method, with minimal duplication. Perhaps we will need to create a new data type for the "augmented" expression matrix? It would parallel the PairWiseClusterList from KINCv1. Then the clustering and correlation analytics could be kept separate and the user could simply use the pipeline illustrated above.

The text was updated successfully, but these errors were encountered:

4ctrl-alt-del · 2017-12-19T18:38:02Z

Great minds think alike :) For all the reasons you just stated, I am making a new data object that will hold all cluster data. Specifically it will hold how many clusters, if any, each gene pair has and for each cluster it will hold a sample mask describing the cluster. So in the future once we get mixture models into the new KINC correlation analytics will take two data object inputs... an emx and a ccm(cluster data object) and produce a cmx. Your mixture model analytic will take an emx as input and output a ccm. Workflow: *.emx --(mixture model analytic)--> *.ccm *.emx --(correlation analytic)--> *.cmx *.ccm --^ Once I release KINC version 3.1 it will have the new cluster matrix data type. So what you need to study is the expression matrix and cluster matrix data types because your analytic will be using those as input/output. Sincerely, Joshua Burns

…

On Tue, Dec 19, 2017 at 6:41 AM, Ben Shealy ***@***.***> wrote: As I'm looking at how to implement mixture model clustering, I'm beginning to see a multi-stage pipeline with options at several points: *.emx ---> clustering ---> correlation ---> *.cmx clustering: - none - k-means - GMM correlation: - Pearson - Spearman - ... So I'm trying to figure out how to best implement this pipeline for the long-term. It looks like KINCv1 can combine clustering with any correlation method, with minimal duplication. Perhaps we will need to create a new data type for the "augmented" expression matrix? It would parallel the PairWiseClusterList from KINCv1. Then the clustering and correlation analytics could be kept separate and the user could simply use the pipeline illustrated above. — You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub <#13>, or mute the thread <https://github.com/notifications/unsubscribe-auth/APD4uS5IUyOe-XrAk-k5tq7LlMjboEd0ks5tB8sxgaJpZM4RHD4Y> .

bentsherman · 2017-12-20T18:52:50Z

Ah, so that's how the *.ccm fits into all of this. Excellent!

feltus · 2017-12-21T05:54:47Z

This is some excellent communication! See their grand plan Ben? It's awesome. We gotta get past the default so we can use the ecosystem. Sent from my Verizon, Samsung Galaxy smartphone -------- Original message --------From: Ben Shealy <notifications@github.com> Date: 12/20/17 1:52 PM (GMT-05:00) To: SystemsGenetics/KINC <KINC@noreply.github.com> Cc: Subscribed <subscribed@noreply.github.com> Subject: Re: [SystemsGenetics/KINC] Organization of clustering and correlation analytics (#13) Ah, so that's how the *.ccm fits into all of this. Excellent! — You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub, or mute the thread. {"api_version":"1.0","publisher":{"api_key":"05dde50f1d1a384dd78767c55493e4bb","name":"GitHub"},"entity":{"external_key":"github/SystemsGenetics/KINC","title":"SystemsGenetics/KINC","subtitle":"GitHub repository","main_image_url":"https://cloud.githubusercontent.com/assets/143418/17495839/a5054eac-5d88-11e6-95fc-7290892c7bb5.png","avatar_image_url":"https://cloud.githubusercontent.com/assets/143418/15842166/7c72db34-2c0b-11e6-9aed-b52498112777.png","action":{"name":"Open in GitHub","url":"https://github.com/SystemsGenetics/KINC"}},"updates":{"snippets":[{"icon":"PERSON","message":"@bentsherman in #13: Ah, so that's how the *.ccm fits into all of this. Excellent!"}],"action":{"name":"View Issue","url":"#13 (comment)"}}}

bentsherman closed this as completed Dec 20, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Organization of clustering and correlation analytics #13

Organization of clustering and correlation analytics #13

bentsherman commented Dec 19, 2017 •

edited

Loading

4ctrl-alt-del commented Dec 19, 2017 via email

bentsherman commented Dec 20, 2017

feltus commented Dec 21, 2017 via email

Organization of clustering and correlation analytics #13

Organization of clustering and correlation analytics #13

Comments

bentsherman commented Dec 19, 2017 • edited Loading

4ctrl-alt-del commented Dec 19, 2017 via email

bentsherman commented Dec 20, 2017

feltus commented Dec 21, 2017 via email

bentsherman commented Dec 19, 2017 •

edited

Loading