Output representative expression profiles of the clusters #16

apcamargo · 2018-11-13T03:40:16Z

Hi Basel,

In many cases, it's very useful to use a prototypical expression profile of the clusters in downstream analysis (by measuring it's correlation to an external variable, for instance). In WGCNA, the eigengene of the modules are usually used for this purpose.

It would be useful if Clust could output some kind of representation of the expression profile of each cluster. It could be the eigengene, median expression for each sample, trimmed mean etc.

What do you think?

BaselAbujamous · 2018-11-14T14:58:52Z

Hi Antoni,

I agree. I usually use mean expression. Data is all available in the output for people to generate this, but it would be nice to provide them with that ready in a separate TSV file. I will consider this in future versions. I will keep this issue open until then.

P.S. Thanks a lot for your recent edits (fixing requirements in README and changing transparency of the plots). I have merged them and they will be part of the next version of the pip-installed package. I liked the idea. Thanks for the contribution.

Basel

apcamargo · 2018-11-15T19:15:10Z

Do you think the mean expression in each condition is a good option? I imagine that there won't be many outlier values (as the expression needs to be at least similar to the cluster profile), but I feel that the average isn't robust enough.

Using trimmed means or medians seems better to me (I might be mistaken). I don't know if the eigengene is robost to ouliers, but I think we can investigate it.

(You're welcome! I really appreciate the effort you put into Clust and I'm willing to help you from a user perspective.)

BaselAbujamous · 2018-11-16T11:49:18Z

Maybe trimmed means makes sense. As the algorithm aims at taking out any outliers anyway, trimmed mean and normal mean would be similar. To be on the safe side, I would use the trimmed mean approach as you suggested.

Your help is much appreciated by ideas or even by direct edits, indeed.

apcamargo · 2018-11-17T00:23:44Z

I did a quick experiment here. I got the values of the C1, C2 and C3 clusters from the D1 dataset and computed representative profiles using four methods: eigengene, mean, trimmed mean and median. I then calculated the sum of the absolute differences between the representative profiles and the true values.

It seems that taking the median was the best strategy (median > trimmed mean > mean > eigengene). I may be computing the eigengene wrong, tough.

We could test if that remains true with the D2 and D3 datasets.

clust_test_representative_profiles.pdf

What do you mean by "take out outliers"? Do Clust explicitly removes outliers or do you mean that a gene with a outlier value in a given sample simply wouldn't be clustered uring the k-means step?

apcamargo · 2018-11-17T02:41:13Z

It seems I was computing the eigengenes wrong after all. It looks like eigengenes are by far the best way to build a representative expression profile for the clusters.

eigengene >>>> median > trimmed mean > mean

clust_test_representative_profiles_v2.pdf

apcamargo · 2018-11-19T01:56:14Z

I did a Python implementation of the eigengene computation (and some plots comparing it to the medians, trimmed means and means).

Python_eigengenes.pdf

BaselAbujamous · 2018-11-19T12:50:27Z

This is some great effort, Antônio! Thanks a lot!

I can see your point, and I believe I will incorporate that in the next version of Clust! I may test it over some other datasets as (with higher dimensions maybe).

Your input is much appreciated and will definately make using Clust a better experience for users!

Thanks again!
Basel

apcamargo · 2018-11-19T21:24:28Z

You're welcome!

I tested the eigengene in one of mine datasets and it performed better again.

There's one important thing that I didn't leave in those PDFs. The eigengene may be computed with inverse signs relative to the true expression pattern of the coexpression module (that's why I put a minus sign in front of the SVD function). I sent a email to one of WGCNA's developers and he said to me that their function "automatically adjusts the sign so that the resulting module eigengene has a positive correlation with the mean gene expression values of the module".

Here's their code:

        {
          if (verbose>4) printFlush(paste(spaces,
                          " .. aligning module eigengene with average expression."))
          corAve = cor(averExpr[,i], PrinComps[,i], use = "p");
          if (!is.finite(corAve)) corAve = 0;
          if (corAve<0) PrinComps[,i] = -PrinComps[,i]
        }

This should be really easy to implement in Python for Clust. If you want to, I can work on a PR.

BaselAbujamous · 2018-11-26T12:55:07Z

Sorry for being late in responding. I am totally happy with it if you would like to work on a PR! Thanks.

taylorreiter · 2019-05-21T14:11:26Z

Hello! Will this be implemented in clust soon? I see #20, and am wondering if it is possible to have this functionality integrated.

BaselAbujamous · 2020-07-04T21:50:54Z

Sorry for the very late response here. I have just merged the edits by @apcamargo allowing for this capability. Thanks a lot, @apcamargo .

apcamargo · 2020-07-04T22:09:07Z

Thanks @BaselAbujamous!

apcamargo changed the title ~~Output a representative expression profile of the clusters~~ Output representative expression profiles of the clusters Nov 13, 2018

BaselAbujamous added the enhancement label Nov 14, 2018

BaselAbujamous closed this as completed Jul 4, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Output representative expression profiles of the clusters #16

Output representative expression profiles of the clusters #16

apcamargo commented Nov 13, 2018

BaselAbujamous commented Nov 14, 2018

apcamargo commented Nov 15, 2018

BaselAbujamous commented Nov 16, 2018

apcamargo commented Nov 17, 2018 •

edited

apcamargo commented Nov 17, 2018 •

edited

apcamargo commented Nov 19, 2018

BaselAbujamous commented Nov 19, 2018

apcamargo commented Nov 19, 2018 •

edited

BaselAbujamous commented Nov 26, 2018

taylorreiter commented May 21, 2019

BaselAbujamous commented Jul 4, 2020

apcamargo commented Jul 4, 2020

Output representative expression profiles of the clusters #16

Output representative expression profiles of the clusters #16

Comments

apcamargo commented Nov 13, 2018

BaselAbujamous commented Nov 14, 2018

apcamargo commented Nov 15, 2018

BaselAbujamous commented Nov 16, 2018

apcamargo commented Nov 17, 2018 • edited

apcamargo commented Nov 17, 2018 • edited

apcamargo commented Nov 19, 2018

BaselAbujamous commented Nov 19, 2018

apcamargo commented Nov 19, 2018 • edited

BaselAbujamous commented Nov 26, 2018

taylorreiter commented May 21, 2019

BaselAbujamous commented Jul 4, 2020

apcamargo commented Jul 4, 2020

apcamargo commented Nov 17, 2018 •

edited

apcamargo commented Nov 17, 2018 •

edited

apcamargo commented Nov 19, 2018 •

edited