Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Output representative expression profiles of the clusters #16

Closed
apcamargo opened this issue Nov 13, 2018 · 12 comments
Closed

Output representative expression profiles of the clusters #16

apcamargo opened this issue Nov 13, 2018 · 12 comments

Comments

@apcamargo
Copy link
Contributor

Hi Basel,

In many cases, it's very useful to use a prototypical expression profile of the clusters in downstream analysis (by measuring it's correlation to an external variable, for instance). In WGCNA, the eigengene of the modules are usually used for this purpose.

It would be useful if Clust could output some kind of representation of the expression profile of each cluster. It could be the eigengene, median expression for each sample, trimmed mean etc.

What do you think?

@apcamargo apcamargo changed the title Output a representative expression profile of the clusters Output representative expression profiles of the clusters Nov 13, 2018
@BaselAbujamous
Copy link
Owner

Hi Antoni,

I agree. I usually use mean expression. Data is all available in the output for people to generate this, but it would be nice to provide them with that ready in a separate TSV file. I will consider this in future versions. I will keep this issue open until then.

P.S. Thanks a lot for your recent edits (fixing requirements in README and changing transparency of the plots). I have merged them and they will be part of the next version of the pip-installed package. I liked the idea. Thanks for the contribution.

Basel

@apcamargo
Copy link
Contributor Author

Do you think the mean expression in each condition is a good option? I imagine that there won't be many outlier values (as the expression needs to be at least similar to the cluster profile), but I feel that the average isn't robust enough.

Using trimmed means or medians seems better to me (I might be mistaken). I don't know if the eigengene is robost to ouliers, but I think we can investigate it.

(You're welcome! I really appreciate the effort you put into Clust and I'm willing to help you from a user perspective.)

@BaselAbujamous
Copy link
Owner

Maybe trimmed means makes sense. As the algorithm aims at taking out any outliers anyway, trimmed mean and normal mean would be similar. To be on the safe side, I would use the trimmed mean approach as you suggested.

Your help is much appreciated by ideas or even by direct edits, indeed.

@apcamargo
Copy link
Contributor Author

apcamargo commented Nov 17, 2018

I did a quick experiment here. I got the values of the C1, C2 and C3 clusters from the D1 dataset and computed representative profiles using four methods: eigengene, mean, trimmed mean and median. I then calculated the sum of the absolute differences between the representative profiles and the true values.

It seems that taking the median was the best strategy (median > trimmed mean > mean > eigengene). I may be computing the eigengene wrong, tough.

We could test if that remains true with the D2 and D3 datasets.

clust_test_representative_profiles.pdf


What do you mean by "take out outliers"? Do Clust explicitly removes outliers or do you mean that a gene with a outlier value in a given sample simply wouldn't be clustered uring the k-means step?

@apcamargo
Copy link
Contributor Author

apcamargo commented Nov 17, 2018

It seems I was computing the eigengenes wrong after all. It looks like eigengenes are by far the best way to build a representative expression profile for the clusters.

eigengene >>>> median > trimmed mean > mean

clust_test_representative_profiles_v2.pdf

@apcamargo
Copy link
Contributor Author

I did a Python implementation of the eigengene computation (and some plots comparing it to the medians, trimmed means and means).

Python_eigengenes.pdf

@BaselAbujamous
Copy link
Owner

This is some great effort, Antônio! Thanks a lot!

I can see your point, and I believe I will incorporate that in the next version of Clust! I may test it over some other datasets as (with higher dimensions maybe).

Your input is much appreciated and will definately make using Clust a better experience for users!

Thanks again!
Basel

@apcamargo
Copy link
Contributor Author

apcamargo commented Nov 19, 2018

You're welcome!

I tested the eigengene in one of mine datasets and it performed better again.

There's one important thing that I didn't leave in those PDFs. The eigengene may be computed with inverse signs relative to the true expression pattern of the coexpression module (that's why I put a minus sign in front of the SVD function). I sent a email to one of WGCNA's developers and he said to me that their function "automatically adjusts the sign so that the resulting module eigengene has a positive correlation with the mean gene expression values of the module".

Here's their code:

        {
          if (verbose>4) printFlush(paste(spaces,
                          " .. aligning module eigengene with average expression."))
          corAve = cor(averExpr[,i], PrinComps[,i], use = "p");
          if (!is.finite(corAve)) corAve = 0;
          if (corAve<0) PrinComps[,i] = -PrinComps[,i]
        }

This should be really easy to implement in Python for Clust. If you want to, I can work on a PR.

@BaselAbujamous
Copy link
Owner

Sorry for being late in responding. I am totally happy with it if you would like to work on a PR! Thanks.

@taylorreiter
Copy link

Hello! Will this be implemented in clust soon? I see #20, and am wondering if it is possible to have this functionality integrated.

@BaselAbujamous
Copy link
Owner

Sorry for the very late response here. I have just merged the edits by @apcamargo allowing for this capability. Thanks a lot, @apcamargo .

@apcamargo
Copy link
Contributor Author

Thanks @BaselAbujamous!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants