Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Extended vs. non-extended labels in pySCENIC #12

Closed
wyattmcdonnell opened this issue May 8, 2018 · 3 comments
Closed

Extended vs. non-extended labels in pySCENIC #12

wyattmcdonnell opened this issue May 8, 2018 · 3 comments

Comments

@wyattmcdonnell
Copy link

Hi there!

So excited to see all of this incredible work ported into Python! The performance in our hands has moved things from about a day-long computation to somewhere around half an hour—so thanks again!

One thing that I've noticed is that there are (+) and (−) labels for the regulons, but it's unclear which of these (if either?) is the extended version of the regulon. In the R version of SCENIC there are clear labels for _extended, but I'm not sure how to distinguish between these in the current iteration. Could you point me in the right direction?

Best wishes,
Wyatt

@bramvds
Copy link
Contributor

bramvds commented May 8, 2018

Dear Wyatt,

Thank you for the feedback!

Regarding your question related to the (+) and (-) suffixes for the regulons: this is experimental work. The (+) indicates that there is a positive correlation between the expression levels of a TF and its target genes across cells (from which we infer that the regulon is a transcriptional activator), while the (-) indicates the positive, i.e. a transcriptional inactivating relationship between the TF and its targets.
This is work in progress. The (+) labelled regulons are the ones you would get from running the original R version.

If you want to filter based on direct or indirect TF annotations you can easily do so by first filtering the dataframe with enriched motifs [i.e. df = prune2df()] and only subsequently derive regulons from the filtered df using df2regulons. You should focus on the column "Annotation" which should contain "gene is directly annotated" for direct annotations. More fine grained control is possible by also using the columns "MotifSimilarityQValue" (should be 0 if no motif similarity is needed to find a matching TF annotation for the enriched motif in the species under investigation) and "OrthologousIdentity" (a value between 0.0 and 1.0 which signifies the orthologous identity of the DBD of involved TF proteins when SCENIC needs to cross species boundaries to find an appropriate annotation for the enriched motifs).

Hope this helps,
Bram

@dschrein
Copy link

i just wanted to second the compliment here - the performance improvement over the R version is staggering: we went from one week for to under a day. :)

the support here has also been prompt and excellent. thank you!

@bramvds
Copy link
Contributor

bramvds commented May 14, 2018

You're more than welcome. Thanks for the feedback, much appreciated.

@bramvds bramvds closed this as completed May 14, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants