Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Annotation format requirement #5

Closed
kvshams opened this issue Jan 17, 2023 · 3 comments
Closed

Annotation format requirement #5

kvshams opened this issue Jan 17, 2023 · 3 comments

Comments

@kvshams
Copy link

kvshams commented Jan 17, 2023

What annotation format is required? Is it possible to use the gene sets directly from the pathway database? for instances the C2 jason bundle from the Broad Institute pathway database?
Thanks,
Shams

@wallet-maker
Copy link
Member

Hi Shams,

my apologies for the late response. Yes, you can use an entire pathway database like the C2 bundle from MSigDB. The important thing is you format the gene set annotation dictionary correctly.

The dictionary has to include all cell types from your adata cell type annotations as keys. Since most databases will not give you annotations which cell types their gene sets are specific to, you will have to 1) either annotate the cell types yourself or 2) set all gene sets as global (both approaches should be fine you can look empirically what works for you).

gene_set_dictionary = {'celltype_1':{'gene_set_1':['gene_a', 'gene_b', 'gene_c'], 'gene_set_2':['gene_c','gene_a','gene_e','gene_f']},

'celltype_2':{'gene_set_1':['gene_a', 'gene_b', 'gene_c'], 'gene_set_3':['gene_a', 'gene_e','gene_f','gene_d']},

'celltype_3':{},

'global':"{'gene_set_4':['gene_m','gene_n']}

Having said that, we believe that best results can be obtained by limiting the number of gene sets to coherent interpretable genes of similar size and with limited redundancy (please see the manuscript Supplementary Methods for further detail https://doi.org/10.1101/2022.12.20.521311 ). We also offer a package to select gene sets for Spectra which we will update with an extended set of annotations (including cancer cell and stroma cell gene sets) in the near future https://github.com/wallet-maker/cytopus .

Let me know if that helps

@kvshams
Copy link
Author

kvshams commented Jan 22, 2023

Thanks for the reply. Is there an example code snippet format the jason file from MSiGDB?
Thanks,
Shams

@wallet-maker
Copy link
Member

Hi Shams,
we do not provide a code snippet, but you will find an explanation in the tutorial how to configure the dictionary. The easiest way would be to run this will use_celltype=False in the est_spectra function. We now provide an example in the tutorial.

https://github.com/dpeerlab/spectra/blob/main/notebooks/example_notebook.ipynb

Thank you,
Thomas

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants