-
Notifications
You must be signed in to change notification settings - Fork 1.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Cellular Modeling Support #1245
Comments
That's a really cool paper! In some ways it's similar to a CNN. They basically take a fully connected network, then remove all the connections except the ones they expect to be important based on domain knowledge. This should be simple to implement with TensorGraph. For each leaf node in the hierarchy, use a Gather layer to collect the inputs for the genes it includes. Other than that, it's just Dense, Concat, and BatchNorm layers. |
@peastman Would it be feasible to build a framework for these types of simulations? I imagine people would want to use different cell types or other tweaks |
Sure. We just need to decide how the structure should be specified. Here are the things that need to be specified:
We could possibly automate some of that, but not all. For example, we can build in the GO hierarchy, but the list of nodes needs to be filtered in ways that are data and problem specific (see the "preparation of ontologies" section). |
This sounds like a good first list. I think a similar API doesn't really exist, so starting with a reasonable design without much automation and adjusting as we get community feedback on ontology specification should work well |
@peastman Is this one on your TODO list already? Feel free to remove the contribution labels if so. |
There are still other things higher up on my list, so let's leave it in case someone else gets to it before I do. |
I'm starting to work on this, so I've removed the labels. |
I'm looking for datasets to test this on. We don't currently have any genomic datasets, do we? Any suggestions for things we should try? I'm assuming we don't want to start adding genomic data to molnet. That seems outside its scope, and there are already public repositories with huge amounts of data. |
I don't think we have any Genomic dataset at present. Could we duplicate some of the results from the original paper? I think it would make sense to add genomic data to molnet if there's not an easily accessible public repository for the data already. In case there's already a public repository, we could add a utility function that provides easy access to the data. |
The most important one is probably the Gene Expression Omnibus (https://www.ncbi.nlm.nih.gov/geo/). It has data from around 100,000 experiments, mostly gene expression but also genetic variation, TF binding, methylation, etc. The data isn't in any consistent format though. It's whatever files the experimenters uploaded for each one. For sequence data there's a whole lot of public databases, but GenBank (https://www.ncbi.nlm.nih.gov/genbank/) is the really big one. That's going to be harder to use with this model, though, because it isn't organized by gene. |
Better GEO and GenBank support would be great. Perhaps we could add download/processing functions that surface specific datasets from these repositories we need for applications. |
There's been a lot of interesting progress recently in cellular modeling. In particular, I'm thinking of this paper that creates a deep learned cell simulator:
https://www.nature.com/articles/nmeth.4627
The code for the simulator is open sources as well:
https://github.com/idekerlab/DCell
I wonder if there's a way to support this form of modeling work through DeepChem. I suspect this would be a very nice complement to deep microscopy support.
The text was updated successfully, but these errors were encountered: