Cellular Modeling Support #1245

rbharath · 2018-05-03T17:55:52Z

There's been a lot of interesting progress recently in cellular modeling. In particular, I'm thinking of this paper that creates a deep learned cell simulator:

https://www.nature.com/articles/nmeth.4627

The code for the simulator is open sources as well:

https://github.com/idekerlab/DCell

I wonder if there's a way to support this form of modeling work through DeepChem. I suspect this would be a very nice complement to deep microscopy support.

peastman · 2018-05-04T23:04:30Z

That's a really cool paper! In some ways it's similar to a CNN. They basically take a fully connected network, then remove all the connections except the ones they expect to be important based on domain knowledge.

This should be simple to implement with TensorGraph. For each leaf node in the hierarchy, use a Gather layer to collect the inputs for the genes it includes. Other than that, it's just Dense, Concat, and BatchNorm layers.

rbharath · 2018-05-07T05:35:16Z

@peastman Would it be feasible to build a framework for these types of simulations? I imagine people would want to use different cell types or other tweaks

peastman · 2018-05-07T15:45:12Z

Sure. We just need to decide how the structure should be specified. Here are the things that need to be specified:

The full list of genes.
The set of genes in each leaf node.
The child nodes of each higher level node.
The number of outputs for each node.

We could possibly automate some of that, but not all. For example, we can build in the GO hierarchy, but the list of nodes needs to be filtered in ways that are data and problem specific (see the "preparation of ontologies" section).

rbharath · 2018-05-08T03:25:56Z

This sounds like a good first list. I think a similar API doesn't really exist, so starting with a reasonable design without much automation and adjusting as we get community feedback on ontology specification should work well

rbharath · 2018-06-03T19:45:24Z

@peastman Is this one on your TODO list already? Feel free to remove the contribution labels if so.

peastman · 2018-06-03T20:29:01Z

There are still other things higher up on my list, so let's leave it in case someone else gets to it before I do.

peastman · 2018-07-11T20:33:21Z

I'm starting to work on this, so I've removed the labels.

peastman · 2018-07-16T20:11:49Z

I'm looking for datasets to test this on. We don't currently have any genomic datasets, do we? Any suggestions for things we should try?

I'm assuming we don't want to start adding genomic data to molnet. That seems outside its scope, and there are already public repositories with huge amounts of data.

rbharath · 2018-07-19T02:00:17Z

I don't think we have any Genomic dataset at present. Could we duplicate some of the results from the original paper?

I think it would make sense to add genomic data to molnet if there's not an easily accessible public repository for the data already. In case there's already a public repository, we could add a utility function that provides easy access to the data.

peastman · 2018-07-19T16:13:09Z

The most important one is probably the Gene Expression Omnibus (https://www.ncbi.nlm.nih.gov/geo/). It has data from around 100,000 experiments, mostly gene expression but also genetic variation, TF binding, methylation, etc. The data isn't in any consistent format though. It's whatever files the experimenters uploaded for each one.

For sequence data there's a whole lot of public databases, but GenBank (https://www.ncbi.nlm.nih.gov/genbank/) is the really big one. That's going to be harder to use with this model, though, because it isn't organized by gene.

rbharath · 2018-07-20T04:58:32Z

Better GEO and GenBank support would be great. Perhaps we could add download/processing functions that surface specific datasets from these repositories we need for applications.

rbharath added Contribution Welcome Good Intermediate Contribution labels Jun 3, 2018

peastman removed Contribution Welcome Good Intermediate Contribution labels Jul 11, 2018

peastman mentioned this issue Jul 13, 2018

OntologyModel #1311

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Cellular Modeling Support #1245

Cellular Modeling Support #1245

rbharath commented May 3, 2018

peastman commented May 4, 2018

rbharath commented May 7, 2018

peastman commented May 7, 2018

rbharath commented May 8, 2018

rbharath commented Jun 3, 2018

peastman commented Jun 3, 2018

peastman commented Jul 11, 2018

peastman commented Jul 16, 2018

rbharath commented Jul 19, 2018

peastman commented Jul 19, 2018

rbharath commented Jul 20, 2018

Cellular Modeling Support #1245

Cellular Modeling Support #1245

Comments

rbharath commented May 3, 2018

peastman commented May 4, 2018

rbharath commented May 7, 2018

peastman commented May 7, 2018

rbharath commented May 8, 2018

rbharath commented Jun 3, 2018

peastman commented Jun 3, 2018

peastman commented Jul 11, 2018

peastman commented Jul 16, 2018

rbharath commented Jul 19, 2018

peastman commented Jul 19, 2018

rbharath commented Jul 20, 2018