This is a notebook that downloads all models uploaded to the cell collective and creates local sbml and booleannet files. It also collects all molecular species in all models in a table (dataframe) with associated models, links, etc.

In [1]:
import cellcollective #https://github.com/colomoto/colomoto-jupyter
import biolqm #https://github.com/GINsim/GINsim-python
import requests
import json
from urllib.request import urlretrieve
import glob
import pandas as pd

In [2]:
#a simple function creating a permanent local file from the retrieved model file
def download_local(url, path, model_id, suffix='sbml'):
    filename = path+str(model_id)+'.'+suffix
    filename, _ = urlretrieve(url, filename=filename)
    return filename

Output folders

In [3]:
sbmls_path='sbmls/'
boolean_models_path='boolean_models/'

#if they don't exist we create them
import os
if not os.path.exists(sbmls_path):
    os.makedirs(sbmls_path)
if not os.path.exists(boolean_models_path):
    os.makedirs(boolean_models_path)


Getting the model ids from the cell collective website

In [4]:
headers = {
  "User-Agent": "Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/72.0.3626.119 Safari/537.36"
}

url = "https://research.cellcollective.org/api/model"

r = requests.get(url, headers=headers)
data = r.json()
model_name_dict={}
for i in range(len(data['data'])):
    if 'model' in data['data'][i].keys():
        model_name_dict[data['data'][i]['model']['id']]=data['data'][i]['model']['name']
        

In [5]:
#print(model_name_dict)

This is manually curated list from which all model ids will be skipped from downloading and all subsequent analysis.

In [6]:
exception_list=[126843,3511,118235,15088,36604]

The download script checks the sbml folder for models already downloaded and skips them (it still downloads newly uploaded models). If set this to True to ignore the contents of the sbml folder.

In [7]:
download_all_again=False

In [8]:

species_dict={}
df=pd.DataFrame()

for model_id in model_name_dict:
    print('Checking',model_id)
    if model_id in exception_list:
        print('Model id is in exception list')
        continue
    downloaded_model_paths=glob.glob(sbmls_path+'*.sbml')
    if not download_all_again:
        downloaded_models=[int(i.split('/')[-1].split('.')[0]) for i in downloaded_model_paths]
    else:
        downloaded_models=[]

    if model_id not in downloaded_models:
        url='https://research.cellcollective.org/api/model/%d/export/version/1?type=SBML'%model_id
        try:
            sbml = cellcollective.load(url)
        except Exception as e:
            print(model_id, str(e))
            continue
        model_name=sbml.dom.getElementsByTagName('model')[0].getAttribute('name')
        print(model_name)

        #I download the file again locally because the colomoto biolqm.load does not work with the temporal download initiated by the the cellcollective script.
        filename = download_local(url,sbmls_path,model_id)
        sbml.localfile=filename
        #save to boolean net
        lqm = cellcollective.to_biolqm(sbml)
        biolqm.save(lqm, "%s%d_%s.booleannet"%(boolean_models_path,model_id,model_name.replace(' ','_')), "booleannet")

    sbml = cellcollective.load(sbmls_path+str(model_id)+'.sbml')
    for s in sbml.species:
        if sbml.species_uniprotkb(s)!=None:
            uniprot=sbml.species_uniprotkb(s).data
        else:
            uniprot=None
            
        row={'species':s.strip(),
             'model_id':model_id,
             'model_name':model_name_dict[model_id],
             'uniprot_info':uniprot,
             'ncbi_gene_info':sbml.species_ncbi_gene(s),
             'link_to_model':'https://research.cellcollective.org/?dashboard=true#module/%d:1/'%model_id}
        df=df.append(row,ignore_index=True)
        if s in species_dict:
            species_dict[s].append(model_id)
        else:
            species_dict[s]=[model_id]
    

Checking 2309


Downloading https://research.cellcollective.org/api/model/2309/export/version/1?type=SBML

EGFR & ErbB Signaling
Checking 5128


Downloading https://research.cellcollective.org/api/model/5128/export/version/1?type=SBML

Lac Operon
Checking 10248


Downloading https://research.cellcollective.org/api/model/10248/export/version/1?type=SBML

Bacteriophages in Cheese Production - Single Vat 2 (Inv 4)
Checking 141066


Downloading https://research.cellcollective.org/api/model/141066/export/version/1?type=SBML

141066 HTTP Error 500: Internal Server Error
Checking 2314


Downloading https://research.cellcollective.org/api/model/2314/export/version/1?type=SBML

IL-6 Signalling
Checking 16659


Downloading https://research.cellcollective.org/api/model/16659/export/version/1?type=SBML

Modeling Light Reactions and Dark Reactions in Photosynthesis
Checking 1557


Downloading https://research.cellcollective.org/api/model/1557/export/version/1?type=SBML

Signal Transduction in Fibroblasts
Checking 6678


Downloading https://research.cellcollective.org/api/model/6678/export/version/1?type=SBML

CD4+ T cell Differentiation
Checking 2329


Downloading https://research.cellcollective.org/api/model/2329/export/version/1?type=SBML

Apoptosis Network
Checking 17433


Downloading https://research.cellcollective.org/api/model/17433/export/version/1?type=SBML

17433 HTTP Error 500: Internal Server Error
Checking 8227


Downloading https://research.cellcollective.org/api/model/8227/export/version/1?type=SBML

T-LGL Survival Network 2011 Reduced Network
Checking 2084


Downloading https://research.cellcollective.org/api/model/2084/export/version/1?type=SBML

Death Receptor Signaling
Checking 2341


Downloading https://research.cellcollective.org/api/model/2341/export/version/1?type=SBML

Body Segmentation in Drosophila 2013
Checking 153639


Downloading https://research.cellcollective.org/api/model/153639/export/version/1?type=SBML

Computational Modeling Lesson Structure
Checking 36647


Downloading https://research.cellcollective.org/api/model/36647/export/version/1?type=SBML

Cell Cycle Regulation - Investigation 1
Checking 121641


Downloading https://research.cellcollective.org/api/model/121641/export/version/1?type=SBML

Introduction to Food Web Dynamics_Incubator20
Checking 36652


Downloading https://research.cellcollective.org/api/model/36652/export/version/1?type=SBML

Cell Cycle Tumorigenesis - Investigation 2
Checking 1582


Downloading https://research.cellcollective.org/api/model/1582/export/version/1?type=SBML

Signaling in Macrophage Activation
Checking 7984


Downloading https://research.cellcollective.org/api/model/7984/export/version/1?type=SBML

MAPK Cancer Cell Fate Network
Checking 29742


Downloading https://research.cellcollective.org/api/model/29742/export/version/1?type=SBML

29742 HTTP Error 404: Not Found
Checking 17416


Downloading https://research.cellcollective.org/api/model/17416/export/version/1?type=SBML

17416 HTTP Error 404: Not Found
Checking 121654


Downloading https://research.cellcollective.org/api/model/121654/export/version/1?type=SBML

Introduction to Food Web Dynamics - 2020 Summer Science AC
Checking 153919


Downloading https://research.cellcollective.org/api/model/153919/export/version/1?type=SBML

153919 HTTP Error 500: Internal Server Error
Checking 4932


Downloading https://research.cellcollective.org/api/model/4932/export/version/1?type=SBML

Stomatal Opening Model
Checking 1607


Downloading https://research.cellcollective.org/api/model/1607/export/version/1?type=SBML

Mammalian Cell Cycle
Checking 4942


Downloading https://research.cellcollective.org/api/model/4942/export/version/1?type=SBML

Pro-inflammatory Tumor Microenvironment in Acute Lymphoblastic Leukemia
Checking 55633


Downloading https://research.cellcollective.org/api/model/55633/export/version/1?type=SBML

Cell Collective Training Module: Factors Influencing Exam Scores
Checking 126290


Downloading https://research.cellcollective.org/api/model/126290/export/version/1?type=SBML

126290 HTTP Error 500: Internal Server Error
Checking 2901


Downloading https://research.cellcollective.org/api/model/2901/export/version/1?type=SBML

T cell differentiation
Checking 2135


Downloading https://research.cellcollective.org/api/model/2135/export/version/1?type=SBML

Yeast Apoptosis
Checking 11863


Downloading https://research.cellcollective.org/api/model/11863/export/version/1?type=SBML

Senescence Associated Secretory Phenotype
Checking 2136


Downloading https://research.cellcollective.org/api/model/2136/export/version/1?type=SBML

Cardiac development
Checking 2394


Downloading https://research.cellcollective.org/api/model/2394/export/version/1?type=SBML

B cell differentiation
Checking 2396


Downloading https://research.cellcollective.org/api/model/2396/export/version/1?type=SBML

Mammalian Cell Cycle 2006
Checking 4705


Downloading https://research.cellcollective.org/api/model/4705/export/version/1?type=SBML

Septation Initiation Network
Checking 4706


Downloading https://research.cellcollective.org/api/model/4706/export/version/1?type=SBML

Predicting Variabilities in Cardiac Gene
Checking 5731


Downloading https://research.cellcollective.org/api/model/5731/export/version/1?type=SBML

Metabolic Interactions in the Gut Microbiome
Checking 2404


Downloading https://research.cellcollective.org/api/model/2404/export/version/1?type=SBML

Budding Yeast Cell Cycle
Checking 2407


Downloading https://research.cellcollective.org/api/model/2407/export/version/1?type=SBML

T-LGL Survival Network 2011
Checking 2663


Downloading https://research.cellcollective.org/api/model/2663/export/version/1?type=SBML

Wg Pathway of Drosophila Signalling Pathways
Checking 121704


Downloading https://research.cellcollective.org/api/model/121704/export/version/1?type=SBML

Introduction to Food Web Dynamics_Incubator20_Ready_Published
Checking 2667


Downloading https://research.cellcollective.org/api/model/2667/export/version/1?type=SBML

VEGF Pathway of Drosophila Signaling Pathway
Checking 2668


Downloading https://research.cellcollective.org/api/model/2668/export/version/1?type=SBML

Toll Pathway of Drosophila Signaling Pathway
Checking 2669


Downloading https://research.cellcollective.org/api/model/2669/export/version/1?type=SBML

Processing of Spz Network from the Drosophila Signaling Pathway
Checking 8558


Downloading https://research.cellcollective.org/api/model/8558/export/version/1?type=SBML

8558 HTTP Error 500: Internal Server Error
Checking 8048


Downloading https://research.cellcollective.org/api/model/8048/export/version/1?type=SBML

Treatment of Castration-Resistant Prostate Cancer
Checking 2161


Downloading https://research.cellcollective.org/api/model/2161/export/version/1?type=SBML

Guard Cell Abscisic Acid Signaling
Checking 2423


Downloading https://research.cellcollective.org/api/model/2423/export/version/1?type=SBML

Budding Yeast Cell Cycle 2009
Checking 2681


Downloading https://research.cellcollective.org/api/model/2681/export/version/1?type=SBML

Cell Cycle Transcription by Coupled CDK and Network Oscillators
Checking 126843
Model id is in exception list
Checking 2171


Downloading https://research.cellcollective.org/api/model/2171/export/version/1?type=SBML

T Cell Receptor Signaling
Checking 2172


Downloading https://research.cellcollective.org/api/model/2172/export/version/1?type=SBML

Cholesterol Regulatory Pathway
Checking 29564


Downloading https://research.cellcollective.org/api/model/29564/export/version/1?type=SBML

Regulation of Cellular Respiration: ETC and Fermentation


Py4JJavaError: An error occurred while calling o154.call.
: java.lang.RuntimeException: Invalid ID: 2ATP
	at org.colomoto.biolqm.NodeInfo.setNodeID(NodeInfo.java:49)
	at org.colomoto.biolqm.modifier.sanitize.SanitizeModifier.ensureUniqueID(SanitizeModifier.java:103)
	at org.colomoto.biolqm.modifier.sanitize.SanitizeModifier.sanitizeIDs(SanitizeModifier.java:79)
	at org.colomoto.biolqm.modifier.sanitize.SanitizeModifier.performTask(SanitizeModifier.java:32)
	at org.colomoto.biolqm.modifier.sanitize.SanitizeModifier.performTask(SanitizeModifier.java:15)
	at org.colomoto.common.task.AbstractTask.call(AbstractTask.java:48)
	at jdk.internal.reflect.GeneratedMethodAccessor4.invoke(Unknown Source)
	at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.base/java.lang.reflect.Method.invoke(Method.java:566)
	at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
	at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
	at py4j.Gateway.invoke(Gateway.java:282)
	at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
	at py4j.commands.CallCommand.execute(CallCommand.java:79)
	at py4j.GatewayConnection.run(GatewayConnection.java:238)
	at java.base/java.lang.Thread.run(Thread.java:834)


In [None]:
#a dictionary mapping molecular species to model_ids of models containing them
print(species_dict)

There exists several python interfaces to programmatically query information from these databases:
- using NBCI Gene ID: https://github.com/biocommons/eutils
- using UniProt ID: https://github.com/jdrudolph/uniprot

In [None]:
df = df.reindex(['species','model_id','model_name','link_to_model','uniprot_info','ncbi_gene_info'], axis=1)

In [None]:
df = df.sort_values('species')
df = df.reset_index(drop=True)
df.to_excel('cell_collective_species_data.xlsx')

In [None]:
df