Ingest ~2,000 high-quality models into Terarium #363

liunelson · 2024-01-04T19:23:49Z

The goal is to pre-populate Terarium with a significant number of "high quality" models from an existing repository such as BioModels.

The models that we want to ingest are the ~2432 models returned by the BioModels search interface with only the filter "model format = SBML". We should use the REST API to download the SBML file of each model.

Each SBML model file (extension = xml or sbml) should go through the following script (requires MIRA package) to convert from SBML format to PetriNet AMR JSON format:

import os
import glob
import json
import tqdm

from mira.metamodel.ops import simplify_rate_laws
from mira.modeling import Model
from mira.modeling.amr.petrinet import AMRPetriNetModel
from mira.sources.sbml import template_model_from_sbml_file

PATH = "data/biomodels_sbml"
fnames = glob.glob(os.path.join(PATH, "*.*ml"))

fnames_succ = []
fnames_fail = []
for fname in tqdm.tqdm(fnames):
    try:
        model_tm = template_model_from_sbml_file(fname)
        model_tm_ = simplify_rate_laws(model_tm)
        model_pn = AMRPetriNetModel(Model(model_tm_))
        model_pn_json = model_pn.to_json()

        with open(".".join([fname.split(".")[0], "json"]), "w") as f:
            json.dump(model_pn_json, f, indent = 4)

        fnames_succ.append(fname)

    except:
        fnames_fail.append(fname)

print(f"{len(fnames_succ)} successes and {len(fnames_fail)} fails")

I've tested ~200 models and ~60% can be successfully converted into a PetriNet AMR JSON.

We'll need @j2whiting 's help to subsequently populate the "Model Card" associated with each model.

The text was updated successfully, but these errors were encountered:

liunelson · 2024-01-04T19:24:19Z

@bigglesandginger @YohannParis
Does the above make sense to you?

bigglesandginger · 2024-01-18T15:48:19Z

@liunelson Have you used the API? When I try curl -XGET https://www.ebi.ac.uk/biomodels/search\?query\=homo+sapiens\&format\=SBML I get a web page, not xml or whatever else one might expect from an API .

liunelson · 2024-01-19T15:32:47Z

You are right about the API. It seems to return the search page itself, as opposed to a nice JSON listing the model IDs.

https://www.ebi.ac.uk/biomodels/search?query=%3A%20AND%20modelformat%3A%22SBML%22&domain=biomodels&offset=0&numResults=10

With the Model IDs, then you can use this endpoint to get the model SBML file
https://www.ebi.ac.uk/biomodels/search/download?models=MODEL0913095435

Does this make sense?

j2whiting · 2024-01-20T22:23:26Z

I can write a script to crawl these and pull models & metadata if you'd like. I should have time on Monday/Tues and I think it should only take a couple hours tops.

Just let me know the schema you need for the output.

j2whiting · 2024-01-23T01:51:15Z

This was a little bit annoying since I had to render the javascript instead of just building a crawler using simple requests and html parsing.. but it is done.

I managed to pull all 2435 model href tags, the URL for the source publication and the download link for the model file.

>>> import json
>>> with open('model_data.json', 'r') as f:
...     data = json.load(f)
...
>>> next(iter(data.items()))
('/biomodels/BIOMD0000000573', {'publication_link': 'http://identifiers.org/pubmed/24997239', 'model_files': ['https://www.ebi.ac.uk/biomodels/services/download/get-files/MODEL1503180001/3/BIOMD0000000573_url.xml', 'https://www.ebi.ac.uk/biomodels/services/download/get-files/MODEL1503180001/3/BIOMD0000000573_urn.xml']})

How do we add these to Terrarium?

Update:

URLs, models and reference links can be found in the JSON here: https://drive.google.com/file/d/1Upv84-fWmSqBvTxSzRpJqSEQ3OQ61GTc/view?usp=share_link

Test adding new models to Terrarium on Jan 24th

j2whiting · 2024-01-26T16:31:17Z

@liunelson to convert these to AMR and upload to Terarium

liunelson · 2024-01-28T02:45:07Z

I've ad-hoc converted 2k models from SBML to AMR JSON that Julian has scrapped from the BioModels repository: see here.

I've also used the Open Access Button API to find the download link of the associated paper PDF:
model_data_oa.json

I was only able to download 10.8% of the open-access URLs but I didn't spend any time trying to figure out why the other ~60% of open-access URL downloads didn't work. Some, the OA URL GET just 404. However, for example, model BIOMD0000000598 has this link which allowed me to download a PDF but I don't know why it'd 403 with GET.

Charles, can you have Terarium ingest all these models with their paper (if available)?

Number of models:			2435
	with SBML:			99.8%
	converted to AMR:		55.6%
	with PDF link:			99.4%
	with OA PDF link:		72.3%
	with downloaded PDF:		10.8%

liunelson assigned liunelson and bigglesandginger Jan 4, 2024

YohannParis assigned j2whiting and unassigned liunelson Jan 22, 2024

YohannParis changed the title ~~Ingest a number (10-1000s) of high-quality models into Terarium~~ Ingest a number 245 of high-quality models into Terarium Jan 23, 2024

j2whiting assigned liunelson Jan 26, 2024

YohannParis changed the title ~~Ingest a number 245 of high-quality models into Terarium~~ Ingest a number 2245 of high-quality models into Terarium Jan 29, 2024

j2whiting mentioned this issue Jan 29, 2024

Model Search DARPA-ASKEM/GoLLM#9

Closed

7 tasks

liunelson changed the title ~~Ingest a number 2245 of high-quality models into Terarium~~ Ingest ~2,000 high-quality models into Terarium Jan 29, 2024

dgauldie assigned kbirk Feb 6, 2024

YohannParis closed this as completed Jun 4, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Ingest ~2,000 high-quality models into Terarium #363

Ingest ~2,000 high-quality models into Terarium #363

liunelson commented Jan 4, 2024 •

edited

Loading

liunelson commented Jan 4, 2024

bigglesandginger commented Jan 18, 2024

liunelson commented Jan 19, 2024

j2whiting commented Jan 20, 2024 •

edited

Loading

j2whiting commented Jan 23, 2024 •

edited

Loading

j2whiting commented Jan 26, 2024

liunelson commented Jan 28, 2024

Ingest ~2,000 high-quality models into Terarium #363

Ingest ~2,000 high-quality models into Terarium #363

Comments

liunelson commented Jan 4, 2024 • edited Loading

liunelson commented Jan 4, 2024

bigglesandginger commented Jan 18, 2024

liunelson commented Jan 19, 2024

j2whiting commented Jan 20, 2024 • edited Loading

j2whiting commented Jan 23, 2024 • edited Loading

j2whiting commented Jan 26, 2024

liunelson commented Jan 28, 2024

liunelson commented Jan 4, 2024 •

edited

Loading

j2whiting commented Jan 20, 2024 •

edited

Loading

j2whiting commented Jan 23, 2024 •

edited

Loading