Can some clever caching of schemas speed things up? #95

edeutsch · 2023-08-15T23:07:25Z

Not crucial or urgent, and I'm not certain this is a problem, but here are my musings:

It seems likely that every time the validator is run (including import), the TRAPI YAML and the Biolink YAML are probably fetched and parsed
Parsing YAML in Python is REEEEALY slow, like stunningly slow. Parsing the same model from JSON is 100x faster
Parsing the Biolink YAML and the TRAPI YAML each take like 0.5 seconds. Parsing the equivalent JSON is like 0.005 seconds.
Plus, are we also downloading these files each time from GitHub?
When someone clicks on a parent PK in the ARAX GUI, ARAX ends up validating a whole batch of documents in parallel, each probably paying the cost of downloading and parsing the YAMLs. Probably each process pays a penalty of over a second?

I wonder if some clever caching would improve each validation by at least a second.
Maybe not huge, but when someone is waiting on the result, not paying a 1 second cost 10 times may be a benefit.

Not trivial though. Where do you store the caches? How do you make the JSON conversion storage thread and concurrent process safe?

Maybe a job for ARAX, not for the validator? I don't know.

RichardBruskiewich mentioned this issue Sep 4, 2023

Strive to enhance performance of the validation execution #28

Open

3 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Can some clever caching of schemas speed things up? #95

Can some clever caching of schemas speed things up? #95

edeutsch commented Aug 15, 2023

Can some clever caching of schemas speed things up? #95

Can some clever caching of schemas speed things up? #95

Comments

edeutsch commented Aug 15, 2023