Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Can some clever caching of schemas speed things up? #95

Open
edeutsch opened this issue Aug 15, 2023 · 0 comments
Open

Can some clever caching of schemas speed things up? #95

edeutsch opened this issue Aug 15, 2023 · 0 comments

Comments

@edeutsch
Copy link

Not crucial or urgent, and I'm not certain this is a problem, but here are my musings:

  • It seems likely that every time the validator is run (including import), the TRAPI YAML and the Biolink YAML are probably fetched and parsed
  • Parsing YAML in Python is REEEEALY slow, like stunningly slow. Parsing the same model from JSON is 100x faster
  • Parsing the Biolink YAML and the TRAPI YAML each take like 0.5 seconds. Parsing the equivalent JSON is like 0.005 seconds.
  • Plus, are we also downloading these files each time from GitHub?
  • When someone clicks on a parent PK in the ARAX GUI, ARAX ends up validating a whole batch of documents in parallel, each probably paying the cost of downloading and parsing the YAMLs. Probably each process pays a penalty of over a second?

I wonder if some clever caching would improve each validation by at least a second.
Maybe not huge, but when someone is waiting on the result, not paying a 1 second cost 10 times may be a benefit.

Not trivial though. Where do you store the caches? How do you make the JSON conversion storage thread and concurrent process safe?

Maybe a job for ARAX, not for the validator? I don't know.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant