Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Avert cross species contamination in VEP cache dump #1575

Merged
merged 1 commit into from Jan 16, 2024

Conversation

nakib103
Copy link
Contributor

@nakib103 nakib103 commented Dec 12, 2023

ENSVAR-6087

Problem

In VEP cache we generally have pubmed and var_synonyms data from the database. For past couple of release we are getting erratically missing data from them. Note that these two are the only field that are queried from the database and dumped into files and read back again from those dump file when creating the cache.

Cause

After adding some debug data (the contents of the dump files read in each job), the cause behind these missing data is found out. We generally keep the data loaded from the dump files in-memory for a species between jobs. But it seems that somehow these objects is persisting jobs between different species and hence one species is getting data from another species.

Solution

Simply do not keep in-memory objects.

Test

Tested with fix and compared the result between different species. The result can be viewed in above JIRA ticket.

@nakib103 nakib103 marked this pull request as draft December 12, 2023 14:42
@nakib103 nakib103 marked this pull request as ready for review December 13, 2023 15:32
@jamie-m-a jamie-m-a self-requested a review December 14, 2023 12:41
@jamie-m-a jamie-m-a self-assigned this Dec 14, 2023
Copy link
Contributor

@jamie-m-a jamie-m-a left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree with solution - removed option to use an existing object and force creation of new object each time these subs called.

@jamie-m-a jamie-m-a merged commit c358c53 into Ensembl:postreleasefix/112 Jan 16, 2024
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants