ENSCORESW-3147: correctly capture all required fields from file #384

magaliruffier · 2019-05-14T09:49:57Z

Requirements

Filling out the template is required. Any pull request that does not include enough information to be reviewed in a timely manner may be closed at the maintainers' discretion;
Review the contributing guidelines for this repository; remember in particular:
- do not modify code without testing for regression
- provide simple unit tests to test the changes
- if you change the schema you must patch the test databases as well, see Updating the schema
- the PR must not fail unit testing

Description

Using one or more sentences, describe in detail the proposed changes.
The RFAM parser does not capture all required fields when run as part of the eHive xref pipeline. This is because the file is read line by line from disk, while the information for a single entry is split across multiple lines.
The proposed change groups the lines in logical blocks representing a single RFAM entry.

Use case

Describe the problem. Please provide an example representing the motivation behind the need for having these changes in place.
Since the move to the eHive-based xref pipeline, all RFAM xrefs are missing descriptions and their label is identical to the accession when it should not be.
The proposed fix ensures both label and description are stored in the database and can be displayed on the browser.
This was reported by a user who compared the same gene across two zebrafish assemblies, see
http://apr2018.archive.ensembl.org/Danio_rerio/Transcript/Summary?db=core;g=ENSDARG00000082665;r=25:8247670-8247787;t=ENSDART00000116947
versus
http://dec2017.archive.ensembl.org/Danio_rerio/Transcript/Summary?db=core;g=ENSDARG00000082665;r=25:8247670-8247787;t=ENSDART00000116947

Benefits

If applicable, describe the advantages the changes will have.
Useful information from RFAM is correctly stored and displayed to our users.

Possible Drawbacks

If applicable, describe any possible undesirable consequence of the changes.
The proposed fix is not the prettiest and results in storing multiple lines in memory. However, the RFAM file is small so this should not have a major impact on memory requirements.

Testing

Have you added/modified unit tests to test the changes?
NA, the current xref pipeline does not have unit tests.
The pipeline was run with and without the fix though to ensure the data is captured correctly.

If so, do the tests pass/fail?
NA

Have you run the entire test suite and no regression was detected?
NA

magaliruffier · 2019-05-14T10:16:40Z

Travis failure is on sliceVariation.t and strainSlice.t, both seem to be related to changes in the variation schema
This will require patching the test databases but should not be related to changes in this PR

coveralls · 2019-05-14T10:25:34Z

Coverage remained the same at 81.455% when pulling 7080718 on bugfix/rfam_parser into b687ee4 on master.

mkszuba

Could you possibly rebase this against master? It doesn't look like there will be any conflicts involved and it should make Travis builds start passing.

As for the code itself, I reckon this will do until we have brought the xref-sprint RFAMParser into master.

magaliruffier · 2019-05-14T11:12:06Z

You're right, sorry about that. I branched off master but was not on the latest version.
It looks like there might be some mismatching commits (did we remove some commits from master), so I will create a new branch

ENSCORESW-3147: correctly capture all required fields from file

7080718

mkszuba self-requested a review May 14, 2019 11:02

mkszuba suggested changes May 14, 2019

View reviewed changes

magaliruffier closed this May 14, 2019

magaliruffier deleted the bugfix/rfam_parser branch June 10, 2020 11:19

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ENSCORESW-3147: correctly capture all required fields from file #384

ENSCORESW-3147: correctly capture all required fields from file #384

magaliruffier commented May 14, 2019

magaliruffier commented May 14, 2019

coveralls commented May 14, 2019

mkszuba left a comment

magaliruffier commented May 14, 2019

ENSCORESW-3147: correctly capture all required fields from file #384

ENSCORESW-3147: correctly capture all required fields from file #384

Conversation

magaliruffier commented May 14, 2019

Requirements

Description

Use case

Benefits

Possible Drawbacks

Testing

magaliruffier commented May 14, 2019

coveralls commented May 14, 2019

mkszuba left a comment

Choose a reason for hiding this comment

magaliruffier commented May 14, 2019