Backport changes to the xref-parser test framework from ensembl-xref #405

mkszuba · 2019-08-08T14:39:24Z

Description

Backport changes to the xref unit-test framework from ensembl-xref to the main repository.

Use case

See ENSCORESW-3234. Although the first iteration of the test framework developed by @nerdstrike for the purpose of actually letting us unit-test xref parsers was committed to ensembl, it has since been modified in ensembl-xref.

Benefits

We will continue to be able to run xref-parser unit tests.

Possible Drawbacks

We will use even more Travis CPU time.

Testing

Have you added/modified unit tests to test the changes?

Yes, the two unit-test files already present in misc-scripts/xref_mapping have been updated accordingly.

If so, do the tests pass/fail?

They pass.

Have you run the entire test suite and no regression was detected?

Yes, everything passes. For the record, at least on my system running the main test suite automatically picks up tests from the xref directory.

The xref schema does not play nicely with DBIC so we use manually corrected modules instead of generating them on the fly.

Works fine on MySQL 5.6 (i.e. mysql-ens-core-prod-1 as of a moment ago). On MariaDB 10.2 deploying the test-DB schema fails with Specified key was too long; max key length is 1000 bytes [for Statement "CREATE TABLE xref ... which is likely related to the use of Unicode in strings and will be debugged later.

Just the bare minimum of changes required for all tests to pass: - source.status is deprecated and has already been removed from the DBIC schema (even though it remains in SQL schema files, for backward compatibility); - source.priority_description must not be longer than 40 bytes; - search for the data file with respect to the location of the test script, not the current directory.

We haven't got any test databases yet but since some of the parsers require them to be present, let's get ready for them.

That way the test suite works without adding both .../misc-scripts/xref_mapping and .../misc-scripts/xref_mapping/t to PERL5LIB.

coveralls · 2019-08-08T14:57:53Z

Coverage remained the same at 81.543% when pulling 842e39b on backport/test_framework into fa052f4 on feature/xref_sprint.

nerdstrike

I see that linting won over being able to regenerate the schema from a modified DB. That's fine I, but it might increase future "fun".

Are you certain you want this in the core repo? I had expected the xref pipeline to live apart from the core ORM, since it does not use core much at all. The only point of contact is in the upload at the end, and I don't think it uses the API for that. It also adds additional testing dependencies to the most basic component of Ensembl. In the past that would have been a no-no.

mkszuba · 2019-08-12T10:40:34Z

I see that linting won over being able to regenerate the schema from a modified DB. That's fine I, but it might increase future "fun".

Could we actually regenerate the schema? IMHO it would be good to be able to do so but I vaguely remember that you had some difficulties making that work due to, ahem, certain peculiarities of the xref schema.

Are you certain you want this in the core repo? I had expected the xref pipeline to live apart from the core ORM, since it does not use core much at all.

We do want to ultimately move it out, yes. That said, I think it will be safer to do this in stages: achieve working state in the core repo, archive ensembl-xref, then create a new repository for this code and update the pipeline in ensembl-production accordingly.

Defaults to 0 because as of early August 2019, the xref schema is not compatible with the default Unicode encoding of MySQL and friends (utf8mb3) - there is an index in the table 'xref' which exceeds the maximum MyISAM key length of 1000 bytes in the event of string characters being longer than 2 bytes each. One could in principle work around this by using ucs2 encoding, then again seeing as the xref schema does not officially support Unicode anyway (the file table.sql explicitly sets collation to Latin1 on all tables), why bother.

Turns out merely setting mysql_enable_utf8 to 0 does not help if the database server defaults to utf8, one needs to adjust the default collation for the database. In theory we could alternatively adjust the character set, that said it is somewhat safer to adjust collation because that way we can make sure to use latin1_swedish_ci (which is the collation declared for all tables in the xref schema file) instead of relying on latin1_swedish_ci being the default Latin1 collation. Note: EBI MySQL servers (MySQL 5.6 as of yesterday) seem to force the character set and collation to latin1 and latin1_swedish_ci, respectively, regardless of whether mysql_enable_utf8 is enabled or not, - but let us not unnecessarily confuse the Perl client. Resolves ENSCORESW-3236.

tgrego

Having the testing framework in the xref_sprint branch is very welcome, even if the DBIxClass stuff can't be generated... that was unfortunately expected unless a redesign of the xref database happens.
I'm ok with this, and now to bring the parsers back to shape.

Marek Szuba added 9 commits August 8, 2019 13:32

xref test suite: rename test_data to test-data

fb1c1c5

xref test suite: rename rgd_parse.t to rgd.t

9215b64

xref test suite: remove dbicconf.conf

814f116

The xref schema does not play nicely with DBIC so we use manually corrected modules instead of generating them on the fly.

xref test suite: cosmetic/whitespace changes to testdb.conf.example

2c326d0

xref test suite: backport schema modules

3e37ceb

xref test suite: add MultiTestDB.conf.default

560458e

We haven't got any test databases yet but since some of the parsers require them to be present, let's get ready for them.

xref test suite: move the test-DB schema one directory up

842e39b

That way the test suite works without adding both .../misc-scripts/xref_mapping and .../misc-scripts/xref_mapping/t to PERL5LIB.

mkszuba changed the title ~~Backport/test framework~~ Backport changes to the xref-parser test framework from ensembl-xref Aug 8, 2019

mkszuba requested review from nerdstrike, s-mm and tgrego August 9, 2019 12:34

nerdstrike reviewed Aug 12, 2019

View reviewed changes

Marek Szuba added 2 commits August 12, 2019 14:01

tgrego approved these changes Aug 21, 2019

View reviewed changes

s-mm approved these changes Aug 27, 2019

View reviewed changes

mkszuba merged commit 948a067 into feature/xref_sprint Aug 27, 2019

mkszuba deleted the backport/test_framework branch August 27, 2019 13:06

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Backport changes to the xref-parser test framework from ensembl-xref #405

Backport changes to the xref-parser test framework from ensembl-xref #405

mkszuba commented Aug 8, 2019

coveralls commented Aug 8, 2019

nerdstrike left a comment

mkszuba commented Aug 12, 2019 •

edited

Loading

tgrego left a comment

Backport changes to the xref-parser test framework from ensembl-xref #405

Backport changes to the xref-parser test framework from ensembl-xref #405

Conversation

mkszuba commented Aug 8, 2019

Description

Use case

Benefits

Possible Drawbacks

Testing

coveralls commented Aug 8, 2019

nerdstrike left a comment

Choose a reason for hiding this comment

mkszuba commented Aug 12, 2019 • edited Loading

tgrego left a comment

Choose a reason for hiding this comment

mkszuba commented Aug 12, 2019 •

edited

Loading