Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Backport changes to the xref-parser test framework from ensembl-xref #405

Merged
merged 11 commits into from
Aug 27, 2019

Conversation

mkszuba
Copy link
Contributor

@mkszuba mkszuba commented Aug 8, 2019

Description

Backport changes to the xref unit-test framework from ensembl-xref to the main repository.

Use case

See ENSCORESW-3234. Although the first iteration of the test framework developed by @nerdstrike for the purpose of actually letting us unit-test xref parsers was committed to ensembl, it has since been modified in ensembl-xref.

Benefits

We will continue to be able to run xref-parser unit tests.

Possible Drawbacks

We will use even more Travis CPU time.

Testing

Have you added/modified unit tests to test the changes?

Yes, the two unit-test files already present in misc-scripts/xref_mapping have been updated accordingly.

If so, do the tests pass/fail?

They pass.

Have you run the entire test suite and no regression was detected?

Yes, everything passes. For the record, at least on my system running the main test suite automatically picks up tests from the xref directory.

Marek Szuba added 9 commits August 8, 2019 13:32
The xref schema does not play nicely with DBIC so we use manually
corrected modules instead of generating them on the fly.
Works fine on MySQL 5.6 (i.e. mysql-ens-core-prod-1 as of a moment ago).
On MariaDB 10.2 deploying the test-DB schema fails with

Specified key was too long; max key length is 1000 bytes [for Statement "CREATE TABLE xref ...

which is likely related to the use of Unicode in strings and will be
debugged later.
Just the bare minimum of changes required for all tests to pass:
 - source.status is deprecated and has already been removed from the
   DBIC schema (even though it remains in SQL schema files, for backward
   compatibility);
 - source.priority_description must not be longer than 40 bytes;
 - search for the data file with respect to the location of the test
   script, not the current directory.
We haven't got any test databases yet but since some of the parsers
require them to be present, let's get ready for them.
That way the test suite works without adding both
.../misc-scripts/xref_mapping and .../misc-scripts/xref_mapping/t to
PERL5LIB.
@mkszuba mkszuba changed the title Backport/test framework Backport changes to the xref-parser test framework from ensembl-xref Aug 8, 2019
@coveralls
Copy link

Coverage Status

Coverage remained the same at 81.543% when pulling 842e39b on backport/test_framework into fa052f4 on feature/xref_sprint.

Copy link
Contributor

@nerdstrike nerdstrike left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see that linting won over being able to regenerate the schema from a modified DB. That's fine I, but it might increase future "fun".

Are you certain you want this in the core repo? I had expected the xref pipeline to live apart from the core ORM, since it does not use core much at all. The only point of contact is in the upload at the end, and I don't think it uses the API for that. It also adds additional testing dependencies to the most basic component of Ensembl. In the past that would have been a no-no.

@mkszuba
Copy link
Contributor Author

mkszuba commented Aug 12, 2019

I see that linting won over being able to regenerate the schema from a modified DB. That's fine I, but it might increase future "fun".

Could we actually regenerate the schema? IMHO it would be good to be able to do so but I vaguely remember that you had some difficulties making that work due to, ahem, certain peculiarities of the xref schema.

Are you certain you want this in the core repo? I had expected the xref pipeline to live apart from the core ORM, since it does not use core much at all.

We do want to ultimately move it out, yes. That said, I think it will be safer to do this in stages: achieve working state in the core repo, archive ensembl-xref, then create a new repository for this code and update the pipeline in ensembl-production accordingly.

Marek Szuba added 2 commits August 12, 2019 14:01
Defaults to 0 because as of early August 2019, the xref schema is not
compatible with the default Unicode encoding of MySQL and friends
(utf8mb3) - there is an index in the table 'xref' which exceeds the maximum
MyISAM key length of 1000 bytes in the event of string characters being
longer than 2 bytes each. One could in principle work around this by
using ucs2 encoding, then again seeing as the xref schema does not
officially support Unicode anyway (the file table.sql explicitly sets
collation to Latin1 on all tables), why bother.
Turns out merely setting mysql_enable_utf8 to 0 does not help if the
database server defaults to utf8, one needs to adjust the default
collation for the database. In theory we could alternatively adjust the
character set, that said it is somewhat safer to adjust collation
because that way we can make sure to use latin1_swedish_ci (which is the
collation declared for all tables in the xref schema file) instead of
relying on latin1_swedish_ci being the default Latin1 collation.

Note: EBI MySQL servers (MySQL 5.6 as of yesterday) seem to force the
character set and collation to latin1 and latin1_swedish_ci,
respectively, regardless of whether mysql_enable_utf8 is enabled or not,
- but let us not unnecessarily confuse the Perl client.

Resolves ENSCORESW-3236.
Copy link
Contributor

@tgrego tgrego left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Having the testing framework in the xref_sprint branch is very welcome, even if the DBIxClass stuff can't be generated... that was unfortunately expected unless a redesign of the xref database happens.
I'm ok with this, and now to bring the parsers back to shape.

@mkszuba mkszuba merged commit 948a067 into feature/xref_sprint Aug 27, 2019
@mkszuba mkszuba deleted the backport/test_framework branch August 27, 2019 13:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants