-
Notifications
You must be signed in to change notification settings - Fork 75
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Backport changes to the xref-parser test framework from ensembl-xref #405
Conversation
The xref schema does not play nicely with DBIC so we use manually corrected modules instead of generating them on the fly.
Works fine on MySQL 5.6 (i.e. mysql-ens-core-prod-1 as of a moment ago). On MariaDB 10.2 deploying the test-DB schema fails with Specified key was too long; max key length is 1000 bytes [for Statement "CREATE TABLE xref ... which is likely related to the use of Unicode in strings and will be debugged later.
Just the bare minimum of changes required for all tests to pass: - source.status is deprecated and has already been removed from the DBIC schema (even though it remains in SQL schema files, for backward compatibility); - source.priority_description must not be longer than 40 bytes; - search for the data file with respect to the location of the test script, not the current directory.
We haven't got any test databases yet but since some of the parsers require them to be present, let's get ready for them.
That way the test suite works without adding both .../misc-scripts/xref_mapping and .../misc-scripts/xref_mapping/t to PERL5LIB.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see that linting won over being able to regenerate the schema from a modified DB. That's fine I, but it might increase future "fun".
Are you certain you want this in the core repo? I had expected the xref pipeline to live apart from the core ORM, since it does not use core much at all. The only point of contact is in the upload at the end, and I don't think it uses the API for that. It also adds additional testing dependencies to the most basic component of Ensembl. In the past that would have been a no-no.
Could we actually regenerate the schema? IMHO it would be good to be able to do so but I vaguely remember that you had some difficulties making that work due to, ahem, certain peculiarities of the xref schema.
We do want to ultimately move it out, yes. That said, I think it will be safer to do this in stages: achieve working state in the core repo, archive ensembl-xref, then create a new repository for this code and update the pipeline in ensembl-production accordingly. |
Defaults to 0 because as of early August 2019, the xref schema is not compatible with the default Unicode encoding of MySQL and friends (utf8mb3) - there is an index in the table 'xref' which exceeds the maximum MyISAM key length of 1000 bytes in the event of string characters being longer than 2 bytes each. One could in principle work around this by using ucs2 encoding, then again seeing as the xref schema does not officially support Unicode anyway (the file table.sql explicitly sets collation to Latin1 on all tables), why bother.
Turns out merely setting mysql_enable_utf8 to 0 does not help if the database server defaults to utf8, one needs to adjust the default collation for the database. In theory we could alternatively adjust the character set, that said it is somewhat safer to adjust collation because that way we can make sure to use latin1_swedish_ci (which is the collation declared for all tables in the xref schema file) instead of relying on latin1_swedish_ci being the default Latin1 collation. Note: EBI MySQL servers (MySQL 5.6 as of yesterday) seem to force the character set and collation to latin1 and latin1_swedish_ci, respectively, regardless of whether mysql_enable_utf8 is enabled or not, - but let us not unnecessarily confuse the Perl client. Resolves ENSCORESW-3236.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Having the testing framework in the xref_sprint branch is very welcome, even if the DBIxClass stuff can't be generated... that was unfortunately expected unless a redesign of the xref database happens.
I'm ok with this, and now to bring the parsers back to shape.
Description
Backport changes to the xref unit-test framework from ensembl-xref to the main repository.
Use case
See ENSCORESW-3234. Although the first iteration of the test framework developed by @nerdstrike for the purpose of actually letting us unit-test xref parsers was committed to ensembl, it has since been modified in ensembl-xref.
Benefits
We will continue to be able to run xref-parser unit tests.
Possible Drawbacks
We will use even more Travis CPU time.
Testing
Have you added/modified unit tests to test the changes?
Yes, the two unit-test files already present in misc-scripts/xref_mapping have been updated accordingly.
If so, do the tests pass/fail?
They pass.
Have you run the entire test suite and no regression was detected?
Yes, everything passes. For the record, at least on my system running the main test suite automatically picks up tests from the xref directory.