Add datacheck CactusMetadataConsistency #560
Merged
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Description of the problem
Issues have arisen in recent releases with Cactus genomic alignment metadata consistency: in one case a retired
GenomeDB
was present in the HAL mapping which links a Compara database to its configured Cactus alignments, and in another case a species in the HAL mapping was not present in the corresponding species tree. Tickets have been created proposing to develop a datatcheck for each of these issues (ENSCOMPARASW-5088 and ENSCOMPARASW-6834, respectively).The datacheck which would be added by this PR tests both cases.
Scope of the pull request
This PR adds a Compara datacheck (
CactusMetadataConsistency
) which tests that for each Cactus MLSS in a Compara database:genome_db_id
in the HAL mapping is present in the database, is current, and can be found in the species tree.WARNING: like #557, this PR moves the CompareVariationRows datacheck index entry so that it is placed alphabetically.
Testing
The datacheck was tested successfully on 3 example Metazoa Compara databases. For further information on testing, please see ENSCOMPARASW-6834.