Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
190 commits
Select commit Hold shift + click to select a range
f454529
Rename EPO_LOW_COVERAGE->EPO_EXTENDED in compara datacheck modules
CristiGuijarro Mar 20, 2020
e2d9c87
Don't need to mention our internal identifier "compara_prev" as we ar…
muffato Mar 9, 2020
c1a4db1
Use types like in skip_tests
muffato Mar 20, 2020
d87852b
We have decided to use type rather than class in the DCs
muffato Mar 20, 2020
2c69577
More straightforward filter
muffato Mar 20, 2020
a03070b
Updated logic to use current version of the metadata. For example, te…
Mar 20, 2020
5cde061
Merge pull request #215 from Ensembl/biomart_dc_fix
thomasmaurel Mar 20, 2020
1ac9745
Fix for sort - can't rely on an alphabetic sort now that we've hit re…
Mar 20, 2020
dc84d1c
datacheck for xref and prediction_transcript tables
Mar 20, 2020
cb327b6
datacheck added for display_xref_id
Mar 23, 2020
df2a0fb
removed perdb flag for Displayref to run per species
Mar 23, 2020
1ce97a8
Merge pull request #217 from Ensembl/bugfix/schema_patch_sort
james-monkeyshines Mar 23, 2020
6a797e9
Add division condition to metadata query for retrieving the name of t…
Mar 25, 2020
9b3f9f6
Update metadata test db to include newly-referenced table.
Mar 26, 2020
7c7dfc0
More robust sorting method for schema patches.
Mar 26, 2020
6e4f767
Merge pull request #219 from Ensembl/bugfix/schema_patch_sort
thomasmaurel Mar 26, 2020
eafb923
No "Use of uninitialized value" when there are no such homologies in …
muffato Mar 26, 2020
60b63f9
No need to check unreleased entries
muffato Mar 26, 2020
e9d0125
Need tabs, not spaces, in table source.
Mar 27, 2020
6067f70
Merge pull request #220 from Ensembl/bugfix/shared_species
james-monkeyshines Mar 27, 2020
d6ea3a6
Merge pull request #213 from CristiGuijarro/feature/epo_rename
james-monkeyshines Mar 27, 2020
93eff7d
Merge pull request #214 from muffato/release/101
james-monkeyshines Mar 27, 2020
e1e9fd6
required changes done based on PR comments
Mar 27, 2020
68c1ae7
Revert "Also check when the assembly length is too long"
james-monkeyshines Apr 1, 2020
45192b2
Merge pull request #221 from Ensembl/revert-193-assembly_longer
james-monkeyshines Apr 1, 2020
519fb0d
Catch run-time errors and present them as test failures, so that a) p…
Apr 1, 2020
91e963c
Minor fix to grab all diagnostics, was inadvertently missing ones wit…
Apr 1, 2020
20f8b34
Removed code that threw an error if DNA db was missing - now reported…
Apr 1, 2020
bf2eab5
Merge pull request #222 from Ensembl/bugfix/catch_errors
james-monkeyshines Apr 1, 2020
3d94082
Re-jig parsing, was not handling collection databases properly - scri…
Apr 6, 2020
ae69e3c
SQL was using incorrect key, this is a transcript-level test, not gen…
Apr 6, 2020
88ea8ef
Since mart-related datachecks are not applicable to collection dbs, c…
Apr 6, 2020
3839c9f
Test not appropriate for GRCh37, so skip it.
Apr 6, 2020
0fc5e8f
Adding exceptions for consistency between provider and biomart meta_k…
Apr 6, 2020
1235690
Add explicit test for existence of taxonomy db. Handle strains in tax…
Apr 6, 2020
1587f3b
Cannot rely on the uniqueness of stable_ids in the SQL, doesn't hold …
Apr 6, 2020
53138a6
Prevent spurious failures for genes which span the origin on circular…
Apr 6, 2020
36d3f74
Make messages/parameter names applicable to all core-like dbs.
Apr 6, 2020
e9b3243
Rather than throwing an error on a failure to parse foreign key relat…
Apr 6, 2020
565c997
Rather than throwing an error if the data_files_path parameter is mis…
Apr 6, 2020
a63c169
Remove 'fail' method, replace functionality by accumulating violation…
Apr 7, 2020
b32d9f6
Prevent Perl from throwing an error if a core db cannot be found, by …
Apr 7, 2020
951414f
Prevent Perl from throwing an error if a core db cannot be found, by …
Apr 7, 2020
7fcb6f7
Prevent Perl from throwing an error if a core db cannot be found, by …
Apr 7, 2020
eeee89d
Cannot retrieve division for non-core schemas, and linking via the dn…
Apr 7, 2020
b1bda63
Prevent Perl from throwing an error if a core db cannot be found, by …
Apr 7, 2020
59ffc4d
Don't need to have a 'fail' method; execution will never get this far…
Apr 7, 2020
314353e
Don't need to have a 'fail' method; execution will never get this far…
Apr 7, 2020
e66a0e7
The module and datacheck name need to be the same as the file name. A…
Apr 8, 2020
ee90fb4
Existence of gene names is advisory, but non-numeric EntrezGenes are …
Apr 8, 2020
943a3eb
Removed start anchor on non-printing character regex, and used that f…
Apr 8, 2020
184722c
Merge pull request #216 from Ensembl/datachecks_for_Xref
james-monkeyshines Apr 8, 2020
023f570
added datachecks: GeneDescription-XrefCigarLines-XrefVersion
Mar 23, 2020
05b030c
Datacheck HGNCType added
Mar 23, 2020
bfeca3b
foreach loop changed to while
Mar 23, 2020
406596a
Added new datacheck HGNCNumeric
Mar 23, 2020
62faa30
corrected pass fail function interchanged
Mar 23, 2020
9e04881
Merge pull request #223 from Ensembl/bugfix/assorted_fixes
vinay-ebi Apr 8, 2020
34f14f3
IdentityXrefCigarLines merged into XrefFormat.pm
Mar 30, 2020
a9d08e2
XrefVersion Removed
Mar 30, 2020
8608798
threshold stuff from HC removed and test made simple by checking two…
Mar 30, 2020
41104a8
index file updated
Mar 30, 2020
1f8bac0
LIKE BINARY added to avoid case sensitive on selecting Uniport
Mar 31, 2020
22093bc
perdb mode remove for HGNCTypes
Apr 2, 2020
15cbf34
perdb mode remove for HGNCTypes
Apr 2, 2020
dd81851
perdb mode remove for HGNCTypes
Apr 2, 2020
d79449b
Added new datacheck HGNCMultipleGene
Apr 2, 2020
825a736
Merge DescriptionNewlines into GeneDescription, both doing similar te…
Apr 8, 2020
80c3682
Flipping logic of check, to make it more intuitive. Updated set of ta…
Apr 8, 2020
5bc1ba3
Adding filter for species, and removing 'per_db' metadata flag - the …
Apr 8, 2020
a0b4111
Doing this test with SQL leads to a really complex query - can achiev…
Apr 8, 2020
79a8775
Merge pull request #218 from Ensembl/healthchecks_to_datachecks
james-monkeyshines Apr 8, 2020
e8dba99
Merge pull request #224 from Ensembl/bugfix/compara
james-monkeyshines Apr 8, 2020
c195464
Prevent Perl from throwing an error if a core db cannot be found, by …
Apr 7, 2020
0846881
Test whether we have a core database before proceeding, to prevent th…
Apr 8, 2020
2d10815
Adding comment to describe unintuitive behaviour.
Apr 9, 2020
1d79716
Merge pull request #225 from Ensembl/bugfix/variation
james-monkeyshines Apr 9, 2020
1b538ff
Switch to using new mitochondrial attribute (no easy way to do it via…
Apr 9, 2020
0255153
Add new test, if it looks like a mitochondrial chromosome, it should …
Apr 9, 2020
15a7af3
Another fix for parsing collection dbs, was picking up the result at …
Apr 9, 2020
5341539
Merge pull request #226 from Ensembl/bugfix/mt_chr_updates
james-monkeyshines Apr 9, 2020
26925db
This foreign key is not always honoured in the master database
muffato Apr 3, 2020
d4eb820
build the string with the correct count
muffato Apr 3, 2020
1e6f5c1
There are syntenies too
muffato Apr 3, 2020
7f3aefc
Trick to make the test pass
muffato Apr 3, 2020
6d0f27a
Not all species-sets have a name
muffato Apr 3, 2020
568e88c
not needed
muffato Apr 3, 2020
e33c537
the rule only applies to current MLSSs
muffato Apr 3, 2020
c7b3451
This test is redundant with the one below (which states that the firs…
muffato Apr 3, 2020
63be9a7
Include composite names too
muffato Apr 3, 2020
1bd74f9
The convention only applies to MLSSs that have been released and are …
muffato Apr 3, 2020
b34c543
The species_set_tag table is not used
muffato Apr 3, 2020
8ccc3b6
bugfix: GROUP_CONCAT is limited to 1024 characters
muffato Apr 3, 2020
475a783
Expect all species to have some overlap since we now do bidirectional…
muffato Apr 4, 2020
386ada6
Instead of comparing to the species-set size, simply make sure there …
muffato Apr 4, 2020
7a57aca
Gracefully handle missing tags
muffato Apr 4, 2020
7b70a49
improvement: query the division name just once
muffato Apr 9, 2020
2f3d367
Plants also have CAFE trees
muffato Apr 9, 2020
3119fb6
Plants only have protein-trees, so can't expect two rows.
muffato Apr 9, 2020
ece49f5
Don't raise an exception if data are missing, and let ok() report the…
muffato Apr 9, 2020
da5c5d3
Improved the descriptions
muffato Apr 9, 2020
ed73bd6
This can be a regular JOIN
muffato Apr 9, 2020
213af37
No HighConfidene data in Fungi
muffato Apr 9, 2020
55dd00a
Make the timestamp in the 'output' result parameter reflect the finis…
Apr 12, 2020
0c8b27f
Re-do generation of output_dir name, automated submissions could end …
Apr 13, 2020
d3d15fb
Use is_rows_zero like in CheckLastZCoverage to directly report the "b…
muffato Apr 14, 2020
83715a8
Reuse the $dbc variable
muffato Apr 14, 2020
d2342f0
Merge pull request #228 from Ensembl/bugfix/end_timestamp
luca-drf Apr 14, 2020
73a3041
When running the XrefPrefixes on vertebrates post xrefs, we discovere…
Apr 16, 2020
b697fc2
Remove NULL values from datacheck
dglemos Apr 16, 2020
36d4813
Merge pull request #229 from Ensembl/xref_prefix_fix
vinay-ebi Apr 17, 2020
b2b89be
Check for empty strings in set columns
helensch Apr 21, 2020
f02b373
Merge pull request #230 from dglemos/variation/duplicated_null
james-monkeyshines Apr 21, 2020
b7b640f
Revert "When running the XrefPrefixes on vertebrates post xrefs, we d…
james-monkeyshines Apr 21, 2020
0a3c405
Skip test when no check specified for species
helensch Apr 21, 2020
efb862b
Skip test if no individual records
helensch Apr 21, 2020
d573f3d
Merge pull request #232 from Ensembl/revert-229-xref_prefix_fix
james-monkeyshines Apr 21, 2020
f491649
Update species name for dog
helensch Apr 21, 2020
9680622
Merge pull request #231 from helensch/feature/empty-sets
james-monkeyshines Apr 21, 2020
29cd6df
No further checks after a species checked
helensch Apr 21, 2020
f5c4132
Update species name for dog
helensch Apr 21, 2020
246dbce
Merge pull request #233 from helensch/fix/indiv-type
james-monkeyshines Apr 22, 2020
aad0d90
Skip the test if there were no scores in the previous database
muffato Apr 22, 2020
d1d00e0
More rules to avoid intempestive complains
muffato Apr 22, 2020
6a49492
This one can be part of the compara group
muffato Apr 3, 2020
38a945d
Created the compara_master group
muffato Apr 3, 2020
2d660f5
New set of Compara groups
muffato Apr 9, 2020
7881b53
These two can fail if the species-set has not increased and should no…
muffato Apr 9, 2020
2d4f247
This one does not pass for Fungi, Protists and Metazoa
muffato Apr 9, 2020
b970528
Skip data_file existence check if not relevant
Apr 22, 2020
312ea87
Typos
muffato Apr 24, 2020
9d743d5
Merge pull request #234 from Ensembl/bug_fix/skip_data_files
james-monkeyshines Apr 28, 2020
bf129dd
Make the ForeignKeysMultiDB datacheck tractable, by skipping motif_fe…
Apr 28, 2020
ca06198
Merge pull request #235 from Ensembl/bugfix/fk_multi_db
james-monkeyshines Apr 29, 2020
3a7a461
Update Test to accept Viruses
marcoooo Apr 30, 2020
8c063fb
Merge pull request #236 from Ensembl/hotfixes/viruses-division
marcoooo Apr 30, 2020
8adcb47
Update VersionedGenes.pm
marcoooo May 1, 2020
8c1788b
Update VersionedGenes.pm
marcoooo May 1, 2020
5276c24
Update VersionedGenes.pm
marcoooo May 1, 2020
37ea003
Force datachecks that rely on multiple databases to always be run, be…
May 4, 2020
bbb296c
pipeline name added in emailnotify subject
May 5, 2020
8625484
Merge pull request #239 from Ensembl/email_notify
vinay-ebi May 5, 2020
ff866f4
Merge pull request #227 from muffato/release/101
james-monkeyshines May 5, 2020
6d5ade1
Merge pull request #238 from Ensembl/hotfix/gene-servionned-viruses
marcoooo May 5, 2020
f8325e0
Skip test when 0 or 1 phenotype_feature records
helensch May 5, 2020
439e863
Merge pull request #240 from helensch/fix/phefeat-region
james-monkeyshines May 5, 2020
1aff58a
Skip check for EnsemblViruses
helensch May 6, 2020
d47b5c0
Fix typo
helensch May 6, 2020
3ba134a
Merge pull request #241 from helensch/fix/mult-seq-region
marcoooo May 6, 2020
5d5862f
Remove duplicated doi from check
dglemos May 7, 2020
e8cb662
Merge pull request #242 from dglemos/publication/doi
james-monkeyshines May 7, 2020
205a92a
Adding test for the presence of a motif_feature file. Skip datacheck …
May 12, 2020
6fb957a
Merge pull request #244 from Ensembl/bugfix/motif_feature_file
vinay-ebi May 12, 2020
031a317
Add 'tables' list to Denormalized check, to prevent unnecessarily re-…
May 12, 2020
dd54820
Changing MultipleSeqRegions to be an advisory datacheck - there is no…
May 12, 2020
8c86572
On reflection, changed logic of previous commit. If we only have a si…
May 12, 2020
63e643e
Merge pull request #245 from Ensembl/feature/force_multi
vinay-ebi May 12, 2020
c6da44d
Merge pull request #246 from Ensembl/bugfix/variation_datachecks
james-monkeyshines May 13, 2020
8101ccb
Don't need to set adaptor for ontology or production dbs, the registr…
May 13, 2020
316c55e
Determine old dbnames for ontology and compara dbs, by regexing the r…
May 13, 2020
74b403f
Removing skip block - it wouldn't work any more, because DbCheck will…
May 13, 2020
8023b42
plants species-tree is now populated
JAlvarezJarreta May 15, 2020
4953a56
Merge pull request #248 from JAlvarezJarreta/release/101
james-monkeyshines May 15, 2020
f96c296
Merge pull request #247 from Ensembl/bugfix/old_ontology_compara
james-monkeyshines May 18, 2020
d8fe8a9
New test for GO xrefs to only be present on transcripts.
May 18, 2020
77f5e8f
Making SQL for GO Evidence datacheck species-specific.
May 18, 2020
7d5987c
Merge pull request #249 from Ensembl/bugfix/go_xrefs
james-monkeyshines May 18, 2020
a744613
Make GO xref comparison species-specific, for collection dbs.
May 22, 2020
53cf5b4
Collate diagnostics, so that they can be displayed after the test, ra…
May 22, 2020
e82bd4d
Do not try to parse TAP output if diagnostics are before the test res…
May 22, 2020
9f9e559
Test is deprecated - it is in the nature of gene name projection to h…
May 22, 2020
57b8602
MLSS tag-related FK checks
Mar 9, 2020
93fbed1
MLSS tag-related checks
Mar 9, 2020
10b4a2c
New module for compara-specific methods, to prevent redundant code ac…
Mar 9, 2020
965c13c
For non-vert compara dbs, check whether GO and InterPro terms are loa…
Mar 19, 2020
382080a
Adding checks for appropriate MLSS tags for GERP and alignments. Incl…
May 5, 2020
85481a5
New foreign key check, to ensure that all of the seq_regions in the '…
May 5, 2020
b9fa65d
Pair of datachecks for checking consistency with core databases.
May 12, 2020
42398ed
Update EPO alignment name
james-monkeyshines May 28, 2020
fb1a702
Merge pull request #250 from Ensembl/bugfix/xref_fixes
james-monkeyshines May 28, 2020
cd12838
Merge pull request #251 from Ensembl/compara_datachecks
james-monkeyshines May 29, 2020
3482a93
Adding group for compara ancestral databases.
May 29, 2020
be7018f
Cannot expect ancestral database to have all of the required meta_key…
Jun 2, 2020
e8d90ce
Need to handle ancestral db name the same as compara dbs.
Jun 2, 2020
0e1aa18
Parse ancestral db names in order to pass the correct parameters to t…
Jun 2, 2020
07e8ea8
Merge pull request #253 from Ensembl/feature/ancestral_group
james-monkeyshines Jun 3, 2020
49cb630
"ancestral_sequences" is what the Compara API uses to link GenomeDBs …
muffato Jun 10, 2020
b7feaac
Merge pull request #258 from Ensembl/feature/ancestral_sequences_name
james-monkeyshines Jun 11, 2020
79d7ddf
Merge branch 'master' into release/101
james-monkeyshines Jun 12, 2020
43cd4cb
Update index.json after resolving conflicts
Jun 12, 2020
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 5 additions & 1 deletion lib/Bio/EnsEMBL/DataCheck/BaseCheck.pm
Original file line number Diff line number Diff line change
Expand Up @@ -131,7 +131,11 @@ sub skip_datacheck {
sub run_datacheck {
# Method can be overridden by a subclass, if required.
my $self = shift;
$self->tests(@_);
eval { $self->tests(@_) };
if ($@) {
fail("Datacheck ran without errors");
diag($@);
}
}

sub skip_tests {
Expand Down
2 changes: 1 addition & 1 deletion lib/Bio/EnsEMBL/DataCheck/Checks/AlignmentCoordinates.pm
Original file line number Diff line number Diff line change
Expand Up @@ -31,7 +31,7 @@ use constant {
NAME => 'AlignmentCoordinates',
DESCRIPTION => 'Alignment coordinates are within the length of their dnafrag',
DATACHECK_TYPE => 'critical',
GROUPS => ['compara', 'compara_pairwise_alignments', 'compara_multiple_alignments'],
GROUPS => ['compara', 'compara_genome_alignments'],
DB_TYPES => ['compara'],
TABLES => ['dnafrag', 'genomic_align']
};
Expand Down
2 changes: 1 addition & 1 deletion lib/Bio/EnsEMBL/DataCheck/Checks/AnalysisFormat.pm
Original file line number Diff line number Diff line change
Expand Up @@ -30,7 +30,7 @@ extends 'Bio::EnsEMBL::DataCheck::DbCheck';
use constant {
NAME => 'AnalysisFormat',
DESCRIPTION => 'Analysis logic name and date are formatted correctly',
GROUPS => ['core', 'brc4_core', 'corelike'],
GROUPS => ['ancestral', 'brc4_core', 'core', 'corelike'],
DB_TYPES => ['cdna', 'core', 'otherfeatures', 'rnaseq'],
TABLES => ['analysis'],
PER_DB => 1
Expand Down
2 changes: 1 addition & 1 deletion lib/Bio/EnsEMBL/DataCheck/Checks/AssemblyExceptions.pm
Original file line number Diff line number Diff line change
Expand Up @@ -31,7 +31,7 @@ extends 'Bio::EnsEMBL::DataCheck::DbCheck';
use constant {
NAME => 'AssemblyExceptions',
DESCRIPTION => 'Assembly exceptions are correctly configured',
GROUPS => ['assembly', 'core', 'brc4_core'],
GROUPS => ['ancestral', 'assembly', 'brc4_core', 'core'],
DB_TYPES => ['core'],
TABLES => ['analysis', 'assembly_exception', 'dna_align_feature',
'external_db', 'seq_region',],
Expand Down
12 changes: 4 additions & 8 deletions lib/Bio/EnsEMBL/DataCheck/Checks/AssemblySeqregion.pm
Original file line number Diff line number Diff line change
Expand Up @@ -31,7 +31,7 @@ extends 'Bio::EnsEMBL::DataCheck::DbCheck';
use constant {
NAME => 'AssemblySeqregion',
DESCRIPTION => 'Assembly and seq_region tables are consistent',
GROUPS => ['assembly', 'core', 'brc4_core'],
GROUPS => ['ancestral', 'assembly', 'brc4_core', 'core'],
DB_TYPES => ['core'],
TABLES => ['assembly', 'coord_system', 'seq_region'],
PER_DB => 1,
Expand Down Expand Up @@ -85,19 +85,15 @@ sub tests {
is_rows_zero($self->dba, $sql_5, $desc_5);

my $desc_6 = 'assembly and seq_region lengths consistent';
my $diag_6 = 'seq_region length != largest asm_end value';
my $diag_6 = 'seq_region length < largest asm_end value';
my $sql_6 = q/
SELECT
sr.name AS seq_region_name,
cs.name AS coord_system_name,
sr.length AS seq_length,
MAX(a.asm_end) AS max_asm_end
SELECT sr.name AS seq_region_name, sr.length, cs.name AS coord_system_name
FROM
seq_region sr INNER JOIN
coord_system cs ON sr.coord_system_id = cs.coord_system_id INNER JOIN
assembly a ON a.asm_seq_region_id = sr.seq_region_id
GROUP BY a.asm_seq_region_id
HAVING sr.length != MAX(a.asm_end)
HAVING sr.length < MAX(a.asm_end)
/;
is_rows_zero($self->dba, $sql_6, $desc_6, $diag_6);
}
Expand Down
2 changes: 1 addition & 1 deletion lib/Bio/EnsEMBL/DataCheck/Checks/BlankEnums.pm
Original file line number Diff line number Diff line change
Expand Up @@ -30,7 +30,7 @@ extends 'Bio::EnsEMBL::DataCheck::DbCheck';
use constant {
NAME => 'BlankEnums',
DESCRIPTION => 'Enum columns do not have empty string values',
GROUPS => ['compara', 'core', 'brc4_core', 'corelike', 'funcgen', 'schema', 'variation'],
GROUPS => ['ancestral', 'brc4_core', 'compara', 'compara_gene_trees', 'compara_master', 'compara_syntenies', 'core', 'corelike', 'funcgen', 'schema', 'variation'],
DB_TYPES => ['cdna', 'compara', 'core', 'funcgen', 'otherfeatures', 'rnaseq', 'variation'],
PER_DB => 1
};
Expand Down
2 changes: 1 addition & 1 deletion lib/Bio/EnsEMBL/DataCheck/Checks/BlankNulls.pm
Original file line number Diff line number Diff line change
Expand Up @@ -30,7 +30,7 @@ extends 'Bio::EnsEMBL::DataCheck::DbCheck';
use constant {
NAME => 'BlankNulls',
DESCRIPTION => 'Nullable columns do not have empty string values',
GROUPS => ['compara', 'core', 'brc4_core', 'corelike', 'funcgen', 'schema', 'variation'],
GROUPS => ['ancestral', 'brc4_core', 'compara', 'compara_gene_trees', 'compara_genome_alignments', 'compara_master', 'compara_syntenies', 'core', 'corelike', 'funcgen', 'schema', 'variation'],
DB_TYPES => ['cdna', 'compara', 'core', 'funcgen', 'otherfeatures', 'rnaseq', 'variation'],
PER_DB => 1
};
Expand Down
63 changes: 63 additions & 0 deletions lib/Bio/EnsEMBL/DataCheck/Checks/BlankSets.pm
Original file line number Diff line number Diff line change
@@ -0,0 +1,63 @@
=head1 LICENSE

Copyright [2018-2020] EMBL-European Bioinformatics Institute

Licensed under the Apache License, Version 2.0 (the 'License');
you may not use this file except in compliance with the License.
You may obtain a copy of the License at

http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an 'AS IS' BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.

=cut

package Bio::EnsEMBL::DataCheck::Checks::BlankSets;

use warnings;
use strict;

use Moose;
use Test::More;
use Bio::EnsEMBL::DataCheck::Test::DataCheck;

extends 'Bio::EnsEMBL::DataCheck::DbCheck';

use constant {
NAME => 'BlankSets',
DESCRIPTION => 'Set columns do not have empty string values (unless default)',
GROUPS => ['variation'],
DB_TYPES => ['variation']
};

sub tests {
my ($self) = @_;

my $set_sql = q/
SELECT TABLE_NAME, COLUMN_NAME FROM
INFORMATION_SCHEMA.COLUMNS
WHERE
TABLE_SCHEMA = database() AND
DATA_TYPE = 'set' AND
COLUMN_DEFAULT <> ''
/;

my $sets = $self->dba->dbc->sql_helper->execute(-SQL => $set_sql);

foreach my $set (@$sets) {
my ($table, $column) = @$set;

my $desc = "SET column $table.$column has no empty string values";
my $sql = qq/
SELECT COUNT(*) FROM $table
WHERE $column = ''
/;
is_rows_zero($self->dba, $sql, $desc);
}
}

1;
2 changes: 1 addition & 1 deletion lib/Bio/EnsEMBL/DataCheck/Checks/CheckCAFETable.pm
Original file line number Diff line number Diff line change
Expand Up @@ -30,7 +30,7 @@ extends 'Bio::EnsEMBL::DataCheck::DbCheck';
use constant {
NAME => 'CheckCAFETable',
DESCRIPTION => 'Each row should show a one-to-many relationship',
GROUPS => ['compara', 'compara_protein_trees'],
GROUPS => ['compara', 'compara_gene_trees'],
DATACHECK_TYPE => 'critical',
DB_TYPES => ['compara'],
TABLES => ['cafe_species_gene']
Expand Down
2 changes: 1 addition & 1 deletion lib/Bio/EnsEMBL/DataCheck/Checks/CheckComparaStableIDs.pm
Original file line number Diff line number Diff line change
Expand Up @@ -30,7 +30,7 @@ extends 'Bio::EnsEMBL::DataCheck::DbCheck';
use constant {
NAME => 'CheckComparaStableIDs',
DESCRIPTION => 'gene trees in gene_tree_root and family all have stable_ids generated',
GROUPS => ['compara', 'compara_families', 'compara_protein_trees'],
GROUPS => ['compara', 'compara_gene_trees'],
DATACHECK_TYPE => 'critical',
DB_TYPES => ['compara'],
TABLES => ['family', 'gene_tree_root']
Expand Down
8 changes: 6 additions & 2 deletions lib/Bio/EnsEMBL/DataCheck/Checks/CheckConservationScore.pm
Original file line number Diff line number Diff line change
Expand Up @@ -30,7 +30,7 @@ extends 'Bio::EnsEMBL::DataCheck::DbCheck';
use constant {
NAME => 'CheckConservationScore',
DESCRIPTION => 'The MLSS for GERP_CONSERVATION_SCORE should have conservation score entries',
GROUPS => ['compara', 'compara_pairwise_alignments'],
GROUPS => ['compara', 'compara_genome_alignments'],
DATACHECK_TYPE => 'critical',
DB_TYPES => ['compara'],
TABLES => ['conservation_score', 'genomic_align_block', 'method_link', 'method_link_species_set', 'method_link_species_set_tag']
Expand Down Expand Up @@ -64,7 +64,11 @@ sub tests {
AND method_link_species_set_id = $mlss_id
/;
my $desc_1 = "There is an msa_mlss_id tag for $mlss_name";
my $msa_mlss_id = $helper->execute_single_result( -SQL => $sql_1 );
my $msa_mlss_id = $helper->execute_single_result( -SQL => $sql_1, -NO_ERROR => 1 );
ok($msa_mlss_id, $desc_1);

# Can't test this mlss without an msa_mlss_id
next unless $msa_mlss_id;

my $sql_2 = qq/
SELECT COUNT(*)
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -31,7 +31,7 @@ extends 'Bio::EnsEMBL::DataCheck::DbCheck';
use constant {
NAME => 'CheckConservationScorePerBlock',
DESCRIPTION => 'Multiple alignments with >3 species and >3 sequences must have a conservation score',
GROUPS => ['compara', 'compara_multiple_alignments'],
GROUPS => ['compara', 'compara_genome_alignments'],
DB_TYPES => ['compara'],
TABLES => ['conservation_score', 'dnafrag', 'genome_db', 'genomic_align', 'genomic_align_block', 'method_link', 'method_link_species_set', 'method_link_species_set_tag']
};
Expand All @@ -46,8 +46,7 @@ sub tests {
FROM method_link_species_set mlss
JOIN method_link USING(method_link_id)
LEFT JOIN method_link_species_set_tag mlsst ON (mlss.method_link_species_set_id = mlsst.method_link_species_set_id AND tag = "msa_mlss_id" AND value != "")
WHERE (type = "GERP_CONSERVATION_SCORE"
OR class LIKE "ConservationScore%")
WHERE type = "GERP_CONSERVATION_SCORE"
AND tag IS NULL;
/;

Expand All @@ -59,8 +58,7 @@ sub tests {
FROM method_link_species_set
LEFT JOIN method_link USING(method_link_id)
LEFT JOIN method_link_species_set_tag USING(method_link_species_set_id)
WHERE (type = "GERP_CONSERVATION_SCORE"
OR class LIKE "ConservationScore%")
WHERE type = "GERP_CONSERVATION_SCORE"
AND tag = "msa_mlss_id";
/;

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -30,7 +30,7 @@ extends 'Bio::EnsEMBL::DataCheck::DbCheck';
use constant {
NAME => 'CheckConstrainedElementTable',
DESCRIPTION => 'Each row should show a one-to-many relationship',
GROUPS => ['compara', 'compara_multiple_alignments'],
GROUPS => ['compara', 'compara_genome_alignments'],
DATACHECK_TYPE => 'critical',
DB_TYPES => ['compara'],
TABLES => ['constrained_elements']
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -30,7 +30,7 @@ extends 'Bio::EnsEMBL::DataCheck::DbCheck';
use constant {
NAME => 'CheckDuplicatedTaxaNames',
DESCRIPTION => 'Check that the ncbi_taxa_name contains only unique rows',
GROUPS => ['compara'],
GROUPS => ['compara', 'compara_gene_trees', 'compara_genome_alignments', 'compara_master', 'compara_syntenies'],
DATACHECK_TYPE => 'critical',
DB_TYPES => ['compara'],
TABLES => ['ncbi_taxa_name']
Expand Down
2 changes: 1 addition & 1 deletion lib/Bio/EnsEMBL/DataCheck/Checks/CheckEmptyLeavesTrees.pm
Original file line number Diff line number Diff line change
Expand Up @@ -30,7 +30,7 @@ extends 'Bio::EnsEMBL::DataCheck::DbCheck';
use constant {
NAME => 'CheckEmptyLeavesTrees',
DESCRIPTION => 'Check that none of the gene tree leaves have children',
GROUPS => ['compara', 'compara_protein_trees'],
GROUPS => ['compara', 'compara_gene_trees'],
DATACHECK_TYPE => 'critical',
DB_TYPES => ['compara'],
TABLES => ['gene_tree_node']
Expand Down
2 changes: 1 addition & 1 deletion lib/Bio/EnsEMBL/DataCheck/Checks/CheckFlatProteinTrees.pm
Original file line number Diff line number Diff line change
Expand Up @@ -30,7 +30,7 @@ extends 'Bio::EnsEMBL::DataCheck::DbCheck';
use constant {
NAME => 'CheckFlatProteinTrees',
DESCRIPTION => 'Check protein tree integrity ensuring number of leaves with parent node at root < 3',
GROUPS => ['compara', 'compara_protein_trees'],
GROUPS => ['compara', 'compara_gene_trees'],
DATACHECK_TYPE => 'critical',
DB_TYPES => ['compara'],
TABLES => ['gene_tree_node', 'gene_tree_root']
Expand Down
12 changes: 8 additions & 4 deletions lib/Bio/EnsEMBL/DataCheck/Checks/CheckGOCScoreStats.pm
Original file line number Diff line number Diff line change
Expand Up @@ -29,15 +29,15 @@ extends 'Bio::EnsEMBL::DataCheck::DbCheck';
use constant {
NAME => 'CheckGOCScoreStats',
DESCRIPTION => 'The number of rows for GOC have not dropped from previous release',
GROUPS => ['compara', 'compara_protein_trees'],
DATACHECK_TYPE => 'critical',
GROUPS => ['compara', 'compara_gene_trees'],
DATACHECK_TYPE => 'advisory',
DB_TYPES => ['compara'],
TABLES => ['homology']
};

sub tests {
my ($self) = @_;
my $prev_dba = $self->registry->get_DBAdaptor('compara_prev', 'compara') || $self->get_old_dba;
my $prev_dba = $self->get_old_dba;

my $curr_helper = $self->dba->dbc->sql_helper;
my $prev_helper = $prev_dba->dbc->sql_helper;
Expand All @@ -54,7 +54,11 @@ sub tests {

foreach my $type ( keys %$prev_results ) {
my $desc = "There are the same number of goc_score populated rows between releases for $type";
cmp_ok( $curr_results->{$type}, ">=", $prev_results->{$type}, $desc );
cmp_ok( $curr_results->{$type} // 0, ">=", $prev_results->{$type}, $desc );
}

unless (%$prev_results) {
plan skip_all => "No MLSSs to test in this database";
}
}

Expand Down
2 changes: 1 addition & 1 deletion lib/Bio/EnsEMBL/DataCheck/Checks/CheckGeneGainLossData.pm
Original file line number Diff line number Diff line change
Expand Up @@ -30,7 +30,7 @@ extends 'Bio::EnsEMBL::DataCheck::DbCheck';
use constant {
NAME => 'CheckGeneGainLossData',
DESCRIPTION => 'ncRNA and protein trees must have gene Gain/Loss trees',
GROUPS => ['compara', 'compara_protein_trees'],
GROUPS => ['compara', 'compara_gene_trees'],
DATACHECK_TYPE => 'critical',
DB_TYPES => ['compara'],
TABLES => ['CAFE_gene_family', 'gene_tree_root']
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -31,7 +31,7 @@ extends 'Bio::EnsEMBL::DataCheck::DbCheck';
use constant {
NAME => 'CheckGenomicAlignGenomeDBs',
DESCRIPTION => 'Check all genome_dbs for each method_link_species_set is present in genomic_aligns',
GROUPS => ['compara', 'compara_multiple_alignments', 'compara_pairwise_alignments'],
GROUPS => ['compara', 'compara_genome_alignments'],
DATACHECK_TYPE => 'critical',
DB_TYPES => ['compara'],
TABLES => ['dnafrag', 'genome_db', 'genomic_align', 'genomic_align_block', 'method_link_species_set', 'species_set']
Expand All @@ -40,7 +40,7 @@ use constant {
sub skip_tests {
my ($self) = @_;
my $mlss_adap = $self->dba->get_MethodLinkSpeciesSetAdaptor;
my @methods = qw (PECAN EPO EPO_LOW_COVERAGE LASTZ_NET LASTZ_PATCH);
my @methods = qw (PECAN EPO EPO_EXTENDED LASTZ_NET LASTZ_PATCH);
my $db_name = $self->dba->dbc->dbname;

my @mlsses;
Expand All @@ -60,7 +60,7 @@ sub tests {
my $helper = $dba->dbc->sql_helper;
my $mlss_adap = $dba->get_MethodLinkSpeciesSetAdaptor;
my $gdb_adap = $dba->get_GenomeDBAdaptor;
my @mlss_types = qw ( PECAN EPO EPO_LOW_COVERAGE LASTZ_NET LASTZ_PATCH);
my @mlss_types = qw ( PECAN EPO EPO_EXTENDED LASTZ_NET LASTZ_PATCH);
my $ancestral = $gdb_adap->fetch_all_by_name('ancestral_sequences');
my @mlsses;

Expand Down
8 changes: 3 additions & 5 deletions lib/Bio/EnsEMBL/DataCheck/Checks/CheckGenomicAlignMTs.pm
Original file line number Diff line number Diff line change
Expand Up @@ -30,7 +30,7 @@ extends 'Bio::EnsEMBL::DataCheck::DbCheck';
use constant {
NAME => 'CheckGenomicAlignMTs',
DESCRIPTION => 'The multiple alignments should include all the MT sequences',
GROUPS => ['compara', 'compara_multiple_alignments'],
GROUPS => ['compara', 'compara_genome_alignments'],
DATACHECK_TYPE => 'advisory',
DB_TYPES => ['compara'],
TABLES => ['dnafrag', 'genome_db', 'genomic_align', 'method_link', 'method_link_species_set', 'species_set']
Expand All @@ -39,7 +39,7 @@ use constant {
sub skip_tests {
my ($self) = @_;
my $mlss_adap = $self->dba->get_MethodLinkSpeciesSetAdaptor;
my @methods = qw( EPO EPO_LOW_COVERAGE PECAN );
my @methods = qw( EPO EPO_EXTENDED PECAN );
my $db_name = $self->dba->dbc->dbname;

my @mlsses;
Expand Down Expand Up @@ -71,9 +71,7 @@ sub tests {
JOIN genome_db USING(genome_db_id)
JOIN dnafrag USING(genome_db_id)
WHERE cellular_component = 'MT'
AND (class LIKE 'GenomicAlignTree%'
OR class LIKE 'GenomicAlign%multiple%')
AND (type NOT LIKE 'CACTUS_HAL%')
AND type IN ("EPO", "EPO_EXTENDED", "PECAN")
/;

my $entries_array = $helper->execute(
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -30,7 +30,7 @@ extends 'Bio::EnsEMBL::DataCheck::DbCheck';
use constant {
NAME => 'CheckGenomicAlignTreeTable',
DESCRIPTION => 'Check the consistency and validity of genomic_align_tree',
GROUPS => ['compara', 'compara_multiple_alignments'],
GROUPS => ['compara', 'compara_genome_alignments'],
DATACHECK_TYPE => 'critical',
DB_TYPES => ['compara'],
TABLES => ['genomic_align_tree', 'method_link_species_set']
Expand All @@ -39,7 +39,7 @@ use constant {
sub skip_tests {
my ($self) = @_;
my $mlss_adap = $self->dba->get_MethodLinkSpeciesSetAdaptor;
my @methods = qw( EPO EPO_LOW_COVERAGE );
my @methods = qw( EPO EPO_EXTENDED );
my $db_name = $self->dba->dbc->dbname;

my @mlsses;
Expand All @@ -57,7 +57,7 @@ sub skip_tests {
sub tests {
my ($self) = @_;
my $mlss_adap = $self->dba->get_MethodLinkSpeciesSetAdaptor;
my @methods = qw( EPO EPO_LOW_COVERAGE );
my @methods = qw( EPO EPO_EXTENDED );
my $db_name = $self->dba->dbc->dbname;
my $dbc = $self->dba->dbc;
my $helper = $dbc->sql_helper;
Expand Down
2 changes: 1 addition & 1 deletion lib/Bio/EnsEMBL/DataCheck/Checks/CheckHomology.pm
Original file line number Diff line number Diff line change
Expand Up @@ -30,7 +30,7 @@ extends 'Bio::EnsEMBL::DataCheck::DbCheck';
use constant {
NAME => 'CheckHomology',
DESCRIPTION => 'Check homology_id are all one-to-many for homology_members',
GROUPS => ['compara', 'compara_protein_trees'],
GROUPS => ['compara', 'compara_gene_trees'],
DATACHECK_TYPE => 'critical',
DB_TYPES => ['compara'],
TABLES => ['homology', 'homology_member']
Expand Down
Loading