Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fixes for HG38 exception and better logging. #4563

Merged
merged 8 commits into from
Apr 6, 2018

Conversation

jonn-smith
Copy link
Collaborator

Added more logging functionality to better inform users.

Fixed NullPointerException when onTraversalStart throws.
Now logs version number information for data sources.
Now checks minimum data sources version and requires that the user
upgrade if the version is too old.

Fixes #4521 in a roundabout way - data sources need to be re-released for this version to run at all, but these new data sources contain a fix for the HG38 issue.

@codecov-io
Copy link

codecov-io commented Mar 23, 2018

Codecov Report

Merging #4563 into master will increase coverage by 0.075%.
The diff coverage is 43.802%.

@@               Coverage Diff               @@
##              master     #4563       +/-   ##
===============================================
+ Coverage     79.857%   79.932%   +0.075%     
- Complexity     17054     17347      +293     
===============================================
  Files           1067      1067               
  Lines          62031     63140     +1109     
  Branches       10039     10311      +272     
===============================================
+ Hits           49536     50469      +933     
- Misses          8582      8696      +114     
- Partials        3913      3975       +62
Impacted Files Coverage Δ Complexity Δ
...broadinstitute/hellbender/utils/test/BaseTest.java 64.444% <0%> (-0.481%) 33 <0> (ø)
...r/dataSources/cosmic/CosmicFuncotationFactory.java 76.238% <0%> (-4.815%) 21 <0> (ø)
...ncotator/mafOutput/MafOutputRendererConstants.java 98.98% <100%> (ø) 1 <0> (ø) ⬇️
...tools/funcotator/DataSourceFuncotationFactory.java 90% <100%> (ø) 17 <1> (ø) ⬇️
...titute/hellbender/tools/funcotator/Funcotator.java 76.351% <100%> (ø) 30 <2> (ø) ⬇️
.../tools/funcotator/dataSources/DataSourceUtils.java 56.432% <29.73%> (-11.616%) 33 <7> (+7)
...cotator/dataSources/vcf/VcfFuncotationFactory.java 85.333% <58.333%> (-4.219%) 23 <2> (+1)
...dataSources/gencode/GencodeFuncotationFactory.java 81.997% <75%> (+1.209%) 146 <1> (+4) ⬆️
...ataSources/xsv/SimpleKeyXsvFuncotationFactory.java 87.097% <77.778%> (-0.403%) 27 <2> (+1)
...ataSources/xsv/LocatableXsvFuncotationFactory.java 84.694% <88.889%> (+0.823%) 28 <3> (+2) ⬆️
... and 20 more

Copy link
Contributor

@LeeTL1220 LeeTL1220 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some changes and questions

@@ -397,10 +397,13 @@ public Object onTraversalSuccess() {
public void closeTool() {

for(final DataSourceFuncotationFactory factory : dataSourceFactories) {
factory.close();
if ( factory != null ) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would this be better as Closeable?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, I can change DataSourceFuncotationFactory to be Closeable.

}

@VisibleForTesting
static boolean validateVersionInformation(final int major, final int minor, final int year, final int month, final int day) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No need for final on primitives for method parameters.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm inclined to leave it as it has semantic meaning as well.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Except that the codebase does not do this elsewhere.

logger.info("Data sources alternate source: " + alternateSource);
}
}
catch (final Exception ex) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please log information about the exception. Generally, you should not be catching Exception, but rather a subclass of Exception.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good.

I want to catch any exceptions that happen there and continue so that it doesn't actually fail. This is an optional check for logging purposes anyway.


boolean dataSourcesPathIsAcceptable = true;

final Path readmePath = dataSourcesPath.resolve(IOUtils.getPath(README_FILE_NAME));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where is documentation about the format of the README file?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll add in an explanation of what is required in the README for these purposes.

arguments.addVCF(new File(inputFile));
arguments.addReference(new File("/Users/jonn/Development/references/Homo_sapiens_assembly19.fasta"));

arguments.addArgument(FuncotatorArgumentDefinitions.DATA_SOURCES_PATH_LONG_NAME, "/Users/jonn/Development/funcotator_dataSources.v1.1.20180208");
arguments.addArgument(FuncotatorArgumentDefinitions.DATA_SOURCES_PATH_LONG_NAME, "/Users/jonn/Development/funcotator_dataSources_latestjnkjjk");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This pathname can't possibly be correct. Must be something that would be on anyone's dev machine.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test is intermediate and planned to be phased out.

It is checked in so that I can run my local test data and not have to worry about rewriting tests every time I need to debug something.

I added the flag doDebugTests to control for this and a unit test to ensure that doDebugTests is always set to false for the tests to pass.

I want to leave these in for now so I can continue to test locally with different settings easily.

@@ -275,10 +277,12 @@ public void spotCheck2() throws IOException {

final ArgumentsBuilder arguments = new ArgumentsBuilder();

arguments.addArgument("verbosity", "INFO");

arguments.addVCF(new File(inputFile));
arguments.addReference(new File("/Users/jonn/Development/references/Homo_sapiens_assembly19.fasta"));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See next comment.

@@ -235,6 +235,8 @@ public void spotCheck() throws IOException {

final ArgumentsBuilder arguments = new ArgumentsBuilder();

arguments.addArgument("verbosity", "INFO");

arguments.addVCF(new File("/Users/jonn/Development/M2_01115161-TA1-filtered.vcf"));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please remove all references to /Users/jonn/Development/ etc

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test is intermediate and planned to be phased out.

It is checked in so that I can run my local test data and not have to worry about rewriting tests every time I need to debug something.

I added the flag doDebugTests to control for this and a unit test to ensure that doDebugTests is always set to false for the tests to pass.

I want to leave these in for now so I can continue to test locally with different settings easily.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But why can't you put these tests into the source tree and then remove if/when no longer needed?
You are asking to put these into master by issuing this PR and we cannot cut a release with these tests -- and master needs to always be release-ready.

There should be an alternative way of doing this.

final Path p = IOUtils.getPath(pathString);
if ( !isValidDirectory(p) ) {
logger.warn("WARNING: Given path is not a valid directory: " + p.toUri().toString());
logger.error("ERROR: Given data source path is not a valid directory: " + p.toUri().toString());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I notice that you have a lot of error messages that do not throw exceptions. Is this correct behavior?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's an error to have an invalid data source path.

You're right - I'll throw an exception here.

String line = reader.readLine();
while ((line != null) && ((version == null) || (source == null) || (alternateSource == null))) {

if (version == null && line.startsWith(README_VERSION_LINE_START)) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why have a README and a config file? Can't we merge those?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@LeeTL1220 I don't understand... there is no config file for the data sources directory - only a README file.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok... I thought this code was being run for each datasource. Not the datasources as a whole...

version = versionMajor + "." + versionMinor + "." + versionYear + "" + versionMonth + "" + versionDay;
}
else {
logger.warn("README file has improperly formatted version string: " + line);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

README files typically have minimal to zero formatting. This would indicate that we have a stricter format. Is there a good reason not to merge with the config file?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@LeeTL1220 There is no config file for a data sources directory - only a README file.

I guess we can call it a config file, but then no one is likely to read it and it provides information to anyone who would download / use the data sources.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Leave w/ README. Again, I thought this was being run for each datasource. Not once for all datasources.

@LeeTL1220
Copy link
Contributor

@jonn-smith Back to you.

@jonn-smith jonn-smith removed the request for review from lbergelson March 27, 2018 15:01
@jonn-smith
Copy link
Collaborator Author

OK, @LeeTL1220 - First round comments addressed.

Copy link
Contributor

@LeeTL1220 LeeTL1220 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jonn-smith I do not see how we can leave in hardocded paths in master branch. Let's discuss alternatives.

String line = reader.readLine();
while ((line != null) && ((version == null) || (source == null) || (alternateSource == null))) {

if (version == null && line.startsWith(README_VERSION_LINE_START)) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok... I thought this code was being run for each datasource. Not the datasources as a whole...

version = versionMajor + "." + versionMinor + "." + versionYear + "" + versionMonth + "" + versionDay;
}
else {
logger.warn("README file has improperly formatted version string: " + line);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Leave w/ README. Again, I thought this was being run for each datasource. Not once for all datasources.

@@ -235,6 +235,8 @@ public void spotCheck() throws IOException {

final ArgumentsBuilder arguments = new ArgumentsBuilder();

arguments.addArgument("verbosity", "INFO");

arguments.addVCF(new File("/Users/jonn/Development/M2_01115161-TA1-filtered.vcf"));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But why can't you put these tests into the source tree and then remove if/when no longer needed?
You are asking to put these into master by issuing this PR and we cannot cut a release with these tests -- and master needs to always be release-ready.

There should be an alternative way of doing this.

@jonn-smith
Copy link
Collaborator Author

@LeeTL1220 that's totally fair. There are a few problems with putting in the tests, though.

I don't know that I can publicly release the data that are being tested because of their provenance (pretty sure I can't).

The second reason is that the whole reference sequences are not checked into git lfs. Without it I can't annotate on a set of variants from around the whole genome, and the whole reference is pretty big.

The last thing I can think of is a data source problem - I don't have all the data sources checked into git lfs (for similar reasons, especially now with dbSNP in there).

Also, these have already made their way into master. So if this is really a problem, we should get them out of there ASAP.

@LeeTL1220
Copy link
Contributor

@jonn-smith Let's discuss offline. Yes, let's get these out of master.

@jonn-smith jonn-smith force-pushed the jts_funcotator_hg38_error_4521 branch from 1da4c11 to 0134757 Compare March 28, 2018 20:41
Fixed NullPointerException when onTraversalStart throws.
Now logs version number information for data sources.
Now checks minimum data sources version and requires that the user
upgrade if the version is too old.
@jonn-smith jonn-smith force-pushed the jts_funcotator_hg38_error_4521 branch from 0134757 to 5d85137 Compare March 29, 2018 14:57
@LeeTL1220 LeeTL1220 assigned jonn-smith and unassigned LeeTL1220 Mar 30, 2018
Now dbSNP uses the `All` datasource, not the `common` data source.

Refactored largeDataValidationTest to use an environment variable to
point to the test data files.
funcotatorValidation tests no longer run by default.
Need to enable these tests in jenkins.
Had to change the expected data because of new mappings from data source
fields to MAF output fields.
@jonn-smith jonn-smith merged commit 2db999e into master Apr 6, 2018
@jonn-smith jonn-smith deleted the jts_funcotator_hg38_error_4521 branch April 6, 2018 21:08
cwhelan pushed a commit to cwhelan/gatk-linked-reads that referenced this pull request May 25, 2018
Added more logging functionality to better inform users.

Fixed NullPointerException when onTraversalStart throws.

Now logs version number information for data sources.

Now checks minimum data sources version and requires that the user
upgrade if the version is too old.

Fixed a bug that caused __UNKNOWN__ annotations.

Refactored the spot checks into largeDataValidationTest.

Now dbSNP uses the `All` datasource, not the `common` data source.

Refactored largeDataValidationTest to use an environment variable to
point to the test data files.

Moved funcotatorValidation tests into their own group; funcotatorValidation tests no longer run by default.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Funcotator hg38 data source not working; configuration files contains reference to files not present
4 participants