Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Gene info parser fix #6754

Merged
merged 3 commits into from Nov 27, 2019
Merged

Gene info parser fix #6754

merged 3 commits into from Nov 27, 2019

Conversation

SRodenburg
Copy link

I ran into some problems while building a hg38 seed database:

Sometimes the file Homo_sapiens.gene_info contains missing values in the map_location and/or chromosome columns. This file is downloaded from NCBI and required when updating genes and gene aliases.
Additionally, sometimes a particular symbol from the GenCode annotation file is not in the info file, causing a NullPointerException.

Changes:

  • When the chromosome can't be parsed from the map_location column, try the chromosome column. If that also fails, skip gene.
  • prevent NullPointerException when the gene symbol can't be found, and instead showing a warning, and skipping the gene.

@SRodenburg SRodenburg changed the base branch from master to rc October 29, 2019 08:47
int referenceGenomeId = DaoReferenceGenome.getReferenceGenomeByBuildName(genomeBuild).getReferenceGenomeId();
String desc = parts[8];
String type = parts[9];
String mainSymbol = parts[10]; // use 10 instead of 2 since column 2 may have duplication
Set<String> aliases = new HashSet<String>();

// try to get chr from other column if needed
if (chr.equals("-")) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

combine two ifs into one, so the second condition will not be evaluated if first one is false.

       if (chr.equals("-") && (!parts[6].equals("-")) {
              chr = parts[6];
       } else {
              // skip line if still unable to parse chr
             continue;
      }

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The unittests revealed a problem with your proposed implementation; the code will go straight to continue when chr does not equal "-", which is obviously not what we want.

A "synonymous" implementation would be:

if (chr.equals("-") && parts[6].equals("-")) {
    continue
} 
if (chr.equals("-")) {
    chr = parts[6];
}

However, this requires 2 logical tests while my original implemetation for most of the lines requires 1, thus would be better performance-wise.

@SRodenburg SRodenburg changed the base branch from rc to master October 30, 2019 09:34
@SRodenburg SRodenburg requested a review from khzhu November 4, 2019 12:33
@SRodenburg
Copy link
Author

@inodb can you merge?

@inodb inodb merged commit 8a953f7 into cBioPortal:master Nov 27, 2019
@inodb inodb added the bug label Nov 29, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants