New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
support multiple reference genomes 2nd try #5891
support multiple reference genomes 2nd try #5891
Conversation
90b1416
to
354c316
Compare
Hi Kelsey. I'm going to start reviewing this. It looks good, but it is pretty big. Could you give a few suggestions on what would be the best classes / configuration to start looking at to understand the important changes? (Advice on how to navigate through the PR and understand the important stuff early in the process) Thanks. |
I see several files with name "pom.xml.new-version" checked in. Aren't these artifacts of the maven build script? I think these can be dropped from the PR. |
I see this image file added, but no references to it. I wonder... is there another PR for frontend changes connected to this work? |
@sheridancbio , oops, should not be included, will remove it. Thanks! |
@sheridancbio , yes, they should be removed as well. I used "git add -A" due to a large number of files changed, that picked some unwanted files as well. Sorry! |
354c316
to
bc9fb5a
Compare
@sheridancbio , extra files removed and push the change back. |
bc9fb5a
to
5465e10
Compare
currently there are three service module unit tests failing in travis-ci: |
@sheridancbio , will take a look. The PR passed all unit testing when I was testing. thanks! |
5465e10
to
7ac6ff4
Compare
@sheridancbio , pushed the change back, should fix those failed unit testing. Thanks! |
hi @sheridancbio , please find updated schema diagram. I will update the code once I made changes. Thanks! |
7ac6ff4
to
145e590
Compare
Hi @sheridancbio , pushed the change back and is ready for the review. Thanks! |
145e590
to
36d5b5c
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So far so good. This is only a partial review, covering the core dao persistence classes. I will continue a pass through the entire code before making change requests.
I make lots of comments while doing code review ... not every comment needs to be addressed - sometimes I am just making notes so that I understand the intended changes, and my own thought process as I come to understand the new features. But I have raised some issues that we probably should talk about soon too.
business/src/main/resources/org/mskcc/cbio/portal/persistence/GenePanelMapperLegacy.xml
Show resolved
Hide resolved
business/src/main/resources/org/mskcc/cbio/portal/persistence/StudyMapperLegacy.xml
Show resolved
Hide resolved
core/src/main/java/org/mskcc/cbio/portal/dao/DaoCancerStudy.java
Outdated
Show resolved
Hide resolved
core/src/main/java/org/mskcc/cbio/portal/dao/DaoReferenceGenomeGene.java
Outdated
Show resolved
Hide resolved
core/src/main/java/org/mskcc/cbio/portal/dao/DaoReferenceGenomeGene.java
Show resolved
Hide resolved
core/src/main/java/org/mskcc/cbio/portal/dao/DaoReferenceGenomeGene.java
Outdated
Show resolved
Hide resolved
@@ -56,7 +56,7 @@ | |||
*/ | |||
public static DataSource getDataSource() { | |||
if (dataSource == null) { | |||
dataSource = new JdbcDataSource(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We need to be careful here. There is code which wires beans appropriately during spring startup for determination of the data source based on portal.properties and application-context files. We use a shared data source within tomcat when running multiple instances of cBioPortal within a single tomcat application. It seems like this code change may hardwire a connection pool through the apache BasicDataSource library onto each running tomcat web application.
We will do some testing of this. Maybe @n1zea144 can comment on this code change.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
good points, if you need to run multiple portal instances in a single tomcat container. Will revert the change.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@sheridancbio , only with one change made to JdbcDataSource constructor, to replace hard code mysql driver class name with value read from portal.properties file as the following:
from
this.setDriverClassName("com.mysql.jdbc.Driver");
to
String mysqlDriverClassName = dbProperties.getDbDriverClassName();
this.setDriverClassName(mysqlDriverClassName);
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
reminder to self : let's test this code in tomcat
@sheridancbio , thank you so much for the feedback. I will go through each of your comments/suggestions over the weekend. |
@@ -120,7 +121,7 @@ public void createAlterationEnrichments() throws Exception { | |||
AlterationEnrichment alterationEnrichment2 = result.get(1); | |||
Assert.assertEquals((Integer) 3, alterationEnrichment2.getEntrezGeneId()); | |||
Assert.assertEquals("HUGO3", alterationEnrichment2.getHugoGeneSymbol()); | |||
Assert.assertEquals("CYTOBAND3", alterationEnrichment2.getCytoband()); | |||
Assert.assertEquals(null, alterationEnrichment2.getCytoband()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this seems incorrect?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @inodb , that is right. for alterations we will have to fill out cytobands in the frontend, since alterations have no reference genome information associated with them. We will have to rely on the reference genome information of the study to backfill cytobands in the frontend.
Hi @ao508 @sheridancbio @pieterlukasse @inodb , |
3948ce9
to
d2fec9d
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
made some additional (minor) requests. let me know if you have any questions.
@@ -82,10 +82,10 @@ public void importData() throws IOException, DaoException { | |||
|
|||
String geneAId = parts[0]; | |||
|
|||
CanonicalGene geneA = daoGene.getNonAmbiguousGene(geneAId, null); | |||
CanonicalGene geneA = daoGene.getNonAmbiguousGene(geneAId); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@khzhu Just wanted to clarify a bit more - the call daoGene.getNonAmbiguousGene(geneAId, null);
--> daoGene.getNonAmbiguousGene(geneAId)
isn't consistent with other calls that were modified in this pr, such as the one below at L88. Why was the second parameter ignored above but set to true
in the call below?
|
||
#Set defaults for genome version and species | ||
self.__species = 'human' | ||
self.__ncbi_build = '37' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should 37
be changed to GRCh37
to be consistent with how we refer to ncbi builds in the java code? see here
@@ -396,12 +396,21 @@ CREATE TABLE `reference_genome` ( | |||
UNIQUE INDEX `BUILD_NAME_UNIQUE` (`BUILD_NAME` ASC) | |||
); | |||
|
|||
<<<<<<< HEAD |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
unmerged file conflict
INSERT INTO `reference_genome` | ||
VALUES (1, 'human', 'hg19', 'GRCh37', NULL, 'http://hgdownload.cse.ucsc.edu/goldenPath/hg19/bigZips', '2009-02-01'); | ||
INSERT INTO `reference_genome` | ||
VALUES (2, 'human', 'hg38', 'GRCh38', NULL, 'http://hgdownload.cse.ucsc.edu/goldenPath/hg38/bigZips', '2013-12-01'); | ||
INSERT INTO `reference_genome` | ||
VALUES (3, 'mouse', 'mm10', 'GRCm38', NULL, 'http://hgdownload.cse.ucsc.edu//goldenPath/mm10/bigZips', '2012-01-01'); | ||
>>>>>>> resolving code review comments/suggestions |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
another conflict
@@ -620,7 +620,7 @@ The extended MAF format recognized by the portal has: | |||
1. **Hugo_Symbol (Required)**: A [HUGO](http://www.genenames.org/) gene symbol. | |||
2. **Entrez_Gene_Id (Optional, but recommended)**: A [Entrez Gene](http://www.ncbi.nlm.nih.gov/gene) identifier. | |||
3. **Center (Optional)**: The sequencing center. | |||
4. **NCBI_Build (Optional)<sup>1</sup>**: Must be "GRCh37" for human, and "GRCm38" for mouse. | |||
4. **NCBI_Build (Required)<sup>1</sup>**: The Genome Reference Consortium Build is used by a variant calling software. It must be "GRCh37" or "GRCh38" for a human, and "GRCm38" for a mouse. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@khzhu we seem to refer to the NCBI builds as GRCh*
but the python scripts just refer (or expect) the actual numbers. These should match. I think the python scripts should be changed to follow GRCh*
for consistency
docs/Using-the-dataset-validator.md
Outdated
-ucsc UCSC_BUILD_NAME, --ucsc_build_name UCSC_BUILD_NAME | ||
UCSC reference genome assembly name (default: assumed hg19) | ||
-ncbi NCBI_BUILD_NUMBER, --ncbi_build_number NCBI_BUILD_NUMBER | ||
NCBI reference genome build number (default: assumed 37 for UCSC reference genome build hg19) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
use GRCh*
species information (default: assumed human) | ||
-ucsc UCSC_BUILD_NAME, --ucsc_build_name UCSC_BUILD_NAME | ||
UCSC reference genome assembly name (default: assumed hg19) | ||
-ncbi NCBI_BUILD_NUMBER, --ncbi_build_number NCBI_BUILD_NUMBER |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same comment here
@@ -62,7 +62,7 @@ | |||
AlterationEnrichment alterationEnrichment = new AlterationEnrichment(); | |||
alterationEnrichment.setEntrezGeneId(gene.getEntrezGeneId()); | |||
alterationEnrichment.setHugoGeneSymbol(gene.getHugoGeneSymbol()); | |||
alterationEnrichment.setCytoband(gene.getCytoband()); | |||
//alterationEnrichment.setCytoband(gene.getCytoband()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we remove this if it's not going to be used or is there a plan to replace it with the reference_genome_gene.CYTOBAND? If so then can you add a TODO here?
@ao508 , replace those 37 with GRCh37 in validator documents. Also, removed the comment. For alterations, cytobands will be filled by the frontend react module. |
d2fec9d
to
eade653
Compare
use MySql reserved keywords with a quote adjust column sizes in mutation_event table to be compatible with UTF8 add new options (species and reference genome build) to PYTHON importer/validate scripts check reference_genome_id in seg file and NCBI_Build (if filled) in MAF file see if agree with reference genome in cancer study meta file add a new API endpoint to fetch reference genome genes by entrez gene ids and hugo gene symbols
- structural variant seed data - db version to 2.11.0
use MySql reserved keywords with a quote adjust column sizes in mutation_event table to be compatible with UTF8 add new options (species and reference genome build) to PYTHON importer/validate scripts check reference_genome_id in seg file and NCBI_Build (if filled) in MAF file see if agree with reference genome in cancer study meta file add a new API endpoint to fetch reference genome genes by entrez gene ids and hugo gene symbols
eade653
to
a46a54a
Compare
What? Why?
Fix
#5652
Changes proposed in this pull request:
Checks
Any screenshots or GIFs?
If this is a new visual feature please add a before/after screenshot or gif
here with e.g. GifGrabber.