Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add script that allows for update to existing gene panel #7695

Merged
merged 1 commit into from
Aug 3, 2020

Conversation

jtquach1
Copy link
Member

@jtquach1 jtquach1 commented Jul 16, 2020

Fix #7286

Describe changes proposed in this pull request:

  • Update functionality
  • Add or remove genes from gene panel depending on list of genes in incoming vs. current panel

Any screenshots or GIFs?

Before:
sql_before

After:
sql_after

@jtquach1 jtquach1 self-assigned this Jul 16, 2020
@jtquach1
Copy link
Member Author

jtquach1 commented Jul 16, 2020

These are some sketches I made regarding edge cases in comparing incoming with original genes.
updateGenePanel_cases.pdf

So far, I've tested intersection case 1 on pages 1 and 2, where A = imported (test panel to overwrite with) and B = original (test panel from database).
image
image

The first image corresponds to the Before and After in the PR's top comment. For the second image, I tested this using the TESTPANEL2 text file against TESTPANEL2 (unchanged).

@jtquach1
Copy link
Member Author

jtquach1 commented Jul 16, 2020

I've just tested cases 1-7, and the only changes I made were for case 1, where having an empty gene list was disallowed for importing. I changed that so case 1 allows the user to delete all genes from the original gene panel when importing an empty gene panel.
case_testing.pdf

@jtquach1 jtquach1 changed the title First draft of updating existing gene panel script Add script that allows for update to existing gene panel Jul 17, 2020
}
}
}
/*
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have a hunch that Windows is changing the newline characters in this file, hence the weird diff. I'll look around for solutions.

Copy link
Member Author

@jtquach1 jtquach1 Jul 20, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think so too, as I get this message from doing commits:

jtquach@DESKTOP-IST3H2B:~/cbioportal_windows10$ git add core/src/main/java/org/mskcc/cbio/portal/scripts/UpdateGenePanel.java
warning: LF will be replaced by CRLF in core/src/main/java/org/mskcc/cbio/portal/scripts/UpdateGenePanel.java.
The file will have its original line endings in your working directory

core/src/test/resources/seed_mini.sql Outdated Show resolved Hide resolved
Copy link
Member

@Luke-Sikina Luke-Sikina left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks great. I left a few tiny changes, but for the most part, I think this is ready to ship.

@jtquach1 jtquach1 force-pushed the update_existing_gene_panel branch 5 times, most recently from c395620 to 075eb15 Compare July 24, 2020 18:41
Copy link
Contributor

@n1zea144 n1zea144 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice cleanup to existing code. Some comments for consideration.

GenePanel genePanel = DaoGenePanel.getGenePanelByStableId(stableId);

if (genePanel == null) {
ProgressMonitor.logWarning("Gene panel " + stableId + " does not exist in the database! Exiting.");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just wondering if there was any thought to combining the update functionality into the Import script?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

According to Luke, the import and update functionality should be separate from each other

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@n1zea144 Leaving them as separate scripts makes it a bit harder to make the mistake of updating an existing gene panel when you instead meant to upload a new version of a gene panel for new samples. I think this is an easy enough mistake- the user copies the gene panel file, vims it, updates the genes, but forgets to change the name of the panel before running the script. By keeping the two scripts separate, we make it so a user that ignores warning messages and just presses enter a bunch cannot end up making this mistake.

@jtquach1 jtquach1 force-pushed the update_existing_gene_panel branch 3 times, most recently from 524a21e to d2b14f7 Compare July 28, 2020 17:49
@jtquach1 jtquach1 requested a review from n1zea144 July 28, 2020 20:07
pstmt.setLong(2, canonicalGene.getEntrezGeneId());
pstmt.executeUpdate();
}
for (CanonicalGene canonicalGene : toRemove) {
Copy link
Contributor

@n1zea144 n1zea144 Jul 30, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like its possible to remove all genes from the panel. Should we check this and if so, should we prevent this from happening ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In UpdateGenePanel.java, the user prompt in importData() verifies the changes that will be made to the gene panel, like so:
image
Other than that, I asked Luke if importing an empty gene panel should allow for the removal of all existing genes, and he stated that behavior sounds reasonable. I suppose if the user wanted to add back the genes, they could pull up an older version of their input file with the gene names.
I'll ping Luke about whether after all genes are removed, that the gene panel should be removed from gene_panel_list as well

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah, so removal of all genes means stop the update script and don't make the deletions

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not meaningful to have a panel in the database, which samples refer to in sample_profile, that contain no genes. I'm also not sure what the affect is on the web page. One approach is to remove the the panel and subsequent records in sample_profile (via cascade), but thats a bit messy and could happen by mistake.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Spoke with @Luke-Sikina via Slack. Best approach is not to prevent an update which results in empty panel in database.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, does that mean that within UpdateGenePanel, checking if the incoming set of genes is empty and aborting the script is not needed?

Copy link
Member Author

@jtquach1 jtquach1 Jul 30, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've just added the check on lines 77-80 on UpdateGenePanel, after the gene panel null check:

        if (genePanel == null) {
            ProgressMonitor.logWarning("Gene panel " + stableId + " does not exist in the database! Exiting.");
            return;
        }

        if (canonicalGenes == null || canonicalGenes.isEmpty()) {
            ProgressMonitor.logWarning("Incoming gene panel is empty, which would result in the removal of all genes from gene panel " + stableId + ". Exiting.");
            return;
        }

(edited since I realized to use ProgressMonitor.logWarning instead of System.err.println)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, checking/aborting when the incoming gene list from the panel file is empty should suffice, thanks for the adjustment.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the reviews!

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've just updated the markdown file again to reflect that trying to import an empty gene panel will abort the script.

@jtquach1 jtquach1 force-pushed the update_existing_gene_panel branch 2 times, most recently from 0468141 to 83c7b91 Compare July 30, 2020 16:57
@jtquach1 jtquach1 requested a review from n1zea144 July 30, 2020 18:35
pstmt.setLong(2, canonicalGene.getEntrezGeneId());
pstmt.executeUpdate();
}
for (CanonicalGene canonicalGene : toRemove) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, checking/aborting when the incoming gene list from the panel file is empty should suffice, thanks for the adjustment.

Added UpdateGenePanel, based off ImportGenePanel
Refactored and moved identical methods into new util file
Updating allows user to add/remove genes from existing gene panel in database
Allow empty gene panel to be inputted, but abort script if so
Prompt user with Y/n to notify that this is an update and not import script
Preview genes to be added/removed before finalizing Y/n prompt
Unit tests for utility methods
Updated ImportGenePanel calls
Formatted code for readability
Added javadocs
Added null check in extractPropertyValue
Added 12 testpanel genes and their aliases for testing
Added Perl script and updated markdown documentation
Edited markdown documentation on how to use the update script

Co-authored-by: Luke Sikina <lucas.sikina@gmail.com>
@Luke-Sikina Luke-Sikina merged commit ec73d82 into cBioPortal:master Aug 3, 2020
@inodb inodb added the feature label Aug 4, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

There is no script for updating an existing gene panel.
4 participants