Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RFC79: Incremental Upload of Data Entries #48

Open
wants to merge 128 commits into
base: main
Choose a base branch
from
Open

RFC79: Incremental Upload of Data Entries #48

wants to merge 128 commits into from

Conversation

forus
Copy link
Contributor

@forus forus commented Jun 19, 2024

See also RFC79

The solution involves extending the current metaImport.py script and java data loader commands with additional flags to support the incremental upload of entries. This approach allows users to add patients, samples, and molecular data without having to reupload the entire study.

Read more in docs cBioPortal/cbioportal#10816

To review in logical parts and see previous discussions about the implementation, you might want to check the chain of closed PRs this PR consists of:

forus added 30 commits April 14, 2024 13:02
To make the dataset look like real data in the database
Apperently, the flag does not change anything.
But we add it anyway as the tests for "incremental" data upload.
adding to the all case list and case list specified with command arguments is supported
From case lists that is not _all case list and not specified with --add-to-case-lists option
We changed them to work for the demo.
Mutation numbers did not change on demo.
Not it was easy to be confused where sample and clinical_sample (attributes),
patient and clinical_patient (attributes) related code
This flag for command to upload molecular profile data
forus added 10 commits June 19, 2024 14:33
by removing them for the whole genetic profile at once.

one sql statment instead of N
the confusion that triggered the change: The use of super. indicates that the subclass also declares one with the same name, but you are trying to not set that somehow?
(1/7) RFC 79: Implement incremental upload of tab delimited data.
(2/7) RFC79: Make study_es_0_inc data pass validation
(3/7) RFC79: Implement incremental upload for timeline data
(4/7) RFC79: Implement incremental upload of CNA DISCRETE long data
(5/7) RFC79: Implement incremental upload for gene panel matrix
(6/7) RFC79: Implement incremental upload of structural variants data
(7/7) RFC79: Implement incremental upload of CNA segmented data
RFC79: Implement feedback from previous 7 PRs
Comment on lines 69 to 70
int listListRow = addSampleListList(sampleList.getCancerStudyId(), listId, sampleList.getSampleList(), con);
rows = (listListRow != -1) ? (rows + listListRow) : rows;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

when will addSampleListList return -1 ? It looks like this scenario was removed from there.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done e795449

this.mutationFile = mutationFile;
this.geneticProfileId = geneticProfileId;
this.swissprotIsAccession = false;
this.genePanel = genePanel;
this.genePanelId = (genePanel == null) ? null : GeneticProfileUtil.getGenePanelId(genePanel);;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

double ;;

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

}
line = buffer.readLine().trim();
}
DaoSampleProfile.upsertSampleProfiles(internalSampleIds, geneticProfileId, genePanelId);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think upsertSampleProfileRecords is a better name. Or upsertSampleToProfileRecords , or upsertSampleVsProfileRecords. or upsertSampleToProfileMapping . "Sampleprofiles" is confusing as it sounds like it is referring to the complete sample profile (i.e. the mutation data itself).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

upsertSampleToProfileMapping is perhaps the best name...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like upsertSampleToProfileMapping Renamed here f120f5d

}
} else if(
geneticProfile.getGeneticAlterationType() == GeneticAlterationType.COPY_NUMBER_ALTERATION
&& DISCRETE_LONG.name().equals(geneticProfile.getDatatype())
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why geneticProfile.getDatatype() -> geneticProfile.getOtherMetaDataField("datatype") ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

DISCRETE_LONG ends up in the database as DISCRETE at the end.

In the case when the genetic profile has been loaded to the database previously, we reuse the platform and we want to access how the datatype has been defined in the file with geneticProfile.getOtherMetaDataField("datatype") instead of how it is stored in the database geneticProfile.getDatatype()

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this class is not doing anything. Let's remove it completely, perhaps in a follow-up PR, together with all other code matching "ImportSif"

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

here is separate issue for that #49

}

private Map<Integer, String> mapWithFileOrderedSampleList(String[] values) {
return ArrayUtil.zip(fileOrderedSampleList.toArray(new Integer[0]), values);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe Integer[]::new

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done 602cc24

Comment on lines +28 to +29
String caseListIds = properties.getProperty("case_list_ids");
List<String> sampleIds = caseListIds == null ? List.of()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think case_list_ids is mandatory, so we should throw error if null

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's done in CaseListValidator. The reader is not concerned with what fields are compulsory in the current design.

Comment on lines 9 to 10
* is the line has some data
* e.g. blank line and comments do not
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

comments seem incomplete

Copy link
Contributor Author

@forus forus Jun 21, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Made complete here 28dfa05

Comment on lines +63 to +66
/**
* Updates case list sample ids from clinical sample and case list files
*/
public void run() {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this UpdateCaseListsSampleIds class and specially updateCaseLists deserve more extended comments, as it is not immediately obvious what it is supposed to do.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here are comments for updateCaseLists 28dfa05

Please let me know which other pieces of code require more explanation in comments

Copy link
Member

@pieterlukasse pieterlukasse left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks @forus . Please take a look at my final comments.

Review note: I've skipped over the test files.

@forus
Copy link
Contributor Author

forus commented Jun 21, 2024

@pieterlukasse Thanks! Please have another look.

@forus forus requested a review from pieterlukasse June 21, 2024 19:27
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants