-
Notifications
You must be signed in to change notification settings - Fork 9
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
RFC79: Incremental Upload of Data Entries #48
base: main
Are you sure you want to change the base?
Conversation
To make the dataset look like real data in the database
Apperently, the flag does not change anything. But we add it anyway as the tests for "incremental" data upload.
adding to the all case list and case list specified with command arguments is supported
From case lists that is not _all case list and not specified with --add-to-case-lists option
with a reference to the ticket
We changed them to work for the demo. Mutation numbers did not change on demo.
Not it was easy to be confused where sample and clinical_sample (attributes), patient and clinical_patient (attributes) related code
This flag for command to upload molecular profile data
by removing them for the whole genetic profile at once. one sql statment instead of N
the confusion that triggered the change: The use of super. indicates that the subclass also declares one with the same name, but you are trying to not set that somehow?
(1/7) RFC 79: Implement incremental upload of tab delimited data.
(2/7) RFC79: Make study_es_0_inc data pass validation
(3/7) RFC79: Implement incremental upload for timeline data
(4/7) RFC79: Implement incremental upload of CNA DISCRETE long data
(5/7) RFC79: Implement incremental upload for gene panel matrix
(6/7) RFC79: Implement incremental upload of structural variants data
(7/7) RFC79: Implement incremental upload of CNA segmented data
RFC79: Implement feedback from previous 7 PRs
int listListRow = addSampleListList(sampleList.getCancerStudyId(), listId, sampleList.getSampleList(), con); | ||
rows = (listListRow != -1) ? (rows + listListRow) : rows; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
when will addSampleListList
return -1 ? It looks like this scenario was removed from there.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done e795449
this.mutationFile = mutationFile; | ||
this.geneticProfileId = geneticProfileId; | ||
this.swissprotIsAccession = false; | ||
this.genePanel = genePanel; | ||
this.genePanelId = (genePanel == null) ? null : GeneticProfileUtil.getGenePanelId(genePanel);; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
double ;;
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
} | ||
line = buffer.readLine().trim(); | ||
} | ||
DaoSampleProfile.upsertSampleProfiles(internalSampleIds, geneticProfileId, genePanelId); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think upsertSampleProfileRecords is a better name. Or upsertSampleToProfileRecords , or upsertSampleVsProfileRecords. or upsertSampleToProfileMapping . "Sampleprofiles" is confusing as it sounds like it is referring to the complete sample profile (i.e. the mutation data itself).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
upsertSampleToProfileMapping is perhaps the best name...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like upsertSampleToProfileMapping
Renamed here f120f5d
} | ||
} else if( | ||
geneticProfile.getGeneticAlterationType() == GeneticAlterationType.COPY_NUMBER_ALTERATION | ||
&& DISCRETE_LONG.name().equals(geneticProfile.getDatatype()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why geneticProfile.getDatatype()
-> geneticProfile.getOtherMetaDataField("datatype")
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
DISCRETE_LONG
ends up in the database as DISCRETE
at the end.
In the case when the genetic profile has been loaded to the database previously, we reuse the platform and we want to access how the datatype has been defined in the file with geneticProfile.getOtherMetaDataField("datatype")
instead of how it is stored in the database geneticProfile.getDatatype()
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this class is not doing anything. Let's remove it completely, perhaps in a follow-up PR, together with all other code matching "ImportSif"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
here is separate issue for that #49
} | ||
|
||
private Map<Integer, String> mapWithFileOrderedSampleList(String[] values) { | ||
return ArrayUtil.zip(fileOrderedSampleList.toArray(new Integer[0]), values); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe Integer[]::new
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done 602cc24
String caseListIds = properties.getProperty("case_list_ids"); | ||
List<String> sampleIds = caseListIds == null ? List.of() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think case_list_ids is mandatory, so we should throw error if null
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's done in CaseListValidator. The reader is not concerned with what fields are compulsory in the current design.
* is the line has some data | ||
* e.g. blank line and comments do not |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
comments seem incomplete
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Made complete here 28dfa05
/** | ||
* Updates case list sample ids from clinical sample and case list files | ||
*/ | ||
public void run() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this UpdateCaseListsSampleIds
class and specially updateCaseLists
deserve more extended comments, as it is not immediately obvious what it is supposed to do.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here are comments for updateCaseLists
28dfa05
Please let me know which other pieces of code require more explanation in comments
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks @forus . Please take a look at my final comments.
Review note: I've skipped over the test files.
method in DaoSampleProfile
@pieterlukasse Thanks! Please have another look. |
See also RFC79
The solution involves extending the current
metaImport.py
script and java data loader commands with additional flags to support the incremental upload of entries. This approach allows users to add patients, samples, and molecular data without having to reupload the entire study.Read more in docs cBioPortal/cbioportal#10816
To review in logical parts and see previous discussions about the implementation, you might want to check the chain of closed PRs this PR consists of: