-
Notifications
You must be signed in to change notification settings - Fork 9
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
RFC79: Incremental Upload of Data Entries #48
Open
forus
wants to merge
128
commits into
main
Choose a base branch
from
rfc79
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
+5,432
−1,391
Open
Changes from 122 commits
Commits
Show all changes
128 commits
Select commit
Hold shift + click to select a range
5dfe298
Add clinical_attribute_meta records to the seed mini
forus 531b10a
Implement sample attribute rewriting flag
forus 248a08c
Add --overwrite-existing for the rest of test cases
forus 2bc7271
Test that mutations stay after updating the sample attributes
forus 31e3194
Add overwrite-existing support for mutations data
forus bd023a9
Fix --overwirte-existing flag description for importer of profile data
forus c49bbf3
Add loader command to update case list with sample ids
forus 1f5695d
Add option to remove sample ids from the remaining case lists
forus 77cd6a8
Make removing sample ids from not mentioned case lists a default beha…
forus bd8c4b2
Make update case list command to read case lists files
forus 5fc633b
Fix test clinical data headers
forus f7132c9
Test incremental patient upload
forus f45e1e8
Add flag to reload patient clinical attributes
forus 8cc95a0
Add TODO comment to remove MIXED_ATTRIBUTES data type
forus fa32b7f
WIP adopt py script to incremental upload
forus f044c3b
Fix java.sql.SQLException: Generated keys not requested
forus 48fca03
Clean alteration_driver_annotation during mutations inc. upload
forus 1302a8e
Fix validator and importer py scripts for inc. upload
forus 659f352
Add test/demo data for incremental loading of study_es_0 study
forus b5952e3
Rename and move incremental tests to incementalTest folder
forus 753119b
Update TODO comment how to deal with multiple sample files
forus 5725d42
Move study_es_0_inc to the new test data folder
forus 299466a
Fix removing patient attributes on samples inc. upload
forus c0c28e2
Change study_es_0_inc to contain more diverse data
forus c6eddbb
Specify that data_directory for incremental data
forus 595d24f
Disambiguate clinical data constants names
forus c8b4c73
Remove not necessary TODO comments
forus efd34d8
Remove MSK copyright mistakenly copy-pasted
forus 3b39e0d
Fix comment of UpdateCaseListsSampleIds.run() method
forus fc785f6
Make --overwrite-existing flag description more generic
forus e782951
Add TODO comments for possible reuse of the code
forus b53c8c4
Update case lists for multiple clinical sample files
forus 99550b5
Extract and reuse common logic to read and validate case lists
forus 1829842
Fix TestIntegrationTest
forus e785a53
Revert RESOURCE_DEFINITION_DICTIONARY initialsation to empty set
forus e09e1e2
Minor improvments. Apply PRs feedback
forus 7b527b6
Make tests fail the build. Conduct exit status of tests correctly
forus f5e8217
Write Validation complete only in case of successful validation
forus 8d3aaed
Add python tests for incremental/full data import
forus 1b6ba41
Add unit test for incremental data validation
forus d252001
Test rough order of importer commands. Remove sorting in the script t…
forus c27b8f1
Extract smaller functions from the big one in py script
forus 2e80b73
Merge pull request #32 from se4bio/inc-data-upload-poc
forus b2c1c21
Refactor tab delim. data importer
forus a7aab3a
Implement incremental upload of mRNA data
forus bd2d8c1
Add RPPA test
forus 8b68331
Add normal sample to thest data to test skipping
forus b18aab1
Add rows with more columns then in header to skip
forus ea688c3
Skip rows that don't have enough sample columns
forus cdae501
Test for invalid entrez id
forus cf458a4
Extract common code from inc. tab. delim. tests
forus 9ea1ada
Implement incremntal upload of cna data via tab. delim. loader
forus 03f9660
Blanken values for genes not mentioned in the file
forus 93cc6ff
Remove unused code
forus 842bcd3
Throw unsupported operation exception for GENESET_SCORE incremental u…
forus 22b688a
Add generic assay data incremental upload test
forus d11a353
Fix integration tests
forus 7dfb1bd
Make tab. delimiter data uploader transactional
forus 71cdf70
Check for illegal state in tab delim. data update
forus 2d31dac
Wire incremental tab delim. data upload to cli commands
forus 4997542
Expand README with section on how to run incremental upload
forus 911ae28
Address TODOs in tab delim. importer
forus c7343f9
Add more data types to incremental data upload folder
forus 2ed0bd8
Remove obsolete TODO comment
forus 76b52a9
Reuse genetic_profile record if it exists in db already
forus fa16076
Test incremental upload of tab delim. data types from umbrella script
forus e5ccc3e
Move counting lines if file inside generic assay patient level data u…
forus 472f47e
Give error that generic asssay patient level data is not supported
forus c54e303
Clean sample_cna_event despite whether it has alteration_driver_annot…
forus 18dbdd3
Fix cbioportalImport script execution
forus c702a8b
Remove not needed spring context initialisation
forus 0ff7031
Make error message more informative when gene panel is not found
forus 54cc04e
Add more genes to the mini seed to load study_es_0
forus a022aab
Make study_es_0_inc data pass validation
forus 90cc928
Document in README how to load study_es_0 study
forus fb75d7c
Implement incremental upload for timeline data
forus 3331223
Implement incremental upload of CNA DISCRETE long data
forus d7e1918
Add data type sanity check for tsv uploded
forus ee183e6
Move storing/dedup logic of genetic alteration values to importer
forus 697631f
Move all inc. upload logic for tab delim. data types to GeneticAltera…
forus 65c8b11
Add CNA DISCRETE LONG to study_es0_inc test dataset
forus 0bf6bf2
Remove unused code
forus cc80e56
Make validation to pass for CNA long and study_es_0_inc data
forus 4070e68
Implement incremental upload for gene panel matrix
forus e8bbb34
Make validation of study_es_0_inc data to pass
forus feed06c
Implement incremental upload of structural variants data
forus bea4987
Implement incremental upload of CNA segmented data
forus 0cdda9d
Make it explicit that timeline uploader support bulk mode only
forus d7e8ff3
Fix number of columns in SV tsv data file
forus ec849e2
Update paragraph on inc. upload in README
forus deb65cb
Rename validation method to better describe it's purpose
forus 8692ead
Fix cleaning alteration_driver_annotation table for specific sample
forus be9082c
DRY tab separated value string parsing
forus 4e8a7c2
Reuse FileUtil.isInfoLine(String line) throughout the code
forus b93e741
Extract ensuring header and row match to tsv utility class
forus 9089e77
Simplify delete sql. Rely on cascade delete instead.
forus 16f6295
Generalise overwrite-existing flag description to make it more accurate
forus 79c4041
Rename updateMode to isIncrementalUpdateMode flag
forus 111f58e
Improve description of overwrite-existing flag for gene panel profile…
forus c4d5ecc
Implement more optimal way to update sample profile
forus 13eb147
Optimize code by always using batch upsert for sample profile
forus 95c32f8
Recognise that SEG importer always use bulkLoad
forus e3ec5d6
Organise bulk mode flushing for SEG importer
forus fc84a41
Ignore case for bulkLoad load mode option as everywhere in the code
forus 4eac259
add comma to README
pieterlukasse d0428f8
improve order comments for INCREMENTAL_UPLOAD_SUPPORTED_META_TYPES
pieterlukasse bf2d539
Add join by GENETIC_PROFILE_ID column for sample_cna_event and altera…
forus 37dcc20
Check for inconsistency in sample ids and values while reading geneti…
forus 6562716
Make method name to initialise transaction clearer
forus b0a448e
Remove TODOs that were done
forus f3d76c7
Rename isInfoLine util. method to isDataLine
forus f544847
Simplify code by using inheritence instead of composition
forus ab51a4b
Optimize removing genetic alterations
forus 96acec5
Access inherited variables with this. intead of super.
forus 52714d6
Merge pull request #45 from cBioPortal/inc-tab-delimited-uploader
forus 074372f
Merge pull request #44 from cBioPortal/make_study_es_0_to_load
forus a5ac232
Merge pull request #43 from cBioPortal/inc-timeline-uploader
forus a47eb49
Merge pull request #42 from cBioPortal/inc-cna-discrete-long
forus d28d04d
Merge pull request #41 from cBioPortal/inc-gene-panel-matrix
forus 8c74dbb
Merge pull request #40 from cBioPortal/inc-sv
forus d15c579
Merge pull request #39 from cBioPortal/inc-seg
forus d081f8f
Merge pull request #47 from cBioPortal/rfc79-feedback
forus e795449
Remove unused code from DaoSampleList.addSampleList()
forus df8f7af
Remove extra semicolons at the end of java statements
forus f120f5d
Rename upsertSampleProfiles to upsertSampleToProfileMapping
forus 602cc24
Use java 8 way to convert typed list to array in GeneticAlterationInc…
forus 28dfa05
Improve doc comments for TsvUtil.isDataLine(String line)
forus 2dd1e62
Rename and codument better method to updateCaseLists
forus File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This may not matter, but running test_scripts.sh like this invokes it in a subshell rather than in the current (shell) process.
If there were environment variables set by the script which need to be exported or read out they would be lost this way.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
indeed. If we lose any relevant env. variable, tests will fail.
source
command is bash specific btw