-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
added functionality to merge clinical trial metadata objects #114
Changes from 2 commits
8e8d7f4
cd9d9ab
69ad3f0
4e191ed
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -878,3 +878,38 @@ def merge_artifact( | |
|
||
# return new object | ||
return new_ct | ||
|
||
|
||
def merge_clinical_trial_metadata(patch: dict, target: dict) -> dict: | ||
""" | ||
merges two clinical trial metadata objects together | ||
|
||
Args: | ||
patch: the metadata object to add | ||
target: the existing metadata object | ||
|
||
Returns: | ||
arg1: the merged metadata object | ||
""" | ||
|
||
# first we assert the trial details are the same | ||
key_details = ["lead_organization_study_id"] | ||
for d in key_details: | ||
if d in patch and d in target: | ||
if patch[d] != target[d]: | ||
raise RuntimeError("unable to merge trials with different \ | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. What if there will be two trials in different lead organizations but with the same There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. this ID is globally unique |
||
lead_organization_study_id") | ||
|
||
# merge the copy with the original. | ||
validator = load_and_validate_schema( | ||
"clinical_trial.json", return_validator=True) | ||
schema = validator.schema | ||
merger = Merger(schema) | ||
|
||
merged = merger.merge(target, patch) | ||
|
||
# validate this | ||
validator.validate(merged) | ||
|
||
# now return it | ||
return merged |
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -12,7 +12,8 @@ | |
from pprint import pprint | ||
from jsonmerge import Merger | ||
|
||
from cidc_schemas.prism import prismify, merge_artifact | ||
from cidc_schemas.prism import prismify, merge_artifact, \ | ||
merge_clinical_trial_metadata | ||
from cidc_schemas.json_validation import load_and_validate_schema | ||
from cidc_schemas.template import Template | ||
from cidc_schemas.template_writer import RowType | ||
|
@@ -342,3 +343,60 @@ def test_snippet_wes(): | |
ds = ct | grep(url) | ||
assert 'matched_values' in ds | ||
assert len(ds['matched_values']) > 0 | ||
|
||
|
||
def test_merge_ct_meta(): | ||
""" | ||
tests merging of two clinical trial metadata | ||
objects. Currently this test only supports | ||
WES but other tests should be added in the | ||
future | ||
""" | ||
|
||
# create two clinical trials | ||
ct1 = copy.deepcopy(CLINICAL_TRIAL) | ||
ct2 = copy.deepcopy(CLINICAL_TRIAL) | ||
|
||
# first we assert the merge is only happening on the same trial | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Should we should also test the non-schema-conformant There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. added this as well. pre-merge validation will stop the merge before it happens. |
||
ct1["lead_organization_study_id"] = "not_the_same" | ||
with pytest.raises(RuntimeError): | ||
merge_clinical_trial_metadata(ct1, ct2) | ||
|
||
# revert the data to same key trial id but | ||
# include data in 1 that is missing in the other | ||
# at the trial level and assert the merge | ||
# does not clobber any | ||
ct1["lead_organization_study_id"] = ct2["lead_organization_study_id"] | ||
ct1['trial_name'] = 'name ABC' | ||
ct2['nci_id'] = 'xyz1234' | ||
|
||
ct_merge = merge_clinical_trial_metadata(ct1, ct2) | ||
assert ct_merge['trial_name'] == 'name ABC' | ||
assert ct_merge['nci_id'] == 'xyz1234' | ||
|
||
# assert the patch over-writes the original value | ||
# when value is present in both objects | ||
ct1['trial_name'] = 'name ABC' | ||
ct2['trial_name'] = 'CBA eman' | ||
|
||
ct_merge = merge_clinical_trial_metadata(ct1, ct2) | ||
assert ct_merge['trial_name'] == 'name ABC' | ||
|
||
# now change the participant ids | ||
# this should cause the merge to have two | ||
# participants. | ||
ct1['participants'][0]['cimac_participant_id'] = 'different_id' | ||
|
||
ct_merge = merge_clinical_trial_metadata(ct1, ct2) | ||
assert len(ct_merge['participants']) == 2 | ||
|
||
# now lets have the same participant but adding multiple samples. | ||
ct1["lead_organization_study_id"] = ct2["lead_organization_study_id"] | ||
ct1['participants'][0]['cimac_participant_id'] = \ | ||
ct2['participants'][0]['cimac_participant_id'] | ||
ct1['participants'][0]['samples'][0]['cimac_sample_id'] = 'new_id_1' | ||
ct1['participants'][0]['samples'][1]['cimac_sample_id'] = 'new_id_2' | ||
|
||
ct_merge = merge_clinical_trial_metadata(ct1, ct2) | ||
assert len(ct_merge['participants']) == 1 | ||
assert len(ct_merge['participants'][0]['samples']) == 4 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If
d
isn't inpatch
or isn't intarget
, then the function continues on happily and tries to merge -- I guess this missing ID would be caught by the validator below, but it's a little hard to follow that in the code. I would either add a comment to that effect, condense this condition into a one-liner with line 899 a laif patch.get(d) != target.get(d)
, or else raise an error related to missing an expected key.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok I've added explicit pre-merge validation to each object. This will report any errors in validation before merging to reduce confusion.