New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Update validator for generic assay binary and categorical data #7973
Conversation
669316a
to
087b6e2
Compare
0cbfb83
to
09e7045
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The changes look good. Thank you @dippindots! We can merge the PR.
# (1) values defined in ALLOWED_VALUES | ||
# (2) NA cell value is allowed; means value was not tested on a sample | ||
|
||
ALLOWED_VALUES = ['yes', 'no', 'true', 'false'] + GenericAssayWiseFileValidator.NULL_VALUES |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just a small question. Are the binary values fixed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, in the RFC, we planned to have a controlled vocabulary for binary type: https://docs.google.com/document/d/1-6O16_j5b5LeHA5SnChnlEKQTYhcwNh4AEwCxB8FwC8/edit?disco=AAAAHCuNjcs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right. We can extend the list if we have to introduce other binary values.
docs/File-Formats.md
Outdated
|
||
### Generic Assay meta file | ||
The generic assay metadata file should contain the following fields: | ||
``` | ||
cancer_study_identifier: Same value as specified in meta file of the study | ||
genetic_alteration_type: GENERIC_ASSAY | ||
generic_assay_type: <GENERIC_ASSAY_TYPE>, e.g., "TREATMENT_RESPONSE" or "MUTATIONAL_SIGNATURE" | ||
datatype: LIMIT-VALUE | ||
datatype: value from LIMIT-VALUE / CATEGORICAL / BINARY |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just to be consistent with how we define datatype in other profiles, can we update the line to datatype: LIMIT-VALUE, CATEGORICAL or BINARY
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, updated!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The failed tests seems to be not related to the changes in this PR. Can we merge?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I think we can merge!
Fix #7932
Only need to update validator for different data type, importer should use the same one.
GenericAssayContinuousValidator
binary
andcategorical
dataFile-Formats.md
Ref: RFC can be found here: #7920