Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update validator for generic assay binary and categorical data #7973

Merged
merged 1 commit into from Oct 26, 2020

Conversation

dippindots
Copy link
Member

@dippindots dippindots commented Oct 15, 2020

Fix #7932
Only need to update validator for different data type, importer should use the same one.

  • Rename previous validator to GenericAssayContinuousValidator
  • Update validator for generic assay binary and categorical data
  • Add unit tests for all three datatypes
  • Update File-Formats.md

Ref: RFC can be found here: #7920

@dippindots dippindots self-assigned this Oct 15, 2020
@dippindots dippindots force-pushed the fix-7932 branch 8 times, most recently from 669316a to 087b6e2 Compare October 16, 2020 15:33
@dippindots dippindots changed the title Fix 7932 Update validator for generic assay binary and categorical data Oct 16, 2020
@dippindots dippindots force-pushed the fix-7932 branch 4 times, most recently from 0cbfb83 to 09e7045 Compare October 19, 2020 20:36
@dippindots dippindots marked this pull request as ready for review October 19, 2020 20:37
Copy link
Contributor

@rmadupuri rmadupuri left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The changes look good. Thank you @dippindots! We can merge the PR.

# (1) values defined in ALLOWED_VALUES
# (2) NA cell value is allowed; means value was not tested on a sample

ALLOWED_VALUES = ['yes', 'no', 'true', 'false'] + GenericAssayWiseFileValidator.NULL_VALUES
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a small question. Are the binary values fixed?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, in the RFC, we planned to have a controlled vocabulary for binary type: https://docs.google.com/document/d/1-6O16_j5b5LeHA5SnChnlEKQTYhcwNh4AEwCxB8FwC8/edit?disco=AAAAHCuNjcs.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right. We can extend the list if we have to introduce other binary values.


### Generic Assay meta file
The generic assay metadata file should contain the following fields:
```
cancer_study_identifier: Same value as specified in meta file of the study
genetic_alteration_type: GENERIC_ASSAY
generic_assay_type: <GENERIC_ASSAY_TYPE>, e.g., "TREATMENT_RESPONSE" or "MUTATIONAL_SIGNATURE"
datatype: LIMIT-VALUE
datatype: value from LIMIT-VALUE / CATEGORICAL / BINARY
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just to be consistent with how we define datatype in other profiles, can we update the line to datatype: LIMIT-VALUE, CATEGORICAL or BINARY?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, updated!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The failed tests seems to be not related to the changes in this PR. Can we merge?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I think we can merge!

@inodb inodb merged commit b67d26b into cBioPortal:master Oct 26, 2020
@inodb inodb added the cl-enhancement Enhancement section changelog. Enhancement to an existing feature label Oct 27, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backend cl-enhancement Enhancement section changelog. Enhancement to an existing feature validator
Projects
None yet
Development

Successfully merging this pull request may close these issues.

(Generic Assay) Backend support for Categorical data and Binary data
3 participants