Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve validation for cases where there is only one donor_id value in obs #820

Open
brianraymor opened this issue Mar 22, 2024 · 1 comment
Assignees
Labels
discovery schema CELLxGENE Discover dataset schema

Comments

@brianraymor
Copy link
Contributor

  • development_stage_ontology_term_id
  • organism_ontology_term_id
  • self_reported_ethnicity_ontology_term_id
  • sex_ontology_term_id

must include the following requirement:

If there is one obs['donor_id], all observations MUST be the same value.

Guidance from @jahilton:

That is a check in our curation qa process, and it is possible to automate so certainly something to look into implementing
But I would push it off of 5.1.0 simply because we would first want to audit the corpus to ensure we capture any edge cases

@brianraymor brianraymor added schema CELLxGENE Discover dataset schema 5.2 Next minor CELLxGENE schema version after 5.1 labels Mar 22, 2024
@brianraymor brianraymor self-assigned this Mar 22, 2024
@jahilton
Copy link
Collaborator

There's no reason this logic is restricted to only Datasets w/ 1 donor_id value. So...
Improve validation for cases where there is only one donor_id value in obs donor metadata
Check that each donor_id only has 1 value for organism_ontology_term_id, sex_ontology_term_id, self_reported_ethnicity_ontology_term_id

Can't include disease (common to have healthy vs disease samples from the same individual) or development_stage (longitudinal studies may have samples from the same individual at different ages)

*Ideally this check occurs within a Collection

@brianraymor brianraymor added N.N.N Placeholder for 2024 issues - will reassign to minor release and removed 5.2 Next minor CELLxGENE schema version after 5.1 labels Mar 27, 2024
@brianraymor brianraymor added discovery and removed N.N.N Placeholder for 2024 issues - will reassign to minor release labels May 1, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
discovery schema CELLxGENE Discover dataset schema
Projects
None yet
Development

No branches or pull requests

2 participants