Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cellxgene-schema CLI must add validation for obs['modality'] and update validation for X #869

Open
brianraymor opened this issue Apr 30, 2024 · 0 comments
Labels
curation software dp Data Platform Team work

Comments

@brianraymor
Copy link
Contributor

brianraymor commented Apr 30, 2024

Design

See X and modality in the schema.

X (Matrix Layers)

assay_ontology_term_id or modality "raw" required? "raw" location "normalized" required? "normalized" location
modality is "transcriptomics" and assay_ontology_term_id is NOT "EFO:0010961" for Visium Spatial Gene Expression REQUIRED. If UMI-based assay (e.g. 10x v3, Slide-seqV2), values MUST be de-duplicated molecule counts.

If non-UMI-based assay (e.g. Smart-seq2), values MUST be one of read counts (e.g. FeatureCounts) or estimated fragments (e.g. output of RSEM).

Each observation MUST contain at least one non-zero value. All non-zero values MUST be positive integers stored as numpy.float32.
AnnData.raw.X unless no "normalized" is provided, then AnnData.X STRONGLY RECOMMENDED AnnData.X
modality is "transcriptomics" and assay_ontology_term_id is "EFO:0010961" for Visium Spatial Gene Expression REQUIRED. Values MUST be de-duplicated molecule counts. All non-zero values MUST be positive integers stored as numpy.float32.

If uns['spatial']['is_single'] is False then each observation MUST contain at least one non-zero value.

If uns['spatial']['is_single'] is True then the unfiltered feature-barcode matrix (raw_feature_bc_matrix) MUST be used. See Space Ranger Feature-Barcode Matrices. This matrix MUST contain 4992 rows. If the obs['in_tissue'] value is 1, then the observation MUST contain at least one non-zero value. If any obs['in_tissue'] values are 0, then at least one observation corresponding to a obs['in_tissue'] with a value of 0 MUST contain a non-zero value.
AnnData.raw.X unless no "normalized" is provided, then AnnData.X STRONGLY RECOMMENDED AnnData.X
modality is "epigenomics" NOT REQUIRED REQUIRED AnnData.X

modality

Key modality
Annotator Curator MUST annotate.
Value categorical with str categories. This MUST be "epigenomics" or "transcriptomics".

This MUST be the correct type for the corresponding assay:

For Assay MUST Use
10x multiome [EFO:0030059] "epigenomics" or "transcriptomics"
10x scATAC-seq [EFO:0030007] "epigenomics"
10x transcription profiling [EFO:0030080] and its descendants "transcriptomics"
BD Rhapsody Targeted mRNA [EFO:0700004] "transcriptomics"
BD Rhapsody Whole Transcriptome Analysis [EFO:0700003] "transcriptomics"
CEL-seq2 [EFO:0010010] and its descendants "transcriptomics"
DroNc-seq [EFO:0008720] "transcriptomics"
Drop-seq [EFO:0008722] "transcriptomics"
GEXSCOPE technology [EFO:0700011] "transcriptomics"
inDrop [EFO:0008780] "transcriptomics"
MARS-seq [EFO:0008796] "transcriptomics"
mCT-seq [EFO:0030060] "transcriptomics"
MERFISH [EFO:0008992] "transcriptomics"
methylation profiling by high throughput sequencing [EFO:0002761] and its descendants "epigenomics"
microwell-seq [EFO:0030002] "transcriptomics"
Patch-seq [EFO:0008853] "transcriptomics"
ScaleBio single cell RNA sequencing [EFO:0022490] "transcriptomics"
scATAC-seq [EFO:0010891] "epigenomics"
sci-RNA-seq [EFO:0010550] and its descendants "transcriptomics"
Seq-Well [EFO:0008919] and its descendants "transcriptomics"
Smart-like [EFO:0010184] and its descendants "transcriptomics"
spatial transcriptomics [EFO:0008994] and its descendants "transcriptomics"
SPLiT-seq [EFO:0009919] "transcriptomics"
STRT-seq [EFO:0008953] "transcriptomics"
TruDrop [EFO:0700010] "transcriptomics"

If the assay does not appear in this table, the most appropriate value MUST be selected and the curation team informed during submission so that the assay can be added to the table.

@brianraymor brianraymor added curation software dp Data Platform Team work 5.1 Next minor CELLxGENE schema version after 5.0 5.2 Next minor CELLxGENE schema version after 5.1 and removed 5.1 Next minor CELLxGENE schema version after 5.0 labels Apr 30, 2024
@brianraymor brianraymor changed the title cellxgene-schema CLI must add validation for modality and update validation for X cellxgene-schema CLI must add validation for obs['modality'] and update validation for X Jun 6, 2024
@brianraymor brianraymor removed the 5.2 Next minor CELLxGENE schema version after 5.1 label Aug 13, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
curation software dp Data Platform Team work
Projects
None yet
Development

No branches or pull requests

1 participant