From e2080b398276388cd34d3e28896bc8b372896b9e Mon Sep 17 00:00:00 2001 From: evanbiederstedt Date: Sun, 30 Nov 2025 19:21:32 -0500 Subject: [PATCH 1/5] add clarification --- cap-anndata-schema.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/cap-anndata-schema.md b/cap-anndata-schema.md index d74ab6c..53c6d36 100644 --- a/cap-anndata-schema.md +++ b/cap-anndata-schema.md @@ -396,7 +396,7 @@ NOTE: A dataset may have multiple sets of cell annotations each with a coorespo NOTE: Certain keywords have been reserved for annotating cells: - The term `'doublets'` is reserved for encoding cells defined as doublets based on some computational analysis. By “doublets”, we refer to the sequencing artifact within droplet-based protocols whereby two or more cells are tagged with the same barcode. -- The term `'junk'` is reserved for encoding cells that failed sequencing for some reason, e.g. few genes detected, high fraction of mitochondrial read. Researchers have found such a generic term useful. +- The term `'junk'` is reserved for encoding cells that failed sequencing (and QC filtering) for some reason, e.g. few genes detected, high fraction of mitochondrial read. Researchers have found such a generic term useful. - The term `'unknown'` is specifically reserved for cells which the author did not know how to annotate with a biological entity. It is a generic term meaning “I do not know”. **Format:** The column name is the string `[cellannotation_setname]` and the values are the strings of `cell_label`. Refer to the fields `cellannotation_setname` and `cell_label` in the JSON Schema. From 293d97bcd15f501c039070bc5f77c07702e329d0 Mon Sep 17 00:00:00 2001 From: evanbiederstedt Date: Sun, 30 Nov 2025 22:46:28 -0500 Subject: [PATCH 2/5] refinement on full name --- cap-anndata-schema.md | 2 ++ 1 file changed, 2 insertions(+) diff --git a/cap-anndata-schema.md b/cap-anndata-schema.md index 53c6d36..e465b21 100644 --- a/cap-anndata-schema.md +++ b/cap-anndata-schema.md @@ -441,6 +441,8 @@ For example, if the user specified the cell annotation as `broad_cells1`, then t NOTE: The `[cellannotation_setname]--cell_fullname` field is intended for cases where a cell annotation does not exist in the corresponding ontology. This field should contain a suggested name for a new ontology entity. In the more common case where an ontology term already exists for this cell annotation, this field must be identical to `[cellannotation_setname]--cell_ontology_term`. +NOTE: In the case of cell types first characterized by single-cell RNA sequencing (scRNAseq) with no corresponding term in the ontology, we *STRONGLY* encourage users to use gene expression as nomenclature, e.g. "Dendritic Cells AXL+ SIGLEC6+". + From 0dfaa38a97452f6acb7452a925c216af0afc4c09 Mon Sep 17 00:00:00 2001 From: evanbiederstedt Date: Mon, 1 Dec 2025 02:50:19 -0500 Subject: [PATCH 3/5] refine gene names, abbrev note --- cap-anndata-schema.md | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/cap-anndata-schema.md b/cap-anndata-schema.md index e465b21..a529006 100644 --- a/cap-anndata-schema.md +++ b/cap-anndata-schema.md @@ -416,7 +416,7 @@ NOTE: Certain keywords have been reserved for annotating cells: - + @@ -616,7 +616,7 @@ NOTE: If the `[cellannotation_setname]--cell_ontology_exists` field is `False`, - +
column
valueAny free-text term which the author uses to annotate cells, the preferred cell label name used by the author.Any free-text term which the author uses to annotate cells, the preferred cell label name used by the author. Abbreviations are acceptable.
source
example'This cell was annotated with [blank] given the canonical markers in the field [X], [Y], [Z]. We noticed [X] and [Y] running differential expression.''This cell was annotated with [blank] given the canonical markers in the field [X], [Y], [Z]. We noticed [X] and [Y] running differential expression using Seurat v5.'
@@ -686,7 +686,7 @@ NOTE: If the `[cellannotation_setname]--cell_ontology_exists` field is `False`, example - 'TP53, KRAS, BRCA1' + 'AXL, SIGLEC1, SIGLEC6' From 8a1c2ae2d0b987170b96427fb6c9b6f7864508d0 Mon Sep 17 00:00:00 2001 From: evanbiederstedt Date: Mon, 1 Dec 2025 02:53:10 -0500 Subject: [PATCH 4/5] more clear synonyms --- cap-anndata-schema.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/cap-anndata-schema.md b/cap-anndata-schema.md index a529006..e132336 100644 --- a/cap-anndata-schema.md +++ b/cap-anndata-schema.md @@ -758,7 +758,7 @@ NOTE: If the `[cellannotation_setname]--cell_ontology_exists` field is `False`, example - 'neuroglial cell, glial cell, neuroglia' or 'amacrine cell' or 'FMB cell' + 'neuroglial cell, glial cell, neuroglia' or 'effector B cells, plasma B-cells, plasmacyte' From 57425c509dd5de9ab379f38289d39e04f9269d84 Mon Sep 17 00:00:00 2001 From: evanbiederstedt Date: Mon, 1 Dec 2025 14:30:11 -0500 Subject: [PATCH 5/5] remove note about researchers finding term useful --- cap-anndata-schema.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/cap-anndata-schema.md b/cap-anndata-schema.md index e132336..28f951b 100644 --- a/cap-anndata-schema.md +++ b/cap-anndata-schema.md @@ -396,7 +396,7 @@ NOTE: A dataset may have multiple sets of cell annotations each with a coorespo NOTE: Certain keywords have been reserved for annotating cells: - The term `'doublets'` is reserved for encoding cells defined as doublets based on some computational analysis. By “doublets”, we refer to the sequencing artifact within droplet-based protocols whereby two or more cells are tagged with the same barcode. -- The term `'junk'` is reserved for encoding cells that failed sequencing (and QC filtering) for some reason, e.g. few genes detected, high fraction of mitochondrial read. Researchers have found such a generic term useful. +- The term `'junk'` is reserved for encoding cells that failed sequencing (and QC filtering) for some reason, e.g. few genes detected, high fraction of mitochondrial read. - The term `'unknown'` is specifically reserved for cells which the author did not know how to annotate with a biological entity. It is a generic term meaning “I do not know”. **Format:** The column name is the string `[cellannotation_setname]` and the values are the strings of `cell_label`. Refer to the fields `cellannotation_setname` and `cell_label` in the JSON Schema.