You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
While curating a recent Collection, I ran into some ambiguity to wrangle column color information from the obs dataframe into uns[{column}_colors]. The readme notes that:
The color code at the Nth position in the ndarray corresponds to the Nth category of anndata.obs.{column}.unique()
anndata.obs.{column}.unique() will return an unsorted list of category names for {column} as they are encountered in the obs dataframe.
It looks like this line in the single_cell_data_portal repo assigns colors based on the adata.obs[column].cat.categories attribute, which returns a sorted list of the category names.
Once I realized this, it was easy to map colors in the proper order after sorting or using adata.obs[column].cat.categories but I think it would be helpful to have some further clarity explaining that the uns[{column}_colors] array depends on the sorted order of category names.
{column}_colors where {column} MUST be the name of a category data type column in obs that is annotated by the data submitter or curator. The following columns that are annotated by CELLxGENE Discover MUST NOT be specified as {column}:
assay
cell_type
development_stage
disease
organism
self_reported_ethnicity
sex
tissue
Instead annotate {column}_ontology_term_id_colors for these columns such as assay_ontology_term_id.
Annotator
Curator MAY annotate.
Value
numpy.ndarray. This MUST be a 1-D array of shape (, c), where c is greater than or equal to the number of categories in the {column} as calculated by:
anndata.obs.{column}.cat.categories.size
The color code at the Nth position in the ndarray corresponds to the Nth category of anndata.obs.{column}.cat.categories.
For example, if cell_type_ontology_term_id includes two categories:
April 25 - per review with @dsadgat and recommendation by @jahilton, the schema to be updated to match the current CXG implementation. It will specify cat.categories instead of unique().
Context
While curating a recent Collection, I ran into some ambiguity to wrangle column color information from the obs dataframe into uns[{column}_colors]. The readme notes that:
anndata.obs.{column}.unique()
will return an unsorted list of category names for {column} as they are encountered in the obs dataframe.It looks like this line in the
single_cell_data_portal
repo assigns colors based on theadata.obs[column].cat.categories
attribute, which returns a sorted list of the category names.Once I realized this, it was easy to map colors in the proper order after sorting or using
adata.obs[column].cat.categories
but I think it would be helpful to have some further clarity explaining that theuns[{column}_colors]
array depends on the sorted order of category names.Design (@brianraymor)
{column}_colors
category
data type column inobs
thatis annotated by the data submitter or curator. The following columns that are annotated by CELLxGENE
Discover MUST NOT be specified as {column}:
Instead annotate {column}_ontology_term_id_colors for these columns such as
assay_ontology_term_id
.numpy.ndarray
. This MUST be a 1-D array of shape(, c)
, wherec
is greater than or equal to thenumber of categories in the {column} as calculated by:
anndata.obs.{column}.cat.categories.size
The color code at the Nth position in the
ndarray
corresponds to the Nth category of anndata.obs.{column}.cat.categories.For example, if
cell_type_ontology_term_id
includes two categories:anndata.obs.cell_type_ontology_term_id.cat.categories.values
array(['CL:0000057', 'CL:0000115'], dtype='object')
then
cell-type_ontology_term_id_colors
MUST contain two or more colors such as:['aqua' 'blueviolet']
where
'aqua'
is the color assigned to'CL:0000057'
and'blueviolet'
is the color assigned to'CL:0000115'
.All elements in the
ndarray
MUST use the same color model, limited to:str
. MUST be a case-insensitive CSS4 color name with no spaces such as"aliceblue"
str
. MUST start with"#"
immediately followed by six case-insensitive hexadecimalcharacters as in
"#08c0ff"
The text was updated successfully, but these errors were encountered: