Update uns[{column}_colors] to match ordering in implementation #854

brian-mott · 2024-04-22T19:01:10Z

Context

While curating a recent Collection, I ran into some ambiguity to wrangle column color information from the obs dataframe into uns[{column}_colors]. The readme notes that:

The color code at the Nth position in the ndarray corresponds to the Nth category of anndata.obs.{column}.unique()

anndata.obs.{column}.unique() will return an unsorted list of category names for {column} as they are encountered in the obs dataframe.

It looks like this line in the single_cell_data_portal repo assigns colors based on the adata.obs[column].cat.categories attribute, which returns a sorted list of the category names.

Once I realized this, it was easy to map colors in the proper order after sorting or using adata.obs[column].cat.categories but I think it would be helpful to have some further clarity explaining that the uns[{column}_colors] array depends on the sorted order of category names.

Design (@brianraymor)

{column}_colors

Key

{column}_colors where {column} MUST be the name of a category data type column in obs that
is annotated by the data submitter or curator. The following columns that are annotated by CELLxGENE
Discover MUST NOT be specified as {column}:

assay
cell_type
development_stage
disease
organism
self_reported_ethnicity
sex
tissue

Instead annotate {column}_ontology_term_id_colors for these columns such as assay_ontology_term_id.

Annotator Curator MAY annotate.

Value

numpy.ndarray. This MUST be a 1-D array of shape (, c), where c is greater than or equal to the
number of categories in the {column} as calculated by:

anndata.obs.{column}.cat.categories.size

The color code at the Nth position in the ndarray corresponds to the Nth category of anndata.obs.{column}.cat.categories.

For example, if cell_type_ontology_term_id includes two categories:

anndata.obs.cell_type_ontology_term_id.cat.categories.values

array(['CL:0000057', 'CL:0000115'], dtype='object')

then cell-type_ontology_term_id_colors MUST contain two or more colors such as:

['aqua' 'blueviolet']

where 'aqua' is the color assigned to 'CL:0000057' and 'blueviolet' is the color assigned to
'CL:0000115'.

All elements in the ndarray MUST use the same color model, limited to:

Color Model	Element Format
Named Colors	`str`. MUST be a case-insensitive CSS4 color name with no spaces such as `"aliceblue"`
Hex Triplet	`str`. MUST start with `"#"` immediately followed by six case-insensitive hexadecimal characters as in `"#08c0ff"`

The text was updated successfully, but these errors were encountered:

brianraymor · 2024-04-22T19:57:01Z

Hmm. The objective is that the implementation match the schema requirements; otherwise, it should not pass QA.

See an earlier comment thread where the difference between unique and cat.catgories was under discussion. And the same line of code was referenced.

CC: @niknak33 @atarashansky

brianraymor · 2024-04-25T21:13:58Z

April 25 - per review with @dsadgat and recommendation by @jahilton, the schema to be updated to match the current CXG implementation. It will specify cat.categories instead of unique().

brianraymor added the schema CELLxGENE Discover dataset schema label Apr 22, 2024

brianraymor added the 5.1 Next minor CELLxGENE schema version after 5.0 label Apr 25, 2024

brianraymor changed the title ~~Clarity for curation of uns[{column}_colors]~~ Update uns[{column}_colors] to match ordering in implementation May 1, 2024

brianraymor mentioned this issue May 29, 2024

updated curation instructions for column_colors #946

Merged

brianraymor closed this as completed in #946 May 29, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Update uns[{column}_colors] to match ordering in implementation #854

Update uns[{column}_colors] to match ordering in implementation #854

brian-mott commented Apr 22, 2024 •

edited by brianraymor

brianraymor commented Apr 22, 2024

brianraymor commented Apr 25, 2024

Update uns[{column}_colors] to match ordering in implementation #854

Update uns[{column}_colors] to match ordering in implementation #854

Comments

brian-mott commented Apr 22, 2024 • edited by brianraymor

Context

Design (@brianraymor)

{column}_colors

brianraymor commented Apr 22, 2024

brianraymor commented Apr 25, 2024

brian-mott commented Apr 22, 2024 •

edited by brianraymor