Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update uns[{column}_colors] to match ordering in implementation #854

Closed
brian-mott opened this issue Apr 22, 2024 · 2 comments · Fixed by #946
Closed

Update uns[{column}_colors] to match ordering in implementation #854

brian-mott opened this issue Apr 22, 2024 · 2 comments · Fixed by #946
Labels
5.1 Next minor CELLxGENE schema version after 5.0 schema CELLxGENE Discover dataset schema

Comments

@brian-mott
Copy link
Collaborator

brian-mott commented Apr 22, 2024

Context

While curating a recent Collection, I ran into some ambiguity to wrangle column color information from the obs dataframe into uns[{column}_colors]. The readme notes that:

The color code at the Nth position in the ndarray corresponds to the Nth category of anndata.obs.{column}.unique()

anndata.obs.{column}.unique() will return an unsorted list of category names for {column} as they are encountered in the obs dataframe.

It looks like this line in the single_cell_data_portal repo assigns colors based on the adata.obs[column].cat.categories attribute, which returns a sorted list of the category names.

Once I realized this, it was easy to map colors in the proper order after sorting or using adata.obs[column].cat.categories but I think it would be helpful to have some further clarity explaining that the uns[{column}_colors] array depends on the sorted order of category names.

Design (@brianraymor)

{column}_colors

Key {column}_colors where {column} MUST be the name of a category data type column in obs that
is annotated by the data submitter or curator. The following columns that are annotated by CELLxGENE
Discover MUST NOT be specified as {column}:

  • assay
  • cell_type
  • development_stage
  • disease
  • organism
  • self_reported_ethnicity
  • sex
  • tissue

Instead annotate {column}_ontology_term_id_colors for these columns such as assay_ontology_term_id.

Annotator Curator MAY annotate.
Value numpy.ndarray. This MUST be a 1-D array of shape (, c), where c is greater than or equal to the
number of categories in the {column} as calculated by:

anndata.obs.{column}.cat.categories.size

The color code at the Nth position in the ndarray corresponds to the Nth category of anndata.obs.{column}.cat.categories.

For example, if cell_type_ontology_term_id includes two categories:

anndata.obs.cell_type_ontology_term_id.cat.categories.values

array(['CL:0000057', 'CL:0000115'], dtype='object')

then cell-type_ontology_term_id_colors MUST contain two or more colors such as:

['aqua' 'blueviolet']

where 'aqua' is the color assigned to 'CL:0000057' and 'blueviolet' is the color assigned to
'CL:0000115'.

All elements in the ndarray MUST use the same color model, limited to:

Color Model Element Format
Named Colors str. MUST be a case-insensitive CSS4 color name with no spaces such as
"aliceblue"
Hex Triplet str. MUST start with "#" immediately followed by six case-insensitive hexadecimal
characters as in "#08c0ff"

@brianraymor brianraymor added the schema CELLxGENE Discover dataset schema label Apr 22, 2024
@brianraymor
Copy link
Contributor

Hmm. The objective is that the implementation match the schema requirements; otherwise, it should not pass QA.

See an earlier comment thread where the difference between unique and cat.catgories was under discussion. And the same line of code was referenced.

CC: @niknak33 @atarashansky

@brianraymor
Copy link
Contributor

April 25 - per review with @dsadgat and recommendation by @jahilton, the schema to be updated to match the current CXG implementation. It will specify cat.categories instead of unique().

@brianraymor brianraymor added the 5.1 Next minor CELLxGENE schema version after 5.0 label Apr 25, 2024
@brianraymor brianraymor changed the title Clarity for curation of uns[{column}_colors] Update uns[{column}_colors] to match ordering in implementation May 1, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
5.1 Next minor CELLxGENE schema version after 5.0 schema CELLxGENE Discover dataset schema
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants