-
Notifications
You must be signed in to change notification settings - Fork 22
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
do or do not (there is no try) _colors #216
Comments
Caught another issue. colors_bug.mov |
[since revised] See #single-cell-data-wrangling. |
@signechambers1 to document the full set of Explorer requirements by Q2 which is required for inclusion in schema 4.0. |
@brianraymor here are the H5AD criteria for _colors functionality for explorer:
Thank you @atarashansky for the help. @brianraymor let us know if you have suggestions or questions. |
CC: @jahilton @jychien - can you please review based on your experience in debugging _colors?
Here's the relevant text which was also linked in the top-level summary comment for this issue as a reference: Presentation HintsThe metadata fields below are optional. They aren't needed for integration, and cellxgene can display the data fine without them, but if they are included cellxgene will do something with them. This allows submitters to fine-tune how their datasets are presented, which is a common request.
|
There is definitely a length requirement where you need to have as many colors as you have categories (comment above)...
Currently, this case is ignored & passes validation seemingly without any downstream issues. We do have a block of qa code that checks for this as 'best practice' so I don't have any concern enforcing it in the validator, but just wanted to note the change in behavior. (clean-up of existing datasets would be incredibly straightforward during a migration) |
|
Thank you all! Added the length requirement to my original comment. @atarashansky can you determine if multiple color formats will work, whether an array or list is expected by explorer, and if colors work for categorical and categorical integer fields? |
If possible, please write requirements in text and not in code fragments. I called this out earlier. Isn't this a missing constraint? Submitters can include a field called "{field}_colors" for any other categorical integer metadata field. |
Updated requirements in plain text with outstanding questions:
|
Categorical integer fields are supported so long as they are pandas Categoricals.
I think colors actually can be defined for categorical fields that do not exist - those colors will just be skipped and not added to the cxg group metadata.
If there are more colors, it will not error. The extra colors will be ignored.
Ah, thank you for correcting me @brianraymor. Here are the actual color requirements. As you indicated, this is validated and will raise an exception so we're good on this front:
Either numpy array or python lists work. |
@atarashansky @signechambers1 - should this reference be updated to - A CSS4 color name, as supported by matplotlib https://matplotlib.org/stable/gallery/color/named_colors.html or did Explorer pin |
@atarashansky - should the shape of the _colors ndarray be derived from:
@bkmartinjr kindly educated me on the differences, because I was too lazy to RTFM this evening:
I don't have a horse in this race, but want to document only one approach where the schema matches the implementation. Recommendation? |
I don't even see matplotlib as a dependency in explorer. |
By Explorer, I meant the CXG conversion code for Explorer. There must be some method for validating CSS4 color names?
I'm unfamiliar with a color model named RFB. From my perspective, it's simply a different way of specifying RGB as illustrated by CSS 4 examples: The color type provides multiple ways to syntactically specify a given color. For example, the following declarations all specify the sRGB color “lime”:
Could we eliminate /* RGB range 0-255 */ as a supported case. I was reviewing matplotlib which does not seem to support the RGB range 0-255 style, preferring: RGB or RGBA (red, green, blue, alpha) tuple of float values in a closed interval [0, 1]. |
RE
I'm reviewing a live example that defines 48 colors for 53 categoricals. Is the actual requirement - "There must be at least one color?" |
Can you link me to this example? If there are 48 colors for 53 categoricals, then that should cause a bug in explorer. |
Looks like we pin Matplotlib to 3.6.3: Here, it looks like we always cast to 0-255: I agree we should change it to match matplotlib specifications/preferences. |
The only requirement is that the len of _colors must be greater than or equal to the number of categories (
Why
Correct. |
I'm not sure how to interpret this statement. Are you indicating that you believe unique to be equivalent to cat.categories? If so, please see this earlier comment. |
It will be converted to an
|
Of the three choices: Confirming the consensus from the #single-cell-data-wrangling:
Option 3 was the preference. My assumption is that Explorer continues to display the ontology labels but uses the related term ids for colors? This needs to be documented in the schema. Are we going to warn or fail if a submitter specifies |
Gotcha, wasn't aware of this.
You interpreted correctly, and TIL. Then the requirements should be
At conversion time, Explorer will be updated to assign the custom colors to the corresponding label column's categories. |
RE
This must fail then because |
RIght, yes, agreed! |
Inquiring minds. What is the scenario for allowing the length of CC: @jahilton |
@brianraymor That would make sense if they just wanted to apply a custom discrete colormap to all categories. |
Design
uns
(Dataset Metadata)uns
is a ordered dictionary with astr
key. Curators MAY annotate the following keys and values inuns
:{category}_colors
category
data type column inobs
thatis annotated by the data submitter or curator. The following columns that are annotated by CELLxGENE
Discover MUST NOT be specified as {category}:
numpy.ndarray
. This MUST be a 1-D array of shape(, c)
, wherec
is greater than or equal to thenumber of unique categories in the {category} column as calculated by:
len(anndata.obs.{category}.unique())
The color code at the Nth position in the
ndarray
corresponds to the Nth category of anndata.obs.{category}.unique().For example, if
cell_type_ontology_term_id
includes two unique categories:anndata.obs.cell_type_ontology_term_id.unique()
['CL:0000057', 'CL:0000115']
Categories (2, object): ['CL:0000057', 'CL:0000115']
then
cell-type_ontology_term_id_colors
MUST contain two or more colors such as:['aqua' 'blueviolet']
where
'aqua'
is the color assigned to'CL:0000057'
and'blueviolet'
is the color assigned to'CL:0000115'
.All elements in the
ndarray
MUST use the same color model, limited to:str
. MUST be a case-insensitive CSS4 color name with no spaces such as"aliceblue"
str
. MUST start with"#"
immediately followed by six case-insensitive hexadecimalcharacters as in
"#08c0ff"
Context
_colors
was not included in schema 2.0.0 because it was perceived as an under documented feature of cellxgene desktop and was not present in most submissions in the portal at the time. Also see category colors and RFC: User-defined colors.There was no validation performed by
cellxgene-schema CLI 1.1.0
or in the subsequent cellxgene conversion so support was brittle. The current CXG conversion code limits validation to color format:convert_anndata_category_colors_to_cxg_category_colors
Recently, there was a CXG conversion failure during data ingestion reported in #single-cell-data-wrangling due to the following causes:
_colors
should either be blocked+fail or documented+validated in the future.There are more testing details documented in the Category Colors section of Prepare Data;
To test that you've done this properly, check that for your given category the number of colors match the number of category values and that the second command below results in a mapping from categories to colors.
The text was updated successfully, but these errors were encountered: