Defining colors per-point programmatically #1255

willwhitney · 2018-12-07T17:15:40Z

Altair's color encoding system is very simple to use, but due to its simplicity it fails to cover a variety of long-tail specific use cases. In order to make it more flexible, while preserving the simplicity of the API, I think it would be useful to be able to pass a DataFrame with a column which is a literal color value:

data = pd.DataFrame({'x': np.random.uniform(0, 1, size=1000), 'y': np.random.uniform(0, 1, size=1000)})
data['color'] = '#' + np.floor(data['x'] * 10).astype(int).astype(str) + '00000'

scale_x = alt.Scale(range=['black', 'red'])
alt.Chart(data).mark_point(opacity=0.3).encode(
    x='x',
    y='y',
    color=alt.Color('color')
)

This would allow the user to construct any arbitrary mapping between their data and colors, without burdening Altair/Vega with having to represent these mappings. As far as I can tell this is not currently possible.

As an example of the utility of such an API, here's my current problem: I'm looking to make a chart where the color of each point is based on the values of two different (numerical) columns. For a simple case, here are samples from a Gaussian where each point is colored with its green channel determined by its x position and its red channel determined by its y position.

As far as I could tell there isn't a way to do this at the moment. The API I'm proposing would let me construct the right color for each point in Python, then just render them with Altair.

Please correct me if there's already a way to do any of this!

The text was updated successfully, but these errors were encountered:

jakevdp · 2018-12-07T17:47:38Z

You can do this by setting scale=None. For example:

import pandas as pd
import altair as alt

data = pd.DataFrame({
    'x': [1, 2, 3, 4, 5, 6],
    'y': [5, 2, 4, 3, 7, 4],
    'color': ['#FF0000', '#00FF00', '#0000FF', '#FF00FF', '#FFFF00', '#00FFFF']
})

alt.Chart(data).mark_point(filled=True, size=300).encode(
    x='x',
    y='y',
    color=alt.Color('color', scale=None)
)

willwhitney · 2018-12-07T18:02:24Z

Right on. Is this documented anywhere, or should I add it?

jakevdp · 2018-12-07T18:15:19Z

I'm not sure it is... it would be a great addition!

willwhitney · 2018-12-07T18:22:18Z

This is awesome, thanks for the help!

rafa-guedes · 2024-03-12T22:51:37Z

When we use this approach the legend is suppressed. Is it possible define colors this way and set a legend based on corresponding values from another column? For example, modifying a bit the snippet from @jakevdp, is it possible to set a legend using the values from the "label" column?

import pandas as pd
import altair as alt

data = pd.DataFrame({
    'x': [1, 2, 3, 4, 5, 6],
    'y': [5, 2, 4, 3, 7, 4],
    'color': ['red', 'green', 'blue', 'green', 'blue', 'red'],
    'label': ["REDS", "GREENS", "BLUES", "GREENS", "BLUES", "REDS"]
})

alt.Chart(data).mark_point(filled=True, size=300).encode(
    x='x',
    y='y',
    color=alt.Color('color', scale=None)
)

joelostblom · 2024-03-12T23:27:04Z

One workaround could be to set the range to the unique list of colors in that dataframe column. I believe this should assign the colors to the correct points as long as the list is sorted the same as the color assignment (which is alphabetical by default), but double check using the tooltip for your data.

import pandas as pd
import altair as alt

data = pd.DataFrame({
    'x': [1, 2, 3, 4, 5, 6],
    'y': [5, 2, 4, 3, 7, 4],
    'color': ['red', 'green', 'blue', 'green', 'blue', 'red'],
})

alt.Chart(data).mark_point(filled=True, size=300).encode(
    x='x',
    y='y',
    color=alt.Color('color').scale(range=sorted(data['color'].unique())),
    tooltip='color'
)

rafa-guedes · 2024-03-12T23:47:50Z

Thanks @joelostblom but I'm afraid this still does not solve my issue as I need the label column to define the labels in the legend. My example was not very helpful in that the values for color and label columns look similar / interchangeable but in the real case I'm trying to implement those columns are different and I need the unique values in label to be used.

joelostblom · 2024-03-13T01:36:48Z

Ah I see. Fortunately that works similarly. You can replace the encoding with you label column while keeping the range the same:

import pandas as pd
import altair as alt

data = pd.DataFrame({
    'x': [1, 2, 3, 4, 5, 6],
    'y': [5, 2, 4, 3, 7, 4],
    'color': ['red', 'green', 'blue', 'green', 'blue', 'red'],
    'label': ["REDS", "GREENS", "BLUES", "GREENS", "BLUES", "REDS"]
})

alt.Chart(data).mark_point(filled=True, size=300).encode(
    x='x',
    y='y',
    color=alt.Color('label').scale(range=sorted(data['color'].unique())),
    tooltip='color'
)

rafa-guedes · 2024-03-13T03:03:59Z

That works great @joelostblom thank you :)

willwhitney closed this as completed Dec 7, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Defining colors per-point programmatically #1255

Defining colors per-point programmatically #1255

willwhitney commented Dec 7, 2018

jakevdp commented Dec 7, 2018

willwhitney commented Dec 7, 2018

jakevdp commented Dec 7, 2018

willwhitney commented Dec 7, 2018

rafa-guedes commented Mar 12, 2024

joelostblom commented Mar 12, 2024

rafa-guedes commented Mar 12, 2024

joelostblom commented Mar 13, 2024

rafa-guedes commented Mar 13, 2024

Defining colors per-point programmatically #1255

Defining colors per-point programmatically #1255

Comments

willwhitney commented Dec 7, 2018

jakevdp commented Dec 7, 2018

willwhitney commented Dec 7, 2018

jakevdp commented Dec 7, 2018

willwhitney commented Dec 7, 2018

rafa-guedes commented Mar 12, 2024

joelostblom commented Mar 12, 2024

rafa-guedes commented Mar 12, 2024

joelostblom commented Mar 13, 2024

rafa-guedes commented Mar 13, 2024