Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Defining colors per-point programmatically #1255

Closed
willwhitney opened this issue Dec 7, 2018 · 9 comments
Closed

Defining colors per-point programmatically #1255

willwhitney opened this issue Dec 7, 2018 · 9 comments

Comments

@willwhitney
Copy link

Altair's color encoding system is very simple to use, but due to its simplicity it fails to cover a variety of long-tail specific use cases. In order to make it more flexible, while preserving the simplicity of the API, I think it would be useful to be able to pass a DataFrame with a column which is a literal color value:

data = pd.DataFrame({'x': np.random.uniform(0, 1, size=1000), 'y': np.random.uniform(0, 1, size=1000)})
data['color'] = '#' + np.floor(data['x'] * 10).astype(int).astype(str) + '00000'

scale_x = alt.Scale(range=['black', 'red'])
alt.Chart(data).mark_point(opacity=0.3).encode(
    x='x',
    y='y',
    color=alt.Color('color')
)

screen shot 2018-12-07 at 12 01 53 pm

This would allow the user to construct any arbitrary mapping between their data and colors, without burdening Altair/Vega with having to represent these mappings. As far as I can tell this is not currently possible.

As an example of the utility of such an API, here's my current problem: I'm looking to make a chart where the color of each point is based on the values of two different (numerical) columns. For a simple case, here are samples from a Gaussian where each point is colored with its green channel determined by its x position and its red channel determined by its y position.

2d_color_big

As far as I could tell there isn't a way to do this at the moment. The API I'm proposing would let me construct the right color for each point in Python, then just render them with Altair.

Please correct me if there's already a way to do any of this!

@jakevdp
Copy link
Collaborator

jakevdp commented Dec 7, 2018

You can do this by setting scale=None. For example:

import pandas as pd
import altair as alt

data = pd.DataFrame({
    'x': [1, 2, 3, 4, 5, 6],
    'y': [5, 2, 4, 3, 7, 4],
    'color': ['#FF0000', '#00FF00', '#0000FF', '#FF00FF', '#FFFF00', '#00FFFF']
})

alt.Chart(data).mark_point(filled=True, size=300).encode(
    x='x',
    y='y',
    color=alt.Color('color', scale=None)
)

visualization 13

@willwhitney
Copy link
Author

Right on. Is this documented anywhere, or should I add it?

@jakevdp
Copy link
Collaborator

jakevdp commented Dec 7, 2018

I'm not sure it is... it would be a great addition!

@willwhitney
Copy link
Author

screen shot 2018-12-07 at 1 16 44 pm

This is awesome, thanks for the help!

@rafa-guedes
Copy link

When we use this approach the legend is suppressed. Is it possible define colors this way and set a legend based on corresponding values from another column? For example, modifying a bit the snippet from @jakevdp, is it possible to set a legend using the values from the "label" column?

import pandas as pd
import altair as alt

data = pd.DataFrame({
    'x': [1, 2, 3, 4, 5, 6],
    'y': [5, 2, 4, 3, 7, 4],
    'color': ['red', 'green', 'blue', 'green', 'blue', 'red'],
    'label': ["REDS", "GREENS", "BLUES", "GREENS", "BLUES", "REDS"]
})

alt.Chart(data).mark_point(filled=True, size=300).encode(
    x='x',
    y='y',
    color=alt.Color('color', scale=None)
)

image

@joelostblom
Copy link
Contributor

One workaround could be to set the range to the unique list of colors in that dataframe column. I believe this should assign the colors to the correct points as long as the list is sorted the same as the color assignment (which is alphabetical by default), but double check using the tooltip for your data.

import pandas as pd
import altair as alt

data = pd.DataFrame({
    'x': [1, 2, 3, 4, 5, 6],
    'y': [5, 2, 4, 3, 7, 4],
    'color': ['red', 'green', 'blue', 'green', 'blue', 'red'],
})

alt.Chart(data).mark_point(filled=True, size=300).encode(
    x='x',
    y='y',
    color=alt.Color('color').scale(range=sorted(data['color'].unique())),
    tooltip='color'
)

image

@rafa-guedes
Copy link

Thanks @joelostblom but I'm afraid this still does not solve my issue as I need the label column to define the labels in the legend. My example was not very helpful in that the values for color and label columns look similar / interchangeable but in the real case I'm trying to implement those columns are different and I need the unique values in label to be used.

@joelostblom
Copy link
Contributor

Ah I see. Fortunately that works similarly. You can replace the encoding with you label column while keeping the range the same:

import pandas as pd
import altair as alt

data = pd.DataFrame({
    'x': [1, 2, 3, 4, 5, 6],
    'y': [5, 2, 4, 3, 7, 4],
    'color': ['red', 'green', 'blue', 'green', 'blue', 'red'],
    'label': ["REDS", "GREENS", "BLUES", "GREENS", "BLUES", "REDS"]
})

alt.Chart(data).mark_point(filled=True, size=300).encode(
    x='x',
    y='y',
    color=alt.Color('label').scale(range=sorted(data['color'].unique())),
    tooltip='color'
)

image

@rafa-guedes
Copy link

That works great @joelostblom thank you :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants