Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Plot Categorical Data #17

Closed
soerenetler opened this issue Jan 17, 2023 · 2 comments · Fixed by #20
Closed

Plot Categorical Data #17

soerenetler opened this issue Jan 17, 2023 · 2 comments · Fixed by #20

Comments

@soerenetler
Copy link

Hello everyone,
thank you for this great project. The tool works great and helps a lot in visualizing visitor numbers. Is it also possible to plot categorical data? If I just use a column with string values this does not work (because no maximum value is defined).
This would be really helpful to plot e.g. days with low, medium and high counts of visitors.
I know it is possible to work around this using hacks with the color scale, but another way would be much easier.

Thank you for all your work,
Sören

@brunorosilva
Copy link
Owner

Hi, thanks for the feedback.

I've been on a personal break so far and I'll work on this feature on this weekend. It makes sense for this to exist. A simple workaround is to create ordered categories for your continuous variable.

Examples

Setup

import pandas as pd
from plotly_calplot import calplot
import numpy as np
dummy_start_date = "2022-01-01"
dummy_end_date = "2023-10-03"
dummy_df = pd.DataFrame(
    {
        "ds": pd.date_range(dummy_start_date, dummy_end_date),
        "value": np.random.randint(
            0,
            30,
            (pd.to_datetime(dummy_end_date) - pd.to_datetime(dummy_start_date)).days
            + 1,
        ),
    }
)

Categorizing linearly (creates no emphasis)

def continuous_to_categoric(v):
    if v < 10:
        return 0
    elif v < 20:
        return 1
    return 2

dummy_df["cat_value"] = dummy_df["value"].apply(lambda x: continuous_to_categoric(x))
fig1 = calplot(dummy_df, x="ds", y="cat_value", dark_theme=False, years_title=True)

image

Categorizing with lower bound (emphasis on higher values)

def continuous_to_categoric(v):
    if v < 10:
        return 0
    elif v < 20:
        return 1
    return 5

dummy_df["cat_value"] = dummy_df["value"].apply(lambda x: continuous_to_categoric(x))
fig1 = calplot(dummy_df, x="ds", y="cat_value", dark_theme=False, years_title=True)

image

Categorizing with higher bound (emphasis on lower values)

def continuous_to_categoric(v):
    if v < 5:
        return 5
    elif v < 20:
        return 1
    return 0

dummy_df["cat_value"] = dummy_df["value"].apply(lambda x: continuous_to_categoric(x))
fig1 = calplot(dummy_df, x="ds", y="cat_value", dark_theme=False, years_title=True, colorscale="RdPu")

image

@soerenetler
Copy link
Author

soerenetler commented Feb 10, 2023

Thank you a lot for the detailed code examples. This helps to create the plot, but the colorscale is afterwards completely off and needs to be created separately. But it is a good workaround for now.

@brunorosilva brunorosilva mentioned this issue Mar 27, 2023
@brunorosilva brunorosilva linked a pull request Mar 27, 2023 that will close this issue
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants