Skip to content

fix: Make dx histogram behavior consistent with px#1002

Merged
jnumainville merged 18 commits intodeephaven:mainfrom
jnumainville:668_hist
Mar 20, 2025
Merged

fix: Make dx histogram behavior consistent with px#1002
jnumainville merged 18 commits intodeephaven:mainfrom
jnumainville:668_hist

Conversation

@jnumainville
Copy link
Copy Markdown
Collaborator

Fixes: #668

The following examples are now basically consistent.
During testing I discovered #1001 so the legend titles are not set in a couple of these, but that is beyond the scope of this ticket as that is a general plot by issue, not a histogram specific one.

Note a few intentional differences in the args. It's not expected that dx histograms and px histograms aggregate in the same exact way. Plotly infers the equivalent of range_bins hence the extra argument. Additionally we have barmode="group" as our default whereas in Plotly it is barmode="relative". I believe this was an intentional choice on our end because it's hard to compare the different traces with barmode="relative" although I don't have a record of it.

import plotly.express as px
import deephaven.plot.express as dx

df = px.data.tips()
fig = px.histogram(df, x="size", y="total_bill")
fig2 = px.histogram(df, x="size", y="total_bill", histfunc="count", nbins=6)
fig3 = px.histogram(df, x="size", y="total_bill", orientation="h", nbins=6)
fig4 = px.histogram(df, x="size", y="total_bill", orientation="h", histfunc="count", nbins=6)
fig5 = px.histogram(df, x=["tip", "total_bill"], y="size", nbins=6)
fig6 = px.histogram(df, x=["tip", "total_bill"], y="size", orientation="h", nbins=6)
fig7 = px.histogram(df, x="size", y=["tip", "total_bill"], nbins=6)
fig8 = px.histogram(df, x="size", y=["tip", "total_bill"], orientation="h", nbins=6)
fig9 = px.histogram(df, x=["tip", "total_bill"], y="size", histfunc="count", nbins=6)
fig10 = px.histogram(df, x=["tip", "total_bill"], y="size", orientation="h", histfunc="count")
fig11 = px.histogram(df, x="size", y=["tip", "total_bill"], histfunc="count", nbins=6)
fig12 = px.histogram(df, x="size", y=["tip", "total_bill"], orientation="h", histfunc="count", nbins=6)
fig13 = px.histogram(df, x="size", y=["tip", "total_bill"], histnorm="probability", barnorm="percent")
fig14 = px.histogram(df, x="size")
fig15 = px.histogram(df, y="size")

fig1d = dx.histogram(df, x="size", y="total_bill", nbins=6, range_bins=[0, 7])
fig2d = dx.histogram(df, x="size", y="total_bill", histfunc="count", nbins=6, range_bins=[0, 7])
fig3d = dx.histogram(df, x="size", y="total_bill", orientation="h", nbins=6, range_bins=[0, 60])
fig4d = dx.histogram(df, x="size", y="total_bill", orientation="h", histfunc="count", nbins=6, range_bins=[0, 60])
fig5d = dx.histogram(df, x=["tip", "total_bill"], y="size", nbins=6, range_bins=[0, 60], barmode="relative")
fig6d = dx.histogram(df, x=["tip", "total_bill"], y="size", orientation="h", nbins=6, range_bins=[0, 7], barmode="relative")
fig7d = dx.histogram(df, x="size", y=["tip", "total_bill"], nbins=6, range_bins=[0, 7], barmode="relative")
fig8d = dx.histogram(df, x="size", y=["tip", "total_bill"], orientation="h", nbins=6, range_bins=[0, 60], barmode="relative")
fig9d = dx.histogram(df, x=["tip", "total_bill"], y="size", histfunc="count", nbins=6, range_bins=[0, 60], barmode="relative")
fig10d = dx.histogram(df, x=["tip", "total_bill"], y="size", orientation="h", histfunc="count", nbins=6, range_bins=[0, 7], barmode="relative")
fig11d = dx.histogram(df, x="size", y=["tip", "total_bill"], histfunc="count", nbins=6, range_bins=[0, 7], barmode="relative")
fig12d = dx.histogram(df, x="size", y=["tip", "total_bill"], orientation="h", histfunc="count", nbins=6, range_bins=[0, 60], barmode="relative")
fig13d = dx.histogram(df, x="size", y=["tip", "total_bill"], histnorm="probability", barnorm="percent", range_bins=[0,7], nbins=6, barmode="relative")
fig14d = dx.histogram(df, x="size", nbins=6)
fig15d = dx.histogram(df, y="size", nbins=6)

@jnumainville jnumainville requested review from a team and mattrunyon and removed request for a team November 8, 2024 16:24
margaretkennedy
margaretkennedy previously approved these changes Nov 8, 2024
Comment thread plugins/plotly-express/docs/histogram.md Outdated
Comment thread plugins/plotly-express/src/deephaven/plot/express/deephaven_figure/generate.py Outdated
Comment on lines +185 to +188
elif self.histnorm == "probability":
hist_agg_label = f"fraction of sum of {hist_agg_label}"
elif self.histnorm == "percent":
hist_agg_label = f"percent of sum of {hist_agg_label}"
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are these X of sum of Y? Looks like this is the case where histfunc is not count or sum. So is there a histfunc that should be required for these cases?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is how plotly labels it https://github.com/plotly/plotly.py/blob/master/packages/python/plotly/plotly/express/_core.py#L239
More specifically though, hist_agg_label would contain histfunc at this point.

Comment on lines +14 to +15
pivot_vars: The vars with new column names if a list was passed in
list_var: The var that was passed in as a list
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm confused by what these do. Is pivot_vars column renames? Is list_var supposed to be something like x or y?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pivot_vars is kind of a rename - it's for when you pass in a list so the tables need to be stacked with resulting value and variable columns, although they may or may not be named exactly that if those columns already exist in the table. It's not a great name for that dict. I think I'll go ahead and refactor it as part of this.
list_var would be x or y yes, whichever parameter was a list originally

Comment on lines +49 to +58
if self.args.get(self.agg_var):
self.agg_col: str = (
pivot_vars["value"]
if pivot_vars and list_var and list_var == self.agg_var
else args[self.agg_var]
)
else:
# if bar_var is not set, the value column is the same as the axis column
# because both the axis bins and value are computed from the same inputs
self.agg_col = self.bin_col
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should the comment in the else say agg_var? The if only checks agg_var, not bar_var

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@jnumainville jnumainville merged commit 08dcbce into deephaven:main Mar 20, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

x and y behavior in dx.histogram is not fully consistent with px.histogram

3 participants