# Hist

In [8]:
import hvplot.polars  # noqa
import polars as pl

```hist``` is often a good way to start looking at continuous data to get a sense of the distribution.
Similar methods include ```kde``` (also available as ```density```).

In [9]:
from bokeh.sampledata.autompg import autompg_clean

df = pl.from_pandas( autompg_clean )
df.sample(n=5)

mpg,cyl,displ,hp,weight,accel,yr,origin,name,mfr
f64,i64,f64,i64,i64,f64,i64,str,str,str
20.0,6,156.0,122,2807,13.5,73,"""Asia""","""toyota mark ii…","""toyota"""
18.0,3,70.0,90,2124,13.5,73,"""Asia""","""maxda rx3""","""mazda"""
23.0,8,350.0,125,3900,17.4,79,"""North America""","""cadillac eldor…","""cadillac"""
19.9,8,260.0,110,3365,15.5,78,"""North America""","""oldsmobile cut…","""oldsmobile"""
22.0,6,146.0,97,2815,14.5,77,"""Asia""","""datsun 810""","""datsun"""


In [10]:
df.hvplot.hist("weight")

When using ```by``` the plots are overlaid by default. To create subplots instead, use ```subplots = True```.

In [11]:
df.hvplot.hist("weight", by="origin", subplots=True, width=250)

You can also plot histograms of datetime data

In [13]:
# Polarsで時系列データのヒストグラムの書き方が分からなかった
import pandas as pd
from bokeh.sampledata.commits import data as commits

commits = commits.reset_index().sort_values("datetime")
commits.head(3)

Unnamed: 0,datetime,day,time
4915,2012-12-29 11:57:50-06:00,Sat,11:57:50
4914,2013-01-02 17:46:43-06:00,Wed,17:46:43
4913,2013-01-03 16:28:49-06:00,Thu,16:28:49


In [14]:
df2.hvplot.hist(
    "datetime",
    bin_range = (pd.Timestamp('2012-11-30'), pd.Timestamp('2017-05-01')),
    bins = 54,   
)

If you want to plot the distribution of a categorical column you can calculate the distribution using Pandas’ method ```value_counts``` and plot it using ```.hvplot.bar```.

In [17]:
df["mfr"].value_counts().sort("count").hvplot.barh("mfr", "count", height = 500)