# Pandas tip #5: Binning your data
Depending on the type of data, you often want to have a look at the distribution. This can be done using a histogram or better using a normalized histogram. As many know, Pandas as the .hist() function that can easily plot the histogram. It is a Matplotlib function under the hood, that accepts a bins for the bins and the density variable to have the plot normalized.

To have more control, we can also use another way of binning, using .cut() function. This attaches the right bin to each and can be stored in a new column. Using .groupby(), we can a new aggregate DataFrame with histogram. The nice thing about the .groupby() method is the bin is shown as an interval in the index. Normalization (surface of the plot should be unity) has to be done manually but is easy to do.

Lets generate some random data:

In [None]:
import numpy as np
import pandas as pd

rng = np.random.default_rng()
df = pd.DataFrame({
    'value': rng.normal(loc=4, scale=1, size=10_000),
})

In [None]:
# Just plotting the PDF
bins = np.linspace(0, 8, 25)
# https://linkedin.com/in/dennisbakhuis
_ = df.value.hist(bins=bins, density=True, figsize=(12, 5))

In [None]:
# binning
df['bin'] = pd.cut(df.value, bins=bins)

hist = (df
    .groupby('bin')[['value']]
    .count()
)

hist.value.plot.bar(figsize=(12,5), fontsize=16)

In [None]:
# normalizing
bin_width = bins[1] - bins[0]
hist['norm'] = hist['value'] / len(df) / bin_width

hist.norm.plot.bar(figsize=(12,5), fontsize=16, color='orange')

If you have any questions, comments, or requests, feel free to [contact me on LinkedIn](https://linkedin.com/in/dennisbakhuis).