## cut()
- Bucket continuous values into intervals. It converts numerical data into categories.
- It produces fixed bin ranges


In [1]:
import pandas as pd

Pyarrow will become a required dependency of pandas in the next major release of pandas (pandas 3.0),
(to allow more performant data types, such as the Arrow string type, and better interoperability with other libraries)
but was not found to be installed on your system.
If this would cause problems for you,
please provide us feedback at https://github.com/pandas-dev/pandas/issues/54466
        
  import pandas as pd


In [6]:
# Sample data
df = pd.DataFrame({"Age": [5, 10, 40, 60, 40, 90, 30, 25, 12, 11, 45, 88, 14]})
print(df)

    Age
0     5
1    10
2    40
3    60
4    40
5    90
6    30
7    25
8    12
9    11
10   45
11   88
12   14


In [8]:
# step1: Bins with default labels
# Define bins
age_bins = [0, 18, 35, 50, 65, 100]

# Categorize ages
ages_cut = pd.cut(df["Age"], age_bins)

print(ages_cut)

0       (0, 18]
1       (0, 18]
2      (35, 50]
3      (50, 65]
4      (35, 50]
5     (65, 100]
6      (18, 35]
7      (18, 35]
8       (0, 18]
9       (0, 18]
10     (35, 50]
11    (65, 100]
12      (0, 18]
Name: Age, dtype: category
Categories (5, interval[int64, right]): [(0, 18] < (18, 35] < (35, 50] < (50, 65] < (65, 100]]


### ages_cut is a pandas Categorical object, which tells you which interval each age belongs to.

In [9]:
# step1: Bins with custom labels to make it more readable
labels = ["Child", "Youth", "Adult", "Middle Age", "Senior"]
df["AgeGroup"] = pd.cut(df["Age"], bins=age_bins, labels=labels)

print(df)

    Age    AgeGroup
0     5       Child
1    10       Child
2    40       Adult
3    60  Middle Age
4    40       Adult
5    90      Senior
6    30       Youth
7    25       Youth
8    12       Child
9    11       Child
10   45       Adult
11   88      Senior
12   14       Child


## qcut()
- Bucket continuous values into intervals. It converts numerical data into categories.
- Divides the data into quantiles (equal-sized groups).
- Use this when you want bins with equal number of observations.


In [16]:
import numpy as np

df = pd.DataFrame({"Age": np.random.randint(1, 100, size=20)})
print("df:\n", df)

df:
     Age
0    48
1     3
2    13
3    57
4    99
5    24
6    70
7    48
8    42
9    36
10   95
11   11
12   21
13   74
14   30
15   62
16   93
17   74
18   80
19   99


In [17]:
# step1:

# Split into 4 quantiles
qcut_bins = pd.qcut(df["Age"], q=4)
print("qcut_bins:\n", qcut_bins)

qcut_bins:
 0      (28.5, 52.5]
1     (2.999, 28.5]
2     (2.999, 28.5]
3      (52.5, 75.5]
4      (75.5, 99.0]
5     (2.999, 28.5]
6      (52.5, 75.5]
7      (28.5, 52.5]
8      (28.5, 52.5]
9      (28.5, 52.5]
10     (75.5, 99.0]
11    (2.999, 28.5]
12    (2.999, 28.5]
13     (52.5, 75.5]
14     (28.5, 52.5]
15     (52.5, 75.5]
16     (75.5, 99.0]
17     (52.5, 75.5]
18     (75.5, 99.0]
19     (75.5, 99.0]
Name: Age, dtype: category
Categories (4, interval[float64, right]): [(2.999, 28.5] < (28.5, 52.5] < (52.5, 75.5] < (75.5, 99.0]]


In [18]:
# step2: lets print the qcut_bins side by side for better visual
df["qcut"] = qcut_bins
print("df:\n", df)

df:
     Age           qcut
0    48   (28.5, 52.5]
1     3  (2.999, 28.5]
2    13  (2.999, 28.5]
3    57   (52.5, 75.5]
4    99   (75.5, 99.0]
5    24  (2.999, 28.5]
6    70   (52.5, 75.5]
7    48   (28.5, 52.5]
8    42   (28.5, 52.5]
9    36   (28.5, 52.5]
10   95   (75.5, 99.0]
11   11  (2.999, 28.5]
12   21  (2.999, 28.5]
13   74   (52.5, 75.5]
14   30   (28.5, 52.5]
15   62   (52.5, 75.5]
16   93   (75.5, 99.0]
17   74   (52.5, 75.5]
18   80   (75.5, 99.0]
19   99   (75.5, 99.0]


#### each bin has equal number of observations (n/q per bin).

## Key differences
```
| Feature        | `pd.cut()`                                                 | `pd.qcut()`                                             
| -------------- | ---------------------------------------------------------- | -----------------
| Binning metho  | Fixed intervals (user-defined ranges)                    | Based on quantiles (data distribution)                        |
| Bin sizes      | Equal width                                              | Equal frequency                                               |
| When to use    | When you want specific intervals (e.g., Age: 0–18, 19–35…)| When you want each bin to have roughly same number of samples |
| Example use case | Categorizing income into fixed ranges (0–20k, 20k–50k)  | Splitting exam scores into quartiles (top 25%, bottom 25%)    |
```