# Binning data using `pd.cut`

**Binning** is an approach to group a large number of continuous variables into smaller defined groups (bins).


<table>
    <tr>
        <td><img src="images/badbars.png" width=400></td>
        <td>------></td>
        <td><img src="images/goodbars.png" width=400></td>
    </tr>
</table>

### Import Dependencies

In [1]:
import pandas as pd

### Create DataFrame

In [2]:
raw_data = {
    'Class': ['Oct', 'Oct', 'Jan', 'Jan', 'Oct', 'Jan'], 
    'Name': ["Cyndy", "Logan", "Laci", "Elmer", "Crystle", "Emmie"], 
    'Test Score': [90, 56, 72, 88, 98, 67]}
df = pd.DataFrame(raw_data)
df

Unnamed: 0,Class,Name,Test Score
0,Oct,Cyndy,90
1,Oct,Logan,56
2,Jan,Laci,72
3,Jan,Elmer,88
4,Oct,Crystle,98
5,Jan,Emmie,67


### Create the bins in which Data will be held

In [3]:
bins = [0, 60, 70, 80, 90, 100]

# Create the names for the four bins
group_names = ["F", "D", "C", "B", "A"]

### Cut data into bins

`pd.cut` requires 2 arguments:
    1. The Series to be cut
    2. A list of bins to be sliced into
    3. (Optional) A list of names/values to be assigned to the bins
    
[Link to `pd.cut` documentation](https://pandas.pydata.org/pandas-docs/version/0.23.4/generated/pandas.cut.html)

In [4]:
df["Test Score Summary"] = pd.cut(df["Test Score"], bins, labels=group_names)
df

Unnamed: 0,Class,Name,Test Score,Test Score Summary
0,Oct,Cyndy,90,B
1,Oct,Logan,56,F
2,Jan,Laci,72,C
3,Jan,Elmer,88,B
4,Oct,Crystle,98,A
5,Jan,Emmie,67,D


### Create a group based off of the bins

In [5]:
group_df = df.groupby("Test Score Summary")
group_df.mean()

Unnamed: 0_level_0,Test Score
Test Score Summary,Unnamed: 1_level_1
F,56
D,67
C,72
B,89
A,98
