You have a numerical feature and want to break it up into discrete bins.

In [11]:
import numpy as np
from sklearn.preprocessing import Binarizer


In [12]:
# Create feature
age = np.array([[6],
[12],
[20],
[36],
[65]])


In [14]:
# Create binarizer
binarizer = Binarizer(threshold=18)

In [15]:
binarizer.fit_transform(age)

array([[0],
       [0],
       [1],
       [1],
       [1]])

Second, we can break up numerical features according to multiple thresholds

In [16]:

# Bin feature
np.digitize(age, bins=[20,30,64])

array([[0],
       [0],
       [1],
       [2],
       [3]], dtype=int64)

Note that the arguments for the bins parameter denote the left edge of each bin.
For example, the 20 argument does not include the element with the value of 20,
only the two values smaller than 20. We can switch this behavior by setting the
parameter right to True

In [17]:
# Bin feature
np.digitize(age, bins=[20,30,64], right=True)

array([[0],
       [0],
       [0],
       [2],
       [3]], dtype=int64)

Discretization can be a fruitful strategy when we have reason to believe that a
numerical feature should behave more like a categorical feature. For example,
we might believe there is very little difference in the spending habits of 19- and
20-year-olds, but a significant difference between 20- and 21-year-olds (the age
in the United States when young adults can consume alcohol). In that example, it
could be useful to break up individuals in our data into those who can drink
alcohol and those who cannot. Similarly, in other cases it might be useful to
discretize our data into three or more bins.
In the solution, we saw two methods of discretization—scikit-learn’s Binarizer
for two bins and NumPy’s digitize for three or more bins—however, we can
also use digitize to binarize features like Binarizer by only specifying a
single threshold:

In [18]:
# Bin feature
np.digitize(age, bins=[18])

array([[0],
       [0],
       [1],
       [1],
       [1]], dtype=int64)