# Variable conversion

In this activity you will learn to convert variables from one type into the other.

## Numeric to categorical

Consider the wine dataset we used earlier:

In [None]:
import sklearn.datasets as datasets
import pandas as pd
import numpy as np

dataset = datasets.load_wine()
X = pd.DataFrame(data=dataset['data'], columns=dataset['feature_names'])

print(X.head())

Let's first bin the variable 'flavanoids' into 5 bins using pandas:

In [None]:
flavanoids = pd.cut(X['flavanoids'], 5)
print(flavanoids.value_counts())

Notice that the bins are all of an equal width, but the distribution is uneven.
We can use a different function to obtain equal-size bins:

In [None]:
flavanoids = pd.qcut(X['flavanoids'], 5)
print(flavanoids.value_counts())

## Categorical to numeric

Let's create a colour variable:

In [None]:
colours = ['blue', 'red', 'green', 'yellow']
colour_array = np.random.choice(colours, 100, p=[0.5, 0.1, 0.1, 0.3])
print(colour_array)

We can easily obtain dummies by using the following code:

In [None]:
dummy_colours = pd.get_dummies(colour_array, prefix='color', drop_first=True)
dummy_colours.head()

Notice that blue is not included? All encoding is relative to the presence of blue. This is due to the ```drop_first``` parameter.

We can also use scikit-learn:

In [None]:
from sklearn.preprocessing import LabelEncoder, OneHotEncoder

# We can use a label encoder to transform categories into numbers
enc = LabelEncoder()
colour_label = enc.fit_transform(colour_array)
print(colour_label)

You will notice that every colour now has its own integer value.