## Dummy Encoding(Binary Encoded Data)
Dummy Encoding transforms each categorical feature into new columns with a 1(True) or 0(False) encoding to represent if that categorical label was present or not in the original row.<br>
we use pd.get_dummies to create Binary Encoded data.

In [1]:
import warnings
warnings.simplefilter("ignore")

%matplotlib inline
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd

## Dataset:  brain_categorical.csv

Source: R.J. Gladstone (1905). "A Study of the Relations of the Brain to
to the Size of the Head", Biometrika, Vol. 4, pp105-123

Description: Brain weight (grams) and head size (cubic cm) for 237
adults classified by gender and age group.

Variables/Columns
GENDER: Gender  Male or Female
AGE: Age Range  20-46 or 46+
SIZE: Head size (cm^3)  21-24
WEIGHT: Brain weight (grams)  29-32

In [2]:
# Read the csv file
brain = pd.read_csv("Resources/brain_categorical.csv")
brain.head()

Unnamed: 0,gender,age,size,weight
0,Male,20-46,4512,1530
1,Male,20-46,3738,1297
2,Male,20-46,4261,1335
3,Male,20-46,3777,1282
4,Male,20-46,4177,1590


In [4]:
X = brain[["gender", "age", "size"]]
y = brain[["weight"]].values.reshape(-1,1)

print(X.shape, y.shape)

(237, 3) (237, 1)


In [9]:
# Encode gender column using get_dummies
data = X.copy()

data_binary_encoded = pd.get_dummies(data, columns=["gender"])
data_binary_encoded.head()

Unnamed: 0,age,size,gender_Female,gender_Male
0,20-46,4512,0,1
1,20-46,3738,0,1
2,20-46,4261,0,1
3,20-46,3777,0,1
4,20-46,4177,0,1


In [11]:
# Encode multiple columns using get_dummies
data = X.copy()

data_binary_encoded = pd.get_dummies(data)
data_binary_encoded.head()

Unnamed: 0,size,gender_Female,gender_Male,age_20-46,age_46+
0,4512,0,1,1,0
1,3738,0,1,1,0
2,4261,0,1,1,0
3,3777,0,1,1,0
4,4177,0,1,1,0
