In [2]:
import pandas as pd

### Pandas Get Dummies
Pandas Get Dummies will turn your categorical variables into many dummy indicator variables. This means you'll go from a Series of labels (['Bob', 'Fred', 'Katie']) to a list of indicators ([0,1,0,0]).

Let's run through 3 examples:
1. Creating Dummy Indicator columns
2. Creating Dummy Indicator columns with prefix
3. Creating Dummy Indicator columns and dropping the first variable

First, let's create a DataFrame

In [5]:
df = pd.DataFrame([('Foreign Cinema', 289.0),
                   ('Liho Liho', 224.0),
                   ('500 Club', 80.5),
                   ('Foreign Cinema', 25.30)],
           columns=('name', 'Amount')
                 )

df

Unnamed: 0,name,Amount
0,Foreign Cinema,289.0
1,Liho Liho,224.0
2,500 Club,80.5
3,Foreign Cinema,25.3


### 1. Creating Dummy Indicator columns

To create dummy columns, I need to tell pandas which DataFrame I want to use, and which columns I want to create dummies on. Here I want to create dummies on the 'name' column.

Notice how there are 3 new columns, one for every disticnt value within our old 'name' column. Within these new columns is a list of 1s and 0s showing if the previous row had the column value.

In [17]:
pd.get_dummies(df, columns=['name'])

Unnamed: 0,Amount,name_500 Club,name_Foreign Cinema,name_Liho Liho
0,289.0,0,1,0
1,224.0,0,0,1
2,80.5,1,0,0
3,25.3,0,1,0


### 2. Creating Dummy Indicator columns with prefix

See how above all of my new columns start with "name_"? Well I don't like it. I want to switch the prefix to something else. You can do this by specifying "prefix" parameter.

In [18]:
pd.get_dummies(df, columns=['name'], prefix="dummy")

Unnamed: 0,Amount,dummy_500 Club,dummy_Foreign Cinema,dummy_Liho Liho
0,289.0,0,1,0
1,224.0,0,0,1
2,80.5,1,0,0
3,25.3,0,1,0


You know what else I don't like? The _ that is in the middle of my prefix and column names. I'll switch it to an * by specifying the *prefix_sep.*

In [19]:
pd.get_dummies(df, columns=['name'], prefix="dummy", prefix_sep="*")

Unnamed: 0,Amount,dummy*500 Club,dummy*Foreign Cinema,dummy*Liho Liho
0,289.0,0,1,0
1,224.0,0,0,1
2,80.5,1,0,0
3,25.3,0,1,0


### 3. Creating Dummy Indicator columns and dropping the first variable
Notice above, how every new dummy column has at least one "1" within it? This is because every variable is accounted for with a True (1) indicator. However, what if a row was all 0s? This is also a way to identify one of your values. *drop_first* allows you to drop your first variable and identify it through all other columns being 0.

Notice how "500 Club" column has been removed, and where the "500 Club" row use to be, remains 0s in both "Foreign Cinema" and "Liho Liho".

It's a bit confusing. If you come up with a valid use case...@ me on Twitter.

In [16]:
pd.get_dummies(df, columns=['name'], drop_first=True)

Unnamed: 0,Amount,name_Foreign Cinema,name_Liho Liho
0,289.0,1,0
1,224.0,0,1
2,80.5,0,0
3,25.3,1,0
