### Def
pandas.get_dummies(
                    data, 
                    prefix=None, 
                    prefix_sep='_', 
                    dummy_na=False, 
                    columns=None, 
                    sparse=False, 
                    drop_first=False, 
                    dtype=None)

#### Prameters:
- ___data___ : array-like, Series, or DataFrame
Data of which to get dummy indicators. 
- ___columns___ : list-like, default None
Column names in the DataFrame to be encoded. If columns is None then all the columns with object or category dtype will be converted.
- ___prefix___ : str, list of str, or dict of str, default None
String to append DataFrame column names
- prefix_sep : str, default ‘_’
- ___drop_first___ : bool, default False
Whether to get k-1 dummies out of k categorical levels by removing the first level.
We only need k-1 columns to indicate k categories.
- ___dummy_na___ : bool, default False
Add a column to indicate NaNs, if False NaNs are ignored.
- ___sparse___ : bool, default False
Whether the dummy-encoded columns should be backed by a SparseArray (True) or a regular NumPy array (False).
- ___dtype___ : dtype, default np.uint8
Data type for new columns. Only a single dtype is allowed.
#### Returns:
DataFrame

### When to use
Convert __categorical__ variable into dummy/indicator variables.

In [1]:
import pandas as pd
import numpy as np

1. Array-like

In [9]:
arr = ['a', 'b', np.nan]
pd.get_dummies(arr, dummy_na=True)

Unnamed: 0,a,b,nan
0,1,0,0
1,0,1,0
2,0,0,1


2. Series

In [6]:
pd.get_dummies(pd.Series(list('abcaa')))

Unnamed: 0,a,b,c
0,1,0,0
1,0,1,0
2,0,0,1
3,1,0,0
4,1,0,0


3. DataFrame

In [13]:
df = pd.DataFrame({'A': ['a', 'b', 'a'], 'B': [np.nan, 'a', 'c'], 'C': [1, 2, 3]})
cols = df.columns
pd.get_dummies(df, columns=cols, prefix=cols)

Unnamed: 0,A_a,A_b,B_a,B_c,C_1,C_2,C_3
0,1,0,0,0,1,0,0
1,0,1,1,0,0,1,0
2,1,0,0,1,0,0,1


In [2]:
train = pd.read_csv('http://bit.ly/kaggletrain')

In [5]:
train.head()

Unnamed: 0,PassengerId,Survived,Pclass,Name,Sex,Age,SibSp,Parch,Ticket,Fare,Cabin,Embarked
0,1,0,3,"Braund, Mr. Owen Harris",male,22.0,1,0,A/5 21171,7.25,,S
1,2,1,1,"Cumings, Mrs. John Bradley (Florence Briggs Th...",female,38.0,1,0,PC 17599,71.2833,C85,C
2,3,1,3,"Heikkinen, Miss. Laina",female,26.0,0,0,STON/O2. 3101282,7.925,,S
3,4,1,1,"Futrelle, Mrs. Jacques Heath (Lily May Peel)",female,35.0,1,0,113803,53.1,C123,S
4,5,0,3,"Allen, Mr. William Henry",male,35.0,0,0,373450,8.05,,S


In [11]:
train.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 891 entries, 0 to 890
Data columns (total 12 columns):
PassengerId    891 non-null int64
Survived       891 non-null int64
Pclass         891 non-null int64
Name           891 non-null object
Sex            891 non-null object
Age            714 non-null float64
SibSp          891 non-null int64
Parch          891 non-null int64
Ticket         891 non-null object
Fare           891 non-null float64
Cabin          204 non-null object
Embarked       889 non-null object
dtypes: float64(2), int64(5), object(5)
memory usage: 83.6+ KB
