Data wrangling is a broad term used, often informally, to describe the process of
transforming raw data to a clean and organized format ready for use. For us, data
wrangling is only one step in preprocessing our data, but it is an important step.

In [2]:
import pandas as pd

In [4]:
url = 'https://tinyurl.com/titanic-csv'

In [19]:
data = pd.read_csv(url)
data.head()

Unnamed: 0,Name,PClass,Age,Sex,Survived,SexCode
0,"Allen, Miss Elisabeth Walton",1st,29.0,female,1,1
1,"Allison, Miss Helen Loraine",1st,2.0,female,0,1
2,"Allison, Mr Hudson Joshua Creighton",1st,30.0,male,0,0
3,"Allison, Mrs Hudson JC (Bessie Waldo Daniels)",1st,25.0,female,0,1
4,"Allison, Master Hudson Trevor",1st,0.92,male,1,0


### 3.1 Creating a Data Frame

#### Problem
You want to create a new data frame.

In [10]:
dataframe = pd.DataFrame()

dataframe['Name'] = ['Jack', 'Steve']
dataframe['Age'] = [22, 23]
dataframe['Driver'] = ['True', 'False']
dataframe


Unnamed: 0,Name,Age,Driver
0,Jack,22,True
1,Steve,23,False


In [15]:
#Create rows
new = pd.Series(['Molly', 40, True], index = ['Name', 'Age', 'Driver'])
dataframe.append(new, ignore_index = True)

Unnamed: 0,Name,Age,Driver
0,Jack,22,True
1,Steve,23,False
2,Molly,40,True


### 3.2 Describing the data

#### Problem
You want to view some characterstics of a DataFrame


In [20]:
#Show two rows
data.head(2)

Unnamed: 0,Name,PClass,Age,Sex,Survived,SexCode
0,"Allen, Miss Elisabeth Walton",1st,29.0,female,1,1
1,"Allison, Miss Helen Loraine",1st,2.0,female,0,1


In [21]:
#Show dimensions
data.shape

(1313, 6)

In [22]:
#Show statistics
data.describe()

Unnamed: 0,Age,Survived,SexCode
count,756.0,1313.0,1313.0
mean,30.397989,0.342727,0.351866
std,14.259049,0.474802,0.477734
min,0.17,0.0,0.0
25%,21.0,0.0,0.0
50%,28.0,0.0,0.0
75%,39.0,1.0,1.0
max,71.0,1.0,1.0


### 3.3 Navigating DataFrame

#### Problem
You need to select individual data or slices of a Dataframe

#### Solution 
Use loc or iloc to select one or more rows or values

In [24]:
#Select first row
data.iloc[0]

Name        Allen, Miss Elisabeth Walton
PClass                               1st
Age                                   29
Sex                               female
Survived                               1
SexCode                                1
Name: 0, dtype: object

In [25]:
#Select three rows
data.iloc[1:4]

Unnamed: 0,Name,PClass,Age,Sex,Survived,SexCode
1,"Allison, Miss Helen Loraine",1st,2.0,female,0,1
2,"Allison, Mr Hudson Joshua Creighton",1st,30.0,male,0,0
3,"Allison, Mrs Hudson JC (Bessie Waldo Daniels)",1st,25.0,female,0,1


In [26]:
#select three rows
data.iloc[:4]

Unnamed: 0,Name,PClass,Age,Sex,Survived,SexCode
0,"Allen, Miss Elisabeth Walton",1st,29.0,female,1,1
1,"Allison, Miss Helen Loraine",1st,2.0,female,0,1
2,"Allison, Mr Hudson Joshua Creighton",1st,30.0,male,0,0
3,"Allison, Mrs Hudson JC (Bessie Waldo Daniels)",1st,25.0,female,0,1


In [27]:
dataframe = data.set_index(data['Name'])

In [29]:
dataframe.loc['Allen, Miss Elisabeth Walton']

Name        Allen, Miss Elisabeth Walton
PClass                               1st
Age                                   29
Sex                               female
Survived                               1
SexCode                                1
Name: Allen, Miss Elisabeth Walton, dtype: object

DataFrame indexes can be set to be unique alphanumeric strings or customer
numbers. To select individual rows and slices of rows, pandas provides two methods:

• loc is useful when the index of the DataFrame is a label (e.g., a string).

• iloc works by looking for the position in the DataFrame. For example, iloc[0]
will return the first row regardless of whether the index is an integer or a label.

### 3.4 Selecting Rows Based on Conditionals

#### Problem
You want to select DataFrame rows based on some condition.


In [33]:
data[data['Sex']=='female'].head(2)

Unnamed: 0,Name,PClass,Age,Sex,Survived,SexCode
0,"Allen, Miss Elisabeth Walton",1st,29.0,female,1,1
1,"Allison, Miss Helen Loraine",1st,2.0,female,0,1


dataframe['Sex'] =='female' is our conditional statement; by wrapping that in dataframe[] we are tell‐
ing pandas to “select all the rows in the DataFrame where the value of data
frame['Sex'] is 'female'.

In [36]:
data[(data['Sex']=='female')& (data['Age']>=34)]

Unnamed: 0,Name,PClass,Age,Sex,Survived,SexCode
6,"Andrews, Miss Kornelia Theodosia",1st,63.0,female,1,1
8,"Appleton, Mrs Edward Dale (Charlotte Lamson)",1st,58.0,female,1,1
15,"Baxter, Mrs James (Helene DeLaudeniere Chaput)",1st,50.0,female,1,1
19,"Beckwith, Mrs Richard Leonard (Sallie Monypeny)",1st,47.0,female,1,1
28,"Bonnell, Miss Elizabeth",1st,58.0,female,1,1
...,...,...,...,...,...,...
937,"Klasen, Mrs Hulda Kristina",3rd,36.0,female,0,1
942,"Laitinen, Miss Kritina Sofia",3rd,37.0,female,0,1
965,"Lindblom, Miss Augusta Charlotta",3rd,45.0,female,0,1
1264,"Turkula, Mrs Hedvig",3rd,63.0,female,1,1


### 3.5 Replacing Values

#### Problem
You need to replace values in a DataFrame.

#### Solution
pandas’ replace is an easy way to find and replace values. For example, we can
replace any instance of "female" in the Sex column with "Woman":

In [38]:
data['Sex'].replace('female', 'Women').head(2)

0    Women
1    Women
Name: Sex, dtype: object

In [39]:
# Replace 'female' and 'male' with 'Woman' and 'Man'
data['Sex'].replace(['female', 'male'], ['women', 'man']).head()

0    women
1    women
2      man
3    women
4      man
Name: Sex, dtype: object

In [40]:
#Replace values, show two rows
data.replace(1, 'one').head(2)

Unnamed: 0,Name,PClass,Age,Sex,Survived,SexCode
0,"Allen, Miss Elisabeth Walton",1st,29,female,one,one
1,"Allison, Miss Helen Loraine",1st,2,female,0,one


### 3.6 Renaming Columns

#### Problem
You want to rename a column in a pandas DataFrame

#### Solution
Rename Columns using the rename method:

In [43]:
data.rename(columns = {'PClass':'Passsenger Class'}).head(2)

Unnamed: 0,Name,Passsenger Class,Age,Sex,Survived,SexCode
0,"Allen, Miss Elisabeth Walton",1st,29.0,female,1,1
1,"Allison, Miss Helen Loraine",1st,2.0,female,0,1


In [44]:
# Rename columns, show two rows
data.rename(columns = {'PClass': 'Passenger Class', 'Sex':'Gender'}).head(2)

Unnamed: 0,Name,Passenger Class,Age,Gender,Survived,SexCode
0,"Allen, Miss Elisabeth Walton",1st,29.0,female,1,1
1,"Allison, Miss Helen Loraine",1st,2.0,female,0,1


In [None]:
### 3.7 Finding the minimum, maximum, sum, average and count

#### Problem
You want to find the min, max, sum, average or count of a numeric column.

#### Solution
Pandas 