# List comprehensions

### Preliminary stuff

Acknowledgement: 
    https://chrisalbon.com/python/data_wrangling/pandas_list_comprehension/

In [1]:
# The following modules are used for plotting and generating data
import matplotlib.pyplot as plt
import seaborn as sns
import numpy as np
import pandas as pd
%matplotlib inline

In [2]:
# import the dataset
df = pd.read_csv('data/titanic.csv')

In [3]:
# Examine
df.head(3)

Unnamed: 0,Survived,Pclass,Sex,Age,Fare,Embarked
0,0,3,male,22.0,7.25,Southampton
1,1,1,female,38.0,71.2833,Cherbourg
2,1,3,female,26.0,7.925,Southampton


#### A little bit of trivia
 
* RMS Titanic sank in the early morning of 15 April 1912 in the North Atlantic Ocean, four days into the ship's maiden voyage from Southampton to New York City.  
* 84 years later, in 1996, James Cameron directed and produced the movie Titanic starring Leonardo DiCaprio and Kate Winslet. 
* Winslet's character rose is 17 years old when the ship sinks, and 101 years old when she throws that emerald into the ocean.

### Simple list comprehension

In [4]:
# Create a simple for-loop:
list1 = []
for letter in 'Southampton':
    list1.append(letter)
list1

['S', 'o', 'u', 't', 'h', 'a', 'm', 'p', 't', 'o', 'n']

In [5]:
# Now do all of that in a single line of code:
list2 = [letter for letter in 'Cherbourg']
list2

['C', 'h', 'e', 'r', 'b', 'o', 'u', 'r', 'g']

In [6]:
# Now you try it.


### Now let's do this with the pandas dataframe.

First, as a loop.

In [15]:
# Create an empty variable
myheartwillgoon = []

# Create a loop
for row in df['Age']:
    # Add 1 to the row and append it to next_year
    myheartwillgoon.append(row + 84)

# Make this a column in our dataframe
df['age_1996'] =  myheartwillgoon

# View the dataframe
df.head(3)

Unnamed: 0,Survived,Pclass,Sex,Age,Fare,Embarked,age_1996,age_2018
0,0,3,male,22.0,7.25,Southampton,106.0,128.0
1,1,1,female,38.0,71.2833,Cherbourg,122.0,144.0
2,1,3,female,26.0,7.925,Southampton,110.0,132.0


Now let's do it with list comprehension.

In [8]:
# Subtract 1 from row, for each row in df.year
df['age_2018'] = [row+106 for row in df['Age']]

# View the dataframe
df.head(3)

Unnamed: 0,Survived,Pclass,Sex,Age,Fare,Embarked,age_1996,age_2018
0,0,3,male,22.0,7.25,Southampton,105.0,128.0
1,1,1,female,38.0,71.2833,Cherbourg,121.0,144.0
2,1,3,female,26.0,7.925,Southampton,109.0,132.0


In [14]:
# Now you try!





![dicaprio](source_material/titanic.jpg)