To execute a cell you can click on the >| Run button above or you can hold shift and press enter.  Unlike a python script, a Jupyter notebook will *cache* (remember) the previous commands you have run, meaning that sometimes you can accidentally call a variable that you have set previously.  If you want to *'reset'* your variables you can run the below line of code, type *'y'* and hit Return. 


In [48]:
reset

Once deleted, variables cannot be recovered. Proceed (y/[n])? y


# Import packages
Python has many, **many** packages available to download and use for your project.  Some come pre-installed, but many of them need to be downloaded.  Python has a handy package manager that allows easy downloading of a package.  For the information below you will need to download the pandas package (which allows you to build tables and then manipulate the table data).  To install the package you can use the command below in your terminal:

* *pip3 install pandas* 

*OR*
* *pip install pandas*

Some packages are pre-installed with the latest version of python (such as numpy), but there are a huge number of helpful online packages.  http://scikit-learn.org/ has a huge number of incredibly helpful Open Source packages that we will use for Machine Learning.  Sci-Kit learn (or SKLearn) is the typical tool of choice for Machine Learning.

## Import necessary packages and appropriate shortcuts
You can either import a whole package or you can import specific functions/objects from a package

In [49]:
import pandas as pd  ## This package helps deal with tables (which python users call 'dataframes')
import numpy as np ## Numpy helps deal with scientific computers (like a scientific calculator)

from numpy import arange  ## You can import specific functions from within a package and call it using its name later  
from datetime import datetime  ## Datetime is a package that helps deal with dates and times!

# Some Basics

## Data types
There are several different datatypes in Python:

| Datatype| Explanation|  Example|
| :--- |:--| :--- |
| string | Strings are sequences of character data. The string type in Python is called str. | str('This is a string!') |
| integer | In Python 3, there is effectively no limit to how long an integer value can be. Of course, it is constrained by the amount of memory your system has, as are all things, but beyond that an integer can be as long as you need it to be.| int(405) |
| floating point | The float type in Python designates a floating-point number. float values are specified with a decimal point. Optionally, the character e or E followed by a positive or negative integer may be appended to specify scientific notationl |  float(0.342341) |
| datetime | datetime makes handling dates and times much easier to work with, particulary when performing operations between them (e.g. calculating the number of days between now and Christmas) |  datetime(2019,12,25,0,0,0,1) |
|None | This datatype can be considered as simply 'nothing'.  It is a placeholder that marks a Null entry. | None|

## Introduction to variables
A variable is a fancy name for 'saving' something you can use later.  You can assign a variable using the equals sign.

In [50]:
x = 3
print(x * 4)

12


## Mathematical functions
There are lots of built in mathematical functions and many more you can install.  Below are the most common operators

In [51]:
print(5 > 0) # Greater than
print(5 >= 0) # Greater than or equal to
print(1 < 2) # Less than 
print(1 <+ 2) # Less than or equal to

print(5 == 5) # Equal to

print(3 + 5) # Addition
print(9 - 2) # Subtraction
print(3 * 7) # Multiplication
print(20 / 4) # Division 

print(2**3) # Indices (to the power of)

print(10 % 3) # Modular division

True
True
True
True
True
8
7
21
5.0
8
1


Note that 'equal to' requires a double == sign (otherwise you can be confused by something that looks mathematically incorrect).

## Introduction to Lists
A list is an order collection of items.  A list is set up with square brackets

In [52]:
empty_list = []
test_list1 = [1,2,3,4,5]
test_list2 = ['test', 1, 0.13, datetime.today()]

In [53]:
print(test_list2)

['test', 1, 0.13, datetime.datetime(2018, 9, 24, 21, 54, 35, 376602)]


You can add to a list using the *append* method

In [54]:
messy_list = ['a','b','c','b','c','a','b']

In [55]:
messy_list.append(4)
print(list(messy_list))

['a', 'b', 'c', 'b', 'c', 'a', 'b', 4]


In [56]:
example_list = [1,2,3,4]
print(example_list)

[1, 2, 3, 4]


In [57]:
example_list.append('new entry')
print(example_list)

[1, 2, 3, 4, 'new entry']


You can refer to a position (or 'index') within a list using square brackets and an integer. 

**Warning:**  Unlike humans, python counts from zero!  This applies to all aspects of python.  This means that you must count from zero when referencing a list or an iterative process (see 'for' loops). 

In [58]:
print(example_list[2])

3


In [59]:
print(example_list[4])

new entry


There are several different methods to remove elements:  *pop*, *remove*, *del*.  They all have slightly different properties.
https://stackoverflow.com/questions/11520492/difference-between-del-remove-and-pop-on-lists

## Introduction to Sets
A set is an un-ordered collection of items.  Duplicates are not possible within a set.

In [60]:
initial_set = {1,2,3,1}

In [61]:
print(initial_set)

{1, 2, 3}


In [62]:
messy_list = ['a','b','c','b','c','a','b']

In [63]:
print(messy_list)

['a', 'b', 'c', 'b', 'c', 'a', 'b']


In [64]:
distinct_items_in_list = set(messy_list)

In [65]:
print(distinct_items_in_list)

{'b', 'a', 'c'}


## Introduction to Dictionaries
A dictionary is a data map.  Every key of a dictionary has a corresponding value.  A dictionary is set up with curly brackets.  You can access the value by putting the key in square brackets to the right of the dictionary.  You can have a dictionary within a dictionary, in which case you will need multiple keys.

In [66]:
test_dictionary1 = {'key': 'value',
                    'entry1': 5,
                    'entry2': 10,
                    'entry3': 'test'}

In [67]:
test_dictionary2 = {'first_key': [1,2,3,4,5], 
                    'another_key': {'inner_dictionary': [1,2,3]} }

Below is are some examples of accessing a value with the appropriate key (or keys)

In [68]:
print(test_dictionary1['entry1'])

5


In [69]:
print(test_dictionary2['another_key']['inner_dictionary'])

[1, 2, 3]


## Introduction to For loops
A for loop is a way of iterating through a repetitive process.  Typically you either loop through all elements in a list or loop through a *generator*.  

In [70]:
count = 0
for i in [1,2,3,4,5,6,7,8,9,10]: 
    count = count + 1
print(count)

10



## Introduction to If statements

An 'if' statement checks a condition.  'elif' is the next condition to be checked if the previous condition is not met.  'else' is what happens if none of the 'if' or 'elif' conditions are met.

In [71]:
a = 12

if a>15:
    print(a, "is greater than 15")
elif a>10:
    print(a, "is greater than 10")
else:
    print(a, "is not greater than 15")

12 is greater than 10


You can use several conditions within an 'if' condition

In [72]:
b = 30

if b>10 and b%10==0:
    print(b, "is greater than 10 AND is divisible by 10")

if b<10 or b%3==0:
    print(b, "is less than 10 OR is divisible by 3")


30 is greater than 10 AND is divisible by 10
30 is less than 10 OR is divisible by 3


## Introduction to Try and Except statements

'try' and 'except' statements are great if you are unsure if something is possible or not.  The 'try' statement will be attempted.  If this statement is not performed the 'except' function is performed.

In [73]:
a = 4
b = 'test'

try:
    print(a + b)
except:
    print("The operation that you attempted is not possible")

The operation that you attempted is not possible


## Custom functions
You can define your own custom functions for your tasks.  Everything in the brackets is accepted as an input.  These inputs can then be used to create new values.  You can then return one (or more) values at the end of your function.  

In [74]:
def plus_one(input_variable):
    new_value = input_variable + 1
    return new_value

In [75]:
plus_one(4)

5

In [76]:
def multiply_together(weird_value, strange_value):
    return weird_value + strange_value

In [77]:
multiply_together(3,5)

8

Functions always 'forget' the information in the intermediate steps, which is helpful if you use the function more than once (which is often the case).  It does however mean that you cannot access these variables outside of this function.

For example:


In [78]:
def funky_function(input1, input2, input3):
    new_value = input1 + input2 + input3
    to_be_forgotten = input1 * input2 * input3
    return new_value

saved = funky_function(1,2,3)

In [79]:
print(saved)

6


You should get an error with the next line...

In [80]:
print(to_be_forgotten)

NameError: name 'to_be_forgotten' is not defined

# Pandas Dataframes

A dataframe is a concept borrowed from the programming language 'R'.  Essentially a dataframe is just a table.  Dataframes are initially quite awkward to set up but they then make working with data incredibly easy and versatile.


## Creating a dataframe
This can be either done manually, but most commonly (for large datasets) is done using one of the following built-in'importing' methods:

* pd.read_csv(filepath)
* pd.read_clipboard()
* pd.read_excel(filepath)

In [108]:
## Dataframe manipulation

## df = pd.read_csv("INPUT FILE PATH HERE")
df = pd.read_csv('titanic_data.csv')

You can also create a dataframe manually with a dictionary and lists.  If you wish to create a dataframe manually please look at the information on this site:  http://pbpython.com/pandas-list-dict.html.  Otherwise, there is an example below.

In [109]:
passenger_hobbies = pd.DataFrame({   
                                    'Name' : ['Braund, Mr. Owen Harris','Allen, Mr. William Henry','Heikkinen, Miss. Laina'],
                                    'Hobby': ['Lacrosse','Jogging','Backgammon']
                                 })
passenger_hobbies

Unnamed: 0,Name,Hobby
0,"Braund, Mr. Owen Harris",Lacrosse
1,"Allen, Mr. William Henry",Jogging
2,"Heikkinen, Miss. Laina",Backgammon


## Dataframe 'head'

You can quickly look at just the top of a dataframe with the .head() method.  This is helpful if you want to look at the structure of the table without printing thousands of rows.

In [110]:
df.head()

Unnamed: 0,PassengerId,Survived,Pclass,Name,Sex,Age,SibSp,Parch,Ticket,Fare,Cabin,Embarked
0,1,0,3,"Braund, Mr. Owen Harris",male,22.0,1,0,A/5 21171,7.25,,S
1,2,1,1,"Cumings, Mrs. John Bradley (Florence Briggs Th...",female,38.0,1,0,PC 17599,71.2833,C85,C
2,3,1,3,"Heikkinen, Miss. Laina",female,26.0,0,0,STON/O2. 3101282,7.925,,S
3,4,1,1,"Futrelle, Mrs. Jacques Heath (Lily May Peel)",female,35.0,1,0,113803,53.1,C123,S
4,5,0,3,"Allen, Mr. William Henry",male,35.0,0,0,373450,8.05,,S


## Filtering Dataframes

You can look at a column using square brackets and convert it to a list using the .tolist() method.

In [111]:
df['Survived'].tolist()

[0,
 1,
 1,
 1,
 0,
 0,
 0,
 0,
 1,
 1,
 1,
 1,
 0,
 0,
 0,
 1,
 0,
 1,
 0,
 1,
 0,
 1,
 1,
 1,
 0,
 1,
 0,
 0,
 1,
 0,
 0,
 1,
 1,
 0,
 0,
 0,
 1,
 0,
 0,
 1,
 0,
 0,
 0,
 1,
 1,
 0,
 0,
 1,
 0,
 0,
 0,
 0,
 1,
 1,
 0,
 1,
 1,
 0,
 1,
 0,
 0,
 1,
 0,
 0,
 0,
 1,
 1,
 0,
 1,
 0,
 0,
 0,
 0,
 0,
 1,
 0,
 0,
 0,
 1,
 1,
 0,
 1,
 1,
 0,
 1,
 1,
 0,
 0,
 1,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 1,
 1,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 1,
 1,
 0,
 1,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 1,
 0,
 1,
 0,
 1,
 1,
 0,
 0,
 0,
 0,
 1,
 0,
 0,
 1,
 0,
 0,
 0,
 0,
 1,
 1,
 0,
 0,
 0,
 1,
 0,
 0,
 0,
 0,
 1,
 0,
 0,
 0,
 0,
 1,
 0,
 0,
 0,
 0,
 1,
 0,
 0,
 0,
 1,
 1,
 0,
 0,
 0,
 0,
 0,
 1,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 1,
 1,
 0,
 1,
 1,
 0,
 0,
 1,
 0,
 1,
 1,
 1,
 1,
 0,
 0,
 1,
 0,
 0,
 0,
 0,
 0,
 1,
 0,
 0,
 1,
 1,
 1,
 0,
 1,
 0,
 0,
 0,
 1,
 1,
 0,
 1,
 0,
 1,
 0,
 0,
 0,
 1,
 0,
 1,
 0,
 0,
 0,
 1,
 0,
 0,
 1,
 0,
 0,
 0,
 1,
 0,
 0,
 0,
 1,
 0,
 0,
 0,
 0,
 0,
 1,
 1,
 0,


You can view multiple columns by putting a list inside the square brackets

In [112]:
df[['Pclass','SibSp']]

Unnamed: 0,Pclass,SibSp
0,3,1
1,1,1
2,3,0
3,1,1
4,3,0
5,3,0
6,1,0
7,3,3
8,3,0
9,2,1


You can filter a table by specifying a condition for a column within square brackets.

In [113]:
df[df['Survived']==1]

Unnamed: 0,PassengerId,Survived,Pclass,Name,Sex,Age,SibSp,Parch,Ticket,Fare,Cabin,Embarked
1,2,1,1,"Cumings, Mrs. John Bradley (Florence Briggs Th...",female,38.00,1,0,PC 17599,71.2833,C85,C
2,3,1,3,"Heikkinen, Miss. Laina",female,26.00,0,0,STON/O2. 3101282,7.9250,,S
3,4,1,1,"Futrelle, Mrs. Jacques Heath (Lily May Peel)",female,35.00,1,0,113803,53.1000,C123,S
8,9,1,3,"Johnson, Mrs. Oscar W (Elisabeth Vilhelmina Berg)",female,27.00,0,2,347742,11.1333,,S
9,10,1,2,"Nasser, Mrs. Nicholas (Adele Achem)",female,14.00,1,0,237736,30.0708,,C
10,11,1,3,"Sandstrom, Miss. Marguerite Rut",female,4.00,1,1,PP 9549,16.7000,G6,S
11,12,1,1,"Bonnell, Miss. Elizabeth",female,58.00,0,0,113783,26.5500,C103,S
15,16,1,2,"Hewlett, Mrs. (Mary D Kingcome)",female,55.00,0,0,248706,16.0000,,S
17,18,1,2,"Williams, Mr. Charles Eugene",male,,0,0,244373,13.0000,,S
19,20,1,3,"Masselmani, Mrs. Fatima",female,,0,0,2649,7.2250,,C


*Note:* It is often beneficial to put these requirements in round brackets (should you have more than one requirement to 'filter' the table with)

Below shows an 'and' requirement and an 'or' requirement.  Note how the syntax is slightly different to the 'and' and 'or' conditions mentioned in the 'if' statement section.

From hereon, we will suppress the output using the .head() method to save on space.

In [114]:
df[(df['Survived']==1) & (df['Pclass']>2)].head()

Unnamed: 0,PassengerId,Survived,Pclass,Name,Sex,Age,SibSp,Parch,Ticket,Fare,Cabin,Embarked
2,3,1,3,"Heikkinen, Miss. Laina",female,26.0,0,0,STON/O2. 3101282,7.925,,S
8,9,1,3,"Johnson, Mrs. Oscar W (Elisabeth Vilhelmina Berg)",female,27.0,0,2,347742,11.1333,,S
10,11,1,3,"Sandstrom, Miss. Marguerite Rut",female,4.0,1,1,PP 9549,16.7,G6,S
19,20,1,3,"Masselmani, Mrs. Fatima",female,,0,0,2649,7.225,,C
22,23,1,3,"McGowan, Miss. Anna ""Annie""",female,15.0,0,0,330923,8.0292,,Q


There are some pandas specific methods that you can use such as .isin() and .isna()

In [115]:
persons_of_interest = [0,1,2,3]
df[df['Parch'].isin(persons_of_interest)]

Unnamed: 0,PassengerId,Survived,Pclass,Name,Sex,Age,SibSp,Parch,Ticket,Fare,Cabin,Embarked
0,1,0,3,"Braund, Mr. Owen Harris",male,22.0,1,0,A/5 21171,7.2500,,S
1,2,1,1,"Cumings, Mrs. John Bradley (Florence Briggs Th...",female,38.0,1,0,PC 17599,71.2833,C85,C
2,3,1,3,"Heikkinen, Miss. Laina",female,26.0,0,0,STON/O2. 3101282,7.9250,,S
3,4,1,1,"Futrelle, Mrs. Jacques Heath (Lily May Peel)",female,35.0,1,0,113803,53.1000,C123,S
4,5,0,3,"Allen, Mr. William Henry",male,35.0,0,0,373450,8.0500,,S
5,6,0,3,"Moran, Mr. James",male,,0,0,330877,8.4583,,Q
6,7,0,1,"McCarthy, Mr. Timothy J",male,54.0,0,0,17463,51.8625,E46,S
7,8,0,3,"Palsson, Master. Gosta Leonard",male,2.0,3,1,349909,21.0750,,S
8,9,1,3,"Johnson, Mrs. Oscar W (Elisabeth Vilhelmina Berg)",female,27.0,0,2,347742,11.1333,,S
9,10,1,2,"Nasser, Mrs. Nicholas (Adele Achem)",female,14.0,1,0,237736,30.0708,,C


In [116]:
df[df['Cabin'].isna()].head()

Unnamed: 0,PassengerId,Survived,Pclass,Name,Sex,Age,SibSp,Parch,Ticket,Fare,Cabin,Embarked
0,1,0,3,"Braund, Mr. Owen Harris",male,22.0,1,0,A/5 21171,7.25,,S
2,3,1,3,"Heikkinen, Miss. Laina",female,26.0,0,0,STON/O2. 3101282,7.925,,S
4,5,0,3,"Allen, Mr. William Henry",male,35.0,0,0,373450,8.05,,S
5,6,0,3,"Moran, Mr. James",male,,0,0,330877,8.4583,,Q
7,8,0,3,"Palsson, Master. Gosta Leonard",male,2.0,3,1,349909,21.075,,S


You can specify that a condition is 'not' met by including a tilda "~" at the start of your condition.

In [117]:
df[~df['Cabin'].isna()].head()

Unnamed: 0,PassengerId,Survived,Pclass,Name,Sex,Age,SibSp,Parch,Ticket,Fare,Cabin,Embarked
1,2,1,1,"Cumings, Mrs. John Bradley (Florence Briggs Th...",female,38.0,1,0,PC 17599,71.2833,C85,C
3,4,1,1,"Futrelle, Mrs. Jacques Heath (Lily May Peel)",female,35.0,1,0,113803,53.1,C123,S
6,7,0,1,"McCarthy, Mr. Timothy J",male,54.0,0,0,17463,51.8625,E46,S
10,11,1,3,"Sandstrom, Miss. Marguerite Rut",female,4.0,1,1,PP 9549,16.7,G6,S
11,12,1,1,"Bonnell, Miss. Elizabeth",female,58.0,0,0,113783,26.55,C103,S


You can also specify that a string contains a substring using the .str.contains() method.

In [118]:
df[df['Name'].str.contains('Annie')]

Unnamed: 0,PassengerId,Survived,Pclass,Name,Sex,Age,SibSp,Parch,Ticket,Fare,Cabin,Embarked
22,23,1,3,"McGowan, Miss. Anna ""Annie""",female,15.0,0,0,330923,8.0292,,Q
211,212,1,2,"Cameron, Miss. Clear Annie",female,35.0,0,0,F.C.C. 13528,21.0,,S
300,301,1,3,"Kelly, Miss. Anna Katherine ""Annie Kate""",female,,0,0,9234,7.75,,Q
357,358,0,2,"Funk, Miss. Annie Clemmer",female,38.0,0,0,237671,13.0,,S
368,369,1,3,"Jermyn, Miss. Annie",female,,0,0,14313,7.75,,Q
415,416,0,3,"Meek, Mrs. Thomas (Annie Louise Rowley)",female,,0,0,343095,8.05,,S
720,721,1,2,"Harper, Miss. Annie Jessie ""Nina""",female,6.0,0,1,248727,33.0,,S
801,802,1,2,"Collyer, Mrs. Harvey (Charlotte Annie Tate)",female,31.0,1,1,C.A. 31921,26.25,,S


## Creating new columns 

You can assign new columns by referencing them with the square bracket notation and using the '=' sign to assign a list to it.

In [119]:
passenger_hobbies

Unnamed: 0,Name,Hobby
0,"Braund, Mr. Owen Harris",Lacrosse
1,"Allen, Mr. William Henry",Jogging
2,"Heikkinen, Miss. Laina",Backgammon


In [120]:
passenger_hobbies['Outdoor_hobby'] = [1,1,0]

In [121]:
passenger_hobbies

Unnamed: 0,Name,Hobby,Outdoor_hobby
0,"Braund, Mr. Owen Harris",Lacrosse,1
1,"Allen, Mr. William Henry",Jogging,1
2,"Heikkinen, Miss. Laina",Backgammon,0


## Lambda function

You can apply functions to a column if you want to perform an operation to rows or columns.

In [122]:
df['Name'].head()

0                              Braund, Mr. Owen Harris
1    Cumings, Mrs. John Bradley (Florence Briggs Th...
2                               Heikkinen, Miss. Laina
3         Futrelle, Mrs. Jacques Heath (Lily May Peel)
4                             Allen, Mr. William Henry
Name: Name, dtype: object

In [123]:
df['Name'].apply(lambda x: x.split(',')[0])

0               Braund
1              Cumings
2            Heikkinen
3             Futrelle
4                Allen
5                Moran
6             McCarthy
7              Palsson
8              Johnson
9               Nasser
10           Sandstrom
11             Bonnell
12         Saundercock
13           Andersson
14             Vestrom
15             Hewlett
16                Rice
17            Williams
18       Vander Planke
19          Masselmani
20              Fynney
21             Beesley
22             McGowan
23              Sloper
24             Palsson
25             Asplund
26                Emir
27             Fortune
28             O'Dwyer
29            Todoroff
            ...       
861              Giles
862              Swift
863               Sage
864               Gill
865            Bystrom
866       Duran y More
867           Roebling
868      van Melkebeke
869            Johnson
870             Balkic
871           Beckwith
872           Carlsson
873    Vand

In [124]:
df['Surname'] = df['Name'].apply(lambda x: x.split(',')[0])

## Concatenating tables

You can concatenate tables in many ways, but the most common form is when two tables have the same column structure.

In [125]:
passenger_hobbies

Unnamed: 0,Name,Hobby,Outdoor_hobby
0,"Braund, Mr. Owen Harris",Lacrosse,1
1,"Allen, Mr. William Henry",Jogging,1
2,"Heikkinen, Miss. Laina",Backgammon,0


In [126]:
passenger_hobbies2 = pd.DataFrame({
                                    'Name': ['Giles, Mr. Frederick Edward','Johnson, Master. Harold Theodor'],
                                    'Hobby': ['Water Polo','Horse Riding'],
                                    'Outdoor_hobby': [0, 1]
                                  })

In [127]:
concatenated_passenger_hobbies = pd.concat([passenger_hobbies, passenger_hobbies2])

## Merging tables

Similar to a SQL JOIN you can carry out 'merges' in pandas.  For information on how joins work, check out the following link: https://blog.codinghorror.com/a-visual-explanation-of-sql-joins/

For information on pandas merges, check out the following link:

Below is an example of the two most common joins:  Left join and Inner join

In [128]:
pd.merge(df, concatenated_passenger_hobbies, how='left', on='Name')

Unnamed: 0,PassengerId,Survived,Pclass,Name,Sex,Age,SibSp,Parch,Ticket,Fare,Cabin,Embarked,Surname,Hobby,Outdoor_hobby
0,1,0,3,"Braund, Mr. Owen Harris",male,22.0,1,0,A/5 21171,7.2500,,S,Braund,Lacrosse,1.0
1,2,1,1,"Cumings, Mrs. John Bradley (Florence Briggs Th...",female,38.0,1,0,PC 17599,71.2833,C85,C,Cumings,,
2,3,1,3,"Heikkinen, Miss. Laina",female,26.0,0,0,STON/O2. 3101282,7.9250,,S,Heikkinen,Backgammon,0.0
3,4,1,1,"Futrelle, Mrs. Jacques Heath (Lily May Peel)",female,35.0,1,0,113803,53.1000,C123,S,Futrelle,,
4,5,0,3,"Allen, Mr. William Henry",male,35.0,0,0,373450,8.0500,,S,Allen,Jogging,1.0
5,6,0,3,"Moran, Mr. James",male,,0,0,330877,8.4583,,Q,Moran,,
6,7,0,1,"McCarthy, Mr. Timothy J",male,54.0,0,0,17463,51.8625,E46,S,McCarthy,,
7,8,0,3,"Palsson, Master. Gosta Leonard",male,2.0,3,1,349909,21.0750,,S,Palsson,,
8,9,1,3,"Johnson, Mrs. Oscar W (Elisabeth Vilhelmina Berg)",female,27.0,0,2,347742,11.1333,,S,Johnson,,
9,10,1,2,"Nasser, Mrs. Nicholas (Adele Achem)",female,14.0,1,0,237736,30.0708,,C,Nasser,,


In [129]:
pd.merge(df, concatenated_passenger_hobbies, how='inner', on='Name')

Unnamed: 0,PassengerId,Survived,Pclass,Name,Sex,Age,SibSp,Parch,Ticket,Fare,Cabin,Embarked,Surname,Hobby,Outdoor_hobby
0,1,0,3,"Braund, Mr. Owen Harris",male,22.0,1,0,A/5 21171,7.25,,S,Braund,Lacrosse,1
1,3,1,3,"Heikkinen, Miss. Laina",female,26.0,0,0,STON/O2. 3101282,7.925,,S,Heikkinen,Backgammon,0
2,5,0,3,"Allen, Mr. William Henry",male,35.0,0,0,373450,8.05,,S,Allen,Jogging,1
3,862,0,2,"Giles, Mr. Frederick Edward",male,21.0,1,0,28134,11.5,,S,Giles,Water Polo,0
4,870,1,3,"Johnson, Master. Harold Theodor",male,4.0,1,1,347742,11.1333,,S,Johnson,Horse Riding,1
