# Topic 2 - Python and Jupyter Notebooks

## Concept 3 - Variables 
A Variable is a mutable object that can store anything we would like it to in python. Try out the below!

In [45]:
my_first_variable = 'Hello World!'

print(my_first_variable)

'Hello World!'

Note that once a cell has been run, any variables, libraries or functions etc are stored in the jupyter kernel and can be reused in other cells.

Variables evaluate the expressions found in them. Try out the below!

In [9]:
my_second_variable = 2 + 3

print(my_second_variable)

5


Variables take on the type of whatever is saved in them. If you need a refresher on types (there may be some in Python that you haven't seen before), check out https://docs.python.org/2/library/types.html

In [10]:
print(type(my_first_variable))

print(type(my_second_variable))

<class 'str'>
<class 'int'>


Variables can be updated to take new contents simply by setting them equal to a new value. The kernel has my_first_variable stored from the cells above.

In [11]:
print(my_first_variable)

my_first_variable = 'Bye Guys!'

print(my_first_variable)

Hi Guys!
Bye Guys!


## Concept 4 - Operators 

An operator is a character that is used to make some kind of interaction take place between objects in python. Some classic arithmetic operators are + - / *. In Python we can also use + to concatenate strings.

In [14]:
print(2+5)
print("2" + "5")

#Subtle difference between these two, even though they look the same

print(type(2+5))
print(type("2" + "5"))

7
25
<class 'int'>
<class 'str'>


Boolean operators allow us to make tests that are useful for conditional logic (if condition = True, perform an action). Some examples of boolean operators are > (greater than), < (less than), == (equal to), != (not equal to)

In [16]:
print(2 == 2)

print(1 > 2)

print(2 < 1)

print(2 != 1)

True
False
False
True


For if conditions, a colon is used after the condition is specified, with an indentation in the next line of code

In [17]:
if 2>1:
    print("that's obvious")

that's obvious


## Concept 5 - Functional Programming 

Python supports functional programming and a large portion of the work we'll be doing will make use of functions that are part of libraries. So far we have used the 'print' and 'type' functions that are reserved terms in Python. A function is 'called' by adding brackets after it. There is a subtle difference between the two pieces of code below.

In [1]:
#Example 1
print

<function print>

In [6]:
#Example 2
print()




In Example 1 we have not called the function. Simply returned the function. In example two we have called the function but given it no arguments. Arguments are the inputs to the function and are supplied within the brackets that call the function. For example, the function 'type' returns the type of it's argument. If we put the integer 5 in, we get...

In [3]:
type(5)

int

If a function requires multiple arguments, we supply them within the brackets separated by commas. To show you an example let's build our own function that takes two arguments.

In [4]:
#define the function with the def command, stating the name of the function and the arguments it takes. It's possible to have a function with no arguments.

#This function takes two arguments, first_no & second_no

def multiply(first_no, second_no):
    
    #perform any operations with an indentation
    output = first_no * second_no
    
    # return what you want using the return command
    return output

multiply(5,4)

20

## Concept 6 - Lists 

Lists are a classic part of Python. A changeable collection of objects that have an order that is accessable via an index. Square brackets are how we tell Python to build a list [].

In [35]:
my_first_list = [1, 2, 3]

To retrieve elements in a list, in index location can be given using another pair of square brackets. 
N.B indices start at 0 in Python!

In [36]:
#Element #1
my_first_list[0]

1

Lists can contain any objects we want, they don't need to be a consistent type. In the example below we have a string, an integer and even a function.

In [37]:
my_second_list = ['a', 2, print]

In [38]:
my_second_list[2]

<function print>

In [39]:
my_second_list[2]('Hello World!')

Hello World!


Slicing is a way of retrieving multiple elements in a list by making use of the colon :

In [30]:
my_third_list = ['zero','one','two','three','four','five','six','seven','eight']

#colon before the index returns everything up until this index. In this case index location 0, 1, 2, 3 (remember indices start at 0 in Python)
my_third_list[:4]

['zero', 'one', 'two', 'three']

In [31]:
#colon after the index returns the index and everything after. In this case index location 4,5,6,7,8
my_third_list[4:]

['four', 'five', 'six', 'seven', 'eight']

For a computationally effective operation on a list, the pythonic approach is the list comprehension. List comprehensions take the following shape.

[OPERATION for ELEMENT in LIST]

E.g

In [40]:
[w*2 for w in my_first_list]

[2, 4, 6]

Gives us each element in the list * 2. This can be really useful for speedy opertaions to a whole list.

## Topic 2 Activity

To write a function that takes two arguments, tests to see if they are the same type and returns them as a list if they meet the condition.

In [43]:
#Solution
def argument_checker(arg1, arg2):
    if type(arg1) == type(arg2):
        output = [arg1, arg2]
        return output
    
argument_checker('hi', 'there')

['hi', 'there']

In [44]:
#Solution
argument_checker('hi', 6)

# Topic 3 - Pandas, Your New Favourite Python Library

## Concept 1 - Attaching Libraries 

A very helpful aspect of working with Python is it's support for Libraries. These are packages of code written by other people that save a lot of time for us as end users. Arguabley the most important library for any data scientist in Python is Pandas. Pandas is a library that gives support for DataFrames, which are the main structure by which we can store and use our data.

In [55]:
#import statement
import pandas

## Concept 2 - DataFrame Construction

We can make a DataFrame in many ways. An intuitive way is using the following pattern.

Call pandas.DataFrame()

Give an argument in the form.

{'Column_name':[things,in,the,column], 'Next_column':[things,in,the,column]}

In [57]:
df = pandas.DataFrame({'Column 1':[1,2,3], 'Column 2':[4,5,6]})

df

Unnamed: 0,Column 1,Column 2
0,1,4
1,2,5
2,3,6


Note that pandas has automatically given us row indices from 0:2 for each of our rows.

## Concept 3 - DataFrame Importing

Reading data from csv is simple using pandas. We simply call the pandas function read_csv, with the csv location as an argument. The following code saves a dataframe with information about the passengers of the titanic to a variable named titanic. You can use Jupyter's interface to scroll through the data. Neat huh?

Data Dictionary-

survival	Survival	0 = No, 1 = Yes
pclass	Ticket class	1 = 1st, 2 = 2nd, 3 = 3rd
sex	Sex	
Age	Age in years	
sibsp	# of siblings / spouses aboard the Titanic	
parch	# of parents / children aboard the Titanic	
ticket	Ticket number	
fare	Passenger fare	
cabin	Cabin number	
embarked	Port of Embarkation	C = Cherbourg, Q = Queenstown, S = Southampton

In [133]:
ticket_info = pandas.read_csv('~/Desktop/ticket_info.csv', index_col='PassengerId')

## Concept 4 - DataFrame Indexing & Slicing

DataFrames make use of 'methods' to perform operations on them. For example, returning a particular subset of the information found in our titanic dataframe requires the .iloc method.

In [134]:
ticket_info.iloc[2]

Cabin                    NaN
Embarked                   S
Fare                   7.925
Pclass                     3
Ticket      STON/O2. 3101282
Name: 3, dtype: object

Returns the 3rd row of the dataframe, with all variables. Remember indices in Python start at 0. We can slice the same way we did with lists, using the colon.

In [135]:
ticket_info.iloc[2:50]

Unnamed: 0_level_0,Cabin,Embarked,Fare,Pclass,Ticket
PassengerId,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
3,,S,7.925,3,STON/O2. 3101282
4,C123,S,53.1,1,113803
5,,S,8.05,3,373450
6,,Q,8.4583,3,330877
7,E46,S,51.8625,1,17463
8,,S,21.075,3,349909
9,,S,11.1333,3,347742
10,,C,30.0708,2,237736
11,G6,S,16.7,3,PP 9549
12,C103,S,26.55,1,113783


## Concept 5 - DataFrame Merging

Sometimes we have data in different frames that needs to be combined in some way. This problem occurs often in the real world and a solution is provided by pandas in the form of merges. Using the same technique as found in concept 2, we can make three dataframes. All we need is a common factor between the two dataframes that we can use as a ‘key’ to join the two frames together. In this instance we can use the name of the show.

In [102]:
netflix1 = pandas.DataFrame({'Show Name':
                             ['House of Cards','Orange is the New Black','Making a Murderer'], 
                             'Category':
                             ['Drama','Drama','Documentary']
                            })

netflix2 = pandas.DataFrame({'Show Name':
                             ['House of Cards','Orange is the New Black','Making a Murderer'],
                             'Season Count':
                             [6,6,2]
                            })

netflix3 = pandas.DataFrame({'Show Name':
                             ['House of Cards','Orange is the New Black','Making a Murderer'], 
                             'Original Release':
                             ['2013-02-01','2013-07-11','2015-12-18']
                            })


In [103]:
netflix1

Unnamed: 0,Show Name,Category
0,House of Cards,Drama
1,Orange is the New Black,Drama
2,Making a Murderer,Documentary


In [104]:
netflix2

Unnamed: 0,Show Name,Season Count
0,House of Cards,6
1,Orange is the New Black,6
2,Making a Murderer,2


In [105]:
netflix3

Unnamed: 0,Show Name,Original Release
0,House of Cards,2013-02-01
1,Orange is the New Black,2013-07-11
2,Making a Murderer,2015-12-18


Using pandas 'merge' method we can combine these

In [142]:
netflix1.merge(netflix2)

Unnamed: 0,Show Name,Category,Season Count
0,House of Cards,Drama,6
1,Orange is the New Black,Drama,6
2,Making a Murderer,Documentary,2


It's also possible to 'chain' together several merges like so

In [143]:
netflix1.merge(netflix2).merge(netflix3)

Unnamed: 0,Show Name,Category,Season Count,Original Release
0,House of Cards,Drama,6,2013-02-01
1,Orange is the New Black,Drama,6,2013-07-11
2,Making a Murderer,Documentary,2,2015-12-18


## Topic 3 Activity

Find out how many rows are in the combined ticket and passenger info data? What was the common factor (key) you used to merge the dataframes?

Get rows 460-470 of data. Were any of these rows more expensive tickets than the ticket held by passenger id 400?

In [8]:
#Solution

ticket_info = pandas.read_csv('~/Desktop/ticket_info.csv')

passenger_info = pandas.read_csv('~/Desktop/passenger_info.csv')

#Common factor is PassengerId

total_info = ticket_info.merge(passenger_info)

total_info.iloc[460:470]

Unnamed: 0,PassengerId,Cabin,Embarked,Fare,Pclass,Ticket,Age,Name,Parch,Sex,SibSp,Survived
460,461,E12,S,26.55,1,19952,48.0,"Anderson, Mr. Harry",0,male,0,1.0
461,462,,S,8.05,3,364506,34.0,"Morley, Mr. William",0,male,0,0.0
462,463,E63,S,38.5,1,111320,47.0,"Gee, Mr. Arthur H",0,male,0,0.0
463,464,,S,13.0,2,234360,48.0,"Milling, Mr. Jacob Christian",0,male,0,0.0
464,465,,S,8.05,3,A/S 2816,,"Maisner, Mr. Simon",0,male,0,0.0
465,466,,S,7.05,3,SOTON/O.Q. 3101306,38.0,"Goncalves, Mr. Manuel Estanslas",0,male,0,0.0
466,467,,S,0.0,2,239853,,"Campbell, Mr. William",0,male,0,0.0
467,468,,S,26.55,1,113792,56.0,"Smart, Mr. John Montgomery",0,male,0,0.0
468,469,,Q,7.725,3,36209,,"Scanlan, Mr. James",0,male,0,0.0
469,470,,C,19.2583,3,2666,0.75,"Baclini, Miss. Helene Barbara",1,female,2,1.0


In [7]:
#Solution

total_info.iloc[484]

PassengerId                        485
Cabin                              B49
Embarked                             C
Fare                           91.0792
Pclass                               1
Ticket                           11967
Age                                 25
Name           Bishop, Mr. Dickinson H
Parch                                0
Sex                               male
SibSp                                1
Survived                             1
Name: 484, dtype: object