# Tutorial 1

In this tutorial, we'll cover the basics of Python. After this tutorial you should be able to:

1. Understand features of different data types
2. Understand and implement conditional logic
3. Write basic functions
4. Introduction to numpy and pandas

### Jupyter notebooks

Welcome to your first Jupyter notebook, an open-source web app that allows you to create documents with live text, code and visualizations. Jupyter notebooks allow us to embed code and execute this code directly in the notebook, by effectively giving us a Python terminal - try below

In [2]:
2+2

4

In [4]:
a = "Hello "
b = "World"

a+b

'Hello World'

## Python indentation

The first thing you'll notice about Python is that indentation matters.

In [61]:
a = 5
    b = 4

IndentationError: unexpected indent (<ipython-input-61-635a605b55f9>, line 2)

You'll see this more prominently when we move onto conditional logic, functions and classes

## Data types 

The first building block in Python we'll cover are data types. While you'll be familiar with all of the standard types, Python has some interesting data types you may not yet be familiar with. Let's cover the standard data types first

### Integers

In [1]:
#Assignment

a = 5
type(a)

int

In [2]:
#Addition

2+2

4

In [3]:
#Subtraction

10-8

2

In [10]:
#Division

4/2

2.0

In [11]:
#Multiplication

2*3

6

In [12]:
# Modulo

7%4

3

In [13]:
# Powers

2**3

8

In [14]:
#Order of operation is observed

2 + 3 * 2

8

### Strings

Strings can be denoted with either single or double quotes

In [15]:
"hello"

'hello'

In [16]:
'hello'

'hello'

Although, when you use apostrophes in your string, make sure you use double quotes

In [5]:
'Hello, I'm John'

SyntaxError: invalid syntax (<ipython-input-5-310882b62b11>, line 1)

In [7]:
"Hello, I'm John"

"Hello, I'm John"

While simply typing a string and hiting enter in your notebook, the correct way to print a string involves using the print function - to see why, see what happens when we try and output multiple strings

In [8]:
"Hello"
"How are you"
"I'm fine, thank you"

"I'm fine, thank you"

In [11]:
print("Hello")
print("How are you")
print("I'm fine, thank you")

Hello
How are you
I'm fine, thank you


Another nice feature of strings is that we can index them to get specific characters. Note, Python starts indexing at 0

In [13]:
a = "Hello, World"
print(a[0])
print(a[7])

H
W


We can also make use of <code>:</code> to perform slicing, which extracts every character up to, or before a certain index position

In [14]:
print(a[1:])
print(a[:7])

ello, World
Hello, 


Also, Python let's us use negative indexing, particularly useful with long strings. Negative indexing starts at -1 

In [15]:
print(a[-1])

d


Importantly, strings are *immutable* - this means once we create a string, we cannot alter it's content

In [16]:
a[1] = 'z'

TypeError: 'str' object does not support item assignment

What strings do however allow us to do is concetenate different strings using addition. Note, that the addition command does not add in a space between the two strings.

In [18]:
print(a + ",I've been concatenated")
print(a + ", I've been concatenated")

Hello, World,I've been concatenated
Hello, World, I've been concatenated


One of the most useful features of strings is the ability to use the <code>%</code> operator to insert strings

In [19]:
print("Hello, my name is %s, and I am from %s" %('John','Cape Town'))

Hello, my name is John, and I am from Cape Town


All data types, or objects in Python, have certain methods which are available to them. Methods will become clearer when we move to the object oriented part of the course, but for now, think of them as functions that are available to specific data types. In this case, we'll look at some of the methods available to string variables. Also note how we implement these methods or functions - instead of what you may be used to, something like say <code>method(input)</code>, in Python we use <code>input.method()</code>. Again, the reason for this notation this will make more sense when we move to OOP.

In [44]:
# Capitalize

hello_world = "Hello, World"
print(hello_world)

hello_world.upper()

Hello, World


NameError: name 'upper' is not defined

In [41]:
# Split every word seperated by spaces in a sentence

my_sentence = "Hello how are you?"
my_sentence.split()

['Hello', 'how', 'are', 'you?']

### Booleans

In [21]:
a = True
b = False
type(a)

bool

In [84]:
10 > 5

True

In [87]:
len('abc') > len('abcd')

False

### Lists

Lists represent an ordered sequence or collection of items. Importantly, lists are mutable, which mean we can edit them. We assign lists using [ ]. Lists can store different data types and can store nested lists as well.

In [23]:
myList = [1, 2, 3, 4]
myList[0]

1

In [24]:
#Mutability

myList[0] = 5
myList

[5, 2, 3, 4]

In [25]:
#Importantly, lists can store different data types

myNewList = [1, 2, "hello"]
print(myNewList)
print(type(myNewList[0]))
print(type(myNewList[2]))

[1, 2, 'hello']
<class 'int'>
<class 'str'>


In [26]:
# Nested lists

my_list1 = [1, 2, 3]
my_list2 = [4, 5, 6]
my_list3 = [7, 8, 9]

my_list = [my_list1, my_list2, my_list3]

print(my_list)

print(my_list[0])

print(my_list[0][1])

[[1, 2, 3], [4, 5, 6], [7, 8, 9]]
[1, 2, 3]
2


### Tuples

Similar to lists, tuples represent another way to store a collection of items. Unlike lists however, tuples are *immutable* meaning they can not be changed. We'll typically store data that is factual (no need to be changed) as tuples.

In [27]:
myTuple = (1, 2, 3, 4)
myTuple[0]

1

In [28]:
#Immutability

myTuple[0] = 5 

TypeError: 'tuple' object does not support item assignment

In [29]:
# We can however append tuples to tuples

to_append = (5, 6)
myTuple = myTuple + to_append
myTuple

(1, 2, 3, 4, 5, 6)

### Sets

Sets also store a collection of items. Unlike lists and tuples however, sets are *unordered* and only store *unique* elements

In [31]:
my_set = {1,2,3}
my_set
my_set[0]

TypeError: 'set' object does not support indexing

In [33]:
# Tuples store unique elements

my_list = [1,1,1,1,2,2,2,3,3,3]
my_new_set = set(my_list)

In [82]:
# Since sets are unordered, they cannot be indexed

my_new_set[0]

TypeError: 'set' object does not support indexing

### Dictionaries

Unlike the previous data structures, dictionaries are less of a sequence and more of a mapping. Dictionaries store a collection of items by using a *key* as opposed to the index number or relative position. When we reference elements in a dictionary, we use the _key_ as opposed to the _index position_. As a result, dictionaries are unordered.

In [52]:
# We create a dictionary using {}
# We then assign a key and a value pair, seperated by :

my_dictionary = {'key1':'value1','key2':'value2'}

# To call the first value, we index the key

print(my_dictionary['key1'])

value1


In [54]:
# Dicitonaries are useful, particularly when we have multiple dictionaries that have the same key

capitals = {'Japan':'Tokyo', 'USA': 'Washington', 'Argentina': 'Buenos Aires'}

print(capitals["Argentina"])

languages = {'Japan':'Japanese', 'USA': 'English', 'Argentina': 'Spanish'}

print(languages["Argentina"])

Buenos Aires
Spanish


In [57]:
# We can append items easily

capitals['South Africa'] = ('Cape Town', 'Pretoria', 'Bloemfontein')
print(capitals)

{'Japan': 'Tokyo', 'USA': 'Washington', 'Argentina': 'Buenos Aires', 'South Africa': ('Cape Town', 'Pretoria', 'Bloemfontein')}


### Conditional logic

As with every programming languages, understanding conditional logic and how to implement it, is one the first steps in mastering the languages. Let's briefly cover how to implement the various types


### If, elif and else

The if, elif and else statements allows for the conditional execution of code  depending on one or more conditional
expressions being met

In [1]:
person = 'John'

if person == 'Trevor':
    print('Welcome Trevor!')
elif person =='John':
    print('Welcome John!')
else:
    print("Welcome, what's your name?")

Welcome John!


In [2]:
person = 'Sally'

if person == 'Trevor':
    print('Welcome Trevor!')
elif person =='John':
    print('Welcome John!')
else:
    print("Welcome, what's your name?")

Welcome, what's your name?


In [3]:
# Note how Python throws an error when we don't indent correctly

person = 'Sally'

if person == 'Trevor': # Note the :
    print('Welcome Trevor!') # Incorrect indentation
elif person =='John':
    print('Welcome John!')
else:
    print("Welcome, what's your name?")

Welcome, what's your name?


### For loops

The for loop is used to repeat a collection of statements a fixed number of times

In [5]:
numbers_list = list(range(1,11)) # Note the notation - 1 to 11 means that we include numbers >= 1 and < 11

for num in numbers_list:
    if num % 2 == 0:
        print(num)

2
4
6
8
10


In [22]:
#Using a for loop to sum or keep track of a counter

sum_of_list = 0 

for num in numbers_list:
    sum_of_list += num

print(sum_of_list)

55


In [7]:
# Tuple unpacking with for loops

my_tuple_list = [(1,2), (3,4), (5,6), (7,8)]

for tup in my_tuple_list:
    print(tup)

(1, 2)
(3, 4)
(5, 6)
(7, 8)


In [8]:
# Using unpacking we can choose which element we want to print    

for (1, tup2) in my_tuple_list:
    print(tup2) my_list

2
4
6
8


In [36]:
# We can implement unpacking with dictionaries too to seperate values and keys

capitals = {'Japan':'Tokyo', 'USA': 'Washington', 'Argentina': 'Buenos Aires'}

for cap in capitals:
    print(cap)

Japan
USA
Argentina


In [38]:
# Since dictionary values can't be iterated over, we use the .items() method which creates a dictionary view object which is iterable

print(capitals.items())

dict_items([('Japan', 'Tokyo'), ('USA', 'Washington'), ('Argentina', 'Buenos Aires')])
Tokyo
Washington
Buenos Aires


In [39]:
for (cap_key, cap_val) in capitals.items():
    print(cap_val)

Tokyo
Washington
Buenos Aires


### While loops

The while loop is a construct that repeats statements while a condition remains true. It is used to repeat a
collection of statements a variable number of times

In [41]:
sum_of_series = 0

while sum_of_series < 10:
    print('sum_of_series is currently: ', sum_of_series)
    sum_of_series+=1    

sum_of_series is currently:  0
sum_of_series is currently:  1
sum_of_series is currently:  2
sum_of_series is currently:  3
sum_of_series is currently:  4
sum_of_series is currently:  5
sum_of_series is currently:  6
sum_of_series is currently:  7
sum_of_series is currently:  8
sum_of_series is currently:  9


In [42]:
# We can also make use of the else statement with a while to execute something when the while has terminated

sum_of_series = 0

while sum_of_series < 10:
    print('sum_of_series is currently: ', sum_of_series)
    sum_of_series+=1    

else:
    print('Finished!')

sum_of_series is currently:  0
sum_of_series is currently:  1
sum_of_series is currently:  2
sum_of_series is currently:  3
sum_of_series is currently:  4
sum_of_series is currently:  5
sum_of_series is currently:  6
sum_of_series is currently:  7
sum_of_series is currently:  8
sum_of_series is currently:  9
Finished!


Another important feature of while loops is the ability to incorporate the <code>break</code>, <code>continue</code> and <code>pass</code> statements

* break: exit out of the current closest loop
* continue: go to the top of the closest loop
* pass: do nothing


Typically, we'll use one of these statements with an <code>if</code> statement to execute the action conditonally

In [11]:
sum_of_series = 0

while sum_of_series < 10:
    print('sum_of_series is currently: ', sum_of_series)
    sum_of_series+=1   
    if sum_of_series == 5:
        print('Reached 5')
        break
    else:
        print('Not yet 5')

sum_of_series is currently:  0
Not yet 5
sum_of_series is currently:  1
Not yet 5
sum_of_series is currently:  2
Not yet 5
sum_of_series is currently:  3
Not yet 5
sum_of_series is currently:  4
Reached 5


SyntaxError: invalid syntax (<ipython-input-12-0be0eaa72393>, line 1)

## Functions

A function is a way to group together code that can be called and run more than once, which can accept parameters/arguments that can be used as inputs. A function's primary purpose is to prevent us from writing repetitive code. 

Functions all have the following structure

In [15]:
def name_of_func(arg1,arg2):
    pass

In [17]:
def bark(name):
    
    print('Woof! I am ' + name)

bark()

TypeError: bark() missing 1 required positional argument: 'name'

Importantly, when we want the function to return something that we intend to store as a variable as opposed to an action, we have to use the <code>return</code> statement

In [22]:
def square_me(num):
    
    result = num**2

res1 = square_me(2)
print(res1)

# Note what the return does here

def square_me(num):
    
    result = num**2
    return result

res2 = square_me(2)
print(res2)

None
4


## Using libraries - Numpy and Pandas

Like all other languages, Python has a range of useful libraries which provide additional functionality far beyond base Python. One of base Python's major limitations involves it's native inability to do operations over 1 dimensional data structures like list and it's inability store and handle multidimensional data objects like dataframes.

Two of the most used Python packages address this: Numpy and Pandas

### Numpy

Numpy is a package or libary which provides functionality large multidimensionality array objects and tools for working with these objects. To see why Numpy is so important, try the code snippet below

In [23]:
first_list = [20, 40, 60]
first_list/5    

TypeError: unsupported operand type(s) for /: 'list' and 'int'

Base Python cannot implement any operation over a list. Numpy allows us to do this.

In [25]:
import numpy as np

# note that when we import a package we have the option to rename it
# This is particularly useful since, we'll often make use of the following syntax to call methods - package_name.method()
# We can therefore shorten the names of packages we use often
# Also, one of the benefits of the Anaconda distribution of Python is that it comes with many pre-installed packages
# Later in the course we'll look at how to install packages

np_first_list = np.array(first_list) # we convert our list to a numpy array

np_first_list/5

type(np_first_list)

numpy.ndarray

A major limitation of Numpy is the fact that it can only store one data type. Nonetheless, numpy represents the standard package for scientific computing.

### Pandas

The next library we'll look at is Pandas. Pandas gives us two data structures: the 1 dimensional *series* and the 2 dimensional *dataframe*. Using Pandas, we can now work with tabular data. You can think of the Pandas dataframe as the Python equivalent of the R dataframe you learnt about earlier in the year.

One feature that is particularly useful is the ability to create dataframes from dictionaries

In [27]:
import pandas as pd

countries = {'Japan':('Tokyo','Japanese'), 
             'USA': ('Washington', 'English'),
             'Argentina': ('Buenos Aires', 'Spanish')}

countries_df = pd.DataFrame(countries)

countries_df

Unnamed: 0,Japan,USA,Argentina
0,Tokyo,Washington,Buenos Aires
1,Japanese,English,Spanish


Since we can now work with tabular data, we can easily read in external data and save them as Panda data frames

In [28]:
FNAME = "http://www.stat.ucla.edu/projects/datasets/twins.dat"

df = pd.read_csv(FNAME)

# Inspection

df.shape # Dimensions

(183, 16)

In [80]:
df.head() # First 5 lines

Unnamed: 0,DLHRWAGE,DEDUC1,AGE,AGESQ,HRWAGEH,WHITEH,MALEH,EDUCH,HRWAGEL,WHITEL,MALEL,EDUCL,DEDUC2,DTEN,DMARRIED,DUNCOV
0,0.2593466,0,33.251198,1105.642156,11.25,1,0,16,8.68,1,0,16,0,1.333,0,0
1,.,-1,54.053388,2921.768764,.,1,0,9,7.85,1,0,10,1,8.0,1,0
2,0.721318058,7,43.570157,1898.358618,18,1,0,19,8.75,1,0,12,4,3.0,-1,0
3,0.011581964,0,30.96783,959.006511,16.5,1,1,12,16.31,1,1,12,0,-2.0,0,1
4,-0.560984677,0,34.633812,1199.500965,9.6154,1,1,14,16.85,1,1,14,1,2.917,0,-1


In [81]:
df.tail(3) # Last 3 lines

Unnamed: 0,DLHRWAGE,DEDUC1,AGE,AGESQ,HRWAGEH,WHITEH,MALEH,EDUCH,HRWAGEL,WHITEL,MALEL,EDUCL,DEDUC2,DTEN,DMARRIED,DUNCOV
180,0.850332764,2,40.312115,1625.066615,5.5,1,0,14,2.35,1,0,12,2,2,0,0
181,-0.729514825,-2,28.413415,807.322179,7.5,1,0,14,15.55555556,1,0,16,-2,-2,0,0
182,-0.500775288,0,28.413415,807.322179,9.0,1,0,16,14.85,1,0,16,-2,3,0,0


In [82]:
df.columns # List of variables as list

Index(['DLHRWAGE', 'DEDUC1', 'AGE', 'AGESQ', 'HRWAGEH', 'WHITEH', 'MALEH',
       'EDUCH', 'HRWAGEL', 'WHITEL', 'MALEL', 'EDUCL', 'DEDUC2', 'DTEN',
       'DMARRIED', 'DUNCOV'],
      dtype='object')

In [83]:
df.dtypes # List of all variables with type

DLHRWAGE     object
DEDUC1        int64
AGE         float64
AGESQ       float64
HRWAGEH      object
WHITEH        int64
MALEH         int64
EDUCH         int64
HRWAGEL      object
WHITEL        int64
MALEL         int64
EDUCL         int64
DEDUC2        int64
DTEN         object
DMARRIED      int64
DUNCOV        int64
dtype: object

In [84]:
df.describe() # Summary statistics

Unnamed: 0,DEDUC1,AGE,AGESQ,WHITEH,MALEH,EDUCH,WHITEL,MALEL,EDUCL,DEDUC2,DMARRIED,DUNCOV
count,183.0,183.0,183.0,183.0,183.0,183.0,183.0,183.0,183.0,183.0,183.0,183.0
mean,-0.010929,38.694015,1666.242009,0.934426,0.459016,13.95082,0.939891,0.459016,13.961749,-0.021858,-0.016393,0.021858
std,1.881048,13.036251,1181.710178,0.248215,0.499685,2.332943,0.238341,0.499685,2.204805,1.886785,0.507909,0.4912
min,-7.0,18.781656,352.750617,0.0,0.0,8.0,0.0,0.0,8.0,-6.0,-1.0,-1.0
25%,-1.0,28.977413,839.696325,1.0,0.0,12.0,1.0,0.0,12.0,-1.0,0.0,0.0
50%,0.0,36.041068,1298.958565,1.0,0.0,13.0,1.0,0.0,14.0,0.0,0.0,0.0
75%,0.0,44.77755,2005.034018,1.0,1.0,16.0,1.0,1.0,16.0,0.0,0.0,0.0
max,7.0,79.123888,6260.589612,1.0,1.0,20.0,1.0,1.0,20.0,7.0,1.0,1.0


In [86]:
df.loc[6] # Seventh row

DLHRWAGE    1.523260216
DEDUC1               -2
AGE             34.9788
AGESQ           1223.52
HRWAGEH              35
WHITEH                1
MALEH                 0
EDUCH                13
HRWAGEL            7.63
WHITEL                1
MALEL                 0
EDUCL                15
DEDUC2               -2
DTEN                  3
DMARRIED              1
DUNCOV                0
Name: 6, dtype: object