# Section 7: python practice notebook

In this section we are going through an introduction to python.  The .html file has more details but this notebook will serve as a place to illustrate the code from the html file and do the exercises

Note, to run the code chunks the keyboard shortcut is Shift + Enter.  For a list of useful keyboard shortcuts in jupyter notebook you can go to this [link](https://towardsdatascience.com/jypyter-notebook-shortcuts-bf0101a98330)

## Objects
First, we create a list and print it out

In [None]:
myList = [1, 2, 'foo']
print("original list:", myList)

Note that the indexing in python starts at 0

In [None]:
print("First item:", myList[0])
print("Second item:", myList[1])

You can also update parts of the list

In [None]:
myList[1] = 2.5
myList

However, you cannot do that with tuples

In [None]:
myTuple = (1, 2, 'foo')
print("original tuple:", myTuple)

# try to update the tuple
myTuple[1] = 2.5

In [None]:
# the tuple remains the same 
print("new tuple:", myTuple)

## Variables

In [None]:
a = 'foobar'
print("original a:", a)
print("different than R:", a * 4)
print("length of a:", len(a))

In [None]:
a = 3
print("original a:", a)
print("what we expect, but different than above:", a * 4)
# produces an error
print("length of a:", len(a)) 

## Modules, files, packages, import

In [None]:
# deleting a
del(a)

In [None]:
# mytest.py is a python script in the sections/07 folder
import mytest

In [None]:
mytest.hello()

In [None]:
mytest.a

In [None]:
# this will not work
hello()

In [None]:
# or this 
a

In [None]:
from mytest import *

# now they do
hello()
a

Also import Python packages, similar to loading an R library, except you can choose which methods to import.  

In [None]:
from math import cos
print("Can compute cos now:", cos(0))
print("Can't compute sin:", sin(0))

In [None]:
# import the whole package and we can 
import math 
print("Can compute cos:", math.cos(0), "and sin:", math.sin(0))

In [None]:
# importing numpy as np 
import numpy as np

# this will not work because we imported as np
numpy.arctan(1)

In [None]:
# this is how to use numpy
np.arctan(1)

See documentation

In [None]:
np.ndim?

## Decoding error messages
days.py is another python script that is included in the sections/07 folder.  This will show you how error messages look in python.

In [None]:
import days
days.print_friday_message()

## Data structures

### Numbers

In [31]:
print(2 * 3)
print(2 / 3)

6
0.6666666666666666


In [32]:
x = 1.1
type(x)

float

In [33]:
print("multiplication:", x * 2)
print("exponentiation:", x ** 2)

multiplication: 2.2
exponentiation: 1.2100000000000002


In [34]:
(type(1), type(1.1), type(1 + 2j))

(int, float, complex)

In [35]:
# trying various functions from math package we imported earlier
(math.cos(0), math.cos(math.pi), math.cos(x))

(1.0, -1.0, 0.4535961214255773)

In the empty chunk below try typing math. and then a tab to see all of the functions that come with the math package. 

### Exercises 
Compute the follow things in the empty chunks below
- $\left(\lceil \frac{3}{4} \times 4 \rceil\right)^3$

- $\sqrt(-1)$ 

## Objects 

In [36]:
x = 3.0
type(x)

float

Try typing x. followed by a tab to see the methods that are available for this float

### Tuples

In [37]:
x = 1; y = 'foo'
# define the tuple
xy = (x, y)
type(xy)

tuple

In [38]:
# another way to do it
xy = x, y
type(xy)

tuple

In [41]:
print("full tuple:", xy)
print("indexing tuple:", xy[1])

full tuple: (1, 'foo')
indexing tuple: foo


In [42]:
# recall tuples are immutable 
xy[1] = 3

TypeError: 'tuple' object does not support item assignment

In [43]:
a, b = x, y
print(a)
print(b)

1
foo


### Exercises
- Store x = 5 and y = 6.  Swap their values in a single line of code.  (How would you do this in R?)

- What happens when you multiple a tuple by a number? How is this different than similar syntax in R?

- What's nice about using immutable objects?

## Lists

In [50]:
dice = [1, 2, 3, 4, 5, 6]
print("original list:", dice)

# extend 
dice.extend([7, 8])
print("extended list:", dice)

# insert 
dice.insert(3, 100)
print("inserted list:", dice)

original list: [1, 2, 3, 4, 5, 6]
extended list: [1, 2, 3, 4, 5, 6, 7, 8]
inserted list: [1, 2, 3, 100, 4, 5, 6, 7, 8]


Indexing a list

In [51]:
dice = [1, 2, 3, 4, 5, 6]
print("first entry", dice[0])
print("second entry", dice[1])

first entry 1
second entry 2


The 6th index does not exist

In [52]:
dice[6]

IndexError: list index out of range

Using sequencing

In [53]:
dice = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
dice[1::2]

[2, 4, 6, 8, 10]

In [54]:
dice[1:4:2]

[2, 4]

In [55]:
dice[1::2] = dice[::2]
dice

[1, 1, 3, 3, 5, 5, 7, 7, 9, 9]

### Exercises
- What do you get if you multiply a list of numbers by a number? 

- What does the following tell you about copying and memory use in Python?

In [56]:
a = [1, 3, 5]
b = a
print("address of a:", id(a))
print("address of a:", id(b))

# update a
a[1] = 5
print("address of updated a:", id(a))

140520920032200
140520920032200
140520920032200


## Dictionaries

In [57]:
students = {"Jarrod Millman": ['A', 'B+', 'A-'],
            "Thomas Kluyver": ['A-', 'A-'],
            "Stefan van der Wait": 'and now for something completely different.'
           }
students

{'Jarrod Millman': ['A', 'B+', 'A-'],
 'Stefan van der Wait': 'and now for something completely different.',
 'Thomas Kluyver': ['A-', 'A-']}

In [58]:
students.keys()

dict_keys(['Jarrod Millman', 'Thomas Kluyver', 'Stefan van der Wait'])

In [59]:
students.values()

dict_values([['A', 'B+', 'A-'], ['A-', 'A-'], 'and now for something completely different.'])

In [60]:
students["Jarrod Millman"]

['A', 'B+', 'A-']

In [61]:
students["Jarrod Millman"][1]

'B+'

Try typing students. followed by a tab to see what methods are available for dictionaries

## Control flow

In [62]:
x = 2 
if(x >= 4):
    print("a is big")
    if(a == 4):
        print("a is small")
else: 
    print("a is small")

a is small


In [63]:
if(x >= 4):
    print("a is big")
    if(a == 4):
        print("a is small")
    else: 
        print("a is small")

### For loops and list comprehension

In [64]:
for x in [1, 2, 3, 4]:
    print(x)

1
2
3
4


In [65]:
for x in [1, 2, 3, 4]:
    y = x * 2
    print(y, end = " ")

2 4 6 8 

In [67]:
for x in range(30):
    print(x)
    y = x

0
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29


In [70]:
print(y)

29


In [72]:
y = [x for x in range(4)]
y

[0, 1, 2, 3]

List comprehension

In [74]:
vals = [-4, 3, -1, 2.5, 7]
[x for x in vals if x > 0] 

[3, 2.5, 7]

### Exercises
- See what [1, 2, 3] + 3 returns. Try to explain what happened and why. 

- Use list comprehension to perform element-wise addition of a scalar to a list of scalars

## Functions

In [77]:
def add(x, y = 1, absol = False):
    if absol: 
        return(abs(x + y))
    else: 
        return(x + y)

In [78]:
add(3)

4

In [79]:
add(3, 5)

8

In [80]:
add(3, absol = True, y = 5)

8

In [81]:
add(y = -5, x = 3)

-2

In [82]:
add(y = -5, 3)

SyntaxError: positional argument follows keyword argument (<ipython-input-82-2f4d96b8a2a8>, line 1)

### Excercise 
- Define a function that will take the square root of a number of will (if requested by the user) set the square root of a negative number to 0. 

## Math and statistics: NumPy and SciPy

In [83]:
z = [0, 1, 2]

In [84]:
y = np.array(z)
y

array([0, 1, 2])

In [85]:
y.dtype

dtype('int64')

In [87]:
x = np.array([[1, 2], [3, 4]], dtype = np.float64)
# element-wise multiplication
x * x

array([[ 1.,  4.],
       [ 9., 16.]])

In [88]:
# matrix multiplication
x.dot(x)

array([[ 7., 10.],
       [15., 22.]])

In [89]:
# transpose 
x.T

array([[1., 3.],
       [2., 4.]])

In [90]:
# SVD 
np.linalg.svd(x)

(array([[-0.40455358, -0.9145143 ],
        [-0.9145143 ,  0.40455358]]),
 array([5.4649857 , 0.36596619]),
 array([[-0.57604844, -0.81741556],
        [ 0.81741556, -0.57604844]]))

In [92]:
e = np.linalg.eig(x)
e[0]

array([-0.37228132,  5.37228132])

In [93]:
e[1][:, 0]

array([-0.82456484,  0.56576746])

Creating a sequence in numpy

In [94]:
np.linspace(0, 1, 5)

array([0.  , 0.25, 0.5 , 0.75, 1.  ])

Randomly sample from normal distribution

In [95]:
np.random.seed(0)
x = np.random.normal(size = 10)

In [96]:
pos = x > 0 
pos

array([ True,  True,  True,  True,  True, False,  True, False, False,
        True])

In [97]:
y = x[pos]
y

array([1.76405235, 0.40015721, 0.97873798, 2.2408932 , 1.86755799,
       0.95008842, 0.4105985 ])

In [98]:
x[[1, 3, 4]]

array([0.40015721, 2.2408932 , 1.86755799])

In [99]:
x[pos] = 0

In [100]:
np.cos(x)

array([1.        , 1.        , 1.        , 1.        , 1.        ,
       0.55928119, 1.        , 0.98856735, 0.99467766, 1.        ])

Some scipy routines

In [102]:
import scipy.stats as st
print(st.norm.cdf(1.96, 0, 1))
print(st.norm.cdf(1.96, 0.5, 2))
print(st.norm(0.5, 2).cdf(1.96))

0.9750021048517795
0.7673049076991025
0.7673049076991025


### Excercise
- See what happens if you try to create a numpy array with a mix of numbers and character strings. 

- Try to add a vector to a matrix; how does this compare to R?

## Pandas

In [104]:
import pandas as pd 
dat = pd.read_csv('gapminder.csv')
dat.head()

Unnamed: 0,country,year,pop,continent,lifeExp,gdpPercap
0,Afghanistan,1952,8425333.0,Asia,28.801,779.445314
1,Afghanistan,1957,9240934.0,Asia,30.332,820.85303
2,Afghanistan,1962,10267083.0,Asia,31.997,853.10071
3,Afghanistan,1967,11537966.0,Asia,34.02,836.197138
4,Afghanistan,1972,13079460.0,Asia,36.088,739.981106


In [105]:
dat.columns

Index(['country', 'year', 'pop', 'continent', 'lifeExp', 'gdpPercap'], dtype='object')

In [106]:
dat['year']

0       1952
1       1957
2       1962
3       1967
4       1972
        ... 
1699    1987
1700    1992
1701    1997
1702    2002
1703    2007
Name: year, Length: 1704, dtype: int64

In [107]:
dat.year

0       1952
1       1957
2       1962
3       1967
4       1972
        ... 
1699    1987
1700    1992
1701    1997
1702    2002
1703    2007
Name: year, Length: 1704, dtype: int64

In [109]:
dat[0:5]

Unnamed: 0,country,year,pop,continent,lifeExp,gdpPercap
0,Afghanistan,1952,8425333.0,Asia,28.801,779.445314
1,Afghanistan,1957,9240934.0,Asia,30.332,820.85303
2,Afghanistan,1962,10267083.0,Asia,31.997,853.10071
3,Afghanistan,1967,11537966.0,Asia,34.02,836.197138
4,Afghanistan,1972,13079460.0,Asia,36.088,739.981106


In [110]:
dat.sort_values(['year', 'country'])

Unnamed: 0,country,year,pop,continent,lifeExp,gdpPercap
0,Afghanistan,1952,8425333.0,Asia,28.801,779.445314
12,Albania,1952,1282697.0,Europe,55.230,1601.056136
24,Algeria,1952,9279525.0,Africa,43.077,2449.008185
36,Angola,1952,4232095.0,Africa,30.015,3520.610273
48,Argentina,1952,17876956.0,Americas,62.485,5911.315053
...,...,...,...,...,...,...
1655,Vietnam,2007,85262356.0,Asia,74.249,2441.576404
1667,West Bank and Gaza,2007,4018332.0,Asia,73.422,3025.349798
1679,Yemen Rep.,2007,22211743.0,Asia,62.698,2280.769906
1691,Zambia,2007,11746035.0,Africa,42.384,1271.211593


In [111]:
dat.loc[0:5, ['year', 'country']]

Unnamed: 0,year,country
0,1952,Afghanistan
1,1957,Afghanistan
2,1962,Afghanistan
3,1967,Afghanistan
4,1972,Afghanistan
5,1977,Afghanistan


In [112]:
dat[dat.year == 1952]

Unnamed: 0,country,year,pop,continent,lifeExp,gdpPercap
0,Afghanistan,1952,8425333.0,Asia,28.801,779.445314
12,Albania,1952,1282697.0,Europe,55.230,1601.056136
24,Algeria,1952,9279525.0,Africa,43.077,2449.008185
36,Angola,1952,4232095.0,Africa,30.015,3520.610273
48,Argentina,1952,17876956.0,Americas,62.485,5911.315053
...,...,...,...,...,...,...
1644,Vietnam,1952,26246839.0,Asia,40.412,605.066492
1656,West Bank and Gaza,1952,1030585.0,Asia,43.160,1515.592329
1668,Yemen Rep.,1952,4963829.0,Asia,32.548,781.717576
1680,Zambia,1952,2672000.0,Africa,42.038,1147.388831


In [113]:
ndat = dat[['pop','lifeExp','gdpPercap']]
ndat.apply(lambda col: col.max() - col.min())

pop          1.318623e+09
lifeExp      5.900400e+01
gdpPercap    1.132820e+05
dtype: float64

In [117]:
dat2007 = dat[dat.year == 2007].copy() 
dat2007.groupby('continent', as_index=False).mean()

Unnamed: 0,continent,year,pop,lifeExp,gdpPercap
0,Africa,2007,17875760.0,54.806038,3089.032605
1,Americas,2007,35954850.0,73.60812,11003.031625
2,Asia,2007,115513800.0,70.728485,12473.02687
3,Europe,2007,19536620.0,77.6486,25054.481636
4,Oceania,2007,12274970.0,80.7195,29810.188275


In [116]:
def stdize(vals):
    return((vals - vals.mean()) / vals.std())

dat2007['lifeExpZ'] = dat2007.groupby('continent')['lifeExp'].transform(stdize)
dat2007

Unnamed: 0,country,year,pop,continent,lifeExp,gdpPercap,lifeExpZ
11,Afghanistan,2007,31889923.0,Asia,43.828,974.580338,-3.377877
23,Albania,2007,3600523.0,Europe,76.423,5937.029526,-0.411301
35,Algeria,2007,33333216.0,Africa,72.301,6223.367465,1.816567
47,Angola,2007,12420476.0,Africa,42.731,4797.231267,-1.253796
59,Argentina,2007,40301927.0,Americas,75.320,12779.379640,0.385476
...,...,...,...,...,...,...,...
1655,Vietnam,2007,85262356.0,Asia,74.249,2441.576404,0.442069
1667,West Bank and Gaza,2007,4018332.0,Asia,73.422,3025.349798,0.338223
1679,Yemen Rep.,2007,22211743.0,Asia,62.698,2280.769906,-1.008383
1691,Zambia,2007,11746035.0,Africa,42.384,1271.211593,-1.289827


### Exercise 
- Use *pd.merge()* to merge the continent means for life expectancy for 2007 back into the original dat2007 dataFrame.

## Classes

In [122]:
class Rectangle(object):
    dim = 2  # class variable
    counter = 0
    def __init__(self, height, width):
        self.height = height  # instance variable
        self.width = width    # instance variable
        self.set_diagonal()
        Rectangle.counter += 1
    def __repr__(self):
        return("{0} by {1} rectangle".format(self.height, self.width))        
    def area(self, verbose = False):
        if verbose:
            print('Computing the area... ')
        return(self.height*self.width)
    def set_diagonal(self):
        self.diagonal = pow(self.height**2 + self.width**2, 0.5)

x = Rectangle(10, 5)
x

10 by 5 rectangle

In [123]:
print(x.dim)
x.dim = 'foo'
print(x.dim) # hmmm

2
foo


In [124]:
x.area()

50

In [125]:
Rectangle.area(x)

50

In [126]:
y = Rectangle(4, 8)
print(y.counter)
print(x.counter)

2

## Strings

In [127]:
import string
print(string.digits)
print(string.digits[1])
print(string.digits[-1])

0123456789
1
9


### Slicing

In [128]:
string.digits[1:5]

'1234'

In [129]:
string.digits[1:5:2]

'13'

In [130]:
string.digits[1::2]

'13579'

In [131]:
string.digits[:5:-1]

'9876'

In [132]:
string.digits[1:5:-1]

''

In [133]:
string.digits[-3:-7:-1]

'7654'

### Subsequence testing

In [142]:
string1 = "my string"

Look at string1. followed by a tab to see what methods are available. 

In [135]:
string1.upper()

'MY STRING'

In [136]:
string1 + "is your string"

'my stringis your string'

In [137]:
"*" * 10

'**********'

In [138]:
string1[3:]

'string'

In [139]:
string1[3:4]

's'

In [140]:
string1[4::2]

'tig'

In [143]:
# does not work
string1[3:5] = 'ts'

TypeError: 'str' object does not support item assignment

In [146]:
# one option
string1[:3] + 'ts' + string1[5:]

'my tsring'

In [147]:
print(string1 > "ab")
print(string1 > "zz")

AttributeError: 'str' object has no attribute '__'

See attributes of the string by typing string1._ followed by a tab:

### Exercises
Using string x = 'The ant wants what all ants want.' solve the following string manipulation problems 
- Convert the string to all lower case letters (don't change x)

- Count the number of occurrences of the substring ant.

- Create a list of the words occurring in x. Make sure to remove punctuation and convert all words to lowercase.

- Using only string methods on x, create the following string: The chicken wants what all chickens want.

- Using indexing and the + operator, create the following string: The tna wants what all ants want.

- Do the same thing except using a string method instead.

Some other string excercises 
- What can you do with the in and not in operators? What R operator is this like and how is it different?
- Figure out what code you could run to figure out if Python is explicitly counting the number of characters when it does len(x)?
- Compare the time for computing the length of a (long) string in Python and R. What can you infer about what is happening behind the scenes?