# Introduction to Python

In [136]:
import pandas as pd
import numpy as np

## Chapter 1. Python Basics

## 1. Hello Python!

### Python

* Guido Van Rossum
* General Purpose: build anything
* Open Source! Free!
* Python Packages, also for Data Science
    * Many applications and fields
* Version 3.x - https://www.python.org/downloads/

### Python Script

* Text Files - .py
* List of Python Commands
* Similar to typing in IPython Shell

### №1 The Python Interface

* Experiment in the IPython Shell; type `5 / 8`, for example
* Add another line of code to the Python script on the top-right (not in the Shell): `print(7 + 10)`

In [1]:
print(5 / 8)
print(7 + 10)

0.625
17


### №2 When to use Python?

Python is a pretty versatile language. For which applications can you use Python?

* You want to do some quick calculations
* For your new business, you want to develop a database-driven website
* Your boss asks you to clean and analyze the results of the latest satisfaction survey
* *All of the above*

### №3 Any comments?

* Above the `print(7 + 10)`, add the comment `# Addition`

In [2]:
print(5 / 8)
# Addition
print(7 + 10)

0.625
17


### №4 Python as a calculator

* Suppose you have $100, which you can invest with a 10% return each year. After one year, it's 100×1.1=110 dollars, and after two years it's 100×1.1×1.1=121. Add code on the right to calculate how much money you end up with after 7 years

In [3]:
print(5 + 5)
print(5 - 5)

print(3 * 5)
print(10 / 2)
print(18 % 7)
print(4 ** 2)

print(100 * 1.1 ** 7)

10
0
15
5.0
4
16
194.87171000000012


## 2. Variables and Types

### Variable

* Specific, case-sensitive name
* Call up value through variable name
* 1.79 m - 68.7 kg

In [4]:
height = 1.79
weight = 68.7
height

1.79

### Calculate BMI

$$ BMI = \frac{weight}{height^2} $$

In [5]:
68.7 / 1.79 ** 2

21.44127836209856

In [6]:
weight / height ** 2

21.44127836209856

In [7]:
bmi = weight / height ** 2
bmi

21.44127836209856

### Reproducibility

**my_script.py**

```python
height = 1.79
weight = 68.7
bmi = weight / height ** 2
print(bmi)
```

```
Output:
21.4413
```

### Python Types

In [8]:
type(bmi)

float

In [9]:
day_of_week = 5
type(day_of_week)

int

### Python Types (2)


In [10]:
x = 'body mass index'
y = 'this works too'
type(y)

str

In [11]:
z = True
type(z)

bool

### Python Types (3)

In [12]:
2 + 3

5

In [13]:
'ab' + 'cd'

'abcd'

### №5 Variable Assignment

* Create a variable `savings` with the value 100
* Check out this variable by typing `print(savings)` in the script

In [14]:
savings = 100
print(savings)

100


### №6 Calculations with variables

* Create a variable `growth_multiplier`, equal to `1.1`
* Create a variable, `result`, equal to the amount of money you saved after `7` years
* Print out the value of `result`

In [15]:
savings = 100
factor = 1.10

result = savings * factor ** 7
print(result)

194.87171000000012


### №7 Other variable types

* Create a new string, `desc`, with the value `'compound interest'`
* Create a new boolean, `profitable`, with the value `True`

In [16]:
desc = 'compound interest'
profitable = True

### №8 Guess the type

We already went ahead and created three variables: `a`, `b` and `c`. You can use the IPython shell on the right to discover their type. Which of the following options is correct?

* `a` is of type `int`, `b` is of type `str`, `c` is of type `bool`
* `a` is of type `float`, `b` is of type `bool`, `c` is of type `str`
* *`a` is of type `float`, `b` is of type `str`, `c` is of type `bool`*
* `a` is of type `int`, `b` is of type `bool`, `c` is of type `str`

In [17]:
a = 194.87171000000012
b = 'True'
c = False

### №9 Operations with other types

* Calculate the product of `savings` and `growth_multiplier`. Store the result in `year1`
* What do you think the resulting type will be? Find out by printing out the type of `year1`
* Calculate the sum of `desc` and `desc` and store the result in a new variable `doubledesc`
* Print out `doubledesc`. Did you expect this?

In [18]:
savings = 100
factor = 1.1
desc = 'compound interest'

year1 = savings * factor
print(type(year1))

doubledesc = desc + desc
print(doubledesc)

<class 'float'>
compound interestcompound interest


### №10 Type conversion

* Fix the code on the right such that the printout runs without errors; use the function `str()` to convert the variables to strings
* Convert the variable `pi_string` to a float and store this float as a new variable, `pi_float`

In [19]:
savings = 100
result = 100 * 1.10 ** 7

print('I started with $' + str(savings) + ' and now have $' + str(result) + '. Awesome!')

pi_string = "3.1415926"

pi_float = float(pi_string)

I started with $100 and now have $194.87171000000012. Awesome!


### №11 Can Python handle everything?

Which one of these will throw an error?

* `'I can add integers, like ' + str(5) + ' to strings.'`
* `'I said ' + ('Hey ' * 2) + 'Hey!'`
* *`'The correct answer to this multiple choice exercise is answer number ' + 2`*
* `True + False`

## Chapter 2. Python Lists

## 3. Python Lists

### Python Data Types

* `float` - real numbers
* `int` - integer numbers
* `str` - string, text
* `bool` - True, False
* Each variable represents single value

In [20]:
height = 1.73
tall = True

### Problem

* Data Science: many data points
* Height of entire family
* Inconvenient

In [21]:
height1 = 1.73
height2 = 1.68
height3 = 1.71
height4 = 1.89

### Python List [a, b, c]

* Name a collection of values
* Contain any type
* Contain different types

In [22]:
[1.73, 1.68, 1.71, 1.89]

[1.73, 1.68, 1.71, 1.89]

In [23]:
fam = [1.73, 1.68, 1.71, 1.89]
fam

[1.73, 1.68, 1.71, 1.89]

In [24]:
fam = ['liz', 1.73, 'emma', 1.68, 'mom', 1.71, 'dad', 1.89]
fam

['liz', 1.73, 'emma', 1.68, 'mom', 1.71, 'dad', 1.89]

In [25]:
fam2 = [['liz', 1.73],
        ['emma', 1.68],
        ['mom', 1.71],
        ['dad', 1.89]]
fam2 

[['liz', 1.73], ['emma', 1.68], ['mom', 1.71], ['dad', 1.89]]

### List type

* Specific functionality
* Specific behavior

In [26]:
type(fam)

list

In [27]:
type(fam2) 

list

### №12 Create list with different types

* Finish the line of code that creates the `areas` list. Build the list so that the list first contains the name of each room as a string and then its area. In other words, add the strings `'hallway'`, `'kitchen'` and `'bedroom'` at the appropriate locations
* Print `areas` again; is the printout more informative this time?

In [28]:
hall = 11.25
kit = 18.0
liv = 20.0
bed = 10.75
bath = 9.50

areas = ['hallway', hall, 'kitchen', kit, 'living room', liv, 'bedroom', bed, 'bathroom', bath]
print(areas)

['hallway', 11.25, 'kitchen', 18.0, 'living room', 20.0, 'bedroom', 10.75, 'bathroom', 9.5]


### №13 Select the valid list

Can you tell which ones of the following lines of Python code are valid ways to build a list?

A. `[1, 3, 4, 2]` B. `[[1, 2, 3], [4, 5, 7]]` C. `[1 + 2, 'a' * 5, 3]`

* *A, B and C*
* B
* B and C
* C

### №14 List of lists

* Finish the list of lists so that it also contains the bedroom and bathroom data. Make sure you enter these in order!
* Print out `house`; does this way of structuring your data make more sense?
* Print out the type of house. Are you still dealing with a list?

In [29]:
hall = 11.25
kit = 18.0
liv = 20.0
bed = 10.75
bath = 9.50

house = [['hallway', hall],
         ['kitchen', kit],
         ['living room', liv],
         ['bedroom', bed],
         ['bathroom', bath]]

print(house)
print(type(house))

[['hallway', 11.25], ['kitchen', 18.0], ['living room', 20.0], ['bedroom', 10.75], ['bathroom', 9.5]]
<class 'list'>


## 4. Subsetting Lists

### Subseting lists

```python
In [1]: fam = ['liz', 1.73, 'emma', 1.68, 'mom', 1.71, 'dad', 1.89]
In [2]: fam
Out[2]: ['liz', 1.73, 'emma', 1.68, 'mom', 1.71, 'dad', 1.89]
   index:  0      1      2      3     4      5     6      7
                     'zero-based indexing'
```

In [30]:
fam[3]

1.68

In [31]:
fam[6]

'dad'

### Subseting lists

```python
In [1]: fam = ['liz', 1.73, 'emma', 1.68, 'mom', 1.71, 'dad', 1.89]
In [2]: fam
Out[2]: ['liz', 1.73, 'emma', 1.68, 'mom', 1.71, 'dad', 1.89]
   index:  0      1      2      3     4      5     6      7
          -8     -7     -6     -5    -4     -3    -2     -1
```

In [32]:
fam[-1]

1.89

In [33]:
fam[-2] == fam[6]

True

### List slicing

```python
[ start : end ]
inclusive exclusive
```

In [34]:
fam[3:5]

[1.68, 'mom']

In [35]:
fam[1:4]

[1.73, 'emma', 1.68]

In [36]:
fam[:4]

['liz', 1.73, 'emma', 1.68]

In [37]:
fam[5:]

[1.71, 'dad', 1.89]

### №15 Subset and conquer

* Print out the second element from the `areas` list (it has the value `11.25`)
* Subset and print out the last element of `areas`, being `9.50`. Using a negative index makes sense here!
* Select the number representing the area of the living room (`20.0`) and print it out

In [38]:
areas = ['hallway', 11.25,
         'kitchen', 18.0,
         'living room', 20.0,
         'bedroom', 10.75,
         'bathroom', 9.50]

print(areas[1])
print(areas[-1])
print(areas[5])

11.25
9.5
20.0


### №16 Subset and calculate

* Using a combination of list subsetting and variable assignment, create a new variable, `eat_sleep_area`, that contains the sum of the area of the kitchen and the area of the bedroom
* Print the new variable `eat_sleep_area`

In [39]:
areas = ['hallway', 11.25,
         'kitchen', 18.0,
         'living room', 20.0,
         'bedroom', 10.75,
         'bathroom', 9.50]

eat_sleep_area = areas[3] + areas[-3]
print(eat_sleep_area)

28.75


### №17 Slicing and dicing

* Use slicing to create a list, `downstairs`, that contains the first 6 elements of `areas`
* Do a similar thing to create a new variable, `upstairs`, that contains the last 4 elements of `areas`
* Print both `downstairs` and `upstairs` using `print()`

In [40]:
areas = ['hallway', 11.25,
         'kitchen', 18.0,
         'living room', 20.0,
         'bedroom', 10.75,
         'bathroom', 9.50]

downstairs = areas[0:6]
print(downstairs)

upstairs = areas[6:]
print(upstairs)

['hallway', 11.25, 'kitchen', 18.0, 'living room', 20.0]
['bedroom', 10.75, 'bathroom', 9.5]


### №18 Slicing and dicing (2)

* Create `downstairs` again, as the first `6` elements of `areas`. This time, simplify the slicing by omitting the `begin` index
* Create `upstairs` again, as the last `4` elements of `areas`. This time, simplify the slicing by omitting the `end` index

In [41]:
areas = ['hallway', 11.25,
         'kitchen', 18.0,
         'living room', 20.0,
         'bedroom', 10.75,
         'bathroom', 9.50]

downstairs = areas[:6]

upstairs = areas[6:]

### №19 Subsetting lists of lists

What will `house[-1][1]` return? 

* A float: the kitchen area
* A string: `'kitchen'`
* *A float: the bathroom area*
* A string: `'bathroom'`

In [42]:
house[-1][1]

9.5

## 5. Manipulating Lists

### List Manipulation
* Change list elements
* Add list elements
* Remove list elements

### Changing list elements

In [43]:
fam = ['liz', 1.73, 'emma', 1.68, 'mom', 1.71, 'dad', 1.89]
fam

['liz', 1.73, 'emma', 1.68, 'mom', 1.71, 'dad', 1.89]

In [44]:
fam[7] = 1.86
fam

['liz', 1.73, 'emma', 1.68, 'mom', 1.71, 'dad', 1.86]

In [45]:
fam[0:2] = ['lisa', 1.74]
fam

['lisa', 1.74, 'emma', 1.68, 'mom', 1.71, 'dad', 1.86]

### Adding and removing elements

In [46]:
fam + ['me', 1.79]

['lisa', 1.74, 'emma', 1.68, 'mom', 1.71, 'dad', 1.86, 'me', 1.79]

In [47]:
fam_ext = fam + ['me', 1.79]

In [48]:
del(fam[2])
fam

['lisa', 1.74, 1.68, 'mom', 1.71, 'dad', 1.86]

In [49]:
del(fam[2])
fam

['lisa', 1.74, 'mom', 1.71, 'dad', 1.86]

### Behind the scenes (1)

In [50]:
x = ['a', 'b', 'c']

y = x
y[1] = 'z'

y

['a', 'z', 'c']

In [51]:
x

['a', 'z', 'c']

### Behind the scenes (2)

In [52]:
x = ['a', 'b', 'c']

y = list(x)
y = x[:]
y[1] = 'z'

x

['a', 'b', 'c']

In [53]:
y

['a', 'z', 'c']

### №20 Replace list elements

* Update the area of the bathroom area to be `10.50` square meters instead of `9.50`
* Make the areas list more trendy! Change `'living room'` to `'chill zone'`

In [54]:
areas = ['hallway', 11.25,
         'kitchen', 18.0,
         'living room', 20.0,
         'bedroom', 10.75,
         'bathroom', 9.50]

areas[-1] = 10.50
areas[4] = 'chill zone'

### №21 Extend a list

* Use the + operator to paste the list `['poolhouse', 24.5]` to the end of the `areas` list. Store the resulting list as `areas_1`
* Further extend `areas_1` by adding data on your garage. Add the string `'garage'` and float `15.45`. Name the resulting list `areas_2`

In [55]:
areas = ['hallway', 11.25,
         'kitchen', 18.0,
         'chill zone', 20.0,
         'bedroom', 10.75,
         'bathroom', 10.50]

areas_1 = areas + ['poolhouse', 24.5]
areas_2 = areas_1 + ['garage', 15.45]

### №22 Delete list elements

Which of the code chunks will do the job for us?

* `del(areas[10]); del(areas[11])`
* `del(areas[10:11])`
* *`del(areas[-4:-2])`*
* `del(areas[-3]); del(areas[-4])`

### №23 Inner workings of lists

* Change the second command, that creates the variable `areas_copy`, such that `areas_copy` is an explicit copy of `areas`. After your edit, changes made to `areas_copy` shouldn't affect `areas`

In [56]:
areas = [11.25, 18.0, 20.0, 10.75, 9.50]

areas_copy = list(areas)
areas_copy[0] = 5.0

print(areas)

[11.25, 18.0, 20.0, 10.75, 9.5]


## Chapter 3. Functions and Packages

## 6. Functions

### Functions

* Nothing new!
* `type()`
* Piece of reusable code
* Solves particular task
* Call function instead of writing code yourself

### Example

In [57]:
fam = [1.73, 1.68, 1.71, 1.89]
fam

[1.73, 1.68, 1.71, 1.89]

In [58]:
max(fam)

1.89

In [59]:
tallest = max(fam)
tallest

1.89

### round()

In [60]:
round(1.68, 1)

1.7

In [61]:
round(1.68)

2

In [62]:
help(round)

Help on built-in function round in module builtins:

round(number, ndigits=None)
    Round a number to a given precision in decimal digits.
    
    The return value is an integer if ndigits is omitted or None.  Otherwise
    the return value has the same type as the number.  ndigits may be negative.



### Find functions

* How to know?
* Standard task -> probably function exists!
* The internet is your friend

### №24 Familiar functions

* Use `print()` in combination with `type()` to print out the type of `var1`.
* Use `len()` to get the length of the list `var1`. Wrap it in a `print()` call to directly print it out
* Use `int()` to convert `var2` to an integer. Store the output as `out2`

In [63]:
var1 = [1, 2, 3, 4]
var2 = True

print(type(var1))
print(len(var1))

out2 = int(var2)

<class 'list'>
4


### №25 Help!

Use the Shell on the right to open up the documentation on `complex()`. Which of the following statements is true?

* `complex()` takes exactly two arguments: `real` and `[, imag]`
* `complex()` takes two arguments: `real` and `imag`. Both these arguments are required
* *`complex()` takes two arguments: `real` and `imag`. `real` is a required argument, imag is an optional argument*
* `complex()` takes two arguments: `real` and `imag`. If you don't specify `imag`, it is set to 1 by Python

In [64]:
help(complex())

Help on complex object:

class complex(object)
 |  complex(real=0, imag=0)
 |  
 |  Create a complex number from a real part and an optional imaginary part.
 |  
 |  This is equivalent to (real + imag*1j) where imag defaults to 0.
 |  
 |  Methods defined here:
 |  
 |  __abs__(self, /)
 |      abs(self)
 |  
 |  __add__(self, value, /)
 |      Return self+value.
 |  
 |  __bool__(self, /)
 |      self != 0
 |  
 |  __divmod__(self, value, /)
 |      Return divmod(self, value).
 |  
 |  __eq__(self, value, /)
 |      Return self==value.
 |  
 |  __float__(self, /)
 |      float(self)
 |  
 |  __floordiv__(self, value, /)
 |      Return self//value.
 |  
 |  __format__(...)
 |      complex.__format__() -> str
 |      
 |      Convert to a string according to format_spec.
 |  
 |  __ge__(self, value, /)
 |      Return self>=value.
 |  
 |  __getattribute__(self, name, /)
 |      Return getattr(self, name).
 |  
 |  __getnewargs__(...)
 |  
 |  __gt__(self, value, /)
 |      Return self>v

### №26 Multiple arguments

* Use `+` to merge the contents of `first` and `second` into a new list: f`ull`
* Call `sorted()` on full and specify the reverse argument to be `True`. Save the sorted list as `full_sorted`
* Finish off by printing out `full_sorted`

In [65]:
first = [11.25, 18.0, 20.0]
second = [10.75, 9.50]

full = first + second

full_sorted = sorted(full, reverse = True)
print(full_sorted)

[20.0, 18.0, 11.25, 10.75, 9.5]


## 7. Methods

### Built-in Functions

* Maximum of list: `max()`
* Length of list or string: `len()`
* Get index in list: ?
* Reversing a list: ?

### Back 2 Basics

* Methods: Functions that belong to object

In [66]:
sister = 'liz' 
sister.replace('l', 'L')

'Liz'

In [67]:
sister.capitalize()

'Liz'

In [68]:
height = 1.73
height.conjugate()

1.73

### `list` methods

In [69]:
fam = ['liz', 1.73, 'emma', 1.68, 'mom', 1.71, 'dad', 1.89]
fam

['liz', 1.73, 'emma', 1.68, 'mom', 1.71, 'dad', 1.89]

In [70]:
 fam.index('mom')

4

In [71]:
fam.count(1.73)

1

### Methods

* Everything = object
* Object have methods associated, depending on type

In [72]:
 sister.replace('z', 'sa') 

'lisa'

In [73]:
# AttributeError
fam.replace('mom', 'mommy') 

AttributeError: 'list' object has no attribute 'replace'

In [90]:
sister.index('z')

2

In [91]:
fam.index('mom') 

4

### Methods (2)

In [92]:
fam

['liz', 1.73, 'emma', 1.68, 'mom', 1.71, 'dad', 1.89]

In [93]:
fam.append('me')
fam 

['liz', 1.73, 'emma', 1.68, 'mom', 1.71, 'dad', 1.89, 'me']

In [94]:
fam.append(1.79)
fam

['liz', 1.73, 'emma', 1.68, 'mom', 1.71, 'dad', 1.89, 'me', 1.79]

### Summary

* Functions
```python
In [11]: type(fam)
Out[11]: list
```
* Methods: call functions on objects
```python
In [12]: fam.index('dad')
Out[12]: 6
```

### №27 String Methods

* Use the `upper()` method on `place` and store the result in `place_up`. Use the syntax for calling methods that you learned in the previous video
* Print out place and `place_up`. Did both change?
* Print out the number of o's on the variable `place` by calling `count()` on `place` and passing the letter `'o'` as an input to the method. We're talking about the variable `place`, not the word `'place'`!

In [95]:
place = 'poolhouse'

place_up = place.upper()

print(place); print(place_up)
print(place.count('o'))

poolhouse
POOLHOUSE
3


### №28 List Methods

* Use the `index()` method to get the index of the element in `areas` that is equal to `20.0`. Print out this index
* Call `count()` on `areas` to find out how many times `9.50` appears in the list. Again, simply print out this number

In [96]:
areas = [11.25, 18.0, 20.0, 10.75, 9.50]

print(areas.index(20.0))
print(areas.count(9.50))

2
1


### №29 List Methods (2)

* Use `append()` twice to add the size of the poolhouse and the garage again: `24.5` and `15.45`, respectively. Make sure to add them in this order
* Print out `areas`
* Use the `reverse()` method to reverse the order of the elements in `areas`
* Print out `areas` once more

In [97]:
areas = [11.25, 18.0, 20.0, 10.75, 9.50]

areas.append(24.5)
areas.append(15.45)

print(areas)

areas.reverse()
print(areas)

[11.25, 18.0, 20.0, 10.75, 9.5, 24.5, 15.45]
[15.45, 24.5, 9.5, 10.75, 20.0, 18.0, 11.25]


## 8. Packages

### Motivation

* Functions and methods are powerful
* All code in Python distribution?
    * Huge code base: messy
    * Lots of code you won’t use
    * Maintenance problem

### Packages

* Directory of Python Scripts
* Each script = module
* Specify functions, methods, types
* Thousands of packages available
    * Numpy
    * Matplotlib
    * Scikit-learn

### Install package

* http://pip.readthedocs.org/en/stable/installing/
* Download `get-pip.py`
* Terminal:
    * `python3 get-pip.py`
    * `pip3 install numpy`

### Import package

In [98]:
# NameError
import numpy
array([1, 2, 3])

NameError: name 'array' is not defined

In [99]:
numpy.array([1, 2, 3])

array([1, 2, 3])

In [100]:
import numpy as np
np.array([1, 2, 3])

array([1, 2, 3])

In [101]:
from numpy import array
array([1, 2, 3])

array([1, 2, 3])

### from numpy import array

**my_script.py**

```python
from numpy import array

fam = ['liz', 1.73, 'emma', 1.68, 'mom', 1.71, 'dad', 1.89]

...

fam_ext = fam + ['me', 1.79]

...

print(str(len(fam_ext)) + ' elements in fam_ext')

...

np_fam = array(fam_ext)
```

### import numpy

**my_script.py**

```python
import numpy

fam = ['liz', 1.73, 'emma', 1.68, 'mom', 1.71, 'dad', 1.89]

...

fam_ext = fam + ['me', 1.79]

...

print(str(len(fam_ext)) + ' elements in fam_ext')

...

np_fam = numpy.array(fam_ext)
```

### №30 Import package

* Import the `math` package. Now you can access the constant `pi` with `math.pi`
* Calculate the circumference of the circle and store it in `C`
* Calculate the area of the circle and store it in `A`

In [102]:
r = 0.43

import math

C = math.pi * r * 2
A = math.pi * r ** 2

print('Circumference: ' + str(C))
print('Area: ' + str(A))

Circumference: 2.701769682087222
Area: 0.5808804816487527


### №31 Selective import

* Perform a selective import from the `math` package where you only import the `radians` function
* Calculate the distance travelled by the Moon over 12 degrees of its orbit. Assign the result to `dist`. You can calculate this as `r * phi`, where `r` is the radius and `phi` is the angle in radians. To convert an angle in degrees to an angle in radians, use the `radians()` function, which you just imported
* Print out `dist`

In [103]:
r = 192500

from math import radians

dist = r * radians(12)
print(dist)

40317.10572106901


### №32 Different ways of importing

Suppose you want to use the function `inv()`, which is in the `linalg` subpackage of the `scipy` package. You want to be able to use this function as follows:
```python
my_inv([[1,2], [3,4]])
```
Which `import` statement will you need in order to run the above code without an error?

* `import scipy`
* `import scipy.linalg`
* `from scipy.linalg import my_inv`
* *`from scipy.linalg import inv as my_inv`*

## Chapter 4. NumPy

## 9. NumPy

### Lists Recap

* Powerful
* Collection of values
* Hold different types
* Change, add, remove
* Need for Data Science
    * Mathematical operations over collections
    * Speed

### Illustration

In [104]:
height = [1.73, 1.68, 1.71, 1.89, 1.79]
height

[1.73, 1.68, 1.71, 1.89, 1.79]

In [105]:
weight = [65.4, 59.2, 63.6, 88.4, 68.7]
weight

[65.4, 59.2, 63.6, 88.4, 68.7]

In [106]:
# TypeError
weight / height ** 2

TypeError: unsupported operand type(s) for ** or pow(): 'list' and 'int'

### Solution: NumPy

* Numeric Python
* Alternative to Python List: NumPy Array
* Calculations over entire arrays
* Easy and Fast
* Installation
    * In the terminal: `pip3 install numpy`

### NumPy

In [107]:
import numpy as np

np_height = np.array(height)
np_height

array([1.73, 1.68, 1.71, 1.89, 1.79])

In [108]:
np_weight = np.array(weight)
np_weight

array([65.4, 59.2, 63.6, 88.4, 68.7])

In [109]:
bmi = np_weight / np_height ** 2
bmi

array([21.85171573, 20.97505669, 21.75028214, 24.7473475 , 21.44127836])

### Comparison

In [110]:
# TypeError
height = [1.73, 1.68, 1.71, 1.89, 1.79]
weight = [65.4, 59.2, 63.6, 88.4, 68.7]
weight / height ** 2

TypeError: unsupported operand type(s) for ** or pow(): 'list' and 'int'

In [111]:
np_height = np.array(height)
np_weight = np.array(weight)
np_weight / np_height ** 2

array([21.85171573, 20.97505669, 21.75028214, 24.7473475 , 21.44127836])

### NumPy: remarks

In [112]:
np.array([1.0, 'is', True]) 

array(['1.0', 'is', 'True'], dtype='<U32')

In [113]:
python_list = [1, 2, 3]
numpy_array = np.array([1, 2, 3])
python_list + python_list

[1, 2, 3, 1, 2, 3]

In [114]:
numpy_array + numpy_array

array([2, 4, 6])

### NumPy Subsetting

In [115]:
bmi

array([21.85171573, 20.97505669, 21.75028214, 24.7473475 , 21.44127836])

In [116]:
bmi[1]

20.97505668934241

In [117]:
bmi > 23

array([False, False, False,  True, False])

In [118]:
bmi[bmi > 23]

array([24.7473475])

### №33 Your First NumPy Array

* Import the `numpy` package as `np`, so that you can refer to `numpy` with `np`
* Use `np.array() to create a `numpy` array from baseball. Name this array `np_baseball`
* Print out the type of `np_baseball` to check that you got it right

In [119]:
baseball = [180, 215, 210, 210, 188, 176, 209, 200]

import numpy as np

np_baseball = np.array(baseball)
print(type(np_baseball))

<class 'numpy.ndarray'>


### №34 Baseball players' height

* Create a `numpy` array from `height_in`. Name this new array `np_height_in`
* Print `np_height_in`
* Multiply `np_height_in` with `0.0254` to convert all height measurements from inches to meters. Store the new values in a new array, `np_height_m`
* Print out `np_height_m` and check if the output makes sense

In [120]:
import numpy as np

height_in = [74, 74, 72, 72, 73, 69, 69, 71, 76, 71, 73, 73, 74, 74, 69, 70, 73, 75, 78, 79]

np_height = np.array(height_in)
print(np_height)

np_height_m = np_height * 0.0254
print(np_height_m)

[74 74 72 72 73 69 69 71 76 71 73 73 74 74 69 70 73 75 78 79]
[1.8796 1.8796 1.8288 1.8288 1.8542 1.7526 1.7526 1.8034 1.9304 1.8034
 1.8542 1.8542 1.8796 1.8796 1.7526 1.778  1.8542 1.905  1.9812 2.0066]


### №35 Baseball player's BMI

* Create a numpy array from the `weight_lb` list with the correct units. Multiply by `0.453592` to go from pounds to kilograms. Store the resulting numpy array as `np_weight_kg`
* Use `np_height_m` and `np_weight_kg` to calculate the BMI of each player. Use the following equation:
$$ BMI = \frac{weight(kg)}{height(m)^2} $$
* Save the resulting numpy array as `bmi`
* Print out `bmi`

In [121]:
import numpy as np

height_in = [74, 74, 72, 72, 73, 69, 69, 71, 76, 71, 73, 73, 74, 74, 69, 70, 73, 75, 78, 79]
weight_lb = [180, 215, 210, 210, 188, 176, 209, 200, 231, 180,
             188, 180, 185, 160, 180, 185, 189, 185, 219, 230]

np_height_m = np.array(height_in) * 0.0254
np_weight_kg = np.array(weight_lb) * 0.453592

bmi = np_weight_kg / np_height_m**2
print(bmi)

[23.11037639 27.60406069 28.48080465 28.48080465 24.80333518 25.99036864
 30.86356276 27.89402921 28.11789135 25.10462629 24.80333518 23.7478741
 23.75233129 20.54255679 26.58105883 26.54444207 24.93526781 23.12315842
 25.30771077 25.91025019]


### №36 NumPy Side Effects

Have a look at this line of code:
```python
np.array([True, 1, 2]) + np.array([3, 4, False])
```
Can you tell which code chunk builds the exact same Python object?

* `np.array([True, 1, 2, 3, 4, False])`
* *`np.array([4, 3, 0]) + np.array([0, 2, 2])`*
* `np.array([1, 1, 2]) + np.array([3, 4, -1])`
* `np.array([0, 1, 2, 3, 4, 5])`

### №37 Subsetting NumPy Arrays

* Subset `np_weight_lb` by printing out the element at index 5
* Print out a sub-array of `np_height_in` that contains the elements at index 15 up to and including index 18

In [122]:
import numpy as np

np_weight_lb = np.array(weight_lb)
np_height_in = np.array(height_in)

print(np_weight_lb[5])
print(np_height_in[15:19])

176
[70 73 75 78]


## 10. 2D NumPy Arrays

### Type of NumPy Arrays

In [123]:
import numpy as np

np_height = np.array([1.73, 1.68, 1.71, 1.89, 1.79])
np_weight = np.array([65.4, 59.2, 63.6, 88.4, 68.7])

type(np_height)

numpy.ndarray

In [124]:
type(np_weight) 

numpy.ndarray

### 2D NumPy Arrays

In [125]:
np_2d = np.array([[1.73, 1.68, 1.71, 1.89, 1.79],
                  [65.4, 59.2, 63.6, 88.4, 68.7]])
np_2d

array([[ 1.73,  1.68,  1.71,  1.89,  1.79],
       [65.4 , 59.2 , 63.6 , 88.4 , 68.7 ]])

In [126]:
np_2d.shape

(2, 5)

In [127]:
np.array([[1.73, 1.68, 1.71, 1.89, 1.79],
          [65.4, 59.2, 63.6, 88.4, '68.7']])

array([['1.73', '1.68', '1.71', '1.89', '1.79'],
       ['65.4', '59.2', '63.6', '88.4', '68.7']], dtype='<U32')

### Subsetting

In [128]:
np_2d[0]

array([1.73, 1.68, 1.71, 1.89, 1.79])

In [129]:
np_2d[0][2]

1.71

In [130]:
np_2d[0,2]

1.71

In [131]:
np_2d[:,1:3]

array([[ 1.68,  1.71],
       [59.2 , 63.6 ]])

In [132]:
np_2d[1,:]

array([65.4, 59.2, 63.6, 88.4, 68.7])

### №38 Your First 2D NumPy Array

* Use `np.array()` to create a 2D numpy array from `baseball`. Name it `np_baseball`
* Print out the type of `np_baseball`
* Print out the `shape` attribute of `np_baseball`. Use `np_baseball.shape`

In [133]:
import numpy as np

baseball = [[180, 78.4],
            [215, 102.7],
            [210, 98.5],
            [188, 75.2]]

np_baseball = np.array(baseball)
print(type(np_baseball))
print(np_baseball.shape)

<class 'numpy.ndarray'>
(4, 2)


### №39 Baseball data in 2D form

* Use `np.array()` to create a 2D numpy array from `baseball`. Name it `np_baseball`
* Print out the `shape` attribute of `np_baseball`

In [134]:
import numpy as np

np_baseball = np.array(baseball)
print(np_baseball.shape)

(4, 2)


### №40 Subsetting 2D NumPy Arrays

* Print out the 50th row of `np_baseball`
* Make a new variable, `np_weight_lb`, containing the entire second column of `np_baseball`
* Select the height (first column) of the 124th baseball player in `np_baseball` and print it out

In [137]:
import numpy as np

baseball = pd.read_csv('Introduction_to_Python/baseball.csv').values.tolist()

np_baseball = np.array(baseball)
print(np_baseball[49, :])

np_weight_lb = np_baseball[:, 1]
print(np_baseball[123][0])

[ 70.   195.    30.69]
75.0


### №41 2D Arithmetic

* You managed to get hold of the changes in height, weight and age of all baseball players. It is available as a 2D numpy array, `updated`. Add `np_baseball` and `updated` and print out the result
* You want to convert the units of height and weight to metric (meters and kilograms respectively). As a first step, create a `numpy` array with three values: `0.0254`, `0.453592` and `1`. Name this array `conversion`
* Multiply `np_baseball` with conversion and print out the result

In [138]:
import numpy as np

updated = pd.read_csv('Introduction_to_Python/updated.csv').values

np_baseball = np.array(baseball)
print(np_baseball + updated, '\n')

conversation = np.array([.0254, .453592, 1])
print(np_baseball * conversation)

[[ 75.2303559  168.83775102  23.99      ]
 [ 75.02614252 231.09732309  35.69      ]
 [ 73.1544228  215.08167641  31.78      ]
 ...
 [ 76.09349925 209.23890778  26.19      ]
 [ 75.82285669 172.21799965  32.01      ]
 [ 73.99484223 203.14402711  28.92      ]] 

[[ 1.8796  81.64656 22.99   ]
 [ 1.8796  97.52228 34.69   ]
 [ 1.8288  95.25432 30.78   ]
 ...
 [ 1.905   92.98636 25.19   ]
 [ 1.905   86.18248 31.01   ]
 [ 1.8542  88.45044 27.92   ]]


## 11. NumPy: Basic Statistics

### Data analysis

* Get to know your data
* Little data -> simply look at it
* Big data -> ?

### City-wide survey

In [139]:
import numpy as np
np_baseball

array([[ 74.  , 180.  ,  22.99],
       [ 74.  , 215.  ,  34.69],
       [ 72.  , 210.  ,  30.78],
       ...,
       [ 75.  , 205.  ,  25.19],
       [ 75.  , 190.  ,  31.01],
       [ 73.  , 195.  ,  27.92]])

### NumPy

* `sum()`, `sort()`, ...
* Enforce single data type: speed!

In [140]:
np.mean(np_baseball[:,0])

73.6896551724138

In [141]:
np.median(np_baseball[:,0])

74.0

In [142]:
np.corrcoef(np_baseball[:,0], np_baseball[:,1])

array([[1.        , 0.53153932],
       [0.53153932, 1.        ]])

In [143]:
np.std(np_baseball[:,0])

2.312791881046546

### Generate data

In [144]:
height = np.round(np.random.normal(1.75, 0.20, 5000), 2)
weight = np.round(np.random.normal(60.32, 15, 5000), 2)

np_city = np.column_stack((height, weight))

### №42 Average versus median

* Create numpy array `np_height_in` that is equal to first column of `np_baseball`
* Print out the mean of `np_height_in`
* Print out the median of `np_height_in`

In [145]:
import numpy as np

np_height_in = np_baseball[:, 0]
print(np.mean(np_height_in))
print(np.median(np_height_in))

73.6896551724138
74.0


### №43 Explore the baseball data

* The code to print out the mean height is already included. Complete the code for the median height. Replace None with the correct code
* Use `np.std()` on the first column of `np_baseball` to calculate `stddev`. Replace `None` with the correct code
* Do big players tend to be heavier? Use `np.corrcoef()` to store the correlation between the first and second column of `np_baseball` in `corr`. Replace `None` with the correct code

In [146]:
import numpy as np

avg = np.mean(np_baseball[:,0])
print('Average: ' + str(avg))

med = np.median(np_baseball[:, 0])
print('Median: ' + str(med))

stddev = np.std(np_baseball[:, 0])
print('Standard Deviation: ' + str(stddev))

corr = np.corrcoef(np_baseball[:, 0], np_baseball[:, 1])
print('Correlation: ' + str(corr))

Average: 73.6896551724138
Median: 74.0
Standard Deviation: 2.312791881046546
Correlation: [[1.         0.53153932]
 [0.53153932 1.        ]]


### №44 Blend it all together

* Convert `heights` and `positions`, which are regular lists, to numpy arrays. Call them `np_heights` and `np_positions`
* Extract all the heights of the goalkeepers. You can use a little trick here: use `np_positions == 'GK'` as an index for `np_heights`. Assign the result to `gk_heights`
* Extract all the heights of all the other players. This time use np_positions != 'GK' as an index for `np_heights`. Assign the result to `other_heights`
* Print out the median height of the goalkeepers using `np.median()`. Replace `None` with the correct code
* Do the same for the other players. Print out their median height. Replace `None` with the correct code

In [147]:
import numpy as np

positions = list(pd.read_csv('Introduction_to_Python/positions_heights.csv')['positions'])
heights = list(pd.read_csv('Introduction_to_Python/positions_heights.csv')['heights'])

np_positions = np.array(positions)
np_heights = np.array(heights)

gk_heights = np_heights[np_positions == 'GK']
print('Median height of goalkeepers: ' + str(np.median(gk_heights)))

other_heights = np_heights[np_positions != 'GK']
print('Median height of other players: ' + str(np.median(other_heights)))

Median height of goalkeepers: 188.0
Median height of other players: 181.0
