# Introduction to Python

## Chapter 1: Python Basics

### Hello Python!

#### Python
* General purpose: build anything
* Open source! Free!
* Python packages, also for data science
    * Many applications and fields
* Version 3.x

#### IPython Shell
* Interactive Python
* Part of the broader Jupyter ecosystem

#### Python Script
* Text files - .py
* List of Python commands
* Similar to typing in the IPython Shell

In [1]:
print(5 / 8)

0.625


In [2]:
print(7 + 10) # You can add comments too

17


### Variables and Types

#### Variable
* Specific, case-sensitive name
* Call up value through variable name
* Makes your code reproducible

In [4]:
height = 1.79
weight = 68.7

bmi = weight / height ** 2
bmi

21.44127836209856

#### Python Types

In [5]:
type(bmi)

float

In [6]:
x = 'body mass index'
type(x)

str

In [7]:
z = True
type(z)

bool

## Chapter 2: Python Lists

### Python Lists

#### Python Data Types
* float - real numbers
* int - integer numbers
* str - string, text
* bool - True, False

#### Problem
* Data Science: many data points
* Height of entire family

#### Python List
* `[a, b, c]`

In [8]:
fam = [1.73, 1.68, 1.71, 1.89]
fam

[1.73, 1.68, 1.71, 1.89]

* Name a collection of values
* Contain any type
* Contain different types

In [9]:
fam = ['liz', 1.73, 'emma', 1.68, 'mom', 1.71, 'dad', 1.89]
fam

['liz', 1.73, 'emma', 1.68, 'mom', 1.71, 'dad', 1.89]

In [11]:
fam2 = [['liz', 1.74],
        ['emma', 1.68],
        ['mom', 1.71],
        ['dad', 1.89]]
fam2

[['liz', 1.74], ['emma', 1.68], ['mom', 1.71], ['dad', 1.89]]

In [12]:
type(fam2)

list

### Subsetting Lists

#### Subsetting lists

In [13]:
fam = ['liz', 1.73, 'emma', 1.68, 'mom', 1.71, 'dad', 1.89]
fam[3]

1.68

In [14]:
fam[6]

'dad'

#### List slicing

In [15]:
fam

['liz', 1.73, 'emma', 1.68, 'mom', 1.71, 'dad', 1.89]

In [16]:
fam[3:5]

[1.68, 'mom']

In [17]:
fam[:4]

['liz', 1.73, 'emma', 1.68]

In [21]:
fam[3:]

[1.68, 'mom', 1.71, 'dad', 1.89]

### Manipulating Lists

#### List Manipulation
* Change list elements
* Add list elements
* Remove list elements

#### Changing list elements

In [22]:
fam = ['liz', 1.73, 'emma', 1.68, 'mom', 1.71, 'dad', 1.89]
fam

['liz', 1.73, 'emma', 1.68, 'mom', 1.71, 'dad', 1.89]

In [23]:
fam[7] = 1.86
fam

['liz', 1.73, 'emma', 1.68, 'mom', 1.71, 'dad', 1.86]

In [24]:
fam[0:2] = ['lisa', 1.74]
fam

['lisa', 1.74, 'emma', 1.68, 'mom', 1.71, 'dad', 1.86]

#### Adding and removing elements

In [25]:
fam + ['me', 1.79]

['lisa', 1.74, 'emma', 1.68, 'mom', 1.71, 'dad', 1.86, 'me', 1.79]

In [27]:
fam_ext = fam + ['me', 1.79]
fam_ext

['lisa', 1.74, 'emma', 1.68, 'mom', 1.71, 'dad', 1.86, 'me', 1.79]

In [28]:
del(fam[2])
fam

['lisa', 1.74, 1.68, 'mom', 1.71, 'dad', 1.86]

In [29]:
del(fam[2])
fam

['lisa', 1.74, 'mom', 1.71, 'dad', 1.86]

#### Behind the scenes (1)
* You're storing the list in your computer memory and store the 'address' of that list, so where the list is in your computer memory, in x
* This means that x does not actually contain all the list elements, it rather contains a reference to the list
* This idea becomes important when you start copying lists

In [93]:
x = ["a", "b", "c"]

y = x
y[1] = "z"

y

['a', 'z', 'c']

In [94]:
x

['a', 'z', 'c']

* Changing the value of the 2nd element in y also changes the values of the element in x
* That is because you copied the reference to the list, not the actual values themselves
* When you're updating an element in the list, it's one and the same list in the computer memory you're changing, because both `x` and `y` point to the same list

#### Behind the scenes (2)
* If you want to create a list y that points to a new list in the memory, with the same values, you'll need to use something else than the equals sign

In [95]:
x = ["a", "b", "c"]

y = list(x)
y = x[:]

y[1] = "z"
y

['a', 'z', 'c']

In [96]:
x

['a', 'b', 'c']

## Chapter 3: Functions and Packages

### Functions

#### Functions
* Nothing new!
* type()
* Piece of reusable code
* Solves particular task
* Call function instead of writing code yourself

#### max()

In [34]:
fam = [1.73, 1.68, 1.71, 1.89]
fam

[1.73, 1.68, 1.71, 1.89]

In [35]:
max(fam)

1.89

In [36]:
tallest = max(fam)
tallest

1.89

#### round()

In [37]:
round(1.68,1)

1.7

In [38]:
round(1.68)

2

In [39]:
help(round)

Help on built-in function round in module builtins:

round(...)
    round(number[, ndigits]) -> number
    
    Round a number to a given precision in decimal digits (default 0 digits).
    This returns an int when called with one argument, otherwise the
    same type as the number. ndigits may be negative.



#### Find functions
* How to know?
* Standard task -> probably a function exists!
* The internet is your friend

### Methods

#### Built-in Functions
* Maximum of a list: max()
* Length of list or string: len()
* Get index in list: ?
* Reversing a list: ?

#### Back 2 Basics
* Methods: Functions that belong to objects

#### list methods

In [42]:
fam = ['liz', 1.73, 'emma', 1.68, 'mom', 1.71, 'dad', 1.89]
fam

['liz', 1.73, 'emma', 1.68, 'mom', 1.71, 'dad', 1.89]

In [43]:
fam.index('mom')

4

In [44]:
fam.count(1.73)

1

#### str methods

In [45]:
sister = 'liz'
sister

'liz'

In [46]:
sister.capitalize()

'Liz'

In [47]:
sister.replace('z', 'sa')

'lisa'

#### Methods
* Everything = object
* Objects have methods associated, depending on type

In [48]:
sister.index('z')

2

In [49]:
fam.index('mom')

4

#### Methods (2)
* Some methods can change the objects they are called on

In [50]:
fam

['liz', 1.73, 'emma', 1.68, 'mom', 1.71, 'dad', 1.89]

In [51]:
fam.append('me')
fam

['liz', 1.73, 'emma', 1.68, 'mom', 1.71, 'dad', 1.89, 'me']

In [52]:
fam.append(1.79)
fam

['liz', 1.73, 'emma', 1.68, 'mom', 1.71, 'dad', 1.89, 'me', 1.79]

#### Summary
* Functions

In [53]:
type(fam)

list

* Methods: call functions on objects

In [54]:
fam.index('dad')

6

### Packages

#### Motivation 
* Functions and methods are powerful
* All code in Python distribution?
    * Huge code base: messy
    * Lots of code you won't use
    * Maintenance problem
    
#### Packages
* Directory of Python Scripts
* Each script = module
* Specify functions, methods, types
* Thousands of packages available
    * Numpy
    * Matplotlib
    * Scikit-learn
    
#### Install package
* Can use pip
* Terminal:
    * `python3 get-pip.py`
    * `pip3 install numpy`
* You have to use the commands `python3` and `pip3` here to tell the system we're working with Python version 3.

#### Import package

In [56]:
import numpy as np

np.array([1, 2, 3])

array([1, 2, 3])

In [58]:
# Can also import specific functions from numpy
from numpy import array

array([1, 2, 3])

array([1, 2, 3])

* Importing numpy as np is preferred though especially if you have a large piece of code, it might not be clear that the `array` function you're using is specifically from `numpy`. That is why using `np.array` is preferred.

## Chapter 4: NumPy

### NumPy

#### Lists Recap
* Powerful
* Collection of values
* Hold different types
* Change, add, remove
* Need for Data Science
    * Mathematical operations over collections
    * Speed
    
#### Illustration

In [59]:
height = [1.73, 1.68, 1.89, 1.79]
height

[1.73, 1.68, 1.89, 1.79]

In [60]:
weight = [65.4, 59.2, 88.4, 68.7]
weight

[65.4, 59.2, 88.4, 68.7]

In [61]:
weight / height ** 2

TypeError: unsupported operand type(s) for ** or pow(): 'list' and 'int'

#### Solution: Numpy
* Numeric Python
* Alternative to Python List: Numpy Array
* Calculations over entire arrays
* Easy and Fast
* Installation
    * In the terminal: `pip3 install numpy`
    
#### Numpy

In [62]:
import numpy as np

np_height = np.array(height)
np_height

array([1.73, 1.68, 1.89, 1.79])

In [63]:
np_weight = np.array(weight)
np_weight

array([65.4, 59.2, 88.4, 68.7])

In [65]:
bmi = np_weight / np_height ** 2
bmi

array([21.85171573, 20.97505669, 24.7473475 , 21.44127836])

#### Numpy: remarks

In [66]:
np.array([1.0, "is", True])

array(['1.0', 'is', 'True'], dtype='<U32')

* Numpy arrays: contain only one type

In [67]:
python_list = [1, 2, 3]
numpy_array = np.array([1, 2, 3])

In [68]:
python_list + python_list

[1, 2, 3, 1, 2, 3]

In [69]:
numpy_array + numpy_array

array([2, 4, 6])

#### Numpy Subsetting

In [70]:
bmi

array([21.85171573, 20.97505669, 24.7473475 , 21.44127836])

In [71]:
bmi[1]

20.97505668934241

In [72]:
bmi > 23

array([False, False,  True, False])

In [73]:
bmi[bmi > 23]

array([24.7473475])

### 2D Numpy Arrays

#### Type of Numpy Arrays

In [74]:
type(np_height)

numpy.ndarray

In [76]:
type(np_weight)

numpy.ndarray

In [77]:
np_2d = np.array([[1.73, 1.68, 1.89, 1.79],
                 [65.4, 59.2, 88.4, 68.7]])
np_2d

array([[ 1.73,  1.68,  1.89,  1.79],
       [65.4 , 59.2 , 88.4 , 68.7 ]])

In [80]:
np_2d.shape # 2 rows, 4 columns

(2, 4)

In [81]:
# If you change one float to be a string, all of the array elements will be coerced into a string
np.array([[1.73, 1.68, 1.89, 1.79],
                 [65.4, 59.2, 88.4, "68.7"]])

array([['1.73', '1.68', '1.89', '1.79'],
       ['65.4', '59.2', '88.4', '68.7']], dtype='<U32')

#### Subsetting

In [82]:
np_2d[0]

array([1.73, 1.68, 1.89, 1.79])

In [83]:
np_2d[0][2]

1.89

In [84]:
np_2d[0,2]

1.89

In [85]:
np_2d[:, 1:3]

array([[ 1.68,  1.89],
       [59.2 , 88.4 ]])

In [86]:
np_2d[1,:]

array([65.4, 59.2, 88.4, 68.7])

### Numpy: Basic Statistics

#### Data analysis
* Get to know your data
* Little data --> simply look at it
* Big data --> ?

#### Generate data
* Arguments for `np.random.normal()`
    * distribution mean
    * distribution standard deviation
    * number of samples

In [88]:
height = np.round(np.random.normal(1.75, 0.2, 5000), 2)
weight = np.round(np.random.normal(60.32, 15, 5000), 2)
np_city = np.column_stack((height, weight))
np_city

array([[ 2.03, 70.24],
       [ 2.1 , 77.02],
       [ 1.72, 65.07],
       ...,
       [ 1.55, 66.77],
       [ 1.37, 78.72],
       [ 1.83, 58.87]])

#### Numpy

In [89]:
np.mean(np_city[:,0])

1.749138

In [90]:
np.median(np_city[:,0])

1.75

In [91]:
np.corrcoef(np_city[:,0], np_city[:,1])

array([[1.        , 0.00164688],
       [0.00164688, 1.        ]])

In [92]:
np.std(np_city[:,0])

0.20206354682623978