# BME 4931/6938 Module 2 Programming Tutorial 
## Overview
This tutorial is split into 3 informative sections, which I think cover the most important parts to remember from this week's lecture! There is a lot of information in the lecture videos and this tutorial can serve as a nice reference point for the most important parts of the videos. 

1. Python Basics
    - A Fancy Calculator
    - Variables
    - Functions
    - Loops and Iterables
    - Conditionals
    - Typing in Python
    - Object Oriented Programming
    - Libraries and More


2. Numpy
    - Arrays
    - Copying
    - Indexing 


3. Pandas
    - Series
    - Dataframes
    - Dealing with data

### Requirements
All you need to run this tutorial is a way to open and run Jupyter Notebook files on your computer! Whether that be locally with an Anaconda or local Python installation or remotely with Google Collab, if you're looking at this file you're already ready to go! 

## 1. Python Basics 

### A Fancy Calculator
The first step to any programmer's journey in Python often starts with math. Python, as with many other programming languages, can be used as a simple calculator. For example, the code cell below will output the sum of 9 and 10 below - give it a run.

In [1]:
9 + 10

19

Addition (`+`), subtraction (`-`), multiplication (`*`), division (`/`), exponentiation (`**` - note that in python this is *not* `^`), modulus (`%`), and floor division (`//`) all exist in python and can be used to perform some rather complex math!

In [2]:
2 + 7**2 - 3.45 / 2 + 565//3 * 7 % 2

49.275

Even though python follows traditional Order of Operations, that... is quite the confusing bit of math.  I find it rather hard to follow. Python lets us use parentheses to not only change the order of operation but also to help *other people* understand your code. The cell below is equivalent to the one above, but much easier to follow.

In [3]:
2 + (7**2) - (3.45/2) + ((565//3) * 7) % 2

49.275

This process of making code more understandable, or readable, is *critical* to programming success. Code is written once, but read many times. Make it easier on those reading your code (including you in the future). In this same vein, the pound symbol (`#`) is used in python to mark anything after as a **comment**. Comments are completely ignored by python but seen by those reading your code. They are *incredibly* useful in tagging and writing your code with information about what your code is doing and why. The rest of the code in this notebook will contain comments that describe the code - and you should put them in your code too!

### Variables

Having a calculator is, admittedly, very mundane and provides little motivation for using any programming language (even Python). **Variables** are where programming begins to differentiate itself from calculators. Variables are, at their core, just names for values. There are very little restrictions on what kind of values you can "name" with variables - integers, whole numbers, text: you can even create your own kind of values that you define yourself to later assign to variables (more on that later).

In Python, you use the `=` operator (known as the **assignment operator**) to name a variable. The syntax is `name = value`. The code cell below assigns the variable "first_name" to the value "Joseph". 

In [4]:
first_name = "Joseph"

A few things to note:

1. Notice the `_` in the variable name. In order to keep things consistent and not confuse readers and Python itself, Python imposes some restrictions on variable names: they cannot contain spaces, start with a number, or contain "special" characters like ^, @, or ). Generally speaking, if you name your variables informative words separated by `_`, you wont run into this.

2. Notice the quotation marks around Joseph. Since there are so many different "things" that a variable can be, different things have their own way of being assigned to variables. If you try to remove the quotes around my name, you will receive an error message! Since variable names are references without quotations, text has to be referenced with quotations or python won't know which one you mean.

Variables are not just for show - our "calculator" math still works with them, and we can even assign new variables based on the output of our old ones! 

In [5]:
systolic_blood_pressure = 120
diastolic_blood_pressure = 80

# Using the standard equation for mean blood pressure (MBP):
#   MBP = DBP + 1/3 SBP
mean_blood_pressure = diastolic_blood_pressure + (1/3) * systolic_blood_pressure

That's weird... our code cell didn't output anything at all. Don't worry, nothing broke, we just didn't tell python to display the result anywhere! Python will quietly do what we tell it to and won't bother us with outputting anything unless we tell it to! The way we can tell it to is by using the `print` function. 

In [6]:
print(mean_blood_pressure)

120.0


There we go! We printed our result.

In summary, variables provide a great way of naming our objects and can be used as "stand-ins" for values.

### Functions

In the previous section, we calculated mean blood pressure from a formula. But what if our systolic and diastolic blood pressure changed? You may be tempted to re-assign the varaibles to the new values and re-calculate the equation - something like the below cell:

In [7]:
systolic_blood_pressure = 140
diastolic_blood_pressure = 70

# Using the standard equation for mean blood pressure (MBP):
#   MBP = DBP + 1/3 SBP
mean_blood_pressure = diastolic_blood_pressure + (1/3) * systolic_blood_pressure
print(mean_blood_pressure)

116.66666666666666


But what if we need to use another set of new values, and then another? It gets very tedious very quickly to copy-paste our formula over and over and over. Rather than doing that, we can create what are known as **functions** - instructions for python to perform specific instructions with given inputs. (Note: sometimes you'll hear these referred to as *methods* as well). The syntax for defining functions is:
- `def`: tells python we are defining a function
- `function_name`: gives a name to the function
- `(input_name(s))`: local variable names for the input(s) to the function, wrapped in parentheses. Functions can have no inputs but the parentheses are not 
- `:`: tells python that we are done with the "naming" part and that an *indented block* of code comes next explaining what to do.
- Function code, indented 4 spaces: instructions for python to perform with the input(s).
- Optional `return` statement: the "value" the function should give back when the calculations are done - useful for assigning variables!

See the example below, where I define the `calculate_mean_blood_pressure` function!

In [8]:
# The triple quotes are a way of creating a multi-line comment
def calculate_mean_blood_pressure(systolic, diastolic):
    '''Calculate the mean blood pressure from systolic and diastolic blood pressure using the standard equation:
    MBP = DBP + 1/3 SBP
    '''
    
    mean_blood_pressure = diastolic + (1/3) * systolic
    print(f"Mean blood pressure is: {mean_blood_pressure}" )
    return mean_blood_pressure

Now, when we want to display a mean blood pressure given systolic/diastolic measurements, we can use our function instead to save ourselves the calculation lines!

**Important:** Functions only change variables within their scope - `calculate_mean_blood_pressure` does *not* change the value of `mean_blood_pressure` outside of the function. The below code block leaves the previously defined value of 116.666 unchanged!

In [9]:
calculate_mean_blood_pressure(120, 90)  # Doesn't change the value of mean_blood_pressure!
print(mean_blood_pressure)

Mean blood pressure is: 130.0
116.66666666666666


Instead we need to overwrite with an assignment operator, which will use the value from our return statement!

In [10]:
mean_blood_pressure = calculate_mean_blood_pressure(120, 90)  # Assignment does
print(mean_blood_pressure)

Mean blood pressure is: 130.0
130.0


Functions are key building blocks for all of Python, not just machine learning or artificial intelligence! Almost all code you write will in some sense be functions. 

### Loops and Iterables
Imagine we had 1000 patients, each with their own systolic and diastolic blood pressures, and we needed to calculate the mean BP for each patient. We *could* paste our function 1000 times and give new values each time but that would take *forever*. Instead we can make use of **loops**, which allow us to perform instructions multiple times for each value

#### The `for` loop
Perhaps the most basic of all loops in Python, the `for` loop allows us to perform actions with a named variable being assigned a value "for" each value in a list. Take a look at the example below!

In [11]:
systolic_bps = [120, 130, 140, 110, 135, 112, 122]  # This is how to define a list in python
diastolic_bps = [80,  70,  75,  66,  70,  62,  61]  # Spacing doesnt matter - space things to make them easier to read/pair

mean_bps = []  # Haven't calculated them yet

# range(number) outputs all numbers from 0 to number.
# len(object) returns the number of items in an object 
for i in range(len(systolic_bps)):  # i is short for index
    systolic_bp = systolic_bps[i]
    diastolic_bp = diastolic_bps[i]
    mean_bps.append(calculate_mean_blood_pressure(systolic_bp, diastolic_bp))
print(mean_bps)

Mean blood pressure is: 120.0
Mean blood pressure is: 113.33333333333333
Mean blood pressure is: 121.66666666666666
Mean blood pressure is: 102.66666666666666
Mean blood pressure is: 115.0
Mean blood pressure is: 99.33333333333333
Mean blood pressure is: 101.66666666666666
[120.0, 113.33333333333333, 121.66666666666666, 102.66666666666666, 115.0, 99.33333333333333, 101.66666666666666]


For loops in python select values for the loop-variable by **iterating** across objects like lists. This process involves stepping one item at a time and then performing the code in the loop. You may have also noticed that 

**Python tip:** For loops are *very* powerful tools in python, and with great power comes great responsibility. Many beginning coders like to use loops that start `for i in range(len(variable_name))`. This is almost always never the best way to do things: variables that have a length are also iterable! 
```py
for i in range(len(concentrations)):
    concentraion = concentrations[i]
    # Math here
```
Can be much more succinctly written as:
```py
for concentration in concentrations:
    # Math here
```

The code from the 13th block above can utilize useful built-in python functions like `zip` (which intertwines two lists) to be rewritten much more succinctly as follows:

```py
mean_bps = []
for systolic, diastolic in zip(systolic_bps, diastolic bps):
    mean_bps.append(calculate_mean_blood_pressure(systolic_bp, diastolic_bp))
```

#### The `while` loop
These loops keep running until a certain condition is met. An example is 
```py
x = 0
while x < 100:
    x = x + 1
    print(x)
```
which will first check to see if x is less than 100, and if it is will add 1 and print the current value. What would you expect the output of that loop to be?

#### Bonus: List comprehension
Oftentimes we want to create lists defined by a common pattern or function call, as we did with `mean_bps` before. Python has a built-in way of performing this through **list comprehension**. By defining a small `for` loop in brackets, Python will efficiently generate a list for us. Check out the code below!

In [12]:
squares = [x**2 for x in range(20)]  # Don't forget that ** is exponentiation
print(squares)

[0, 1, 4, 9, 16, 25, 36, 49, 64, 81, 100, 121, 144, 169, 196, 225, 256, 289, 324, 361]


### Conditionals
Oftentimes in coding we want to do something if certain conditions are met - in our BP example, let's say we want to print a warning if the systolic BP is higher than the recommended 120. The way this is done is through **conditional** statements. The syntax is similar to loops (recall that after a `:` python always expects an indented code block on the next line).

In [13]:
for systolic_bp in systolic_bps:  # I hope you didn't already forget loops!
    if systolic_bp > 120:
        print(f"{systolic_bp} is higher than the recommended 120!")  # More on what this "f" means in the next section!
    # The "elif," short for else-if, lets us select more conditions from the options that don't meet the first condition
    elif systolic_bp > 110:      
        print(f"{systolic_bp} is 120 or lower, good job!")
    # "else" covers every case not already covered
    else: 
        print(f"Your systolic BP of {systolic_bp} is dangerously low!")

120 is 120 or lower, good job!
130 is higher than the recommended 120!
140 is higher than the recommended 120!
Your systolic BP of 110 is dangerously low!
135 is higher than the recommended 120!
112 is 120 or lower, good job!
122 is higher than the recommended 120!


### Typing in Python
Python *types* are the differnt kinds of data that python can store - we have been using a few of them throughout this tutorial. The most common types you will find - each type has their own built-in operations that work different ways between them. For example, numbers can be added and subtracted, but not lists or dictionaries! To check a variable's type you can use the built-in function `type`.

In [14]:
# Numerical types
print(type(1))
print(type(1.0))
print(type(complex(1, 2)))

# Sequence types
print(type([1, 2, "buckle my shoe"]))
print(type(range(5)))
print(type((1, 2)))
print(type("I am a string!"))
# There are also bytes, bytearray, and memoryview, which are beyond the scope of the tutorial

# Mapping type
print(type({"a": 1}))

<class 'int'>
<class 'float'>
<class 'complex'>
<class 'list'>
<class 'range'>
<class 'tuple'>
<class 'str'>
<class 'dict'>


A few typing tips! 
- Integer types have NO values after a decimal point. They are truncated, not rounded. Accidentially converting to an integer will lose all decimal information
- f-strings can be used to include values of variables in your strings! Prepend your quotation marks with f, and reference your variables in brackets (take a look throughout the code in the tutorial for some examples!)

### Object Oriented Programming
 Object Oriented Programming (OOP) is one of the more nuanced aspects of modern coding. Briefly, we can define our own **classes**, or "blueprints" for a desired variable. We can then create **objects** from these blueprints that act as instances that follow our blueprints! Almost everything we use in Python is an object, especially in machine learning. Many models that we will use in this course will be objects that contain their own functions and variables defined within them. These functions and variables can be accessed with the `.` syntax. A great example is the built-in list class that contains the method `append()` to add a value to the list. 

In [15]:
one_to_five = [1, 2, 3, 4]
one_to_five.append(5)  # Using the . to access the function "append" for the class "list"
print(one_to_five)

[1, 2, 3, 4, 5]


### Libraries and More
Finally, we can talk a bit about libraries - other python functions and objects nicely packaged and able to be read into our own python files seamlessly. Coding everything ourselves would take significantly too long and it's best to not re-invent the wheel! Python lets us bring in new libraries to the namespace using the `import` statement.

Many libraries are published in repositories for us to use, we just need to install them! Python comes packaged with its own package manager called `pip`, which we can use to install packages *from the command line*. To use pip in a jupyter notebook, we need to prepend our code with a shebang (!) to execute the code in a terminal

In [16]:
!pip install numpy



There are a few ways to import libraries once they're installed. The simplest is `import`.

In [17]:
import math
print(math.exp(1))

2.718281828459045


If we want to take a short list functions or classes from a package, we can use `from ... import ...` to specify what we want to load just those functions into our namespace.

In [18]:
from random import randint, randrange
print(randint(1, 10))

4


If we want to use a different name for a package, we can use `import ... as ...` 

In [19]:
import numpy as np
print(np.exp(1))

2.718281828459045


If we want to load all methods/classes from a library into our namespace we can use the `from ... import *` syntax. **This is NOT recommended, and can very easily cause confusion or overlapping names from libraries that implement two different functions with the same name.** Notice how numpy and math both implement their own `exp` function.

In [20]:
from json import *
loads('{"a" : 1}')  # Method loads (short for load string) from json library

{'a': 1}

That concludes the Python basics section of the tutorial! Now we'll move into some specifics for two libraries that we should be familiar with - numpy and pandas! Lets install them and load them in with the `as` syntax that you'll be seeing throughout your coding career in Python - 

In [21]:
 # Note that multiple packages can be installed with a space
!pip install numpy pandas 
import numpy as np
import pandas as pd



 ## 2. Numpy
 Numpy (commonly seen as np in code) is a python library that provides multi-dimensional array support and allows us to use matrix operations on these arrays. It uses C in the backend, which means that often using numpy is faster for array operations than pure python.
 
 ### Numpy Arrays
 The basic numpy functionality centers around the *array*, which can be created using numpy's built in `array` method and assigned to a variable. Additionally, most numpy functions also return numpy arrays

In [22]:
array_from_list = np.array([1, 2, 3])
array_from_range = np.arange(15)
print(array_from_list)
print(array_from_range)

[1 2 3]
[ 0  1  2  3  4  5  6  7  8  9 10 11 12 13 14]


Arrays have lots of "metadata" associated with them - the number of dimensions, the type of the objects inside the array (in numpy arrays can only contain one type of data, though that type can be "object" to create arrays with a mix of integers/floats/strings/etc.), the shape of the array, etc.

In [23]:
print(array_from_list.ndim)
print(array_from_range.shape)
# The reshape method changes the dimensions of an array while keeping the order of elements the same
print(array_from_range.reshape(3, 5).dtype)  

1
(15,)
int32


#### Array operations
As with integers and floats, many math operations also work on our arrays! Most operations on an array happen to each element of the array or act on the entire array at once.

In [24]:
example_array= np.arange(12).reshape(3,4)
print(example_array)

[[ 0  1  2  3]
 [ 4  5  6  7]
 [ 8  9 10 11]]


In [25]:
example_array + 2

array([[ 2,  3,  4,  5],
       [ 6,  7,  8,  9],
       [10, 11, 12, 13]])

In [26]:
example_array * 4

array([[ 0,  4,  8, 12],
       [16, 20, 24, 28],
       [32, 36, 40, 44]])

In [27]:
example_array.sum()

66

### Indexing
We can access data inside of our arrrays by specifying the index of the element(s) we want to access. We can also use slicing syntax (see below) to select multiple concurrent indices, and in a multidimensional array we can select indices for each dimension separated by commas!

In [28]:
example_array

array([[ 0,  1,  2,  3],
       [ 4,  5,  6,  7],
       [ 8,  9, 10, 11]])

In [29]:
example_array[2]            # 3rd row in the array

array([ 8,  9, 10, 11])

In [30]:
example_array.flatten()[4]  # Fifth element in the array (flatten makes the array 1 dimensional)

4

In [31]:
example_array[1, 3]          # Second row, fourth column

7

In [32]:
example_array[0, 1:4]       # First row, 2nd to 4th entries (the syntax includes start and excludes end)

array([1, 2, 3])

### Copying
Some unexpected results may happen when using numpy arrays in your code. Take a look at the code below:

In [33]:
array_a = np.array([1, 2, 3])
array_b = array_a
array_a[1] = 3  # Change the item at index 1 (the second item) to 3
print(array_b)

[1 3 3]


Array B changed as well as Array A! This is because numpy does not allocate new memory for array b - it just tells python that a and b point to the same object! To mitigate this, we need to use the `copy()` method. There are different types of copy, but as a good rule of thumb, numpy's built in `copy()` is often more than sufficient. If you are dealing with arrays that contain complex objects inside of them, the `copy` library's `deepcopy()` method is good.

In [34]:
array_a = np.array([1, 2, 3])
array_b = array_a.copy()
array_a[1] = 3 
print(array_b)

[1 2 3]


We will continue to work with numpy throughout the weeks, and this section is meant as a basic primer for the library. With few exceptions, each week's tutorial will contain some useful numpy tips for that week's lecture (or just for general knowledge). A great way to see more information is by using Google or ChatGPT for tips, or using the `help` method in python for information about specific numpy functions!

In [35]:
help(np.exp)

Help on ufunc:

exp = <ufunc 'exp'>
    exp(x, /, out=None, *, where=True, casting='same_kind', order='K', dtype=None, subok=True[, signature, extobj])
    
    Calculate the exponential of all elements in the input array.
    
    Parameters
    ----------
    x : array_like
        Input values.
    out : ndarray, None, or tuple of ndarray and None, optional
        A location into which the result is stored. If provided, it must have
        a shape that the inputs broadcast to. If not provided or None,
        a freshly-allocated array is returned. A tuple (possible only as a
        keyword argument) must have length equal to the number of outputs.
    where : array_like, optional
        This condition is broadcast over the input. At locations where the
        condition is True, the `out` array will be set to the ufunc result.
        Elsewhere, the `out` array will retain its original value.
        Note that if an uninitialized `out` array is created via the default
        ``

## 3. Pandas
Pandas is a library built on numpy that allows for more complex data "wrangling" and easier to understand information. Pandas is traditionally imported as `pd` to save space. Pandas extends the numpy array to allow for a data-structure that labels the columns (called a DataFrame).

### Series
A series is the smallest building block in Pandas - they act as labelled arrays. We can create them with pd.Series

In [36]:
s = pd.Series(np.random.randn(5), index=['a', 'b', 'c', 'd', 'e'])
s

a    2.137891
b    0.124201
c   -0.615638
d   -0.528526
e   -1.349534
dtype: float64

Note that each element in the array of 5 random numbers has its own lettered label. We can now access these numbers using the label instead of just the index!

In [37]:
s['a']

2.1378906642839333

In [38]:
s[['b', 'e']]  # Note that we use a list as the index so that numpy doesn't think we are trying to acces multiple dimensions

b    0.124201
e   -1.349534
dtype: float64

### DataFrame
The "bread and butter" of pandas is the DataFrame: a 2-Dimensional array, where each row/column is its own series. I find it easiest to think of these as "spreadsheets." There are a few ways to initialize them, but the one that is most commonly used is to initialize from a numpy array and provide column names and row names as below.

In [39]:
blood_pressure_data = np.array([systolic_bps, 
                                diastolic_bps, 
                                mean_bps]).transpose()
blood_pressure_data = pd.DataFrame(blood_pressure_data, 
                                   index=[f'Patient {num+1}' for num in range(7)],  # Don't forget our list comprehension
                                   columns=["Systolic", "Diastolic", "Mean"])
blood_pressure_data

Unnamed: 0,Systolic,Diastolic,Mean
Patient 1,120.0,80.0,120.0
Patient 2,130.0,70.0,113.333333
Patient 3,140.0,75.0,121.666667
Patient 4,110.0,66.0,102.666667
Patient 5,135.0,70.0,115.0
Patient 6,112.0,62.0,99.333333
Patient 7,122.0,61.0,101.666667


You can index DataFrames by row, column, or even both!

In [40]:
blood_pressure_data['Systolic']

Patient 1    120.0
Patient 2    130.0
Patient 3    140.0
Patient 4    110.0
Patient 5    135.0
Patient 6    112.0
Patient 7    122.0
Name: Systolic, dtype: float64

In [41]:
blood_pressure_data.loc['Patient 1']  # With just column you need to use the .loc method

Systolic     120.0
Diastolic     80.0
Mean         120.0
Name: Patient 1, dtype: float64

In [42]:
blood_pressure_data['Systolic']['Patient 1']  # Column first, then row 

120.0

### Reading CSVs
Pandas provides a method for us to easily read in "spreadsheet" data saved as Comma Separated Values (csv) files. You can read these CSVs by using the `read_csv()` method and passing a file location. An example syntax can be found below, and we will become much more familiar with this process as the course continues. 

```py
dataframe_from_csv = pd.read_csv("~/Desktop/example_csv.csv", header=0)  # Read first line of the csv as column names
```

Similary to numpy, we will be using a lot of pandas throughout the course and useful pandas methods will be presented in the weeks of the tutorials they will be showing up in for the first time (so you dont forget them!)