# 2.2 - Containers, functions, arrays

### Learning goals for today
1. Inspect and understand Python data types (10 minutes)
2. Create, index and manipulate built in containers like lists and tuples (15 minutes)
3. Use and define functions in Python, Import specialized python libraries (15 minutes)
4. Use numpy to create and compute with arrays and matrices (15 minutes)
5. Use NumPy and Matplotlib to solve problems in biology (20 minutes)

---
### How to use this notebook during class
- Follow along as we go
- Use your **Cards** to indicate where you're at:
    - A **🟩Green card** means you are caught up with Max and **ready to help your classmates**
    - A **🟥Red card** means you are stuck and need help
- <span style='color:red;'>EXERCISE</span> — work on this problem by yourself, or try with a partner if you get stuck
---
### Resources and more practice:
- https://swcarpentry.github.io/python-novice-gapminder/03-types-conversion.html
- https://swcarpentry.github.io/python-novice-gapminder/instructor/04-built-in.html
- https://swcarpentry.github.io/python-novice-gapminder/instructor/06-libraries.html

## 1) Data types
Common built-in scalar types:
- `int` (integers), `float` (decimal numbers), `bool` (True/False), and `str` (text).

Python variables are *dynamically* typed. This means that when you define a variable Python is smart and figures out what the data type should be. This is convenient but can be dangerous.

Use `type(obj)` to check the type.


### Booleans and comparisons
Comparison operators: `==`, `!=`, `<`, `<=`, `>`, `>=`.
Logical operators: `and`, `or`, `not`.


## 2) Containers

A **container** is a data structure that stores a collection of objects

There are a few examples of containers, including:
- **string**: A container full of characters like "hello"
- **list**: A container full of whatever, separated by commas, like: [1, 2, 3.5, 50]
- **tuple**: a lot like a list but it is _immutable_ 

What can you do with a container? Here are two things:
1. Index to retrieve or change values
2. Math!

Let's make a list and see what both of these look like:

### <span style='color:red;'>EXERCISE 1</span>: LIST MATH (5 min)
Make a list of numbers and then write some code to compute the average of the values and print it

## 3) Functions & libraries
When you install Python, it includes a **built-in** set of **functions** and **libraries**
- A **function** is a bit of code that takes an input, does something with it, and returns an output
- A **library** is a collection of code that can dramatically extend the functionality Python

A function uses parentheses () to input the **argument(s)**

You already know (at least) one Python function from last time: print()

### Finding help for a function
1. use the help() function
2. Google it!

### Making your own function

You can define your own functions! Let's see how that works...

OK now let's use our new function

OK cool our function works, but let's imagine someone else using it, or you using it weeks from now...


** Good functions help the user understand how to use them, and prevent users from mis-using them. Let's improve our function with some **documentation** and some **input validation**

## Libraries
Python doesn't have that many **built-in** **functions** - they are listed here:
https://docs.python.org/3/library/functions.html

If this were all we were limited to, Python would not be very useful for the sciences

Python is expandable because of **libraries**
To use a **library** you need to first **import** it

### Importing NumPy and Matplotlib
Let's import two community libraries:
- **numpy** is for numerical computing
- **Matplotlib** is for plotting

These libraries must be installed before you can use them. If you are using the full Anaconda distribution, you should already have **numpy** and **matplotlib**, if not, you may need to install it

We'll import numpy and the plotting module of matplotlib, called pyplot, with a shortcut variable to refer to them...

## 4) NumPy and Matplotlib

### Using arrays in Numpy
One of the big advantages of Numpy is that it has built in data types called arrays, which can be 1D (like a vector) or n-dimensional (like a matrix)

Let's start by making a 1D array...

Indexing into numpy arrays works much like with other containers like lists

numpy gives us access to tools for computations on our arrays

## <span style='color:red;'>EXERCISE 2</span>: LEARN NUMPY FUNCTIONS ON YOUR OWN (5 min)
Use the wonders of the internet, or your intuition, to figure out how to compute the **median** and **variance** of _counts_ using numpy functions

## matrices (ndarrays)
Numpy also allows you to create matrices / n-dimensional arrays.
Let's create a matrix to see how they work...

## <span style='color:red;'>EXERCISE 3</span>: LEARN NUMPY FUNCTIONS ON YOUR OWN PT 2 (5 min)
Look up the numpy.mean() documentation to figure out how to compute the mean...
- across all rows (should be an array of length 3)
- across all columns (should be an array of length 4) 
- across the whole matrix (should be a single, scalar number)

https://numpy.org/doc/2.3/reference/generated/numpy.mean.html

## Random numbers in numpy
numpy has functions for generating pseudo-random numbers. Let's try one by plucking random numbers from a uniform distribution

Then we'll use **pyplot** to plot a histogram. Note there are lots of ways to create figure and axis objects with pyplot but I will teach you what I think is the most explicit (and best) way...

## <span style='color:red;'>EXERCISE 4</span>: LEARN NUMPY FUNCTIONS ON YOUR OWN PT 3 (5 min)
Look up the documentation for numpy.random distributions to see what random distributions numpy can draw from. Pick a distribution and some parameters, and then plot a histogram of 1000 draws from that distribution

https://numpy.org/doc/2.1/reference/random/generator.html#distributions

## BONUS: Modeling bacterial growth

You are growing E. coli in a flask, and you measure optical density of the bacterial suspension using a spectrophotometer every 30 minutes for 5 hours.
The optical density is directly related to the number of bacteria in the suspension

In [None]:
time_points = np.array([0, 0.5, 1, 1.5, 2, 2.5, 3, 3.5, 4, 4.5, 5])
optical_density = [30, 45, 92, 198, 404, 838, 1705, 3120, 3780, 4100, 4400]

Let's plot the data using **pyplot**


Part of this growth curve is the **exponential phase**, where the number of bacteria double at regular intervals. This interval is called the doubling time

**N(t) = N_0 * 2^(t/T_d)**
- N(t) is the number of bacteria as a function of time, t
- N_0 is the initial number of bacteria
- T_d is the doubling time


Let's transform our optical density by log2...
This makes our equation

Log2(N(t)) = Log2(N_0) + t/T_D

which is linear, of the form Y = mx + b, where the **slope m = 1/T_d**


Now we're going to fit a line to our log-transformed data to estimate the doubling time. First note that the exponential phase is between about 0.5 and 3 hours...