# Scientific Python Bootcamp Day 3

Prepared and presented by John Russell (johnrussell@g.harvard.edu) and Ian Hunt-Isaak (ianhuntisaak@g.harvard.edu)

### Getting Started
http://tinyurl.com/9ddaf9gy

### Topics for Today

- Review of jupyter lab
- Random numbers and random walks 
- Introduction to the scientific python ecosystem
- Local installation of jupyter lab

### Jupyter Lab Review

**The Essentials**
- `Shift + Enter` executes a cell
- `Shift + Tab` shows the documentation of a function
- `Tab` will attempt to auto-complete the word you are typing

**Cell Operations**
- There are two modes in a jupyter notebook: *Edititing* mode is where you are editing text in a cell. *Command* mode is when you are outside of a cell. `Esc` while in a cell switches to command mode. `Enter` will select a cell and enter editing mode there if you are in command mode.
- `Esc + a` makes a new cell *above* your current position 
- `Esc + b` makes a new cell *below* your current position
- `Esc + m` converts a cell into a markdown cell
- `Esc + y` converts a cell into a code cell.
- `Esc + d + d` deletes a cell
- `Esc + i + i` interrupts the execution of a cell
- If you really get into it, you can make custom keyboard shortcuts in `Settings > Advanced Settings > Keyboard Shortcuts`

There are also jupyter lab extensions that can really improve the experience of using jupyter lab. A very good list of these extensions is called [Awesome Jupyter](https://github.com/markusschanta/awesome-jupyter).

In [1]:
import numpy as np
import matplotlib.pyplot as plt

In [1]:
#this cell changes some matplotlib defaults to make plots nicer 
import matplotlib as mpl
mpl.rc("font", family='serif')
mpl.rc("figure", figsize=(9,6))
%config InlineBackend.figure_format = 'retina'

## Random numbers and random walks

Random number generation is a surprisingly tricky thing to do on a computer which we generally think of as highly non-random. Strictly speaking, we will be talking about *pseudo*random number generation since its impossible to genereate truly random numbers. However it is important enough that lots work has gone into doing it well and many of the best implementations live in the `numpy.random` module.

*Note*: The `numpy.random` module was changed significantly in summer of 2019 so what you'll see today is the modern usage. For compatibility reasons, numpy still supports the old way and you may well come across older code which will look slightly different.

The core of the random module is a `Generator` object. The easiest and most likely best way to initialize one is

In [None]:
rng = np.random.default_rng() #rng stands for Random Number Generator

The `Generator` object can then generate numbers from a vast array of different distributions. You can learn about these in a statistics class but I'll show a few examples.

In [None]:
#sample normal distribution

In [None]:
#show every distribution, sample gamma

In [None]:
#Randomly choose from pets
pets = ['cat', 'dog', 'fish', 'rabbit']

#### Random Walks

Random walks are a *very* powerful and widely used model in basically every area of science. One of the great things about random walks is that they are very easy to simulate and often analytically tractable though the math is much beyond the scope of this bootcamp. What is a random walk?

Here is the idea: a walker starts at some point and at each time point takes a "random step." There are many ways to define a random step but lets focus on the simplest case in 1 dimension. 
- A walker starts at 0 on the number line.
- The walker flips a coin.
- If the coin comes heads, take a step to the right (+1)
- If the coin comes up tails take a step to the left (-1)
- Repeat this process for many time steps i.e. coin flips.

In [None]:
steps = rng.choice([-1,1], size=1000)

In [None]:
position = np.cumsum(steps)

In [None]:
plt.plot(position)
plt.title("A Random Walk")
plt.xlabel('Time')
plt.ylabel('Position')
plt.show()

#### Compiling statistics

Often the idea with simulating random walkers is that we simulate many of them and the calculate statistics as a function of time. Said slightly differently, we often average over the walkers rather than averaging over time. 

In [None]:
# generate 1000 time steps for 500 walkers


In [None]:
#convert steps to positions


In [None]:
# plt.plot(positions[:,0])
# plt.title("10 Random Walks")
# plt.xlabel('Time')
# plt.ylabel('Position')
# plt.show()

In [None]:
# positions.shape

Remember that we have 500 walkers and 1000 time steps so the first dimension in this array is time and the second dimension is the walkers.

In [None]:
# mean = positions.mean(1)
# std = positions.std(1)

In [None]:
# plt.plot(std, label="Std. Dev.")
# plt.plot(mean, label='Mean')
# # fit standard deviation
# plt.legend()
# plt.show()

#### Other questions we can ask

With this set of random walker trajectories we can ask other questions beyond just calculating simple statistics. For instance, roughly what fraction of walkers only walk in the positive part of the number line?

*Note* With only 500 walkers we dont really have enough to estimate complex quantites like this. Generally you might simulate as many as $10^9$ walkers but things do start to get slower at that point. 

In [None]:
#identify positive only trajectories

In [None]:
#count them

In [None]:
#select the trajectories

In [None]:
#plot 

### Exercise 2

*Note* Since we are generating random numbers your individual results may be different. 


#### Part a.

Similate 100 random walkers each taking 1000 steps as above but rather than a "coin flip" to determine the step, have these walkers take a step to the right (+1) with probability 0.65 and a step to the left (-1) with probability 0.35 - this is often called a biased random walk. Make plot showing the trajectories of the walkers which ended up farthest from the origin and closest to the origin.

*Hint* Read the documentation of `rng.choice`

#### Part b.

- Compute the mean and standard deviation of these walkers as a function of time. 
- Plot the mean and standard deviation as a function of time on the same axes. 
- Plot $\sqrt{t}$ as above and plot on the same axes. Does it still seem to describe the standard deiviation as a function of time?
- **Optional** Can you come up with a function that describes the mean as a function of time? Plot this function as well. *Hint* How do you think the average depends on the probability of going right? Does your formula give the correct result from the demo above when $p=0.5$?

So the standard deviation is about the same, it growns like $\sqrt{t}$. The average position grows linearly in time proportional to the difference between the probability of going right and the probability of going left. It also follows from this formula that if $p_R = p_L = 0.5$ the average position is constant at 0.

#### Part c. 

Rather than just walking up and down the number line, lets see what happens when the walk happens in two dimensions. Simulate 100 walkers each taking 1000 steps in the XY plane. Generate a 2D step by taking 2 independent samples from a standard normal distribution (mean=0, standard deviation=1). Plot 10 walks *in the XY plane*.

#### Part d.

Find the walkers which end up the farthest from the origin and the closest. Plot these two trajectories in the XY plane.

*Hint* Given a point $(x,y)$ how do you compute the distance from the origin? Can you use numpy to compute the distance for all the walkers at all the time points without any loops?

**Optional** Try to make this cool plot from the bootcamp flyer. Plot all the walkers in black and use the keyword `alpha=0.5` in your call to `plt.plot`. Then plot the closest and farthest walkers in red and orange respectively.

<img src = "../day2/figures/2d_walk.png" width=400px>

#### Part e. 

Plot the trajectories of any walkers who remain in the first quadrant for their entire trajectory (i.e. $x(t)>0$ and $y(t)>0$ for all times $t$). You will probably want to simulate more walkers (~$10^5$) in order to find some who meet this criterion, it happens with probability ~0.05\%.

### Introduction to Scipy

As we have seen the past few days numpy is a highly performant array library and it contains some functions for simple math. How do we do more interesting/specialized things? People have written all sorts of libraries that use numpy arrays to do fancy things. The first layer of added complexity here is called Scipy. 

[Scipy Docs](https://docs.scipy.org/doc/scipy/reference/)

#### Solving differential equations

Generally the problem can be written as 

$$ \dfrac{dy}{dt} = f(t,y).$$

In some words, were given the derivative of a function and we want to find the function itself. One other key point is that we need to be given an *initial value* $y(0)$. Also note that $y$ can be a vector in which case $f(t,y)$ returns the derivative of each component.

One very cool differential equation is the [Lorenz system](https://en.wikipedia.org/wiki/Lorenz_system) which is represened by the following equations:


$$ \frac{dx}{dt} = \sigma(y-x)$$
$$ \frac{dy}{dt} = x(\rho-z) - y$$
$$ \frac{dz}{dt} = xy- \beta z.$$


"scipy solve differential equation"

In [5]:
from scipy.integrate import solve_ivp

In [8]:
def lorenz(t, r, rho, sigma, beta):
    x,y,z = r
    dxdt = sigma*(y-x)
    dydt = x*(rho-z) - y
    dzdt = x*y-beta*z
    return np.array([dxdt, dydt, dzdt])

lorenz_init = np.array([1,1,1])

In [28]:
#use ρ = 28, σ = 10, and β = 8/3
lorenz_sol = None

In [16]:
t_eval = np.linspace(0, 100, 10000)

In [26]:
#plot x y

In [27]:
#plot y z

In [29]:
# This is beyond the scope of this bootcamp but just to show you that its possible
# I google "matplotlib 3d plot example" every single time I do this
# from mpl_toolkits.mplot3d import Axes3D
# fig = plt.figure()
# ax = fig.add_subplot(111, projection='3d')
# fig.suptitle(r"Lorenz Attractor", fontsize=20)
# ax.plot(r[0], r[1], r[2], linewidth=0.5,c='xkcd:purple')
# ax.set_xlabel('X')
# ax.set_ylabel('Y')
# ax.set_zlabel('Z')
# plt.show()