### Numpy basics

1\. Find the row, column and overall means for the following matrix:

```python
m = np.arange(12).reshape((3,4))
```

In [None]:
import numpy as np

m = np.arange(12).reshape((3,4))
print(m)

row_means = [np.average(m[i,:]) for i in range(m.shape[0])]
col_means = [np.average(m[:,i]) for i in range(m.shape[1])]
mean = np.average(m)

print("Row means:", row_means)
print("col_means:", col_means)
print("tot_mean:", mean)

2\. Find the outer product of the following two vecotrs

```python
u = np.array([1,3,5,7])
v = np.array([2,4,6,8])
```

Do this in the following ways:

   * Using the function outer in numpy
   * Using a nested for loop or list comprehension
   * Using numpy broadcasting operatoins


In [None]:
u = np.array([1,3,5,7])
v = np.array([2,4,6,8])

print("Outer product from numpy:", np.outer(u,v), sep='\n')

outer = [ [u[i]*v[j] for j in range(v.shape[0])] for i in range(u.shape[0]) ]
print("Outer product by list compr:", np.array(outer), sep='\n')

print("Outer product with broadcast:", u.reshape(-1,1)*v.reshape(1,-1), sep='\n')

3\. Create a 10 by 6 matrix of random uniform numbers. Set all rows with any entry less than 0.1 to be zero

Hint: Use the following numpy functions - np.random.random, np.any as well as Boolean indexing and the axis argument.

In [None]:

a = np.random.rand(10, 6)
print("Original matrix:", a, sep='\n')

mask = ( a < 0.1 )
print("Rows to change:", a[np.any(mask, axis=1)], sep='\n')

a[np.any(mask, axis=1)] = np.zeros(6)
print("Modified matrix:", a, sep='\n')

4\. Use np.linspace to create an array of 100 numbers between 0 and 2π (includsive).

  * Extract every 10th element using slice notation
  * Reverse the array using slice notation
  * Extract elements where the absolute difference between the sine and cosine functions evaluated at that element is less than 0.1
  * Make a plot showing the sin and cos functions and indicate where they are close

In [None]:
a = np.linspace(0, 2*np.pi, 100)
print("Original:", a, sep='\n')
tenth = a[::10]
print("Every 10:", tenth, sep='\n')
reverse = a[::-1]
print("Reverse:", reverse, sep='\n')
mask = (abs(np.sin(a)-np.cos(a)) < 0.1)
print("Less than 0.1:", a[mask], sep='\n' )

import matplotlib.pyplot as plt

fig, ax = plt.subplots()
ax.plot(a, np.sin(a))
ax.plot(a, np.cos(a))
ax.scatter(a[mask], np.sin(a[mask]))
ax.scatter(a[mask], np.cos(a[mask]))

5\. Create a matrix that shows the 10 by 10 multiplication table.

 * Find the trace of the matrix
 * Extract the anto-diagonal (this should be ```array([10, 18, 24, 28, 30, 30, 28, 24, 18, 10])```)
 * Extract the diagnoal offset by 1 upwards (this should be ```array([ 2,  6, 12, 20, 30, 42, 56, 72, 90])```)

In [None]:
mult = np.array([x*y for x in range(1,11) for y in range(1,11)]).reshape(10,10)
print("Table:", mult, sep='\n')
print("Trace:", np.trace(mult))
print("anto-diagonal:", np.diagonal(mult[::-1,:]))
print("diagonal offset:", np.diagonal(mult, offset=1))

6\. Use broadcasting to create a grid of distances

Route 66 crosses the following cities in the US: Chicago, Springfield, Saint-Louis, Tulsa, Oklahoma City, Amarillo, Santa Fe, Albuquerque, Flagstaff, Los Angeles
The corresponding positions in miles are: 0, 198, 303, 736, 871, 1175, 1475, 1544, 1913, 2448

  * Construct a 2D grid of distances among each city along Route 66
  * Convert that in km (those savages...)

In [None]:
pos = np.array([0, 198, 303, 736, 871, 1175, 1475, 1544, 1913, 2448]).reshape(1,-1)
dist = pos - pos.T
print("Distances in miles :(", dist, sep='\n')
print("Distances in km :)", np.around(dist*1.61), sep='\n')

7\. Prime numbers sieve: compute the prime numbers in the 0-N (N=99 to start with) range with a sieve (mask).
  * Constract a shape (100,) boolean array, the mask
  * Identify the multiples of each number starting from 2 and set accordingly the corresponding mask element
  * Apply the mask to obtain an array of ordered prime numbers
  * Check the performances (timeit); how does it scale with N?
  * Implement the optimization suggested in the [sieve of Eratosthenes](https://en.wikipedia.org/wiki/Sieve_of_Eratosthenes)

In [None]:
import timeit
def prime(N):
    numbers = np.arange(N)
    mask = np.empty(numbers.shape[0], dtype=bool)
    for i in range(mask.shape[0]):
        if np.any(i % numbers[2:i] == 0):
            mask[i] = False
        else:
            mask[i] = True
    #print("Prime numbers from 0 to", numbers.shape[0], ':')
    #print(numbers[mask])
    mask[0], mask[1]= False, False
    return numbers[mask]

def erat(N):
    numbers = np.arange(2,N)
    mask = np.ones(numbers.shape[0], dtype=bool)
    i=2
    while(np.any(mask[2*i-2::i])):
        mask[2*i-2::i] = False
        filtered = numbers[mask]
        i = int(filtered[np.argwhere(filtered==i)+1])
    return numbers[mask]
print(prime(100))
print(erat(100))

Ns = np.linspace(10,1000,100)
y_p = [ timeit.timeit('prime('+str(i)+')', globals=globals(), number=10)/10 for i in Ns]
y_e = [timeit.timeit('erat('+str(i)+')', globals=globals(), number=10)/10 for i in Ns]

fig, ax = plt.subplots()
ax.plot(Ns, y_p, label="non optimized")
ax.plot(Ns, y_e, label="optimized")
ax.legend()
%timeit prime(1000)
%timeit erat(1000)


**N.B. the following exercises are meant to be solved only if you are familiar with the numpy random library. If not you can skip them (postponed for one of the next exercise sessions)**


8\. Diffusion using random walk

Consider a simple random walk process: at each step in time, a walker jumps right or left (+1 or -1) with equal probability. The goal is to find the typical distance from the origin of a random walker after a given amount of time. 
To do that, let's simulate many walkers and create a 2D array with each walker as a raw and the actual time evolution as columns

  * Take 1000 walkers and let them walk for 200 steps
  * Use randint to create a 2D array of size walkers x steps with values -1 or 1
  * Build the actual walking distances for each walker (i.e. another 2D array "summing on each raw")
  * Take the square of that 2D array (elementwise)
  * Compute the mean of the squared distances at each step (i.e. the mean along the columns)
  * Plot the average distances (sqrt(distance\*\*2)) as a function of time (step)
  
Did you get what you expected?
quite, I expected the mean position to be steady around zero while the mean distance to grow with steps, as I applied a function to the position distribution, but i didn't expect this type of evolution.

In [None]:
walkers = 2*np.random.randint(0, 2, (1000, 200))-1
print("Walkers:", walkers, sep='\n')
distances=np.array([np.sum(walkers[:,:i], axis=1) for i in range(1,201)]).T
print("Distances:", distances, distances.shape, sep='\n')
dist_square=distances**2
print("Squared:", dist_square, sep='\n')
mean_step=np.sum(dist_square, axis=0)/1000
print("mean dist per step:", mean_step, sep='\n')

fig, ax = plt.subplots()
ax.plot(range(1,201), np.sum(abs(distances), axis=0)/1000, label="mean distance")
ax.plot(range(1,201), np.sum(distances, axis=0)/1000, label="mean position")
ax.legend()

9\. Analyze a data file 
  * Download the population of hares, lynxes and carrots at the beginning of the last century.
    ```python
    ! wget https://www.dropbox.com/s/3vigxoqayo389uc/populations.txt
    ```

  * Check the content by looking within the file
  * Load the data (use an appropriate numpy method) into a 2D array
  * Create arrays out of the columns, the arrays being (in order): *year*, *hares*, *lynxes*, *carrots* 
  * Plot the 3 populations over the years
  * Compute the main statistical properties of the dataset (mean, std, correlations, etc.)
  * Which species has the highest population each year?

Do you feel there is some evident correlation here? [Studies](https://www.enr.gov.nt.ca/en/services/lynx/lynx-snowshoe-hare-cycle) tend to believe so. there's a clear correlation between hares and lynxes

In [None]:
! wget https://www.dropbox.com/s/3vigxoqayo389uc/populations.txt

In [None]:
data = np.genfromtxt("populations.txt", delimiter='\t')
print(data)
year, hares, lynxes, carrots = [data[:,i] for i in range(data.shape[1])]

fig, ax = plt.subplots()
ax.plot(year, hares, label="hares")
ax.plot(year, lynxes, label="lynxes")
ax.plot(year, carrots, label="carrots")
ax.legend()

import pandas as pd
print("Mean and std:",pd.DataFrame(data[:,1:]).describe(), "Covariance matrix:", np.cov(data[:,1:], rowvar=False), sep = '\n')

diction={0:"hares", 1:"lynxes", 2:"carrots"}
print("species with highest population each year:", np.hstack((year.reshape(-1,1),np.array([diction[i] for i in np.argmax(data[:,1:], axis=1)]).reshape(-1,1))), sep='\n')