### Numpy basics

1\. Find the row, column and overall means for the following matrix:

```python
m = np.arange(12).reshape((3,4))
```

In [None]:
import numpy as np

In [None]:
m = np.arange(12).reshape((3,4))
print("Array: \n", m)
print("Array dimensions (rank):", m.ndim)
print("Shape of the array:", m.shape)
print("Size of the first dimension (axis):", len(m))
print("Overall mean", m.mean())
print("Row wise mean", m.mean(1))
print("Column wise mean", m.mean(0))

2\. Find the outer product of the following two vecotrs

```python
u = np.array([1,3,5,7])
v = np.array([2,4,6,8])
```

Do this in the following ways:

   * Using the function outer in numpy
   * Using a nested for loop or list comprehension
   * Using numpy broadcasting operatoins


In [None]:
u = np.array([1,3,5,7])
v = np.array([2,4,6,8])

a = np.outer(u, v)

b = np.array([i*j for i in u for j in v]).reshape(4,4)

c = u[:, np.newaxis]*v

3\. Create a 10 by 6 matrix of random uniform numbers. Set all rows with any entry less than 0.1 to be zero

Hint: Use the following numpy functions - np.random.random, np.any as well as Boolean indexing and the axis argument.

In [None]:
np.random.seed(1170818)
d = np.array([np.random.random(60)]).reshape(10,6)
print(d)
con = np.array(np.where(np.any(d<0.1, axis=1)))[0]
for i in range(len(con)):
    d[con[i]] = np.tile(0., d.shape[1])

print(d)

4\. Use np.linspace to create an array of 100 numbers between 0 and 2π (includsive).

  * Extract every 10th element using slice notation
  * Reverse the array using slice notation
  * Extract elements where the absolute difference between the sine and cosine functions evaluated at that element is less than 0.1
  * Make a plot showing the sin and cos functions and indicate where they are close

In [None]:
import matplotlib.pyplot as plt

%matplotlib inline 

xs = np.linspace(0, 2*np.pi, 100)
#1
e = xs[9::10]
#2
f = xs[::-1]
#3
index = np.abs(np.sin(xs)-np.cos(xs)) < 0.1
g = xs[index]
#4
plt.scatter(g, np.sin(g))
plt.plot(xs, np.sin(xs), xs, np.cos(xs));

5\. Create a matrix that shows the 10 by 10 multiplication table.

 * Find the trace of the matrix
 * Extract the anto-diagonal (this should be ```array([10, 18, 24, 28, 30, 30, 28, 24, 18, 10])```)
 * Extract the diagnoal offset by 1 upwards (this should be ```array([ 2,  6, 12, 20, 30, 42, 56, 72, 90])```)

In [None]:
ns = np.arange(1, 11)
mult = ns[:,None]*ns[None,:]
print(mult)
print(mult.trace())
print(np.flipud(mult).diagonal())
print(mult.diagonal(offset=1))

6\. Use broadcasting to create a grid of distances

Route 66 crosses the following cities in the US: Chicago, Springfield, Saint-Louis, Tulsa, Oklahoma City, Amarillo, Santa Fe, Albuquerque, Flagstaff, Los Angeles
The corresponding positions in miles are: 0, 198, 303, 736, 871, 1175, 1475, 1544, 1913, 2448

  * Construct a 2D grid of distances among each city along Route 66
  * Convert that in km (those savages...)

In [None]:
cities = ["Chicago", "Springfield", "Saint-Louis", "Tulsa", "Oklahoma City", "Amarillo",
          "Santa Fe", "Albuquerque", "Flagstaff", "Los Angeles"]
city = np.array([0, 198, 303, 736, 871, 1175, 1475, 1544, 1913, 2448])
dist = np.abs(city[:, None] - city[None,:])

to_km = 1.60934
dist_km = dist*to_km
print("The distance between:\n")
for i in range(10):
    for k in range(i, 10):
        if i != k:
            print("{} and {} is {}km".format(cities[i], cities[k], round(dist_km[i][k],2))) #[i][k]=[k][i]

7\. Prime numbers sieve: compute the prime numbers in the 0-N (N=99 to start with) range with a sieve (mask).
  * Constract a shape (100,) boolean array, the mask
  * Identify the multiples of each number starting from 2 and set accordingly the corresponding mask element
  * Apply the mask to obtain an array of ordered prime numbers
  * Check the performances (timeit); how does it scale with N?
  * Implement the optimization suggested in the [sieve of Eratosthenes](https://en.wikipedia.org/wiki/Sieve_of_Eratosthenes)

In [None]:
def prime(N):
    def is_prime(n):
        if n % 2 == 0 and n > 2: 
            return False
        return all(n % i for i in range(3, int(np.sqrt(n)) + 1, 2))
    p = np.arange(2, N)
    tf = np.vectorize(is_prime)
    pbools = tf(p)
    primes = np.extract(pbools, p)
    return primes

print(list(prime(100)))
%timeit prime(100)
%timeit prime(1000)
%timeit prime(10000)
%timeit prime(100000)
%timeit prime(1000000) #performance scales roughly exponentially with each order of magnitude 

In [None]:
#Sieve of Eratosthenes
def sieve(n):
    ps, sieve = [], [True]*(n + 1)
    for p in range(2, n+1):
        if sieve[p]:
            ps.append(p)
            for i in range(p * p, n + 1, p):
                sieve[i] = False
    return ps
print(sieve(100))

8\. Diffusion using random walk

Consider a simple random walk process: at each step in time, a walker jumps right or left (+1 or -1) with equal probability. The goal is to find the typical distance from the origin of a random walker after a given amount of time. 
To do that, let's simulate many walkers and create a 2D array with each walker as a raw and the actual time evolution as columns

  * Take 1000 walkers and let them walk for 200 steps
  * Use randint to create a 2D array of size walkers x steps with values -1 or 1
  * Build the actual walking distances for each walker (i.e. another 2D array "summing on each raw")
  * Take the square of that 2D array (elementwise)
  * Compute the mean of the squared distances at each step (i.e. the mean along the columns)
  * Plot the average distances (sqrt(distance\*\*2)) as a function of time (step)
  
Did you get what you expected?

In [None]:
import numpy.random as npr
import matplotlib.pyplot as plt
steps, num_walkers = 200, 1000
time_steps = np.arange(steps)
walkers = np.array([2*npr.randint(0,2,size=steps*num_walkers)-1]).reshape(steps, num_walkers)
path = np.empty([steps, num_walkers], "int")

for i in range(walkers.shape[0]):
    path[i] = np.array([np.sum(walkers[:i+1,],axis=0)])
avg_dist = np.sqrt((path**2).mean(axis=1))

plt.figure(figsize=(10,5))
plt.scatter(time_steps, avg_dist)
plt.xlabel("Time steps", fontsize=15)
plt.ylabel("Average distance", fontsize=15)
plt.grid(True)
plt.show()

#as expected sqrt(avg_distance^2)=sqrt(time_steps)

9\. Analyze a data file 
  * Download the population of hares, lynxes and carrots at the beginning of the last century.
    ```python
    ! wget https://www.dropbox.com/s/3vigxoqayo389uc/populations.txt
    ```

  * Check the content by looking within the file
  * Load the data (use an appropriate numpy method) into a 2D array
  * Create arrays out of the columns, the arrays being (in order): *year*, *hares*, *lynxes*, *carrots* 
  * Plot the 3 populations over the years
  * Compute the main statistical properties of the dataset (mean, std, correlations, etc.)
  * Which species has the highest population each year?

Do you feel there is some evident correlation here? [Studies](https://www.enr.gov.nt.ca/en/services/lynx/lynx-snowshoe-hare-cycle) tend to believe so.

In [None]:
! wget https://www.dropbox.com/s/3vigxoqayo389uc/populations.txt

In [None]:
f = open("populations.txt", "r")
print(f.read())

In [None]:
import matplotlib.pyplot as plt
data = np.loadtxt("populations.txt")
year, hares, lynxes, carrots = data[:,0], data[:,1], data[:,2], data[:,3]
high, low = [], []

plt.figure(figsize=(12,6))
plt.scatter(year, carrots, label="Carrots")
plt.scatter(year, hares, label="Hares")
plt.scatter(year, lynxes, label="Lynxes")
carrots_mean = plt.hlines(carrots.mean(), year[0], year[20], color="navy", label="Carrots mean")
hares_mean = plt.hlines(hares.mean(), year[0], year[20], color="darkred", label="Hares mean")
lynxes_mean = plt.hlines(lynxes.mean(), year[0], year[20], color="gold", label="Lynxes mean")
plt.xlabel("Year", fontsize=15)
plt.legend(fontsize=12)
plt.grid(True)
plt.show()

sigma = np.array([hares.std(), lynxes.std(), carrots.std()])
HL = np.corrcoef(hares, lynxes)
LC = np.corrcoef(lynxes, carrots) #lynxes and carrots are correlated (corr ~ -0.68)
HC = np.corrcoef(hares, carrots)

for x in range(21):
    pop_max = np.max(data[x,:])
    if (pop_max in hares): high.append("H")
    elif (pop_max in lynxes): high.append("L")
    else: high.append("C")
for x in range(21):
    pop_min = np.min(data[x,1:])
    if (pop_min in hares): low.append("H")
    elif (pop_min in lynxes): low.append("L")
    else: low.append("C")
print("Highest population:\n", high)
print("Lowest population:\n", low)
#as shown, almost always, to a lower lynxes population number correspond peaks of carrots