### Numpy basics

In [1]:
import numpy as np

1\. Find the row, column and overall means for the following matrix:

```python
m = np.arange(12).reshape((3,4))
```

In [2]:
m = np.arange(12).reshape((3,4))
print("Matrix m:\n", m)

print("\nRows mean: ", np.mean(m,1))
print("Cols mean: ", np.mean(m,0))
print("Overall mean: ", np.mean(m))

Matrix m:
 [[ 0  1  2  3]
 [ 4  5  6  7]
 [ 8  9 10 11]]

Rows mean:  [1.5 5.5 9.5]
Cols mean:  [4. 5. 6. 7.]
Overall mean:  5.5


2\. Find the outer product of the following two vecotrs

```python
u = np.array([1,3,5,7])
v = np.array([2,4,6,8])
```

Do this in the following ways:

   * Using the function outer in numpy
   * Using a nested for loop or list comprehension
   * Using numpy broadcasting operatoins


In [3]:
u = np.array([1,3,5,7])
v = np.array([2,4,6,8])


# Numpy outer function
print("Numpy outer function:\n", np.outer(u,v))

# List comprehension
result = np.array([(a*b) for a in u for b in v]).reshape((4,4))
print("\nList comprehension:\n", result)

# Numpy broadcasting operatoins
result = np.tile( u, ( np.size(u),1 ) ).T
print("\nNumpy broadcasting operatoins:\n", result*v)

Numpy outer function:
 [[ 2  4  6  8]
 [ 6 12 18 24]
 [10 20 30 40]
 [14 28 42 56]]

List comprehension:
 [[ 2  4  6  8]
 [ 6 12 18 24]
 [10 20 30 40]
 [14 28 42 56]]

Numpy broadcasting operatoins:
 [[ 2  4  6  8]
 [ 6 12 18 24]
 [10 20 30 40]
 [14 28 42 56]]


3\. Create a 10 by 6 matrix of random uniform numbers. Set all rows with any entry less than 0.1 to be zero

Hint: Use the following numpy functions - np.random.random, np.any as well as Boolean indexing and the axis argument.

In [4]:
import numpy.random as npr
npr.seed(1204533)

rand_matrix = npr.rand(10,6)
mask = ( rand_matrix < 0.1 )
anyresult = np.any(mask,1)
for i in range( len(anyresult) ):
    if anyresult[i]==True:
        rand_matrix[i,:] = 0
print(rand_matrix)

[[0.         0.         0.         0.         0.         0.        ]
 [0.         0.         0.         0.         0.         0.        ]
 [0.45336231 0.41291907 0.48506972 0.68604465 0.31249518 0.43869008]
 [0.         0.         0.         0.         0.         0.        ]
 [0.63934031 0.46643865 0.78613795 0.2818773  0.50940951 0.10686072]
 [0.59507219 0.40197898 0.54533185 0.72474485 0.45924215 0.67561666]
 [0.         0.         0.         0.         0.         0.        ]
 [0.64030938 0.61135255 0.13236534 0.95456001 0.5360818  0.99285951]
 [0.         0.         0.         0.         0.         0.        ]
 [0.31079899 0.17123684 0.57528515 0.35887859 0.24951071 0.13629365]]


4\. Use np.linspace to create an array of 100 numbers between 0 and 2π (includsive).

  * Extract every 10th element using slice notation
  * Reverse the array using slice notation
  * Extract elements where the absolute difference between the sine and cosine functions evaluated at that element is less than 0.1
  * Make a plot showing the sin and cos functions and indicate where they are close

In [5]:
import math
import matplotlib.pyplot as plt

x = np.linspace( 0, 2*math.pi, 100 )
print('array:\n', x)
extra1 = x[::10]
print('\nextracted from initial:\n', extra1)
rev = x[::-1]
print('\nReversed:\n', rev)

extra2 = x[ np.where( abs( np.sin(x) - np.cos(x) )<0.1 ) ]
print('\nElements where the absolute difference between the sine and cosine functions evaluated at that element is less than 0.1\n',extra2)

y1 = np.sin(x) 
y2 = np.cos(x) 
plt.plot(x, y1,'k-') 
plt.plot(x, y2,'g-')

idx = np.argwhere(abs(y1-y2)<0.1).flatten()
plt.plot(x[idx], y1[idx], 'b+')
plt.plot(x[idx], y2[idx], 'r_')
plt.show()

array:
 [0.         0.06346652 0.12693304 0.19039955 0.25386607 0.31733259
 0.38079911 0.44426563 0.50773215 0.57119866 0.63466518 0.6981317
 0.76159822 0.82506474 0.88853126 0.95199777 1.01546429 1.07893081
 1.14239733 1.20586385 1.26933037 1.33279688 1.3962634  1.45972992
 1.52319644 1.58666296 1.65012947 1.71359599 1.77706251 1.84052903
 1.90399555 1.96746207 2.03092858 2.0943951  2.15786162 2.22132814
 2.28479466 2.34826118 2.41172769 2.47519421 2.53866073 2.60212725
 2.66559377 2.72906028 2.7925268  2.85599332 2.91945984 2.98292636
 3.04639288 3.10985939 3.17332591 3.23679243 3.30025895 3.36372547
 3.42719199 3.4906585  3.55412502 3.61759154 3.68105806 3.74452458
 3.8079911  3.87145761 3.93492413 3.99839065 4.06185717 4.12532369
 4.1887902  4.25225672 4.31572324 4.37918976 4.44265628 4.5061228
 4.56958931 4.63305583 4.69652235 4.75998887 4.82345539 4.88692191
 4.95038842 5.01385494 5.07732146 5.14078798 5.2042545  5.26772102
 5.33118753 5.39465405 5.45812057 5.52158709 5.58505361 

<Figure size 640x480 with 1 Axes>

5\. Create a matrix that shows the 10 by 10 multiplication table.

 * Find the trace of the matrix
 * Extract the anto-diagonal (this should be ```array([10, 18, 24, 28, 30, 30, 28, 24, 18, 10])```)
 * Extract the diagnoal offset by 1 upwards (this should be ```array([ 2,  6, 12, 20, 30, 42, 56, 72, 90])```)

In [6]:
matrix = np.array([i*j for i in range(1,11) for j in range(1,11)]).reshape(10,10)

trace = matrix.trace()
print("Trace:", trace)

anti_diag = np.diag(matrix[:,::-1])
print("\nAnti diagonal", anti_diag)

diag_off = np.diag(np.roll(matrix, -1, axis=0))
print("\nDiagonal with offset:", diag_off)

Trace: 385

Anti diagonal [10 18 24 28 30 30 28 24 18 10]

Diagonal with offset: [ 2  6 12 20 30 42 56 72 90 10]


6\. Use broadcasting to create a grid of distances

Route 66 crosses the following cities in the US: Chicago, Springfield, Saint-Louis, Tulsa, Oklahoma City, Amarillo, Santa Fe, Albuquerque, Flagstaff, Los Angeles
The corresponding positions in miles are: 0, 198, 303, 736, 871, 1175, 1475, 1544, 1913, 2448

  * Construct a 2D grid of distances among each city along Route 66
  * Convert that in km (those savages...)

In [7]:
pos = np.array([0, 198, 303, 736, 871, 1175, 1475, 1544, 1913, 2448])
grid_2d = pos - pos.reshape(10,1)
c_miles2km = 1.60934
print("Grid:\n", grid_2d)
print("\nMiles to km:\n", grid_2d*c_miles2km)

Grid:
 [[    0   198   303   736   871  1175  1475  1544  1913  2448]
 [ -198     0   105   538   673   977  1277  1346  1715  2250]
 [ -303  -105     0   433   568   872  1172  1241  1610  2145]
 [ -736  -538  -433     0   135   439   739   808  1177  1712]
 [ -871  -673  -568  -135     0   304   604   673  1042  1577]
 [-1175  -977  -872  -439  -304     0   300   369   738  1273]
 [-1475 -1277 -1172  -739  -604  -300     0    69   438   973]
 [-1544 -1346 -1241  -808  -673  -369   -69     0   369   904]
 [-1913 -1715 -1610 -1177 -1042  -738  -438  -369     0   535]
 [-2448 -2250 -2145 -1712 -1577 -1273  -973  -904  -535     0]]

Miles to km:
 [[    0.        318.64932   487.63002  1184.47424  1401.73514  1890.9745
   2373.7765   2484.82096  3078.66742  3939.66432]
 [ -318.64932     0.        168.9807    865.82492  1083.08582  1572.32518
   2055.12718  2166.17164  2760.0181   3621.015  ]
 [ -487.63002  -168.9807      0.        696.84422   914.10512  1403.34448
   1886.14648  1997.1909

7\. Prime numbers sieve: compute the prime numbers in the 0-N (N=99 to start with) range with a sieve (mask).
  * Constract a shape (100,) boolean array, the mask
  * Identify the multiples of each number starting from 2 and set accordingly the corresponding mask element
  * Apply the mask to obtain an array of ordered prime numbers
  * Check the performances (timeit); how does it scale with N?
  * Implement the optimization suggested in the [sieve of Eratosthenes](https://en.wikipedia.org/wiki/Sieve_of_Eratosthenes)

In [16]:
N = 100
def prime(N):
    mat = np.ones((N,), dtype=bool)
    num = np.array([n for n in range(1,N+1)])
    for i in range( 2, int(N/2) ):
        condition = ( num > i ) & ( num%i==0 )
        mat[ condition ] = False

def sieve_Er(N):
    mat = np.ones((N,), dtype=bool)
    num = np.array([n for n in range(1,N+1)])
    for i in range( 2, math.ceil(N**0.5) - 1 ):
        if(mat[i]):
            for j in range(i**2,N-1):
                mat[j] = False
                
print(prime(100))

[ True  True  True False  True False  True False False False  True False
  True False False False  True False  True False False False  True False
 False False False False  True False  True False False False False False
  True False False False  True False  True False False False  True False
 False False False False  True False False False False False  True False
  True False False False False False  True False False False  True False
  True False False False False False  True False False False  True False
 False False False False  True False False False False False False False
  True False False False]
None


8\. Diffusion using random walk

Consider a simple random walk process: at each step in time, a walker jumps right or left (+1 or -1) with equal probability. The goal is to find the typical distance from the origin of a random walker after a given amount of time. 
To do that, let's simulate many walkers and create a 2D array with each walker as a raw and the actual time evolution as columns

  * Take 1000 walkers and let them walk for 200 steps
  * Use randint to create a 2D array of size walkers x steps with values -1 or 1
  * Build the actual walking distances for each walker (i.e. another 2D array "summing on each raw")
  * Take the square of that 2D array (elementwise)
  * Compute the mean of the squared distances at each step (i.e. the mean along the columns)
  * Plot the average distances (sqrt(distance\*\*2)) as a function of time (step)
  
Did you get what you expected?

9\. Analyze a data file 
  * Download the population of hares, lynxes and carrots at the beginning of the last century.
    ```python
    ! wget https://www.dropbox.com/s/3vigxoqayo389uc/populations.txt
    ```

  * Check the content by looking within the file
  * Load the data (use an appropriate numpy method) into a 2D array
  * Create arrays out of the columns, the arrays being (in order): *year*, *hares*, *lynxes*, *carrots* 
  * Plot the 3 populations over the years
  * Compute the main statistical properties of the dataset (mean, std, correlations, etc.)
  * Which species has the highest population each year?

Do you feel there is some evident correlation here? [Studies](https://www.enr.gov.nt.ca/en/services/lynx/lynx-snowshoe-hare-cycle) tend to believe so.