# Exercises

In [1]:
import numpy as np
import os
import time
import urllib

## Q1: Fun with arrays

**A.**  Create the array: 
```
[[1,  6, 11],
 [2,  7, 12],
 [3,  8, 13],
 [4,  9, 14],
 [5, 10, 15]]
```
with out explicitly typing it in.

In [2]:
array_1 = np.arange(1,16).reshape(3,5).T

array_1

array([[ 1,  6, 11],
       [ 2,  7, 12],
       [ 3,  8, 13],
       [ 4,  9, 14],
       [ 5, 10, 15]])

Now create a new array containing only its 2nd and 4th rows.

In [3]:
array_2 = array_1[[1, 3]]

array_2

array([[ 2,  7, 12],
       [ 4,  9, 14]])

**B.** Create a 2d array with `1` on the border and `0` on the inside, e.g., like:
```
1 1 1 1 1
1 0 0 0 1
1 0 0 0 1
1 1 1 1 1
```

Do this using array slice notation to let it work for an arbitrary-sized array

In [4]:
array_3 = np.ones(20).reshape(5,4)
array_3[1:-1, 1:-1] = 0

array_3

array([[1., 1., 1., 1.],
       [1., 0., 0., 1.],
       [1., 0., 0., 1.],
       [1., 0., 0., 1.],
       [1., 1., 1., 1.]])

## Q2: Histrograms

Here we will read in columns of numbers from a file and create a histogram, using NumPy routines.  Make sure you have the data file
"`sample.txt`" in the same directory as this notebook. You download it from  https://raw.githubusercontent.com/sbu-python-summer/python-tutorial/master/day-3/sample.txt (and use python to download a file!)

  * Use `np.loadtxt()` to read this file in.  

  * Next, use `np.histogram()` to create a histogram array.  The output returns both the count and an array of edges.
  
  * Finally, loop over the bins and print out the bin center (averaging the left and right edges of the bin) and the count for that bin.

In [6]:
filename = "misc/sample.txt"
url = "https://raw.githubusercontent.com/sbu-python-summer/python-tutorial/master/day-3/sample.txt"

if not os.path.exists(filename):
    urllib.request.urlretrieve(url, filename)

data = np.loadtxt(filename)
counts, bin_edges = np.histogram(data, bins=10)

for i in range(len(counts)):
    bin_center = (bin_edges[i] + bin_edges[i + 1]) / 2
    print(f"Bin center: {bin_center}, Counts: {counts[i]}")

Bin center: -24.109006493430737, Counts: 6
Bin center: -11.150163704648554, Counts: 23
Bin center: 1.8086790841336278, Counts: 52
Bin center: 14.767521872915811, Counts: 37
Bin center: 27.726364661697996, Counts: 16
Bin center: 40.68520745048018, Counts: 14
Bin center: 53.64405023926236, Counts: 13
Bin center: 66.60289302804455, Counts: 13
Bin center: 79.56173581682673, Counts: 13
Bin center: 92.5205786056089, Counts: 13


## Q3: Are you faster than numpy?

Numpy of course has a standard deviation function, `np.std()`, but here we'll write our own that works on a 1-d array (vector).  The standard
deviation is a measure of the "width" of the distribution of numbers
in the vector.

Given an array, $a$, and an average $\bar{a}$, the standard deviation
is:
$$
\sigma = \left [ \frac{1}{N} \sum_{i=1}^N (a_i - \bar{a})^2 \right ]^{1/2}
$$

Write a function to calculate the standard deviation for an input array, `a`:

  * First compute the average of the elements in `a` to define $\bar{a}$
  * Next compute the sum over the squares of $a - \bar{a}$
  * Then divide the sum by the number of elements in the array
  * Finally take the square root (you can use `np.sqrt()`)
  
Test your function on a random array, and compare to the built-in `np.std()`. Check the runtime as well.

In [5]:
import time

def new_std_dev(vec):
    mu_vec = np.mean(vec)
    sum_squared_diff_vec = np.sum((vec - mu_vec) ** 2)
    variance_vec = sum_squared_diff_vec / len(vec)
    std_dev_vec = np.sqrt(variance_vec)
    
    return std_dev_vec

np.random.seed(0)
a = np.random.rand(50)

# Custom standard deviation function
start_time = time.time()
std_dev = new_std_dev(a)
delta_time = time.time() - start_time

print(f"Custom std dev: {std_dev}")
print(f"Custom function: {delta_time} sec")

# Numpy standard deviation function
start_time = time.time()
std_dev = np.std(a)
delta_time = time.time() - start_time

print(f"Numpy std dev: {std_dev}")
print(f"Numpy function: {delta_time} sec")

Custom std dev: 0.27226582292177587
Custom function: 0.0001537799835205078 sec
Numpy std dev: 0.27226582292177587
Numpy function: 0.00013113021850585938 sec


## Q5: Einstein summation

einsum is a powerful (but often painful) numpy thing:
- https://numpy.org/doc/stable/reference/generated/numpy.einsum.html
- https://stackoverflow.com/questions/26089893/understanding-numpys-einsum

Take 2 vectors A and B. Write the einsum equivalent of inner, outer, sum, and mul function.

In [6]:
a = np.array([1, 2, 3])
b = np.array([4, 5, 6])

inner_product = np.einsum("i,i->", a, b)
outer_product = np.einsum("i,j->ij", a, b)
sum_a = np.einsum("i->", a)
sum_b = np.einsum("i->", b)
mul = np.einsum("i,i->i", a, b)

print(f"Inner product: {inner_product}")
print(f"Outer product:\n {outer_product}")
print(f"Sum of elements in A: {sum_a}")
print(f"Sum of elements in B: {sum_b}")
print(f"Multiplication: {mul}")

Inner product: 32
Outer product:
 [[ 4  5  6]
 [ 8 10 12]
 [12 15 18]]
Sum of elements in A: 6
Sum of elements in B: 15
Multiplication: [ 4 10 18]
