# More Numpy Practice

## Group Names and Roles

- Partner 1 (Role)
- Partner 2 (Role)
- Partner 3 (Role)


Go ahead and run the code block below to load in the packages we'll need for today: 

In [None]:
# run this block
import numpy as np
from matplotlib import pyplot as plt

## Exercise 1

The function `np.percentile` computes a specified percentile of an array. For example, `np.percentile(a, 80)` will return the 80th percentile of `a`. 

```python
a = np.random.rand(1000)
np.percentile(a, 80)
```
```
0.8066467817256507
```

### Part A. 

Write a function called `stats_summary` that prints out a readable summary of the data: 

```python
a = np.random.randn(1000)
stats_summary(a)
```
```
The 20th percentile is -0.85
The mean is -0.03
The median is -0.04
The 80th percentile is 0.85
The standard deviation is 1.01
```

Recall that the median is the 50th percentile. 

**Hint:** `np.round(x, 2)`. 

In [None]:
# your solution


In [None]:
# try it out
a = np.random.randn(1000) # 1000 normal variables
stats_summary(a)

### Part B.

Modify your function so that it can compute the same summaries when the input `a` contains `np.nan` values. This should require only minimal modifications. 

***Note***: *You will receive a *warning* when you run the block that constructs the new data `b`. `numpy` is warning you that `nan` values have been produced. This happens because `a` includes some negative values, and taking the logarithm of a negative value produces `nan` in `numpy`. You should read this warning and then move on -- everything is going as it should.*  

In [None]:
# modified function here


In [None]:
# new test data: will generate a RuntimeWarning
b = np.log(a)

In [None]:
# test new version of stats_summary here
# should obtain numbers, no NaN values. 
stats_summary(b)

## Exercise 2

The famous [Basel problem](https://en.wikipedia.org/wiki/Basel_problem) asks for a closed form of the infinite series 

$$\sum_{n = 1}^{\infty} \frac{1}{n^2} = \frac{1}{1^2} + \frac{1}{2^2} + \frac{1}{3^2} + \cdots$$

In 1735, Leonhard Euler, one of the all-time mathematical greats, showed that this sum is exactly equal to $\pi^2/6$, although it wasn't until a few years later that his result would be made fully rigorous. 

Define a variable `n_max = 50`. Then, create the following two 1-dimensional `numpy` arrays, each of length `n_max`: 

- `asymptote`: each entry is equal to $\pi^2/6$. **Hint**: `np.ones()`, `np.pi`.  
- `partial_sums`: entry `k` of `partial sums` should have value equal to 
$$\sum_{n = 1}^{k+1}\frac{1}{n^2} = \frac{1}{1^2} + \frac{1}{2^2} + \cdots + \frac{1}{(k+1)^2}$$
- **Hint:** `np.cumsum()`. 

Create each of these arrays **in a single line of code each, of less than 80 characters.**

In [None]:
                                                                                # 80 characters

In [None]:
# your solution


Once you've created your arrays, run the code block below to visualize the convergence of the partial sums to the theoretical asymptote. 

<figure class="image" style="width:50%">
  <img src="https://raw.githubusercontent.com/PhilChodrow/PIC16A/master/discussion/numpy_practice_II-example-1.png" alt="">
  <figcaption><i>Expected output.</i></figcaption>
</figure>

In [None]:
# run this once you've created the two needed arrays
plt.plot(asymptote, label = r"$\pi^2/6$")
plt.scatter(np.arange(1, n_max+1), 
            partial_sums, 
            label = r"$\sum_{n = 1}^k \frac{1}{n^2}$", 
            edgecolors = "black",
            facecolors = 'none')
plt.gca().set(xlabel = "k")
plt.legend()

If you've made it this far, great! If there are fewer than 10 minutes remaining in Discussion, feel free to submit your assignment. Otherwise, continue on to Exercise 3. 

---

## Exercise 3

The *law of large numbers* asserts that, if $X_1,\ldots,X_k$ is a sequence of independent and identically distributed random variables with mean $\mu$, then, if $k$ is sufficiently large, 
$$\frac{1}{k}\sum_{n = 1}^kX_n = \frac{1}{k}\left(X_1 + X_2 + \cdots + X_k\right) \approx \mu\;.$$

Here is some random data:

In [None]:
# run this block

k = 1000
a = np.random.randn(k) + np.random.randn()

Create a visualization, similar to the one in Exercise 2, that illustrates the law of large numbers for this data set. You'll need to first compute the required arrays. It's not necessary to worry about making attractive labels for the legend, but if you'd like to, you can use the raw strings `r"$\frac{1}{k}\sum_{n = 1}^k X_n$"` and `r"$\mu$"` for the series labels. 


Save the mean in an **array called `m`**, which is simply `k` copies of the mean of the data. Then, compute the progressive partial means in an array **called `means`**. You'll want to check that both of these arrays have length `k`. 

<figure class="image" style="width:50%">
  <img src="https://raw.githubusercontent.com/PhilChodrow/PIC16A/master/discussion/numpy_practice_II-example-2.png" alt="">
  <figcaption><i>An example of what your output might look like. Yours will be slightly different, since we are working with random numbers.</i></figcaption>
</figure>

In [None]:
# create the arrays here


In [None]:
# run this once you've created the two needed arrays
plt.plot(m, label = r"$\mu$")
plt.scatter(np.arange(1, k+1), 
            means, 
            label = r"$\frac{1}{k}\sum_{n = 1}^k X_n$", 
            edgecolors = "black",
            facecolors = 'none')
plt.gca().set(xlabel = "k")
plt.legend()