In [None]:
# Initialize Otter
import otter
grader = otter.Notebook("JHW4.ipynb")

<h2> Jupyter Homework Week 4 --- Math 032, Summer 2024 </h2>

<img src="images/empirical-rule.png" style="float: right; width: 30%">



<h2> Simulating means and variances </h2>

<h4> Part 1: Introduction </h4>

In this notebook, we'll study the normal distribution and learn how to sample from it. We will empirically verify the [68-95-99.7 rule](https://en.wikipedia.org/wiki/68–95–99.7_rule), which states that, if we sample a large number of times from a [normal distribution](https://en.wikipedia.org/wiki/Normal_distribution), we should expect about $68$% of samples to be within *one* standard deviation of the mean, $95$% to be within *two* standard deviations, and $99.7$% to be within *three* standard deviations. Then, we will sample from a standard normal distribution (mean $0$ and variance $1$) to estimate the means and variances of two other distributions that are derived from the normal distribution.

<h4> Part 2: Generating data and Calculating Quantiles </h4>

To begin, let's use scipy's stats package to find the percentiles (or quantiles) of the standard normal distribution and compare it to the table in our textbook (Table B.1 on page 432).

In [None]:
# import the requisite package
import numpy as np
from scipy.stats import norm

Here is an example of how to use `norm` to compute quantiles:

In [None]:
# Define the mean and standard deviation
mean = 0
std_dev = 1

# Create a normal distribution object
normal_dist = norm(loc=mean, scale=std_dev)

# Calculate quantiles
quantile_95 = normal_dist.ppf(0.95)
quantile_80 = normal_dist.ppf(0.80)

print("95th percentile (quantile):", quantile_95)
print("80th percentile (quantile):", quantile_80)

This means that, on the graph above, $95$% of the area is to the left of $x=1.64485$, and $80$% is to the left of $0.8416$. We can also generate samples using the following syntax:

In [None]:
sample_size = 10000
samples = normal_dist.rvs(size=sample_size)
print("The first five samples are ",samples[:5])

Let's continue: we can compute means and variances by directly computing averages.

In [None]:
print("The average of our samples is",samples.mean())
print("The variance of our samples is",samples.var())

Notice how close these are to the true values $\mu = 0$ and $\sigma^2 = 1$!

Let's suppose $Z$ is a *standard normal* random variable, so that $Z$ has pdf

$$f(x) = \frac{1}{\sqrt{2 \pi}} e^{-x^2/2}, \ \ \ \ -\infty <x < \infty .$$
If we want to calculate exact values of the cumulative distribution function (cdf), we can use `norm.cdf`.  Here are two ways to do this. Note that we have *already defined* `normal_dist` above.

In [None]:
# Calculate the probability P(Z <= 1).
print(normal_dist.cdf(1))
# Calculate the probability P(-4 <= Z <= 2).
prob = normal_dist.cdf(2) - normal_dist.cdf(-4)
print(prob)

### Part 3: Questions

#### Question 1:  

Estimate a threshold $\alpha$ for which $P(Z \le \alpha) = 0.75$ using `normal_dist.ppf`.

<!-- BEGIN QUESTION -->



In [None]:
# Put your answer to question 1 here

alpha = ...
print(alpha)

In [None]:
grader.check("q1")

<!-- END QUESTION -->

#### Question 2:

A standard rule of thumb described above is the [68-95-99.7](https://en.wikipedia.org/wiki/68–95–99.7_rule) rule, which refers to the probabilities of $P(-1 \le Z \le 1)$, $P(-2 \le Z \le 2)$, and $P(-3 \le Z \le 3)$ for a random variable $Z \sim N(0, 1)$. 

Calculate these probabilities using the function `norm.cdf`. You can see an example above. Your answer should be *very close* to the true values, because you are not doing any sampling.

<!-- BEGIN QUESTION -->



In [None]:
# Put your answer to question 2 here

prob_within_one = ...
prob_within_two = ...
prob_within_three = ...

print(prob_within_one,prob_within_two,prob_within_three)

In [None]:
grader.check("q2")

<!-- END QUESTION -->

#### Question 3:

Many important distributions arise by *applying a function* to normally distributed data. Two of these are the  [chi-squared distribution](https://en.wikipedia.org/wiki/Chi-squared_distribution), which is of the form $X_1^2+X_2^2+...+X_n^2$, where the $X_i$ are normally distributed, and the [folded normal distribution](https://en.wikipedia.org/wiki/Folded_normal_distribution), which is the absolute value $|Z|$ of a standard normal random variable.

Suppose $Z$ is standard normal and let $X = Z^2$, $Y=|Z|$. **Estimate the mean and variance** of $X$ and $Y$ *by sampling* from a standard normal distribution. You can use `np.random.normal()` to create samples of $Z$.

<!-- BEGIN QUESTION -->



In [None]:
# Put your answer to question 3 here

def estimate_mean_and_variance_X():
    # sample chi_squared and return the sample mean and variance
    # Take a large number of samples (e.g. 100000)
    num_samples = ...
    # Sample From a Standard Normal Distribution
    Z = ...
    # Create samples of chi-square from normal_samples
    X = ...
    return X.mean(),X.var()

def estimate_mean_and_variance_Y():
    # sample from folded normal and return the sample mean and variance
    # Take a large number of samples (e.g. 100000)
    num_samples = ...
    # Sample From a Standard Normal Distribution
    Z = ...
    # Create samples of folded normal
    Y = ...
    return Y.mean(),Y.var()


print("The chi squared distribution has mean and variance: ",estimate_mean_and_variance_X())
print("The folded normal distribution has mean and variance: ",estimate_mean_and_variance_Y())

In [None]:
grader.check("q3")

<!-- END QUESTION -->



## Submission

Make sure you have run all cells in your notebook in order before running the cell below, so that all images/graphs appear in the output. The cell below will generate a zip file for you to submit. **Please save before exporting!**

In [None]:
# Save your notebook first, then run this cell to export your submission.
grader.export(run_tests=True)