In [None]:
# SETUP

import numpy as np
from datascience import *
from prob140 import *

# These lines do some fancy plotting magic
import matplotlib
%matplotlib inline
import matplotlib.pyplot as plt
plt.style.use('fivethirtyeight')

# These lines make warnings look nicer
import warnings
warnings.simplefilter('ignore', FutureWarning)

# To save you a lot of time
from scipy import stats
from client.api.assignment import load_assignment
autograder = load_assignment('main.ok')

In [2]:
from sympy import *
init_printing()

# Homework 9 #
In several problems you are given a Markdown cell and a code cell. Use the Markdown cell to show your math and the code cell to get a decimal answer or a plot. You don't have to use the code cell if you can see what the decimal answer must be without having to evaluate a Python expression.

A few reminders:
- Before submitting to gradescope, check that your code is fully visible in the generated pdf and does not go off the page.
- Please do not delete cells or text in cells that contain questions as this could cause errors when you submit.

&zwnj;

#QUESTION

### 1. Working with Densities ###
For some positive constant $c$, let $X$ have density given by

$$
f(x) = 
\begin{cases}
0 ~~~~~~~~ \text{if } x < 1 \\
cx^{-5} ~~~ \text{if } x \ge 1 
\end{cases}
$$

&zwnj;

### a) ###
Find $c$.

*Provide your answer and reasoning in this Markdown cell.*

&zwnj;

### b) ###
Find $P(X > 2)$.

*Provide your answer and reasoning in this Markdown cell.*

&zwnj;

### c) ###
Find the cdf of $X$ and plot it over the interval $(-1, 6)$.

*Provide your answer and reasoning in this Markdown cell.*

In [None]:

def cdf(x):
    ...
    
x = np.arange(..., ..., 0.01)
plt.plot(x, cdf(x), color='darkblue', lw=2)
plt.xlabel('$x$')
plt.ylabel('$F(x)$');

&zwnj;

### d) ###
Find $E(X)$.

*Provide your answer and reasoning in this Markdown cell.*

&zwnj;

### e) ###
Find $SD(X)$.

*Provide your answer and reasoning in this Markdown cell.*

&zwnj;

#QUESTION

### 2. What's Normal? ###
In any part of this question that involves a sample size, you can assume the sample size is big enough for the Central Limit Theorem approximation to be good. But pay attention to what is being approximated by the CLT.

&zwnj;

### a) ###
In a simple random sample of 1000 faculty taken among all universities in a country, the number of papers published by the sampled faculty in the past year had a mean of 1.1 and an SD of 1.8. Does the Central Limit Theorem say that the distribution of the number of papers published by the sampled faculty in the past year is roughly normal? If not, what do you think is the shape of that distribution? Explain based on the information given in the problem.

*Provide your answer and reasoning in this Markdown cell.*

&zwnj;

### b) ###
Continuing part (a), construct an approximate 90% confidence interval for the mean number of papers published by faculty at all universities in the country in the past year. If this is not possible, explain why not.

*Provide your answer and reasoning in this Markdown cell.*

&zwnj;

#QUESTION

### 3. Widths of Confidence Intervals ###
In any part of this question that involves a sample size, you can assume the sample size is big enough for the Central Limit Theorem approximation to be good.

&zwnj;

### a) ###
A survey organization has used the methods of our class to construct an approximate 95% confidence interval for the mean annual income of households in a county. The interval runs from \$66,000 to \$70,000. If possible, find an approximate 99% confidence interval for the mean annual income of households in the county. If this is not possible, explain why not.

*Provide your answer and reasoning in this Markdown cell.*

&zwnj;

### b) ### 
Draw the graph of the function $p(1-p)$ for $0 \le p \le 1$. Label the horizontal axis $p$ (use the code in 1c above as a guide). Find the location and value of the maximum. You don't have to derive the maximum analytically, though it's easy.

In [None]:

p = np.arange(0, 1.01, 0.01)
...

*Provide your answer and reasoning in this Markdown cell.*

&zwnj;

### c) ###
A survey organization is going to take a simple random sample of $n$ voters from among all the voters in a state, to construct a 99% confidence interval for the proportion of voters who favor a proposition. Find an $n$ such that the total width of the confidence interval (left end to right end) will be no more than 0.06.

*Provide your answer and reasoning in this Markdown cell.*

&zwnj;

#QUESTION

### 4. A Mixture ###
This is a problem from Pitman's text. Keep in mind that because you know the survival function, mean, and variance of the exponential distribution, you can write down (in terms of the constant $c > 0$) the values of the three integrals $\int_0^\infty e^{-cx}dx $, $\int_0^\infty xe^{-cx}dx $ and $\int_0^\infty x^2e^{-cx}dx $ without integrating. **Use that knowledge; no credit on this problem if you use SymPy or work out the integrals by calculus. We want you to learn how to use probabilistic methods to simplify calculation.**

Transistors produced by one machine have a lifetime that is exponentially distributed with mean 100 hours. Those produced by a second machine have an exponentially distributed lifetime with mean 200 hours. A package of 12 transistors contains 4 produced by the first machine and 8 produced by the second. Let $X$ be the lifetime of a transistor picked at random from the package. 

&zwnj;

### a) ###
Use one line of code to find $P(X \ge 200)$. For a number $c$, the expression `np.exp(c)` evaluates to $e^c$.

In [6]:
...

&zwnj;

### b) ###
Use iterated expectations (Section 11.2) and one line of code to find $E(X)$. 

In [None]:
...

&zwnj;

### c) ###
For $x>0$, find $P(X \in dx)$ and hence find the density of $X$.

[To find $P(X \in dx)$, use the approach you used in (a).]

*Provide your answer and reasoning in this Markdown cell.*

&zwnj;

### d) ###
Use your answers to (b) and (c) to find $Var(X)$.

[No, $Var(X)$ is not just the average of the two variances. You should try to think about why.]

*Provide your answer and reasoning in this Markdown cell.*

&zwnj;

#QUESTION

### 5. Rayleigh and Cauchy 

&zwnj;

### a) ###
Let $T$ have the exponential distribution with rate $\lambda = 1/2$. Let $R = \sqrt{T}$. Find the density of $R$. This is called the *Rayleigh* density.

*Provide your answer and reasoning in this Markdown cell.*

&zwnj;

### b) ###
For $R$ as in part (a), find $E(R)$.

[Use the density you found in (a), and notice that you can write down the indefinite integral of $xe^{-cx^2}$ by differentiating $e^{-cx^2}$ and examining the result.

*Provide your answer and reasoning in this Markdown cell.*

&zwnj;

### c) ###
This is from Pitman's text, page 310, where you'll find a useful diagram. But note that I'm using $\Theta$ where he uses $\Phi$, to avoid confusion with the standard normal cdf.

Suppose that a particle is fired from the origin in the $(x, y)$-plane in a straight line in a direction at a random angle $\Theta$ to the $x$-axis. Let $Y$ be the $y$-coordinate of the point where the particle hits the line $x = 1$. Show that if $\Theta$ has the uniform distribution on $(-\pi/2, \pi/2)$, then the density of $Y$ is

$$
f_Y(y) ~ = ~ \frac{1}{\pi(1 + y^2)}, ~~~ - \infty < y < \infty
$$

This is called the *Cauchy* density.

*Provide your answer and reasoning in this Markdown cell.*

&zwnj;

### d) ###
Plot the Cauchy density over the interval $(-5, 5)$ and overlay the standard normal curve. Use `'darkblue'` for the color of the one with heavier tails, and `'gold'` for the color of the other one.

[Use `stats.cauchy.pdf` for the Cauchy density function and `stats.norm.pdf` for the standard normal density function.]

In [None]:

y = np.arange(-5, 5.01, 0.01)
plt.plot(y, ..., color='darkblue', lw=2)
plt.plot(y, ..., color='gold', lw=2)
plt.ylim(0, 0.45);

&zwnj;

### e) ###
For $Y$ with the Cauchy density, use calculus to show that $E(\lvert Y \rvert ) = \infty$. Thus $E(Y)$ is undefined even though the density of $Y$ is symmetric about 0.

The Cauchy curve is called the [Witch of Agnesi](https://en.wikipedia.org/wiki/Witch_of_Agnesi). Skim the History and Applications sections of the Wikipedia article.

*Provide your answer and reasoning in this Markdown cell.*

&zwnj;

### f) ###
Explain what is being plotted by the code below, and discuss what you see in the graph in relation to the Weak Law of Large Numbers. Run the cell several times before you answer. You should also vary $N$.

[`stats.cauchy.rvs(size = N)` returns an array of $N$ i.i.d. Cauchy random numbers.]

In [27]:
N = 10000
n = np.arange(1, N+1)
x = stats.cauchy.rvs(size = N)
y = np.cumsum(x)/n
plt.plot(n, y, color='darkblue', lw=2)
plt.plot([0, N], [0, 0], color='k', lw=2);

*Provide your answer and reasoning in this Markdown cell.*

&zwnj;

#QUESTION

### 6. Transforming the Standard Normal ###
Let $Z$ have the standard normal density.

&zwnj;

### a) ###
Find the density of $\lvert Z \rvert$. Don't try to use a formula; just think about $P(\lvert Z \rvert \in dz)$ for positive $z$.

*Provide your answer and reasoning in this Markdown cell.*

&zwnj;

### b) ###
Plot the density of $\lvert Z \rvert$ over the interval $(0, 4)$. Base your code on the examples of plots made in the previous questions.

In Data 8 you saw distributions of this shape when you simulated statistics such as "the absolute difference between the proportion of heads and 1/2".

In [None]:
...
...

&zwnj;

### c) ###
Use the density in (a) to find $E(\lvert Z \rvert )$. Question 5b will be useful.

*Provide your answer and reasoning in this Markdown cell.*

&zwnj;

### d) ###
Let $\mu$ and $\sigma$ be constants with $\sigma > 0$. Find the density of $\sigma Z + \mu$. Thus far we have just assumed what this density is; now you can derive its formula by using the change of variable method for densities.

*Provide your answer and reasoning in this Markdown cell.*

&zwnj;

### e) ###
Find the density of $1/Z$. Why should you not worry about $Z = 0$?

Then plot the density over an interval that is large enough so that you can see almost all the probability. You'll have to experiment a bit to find a good interval.

*Provide your answer and reasoning in this Markdown cell.*

In [None]:
_ = autograder.grade('q1')

&zwnj;

## Latex error fix

If you have a latex error when compiling to gradescope, we now have a function called `cell_by_cell` which will identify which cell is the problem! Feel free to test it out by running the cell below

Please don't delete this cell and **make sure to save your notebook**

In [None]:
import gsExport
gsExport.cell_by_cell()

In [None]:
# For your convenience, you can run this cell to run all the tests at once!
import os
_ = [autograder.grade(q[:-3]) for q in os.listdir("tests") if q.startswith('q')]

In [None]:
import gsExport
gsExport.generateSubmission()