# Week 7 - Distributions of Sampling Statistics

This is a Jupyter notebook to explore the material in (Ross, 2017, Chp. 7). 



In [5]:
%matplotlib inline
# from now on we'll start each notebook with the library imports
# and special commands to keep these things in one place (which
# is good practice). The line above is jupyter command to get 
# matplotlib to plot inline (between cells)
# Next we import the libraries and give them short names
import numpy as np
import scipy.stats as stats
import matplotlib.pyplot as plt
from collections import Counter
from collections import defaultdict

## Exercise A

Complete question 3 from problems for (Ross, 2017, Sec. 7.3) -- the text is repeated below for convenience:

> 3. Consider a population whose probabilities are given by
>
> $$p(1) = p(2) = p(3) = \frac{1}{3}$$
>
>    (a) Determine E[X].
>
>    (b) Determine SD(X).
>
>    (c) Let X denote the sample mean of a sample of size 2 from this
population. Determine the possible values of X along with their
probabilities.
>
>    (d) Use the result of part (c) to compute E[X] and SD(X).
>
>    (e) Are your answers consistent?


*complete your answers in Markdown*

#### Question 3

$p(1)$ = $\frac{1}{3}$

$p(2)$ = $\frac{1}{3}$

$p(3)$ = $\frac{1}{3}$

##### (a) Determine E[X].

$E[X]$ = $np$

$n$ = 3

$p$ = $\frac{1}{3}$

$E[X]$ = 3 * $\frac{1}{3}$

$E[X]$ = 1

##### (b) Determine SD(X).

$SD(X)$ = $\sqrt{np(1-p)}$

$SD(X)$ = $\sqrt{1(1-(1/3))}$

$SD(X)$ = $\sqrt{1(2/3)}$

$SD(X)$ = $\sqrt{2/3}$

$SD(X)$ = 0.47 (2 d.p.)

##### (c) Let X denote the sample mean of a sample of size 2 from this population. Determine the possible values of X along with their probabilities.

$X (mean)$ = $\frac{x}{n}$

$X (mean)$ = $\frac{x}{2}$

$X (mean)$ * $2$ = $x$

Probabilities anything which will add 1 to from 2 values e.g. (0.5, 0.5), (0.3, 0.7), (0.1, 0.9)

##### (d) Use the result of part (c) to compute E[X] and SD(X).

$E[X]$ = $np$

$SD(X)$ = $\sqrt{np(1-p)}$

Substitute previous values for result

##### (e) Use the result of part (c) to compute E[X] and SD(X).

I believe answers are consistent, as they are derived from the formulas


## Exercise B

Complete question 5 from problems for (Ross, 2017, Sec. 7.4) -- the text is repeated below for convenience:

> 5. The time it takes to develop a photographic print is a random variable
with mean 17 seconds and standard deviation 0.8 seconds. Approximate
the probability that the total amount of time that it takes to process 100
prints is
>
>    (a) More than 1720 seconds
>
>    (b) Between 1690 and 1710 seconds

*Write up in markdown but you may want to use the code block below to complete your calculations.*

In [6]:
## supporting code for Exercise B

print("Question 5")
print('\n')

print("(a) More than 1720 seconds")
mean = 17
sd = 0.8
n = 100
print(f"Mean = {mean} seconds")
print(f"Standard Deviation = {mean} seconds")
print(f"Sample value {n} prints")
print("P(X > 1720)")
standa  = (1720 - (mean*100))/(sd*100)
print(f"P(Z > {standa}) = 0.5987")

print('\n')

print("(b) Between 1690 and 1710 seconds")
mean = 17
sd = 0.8
n = 100
print(f"Mean = {mean} seconds")
print(f"Standard Deviation = {mean} seconds")
print(f"Sample value {n} prints")
print("P(1690 < x < 1710)")
standa1  = (1690 - (mean*100))/(sd*100)
standa2  = (1710 - (mean*100))/(sd*100)
print(f"P({standa1} < Z < {standa2})")
print(f"P({standa1} < Z < {standa2}) = {0.0497382248301129 -- 0.049737775169887095}")


Question 5


(a) More than 1720 seconds
Mean = 17 seconds
Standard Deviation = 17 seconds
Sample value 100 prints
P(X > 1720)
P(Z > 0.25) = 0.5987


(b) Between 1690 and 1710 seconds
Mean = 17 seconds
Standard Deviation = 17 seconds
Sample value 100 prints
P(1690 < x < 1710)
P(-0.125 < Z < 0.125)
P(-0.125 < Z < 0.125) = 0.099476


## Exercise C

Complete questions 2 and 3 from problems for (Ross, 2017, Sec. 7.5) -- the text is repeated below for convenience:

> 2. Ten percent of all electrical batteries are defective. In a random selection of
8 of these batteries, find the probability that
>
>    (a) There are no defective batteries.
>
>    (b) More than 15 percent of the batteries are defective.
>
>    (c) Between 8 and 12 percent of the batteries are defective.

> 3. Suppose there was a random selection of n = 50 batteries in Prob. 2. Determine approximate probabilities for parts (a), (b), and (c) of that problem.

*complete in markdown but you can use the code block below for any calculations*

In [7]:
## supporting code for exercise C

print("Exercise C - Question 2")
print('\n')

print("(a)")
print("Let X = number of defective batteries")
print("X is a binomial with n=8, p=0.1")
print("We cannot use the normal approximation since np=0.8 and n(1-p) = 7.2")
print("P{X = 0} = 0.4305")
print("From Cumulative Binomial Distribution Function Table")

print('\n')

print("(b)")
print("15% of 8 = 1.2")
print("P(X > 1.2) = P{X >= 2} = 1 - [P{X < 2}]")
print("1 - [P{X = 0} + P{X = 1}] = 1 - [0.4305 + 0.3826] = 0.1869")

print('\n')

print("(c)")
print("8% of 8 = 0.64 and 12% of 8 = 0.96")
print("Hence, P[0.64 < X < 0.96] = 0")

print('\n')

print("Exercise C - Question 3")
print('\n')

print("(a)")
print("Let X = number of defective batteries")
print("X is a binomial with n=50, p=0.1")
print("We cannot use the normal approximation since np=5 and n(1-p) = 45")
print("P{X = 0} = 0.0052")
print("From Cumulative Binomial Distribution Function Table")

print('\n')

print("(b)")
print(f"15% of 50 = {0.5*15}")
print("P(X > 1.2) = P{X >= 2} = 1 - [P{X < 2}]")
print("1 - [P{X = 0} + P{X = 1}] = 1 - [0.4305 + 0.3826] = 0.1869")

print('\n')

print("(c)")
print(f"8% of 50 = {0.8*50} and 12% of 50 = {0.12*50}")
print("Hence, P[0.64 < X < 0.96] = 0")


Exercise C - Question 2


(a)
Let X = number of defective batteries
X is a binomial with n=8, p=0.1
We cannot use the normal approximation since np=0.8 and n(1-p) = 7.2
P{X = 0} = 0.4305
From Cumulative Binomial Distribution Function Table


(b)
15% of 8 = 1.2
P(X > 1.2) = P{X >= 2} = 1 - [P{X < 2}]
1 - [P{X = 0} + P{X = 1}] = 1 - [0.4305 + 0.3826] = 0.1869


(c)
8% of 8 = 0.64 and 12% of 8 = 0.96
Hence, P[0.64 < X < 0.96] = 0


Exercise C - Question 3


(a)
Let X = number of defective batteries
X is a binomial with n=50, p=0.1
We cannot use the normal approximation since np=5 and n(1-p) = 45
P{X = 0} = 0.0052
From Cumulative Binomial Distribution Function Table


(b)
15% of 50 = 7.5
P(X > 1.2) = P{X >= 2} = 1 - [P{X < 2}]
1 - [P{X = 0} + P{X = 1}] = 1 - [0.4305 + 0.3826] = 0.1869


(c)
8% of 50 = 40.0 and 12% of 50 = 6.0
Hence, P[0.64 < X < 0.96] = 0


## Exercise D

Complete question 1 from problems for (Ross, 2017, Sec. 7.6) -- the text is repeated below for convenience:

> 1. The following data sets come from normal populations whose standard
deviation σ is specified. In each case, determine the value of a statistic
whose distribution is chi-squared, and tell how many degrees of freedom
this distribution has.
>
>    (a) 104, 110, 100, 98, 106; σ = 4
>
>    (b) 1.2, 1.6, 2.0, 1.5, 1.3, 1.8; σ = 0.5
>
>    (c) 12.4, 14.0, 16.0; σ = 2.4

In [8]:
## complete in python

print("Exercise D - Question 1")
print('\n')

print("(a) 104, 110, 100, 98, 106; σ = 4")
print("n-1 degrees of freedom")
print(f"Mean = {(104+110+100+98+106)/5}")
print(f"S squared = {((104+110+100+98+106)/5)/(4**2)}")

print('\n')

print("(b) 1.2, 1.6, 2.0, 1.5, 1.3, 1.8; σ = 0.5")
print("n-1 degrees of freedom")
print(f"Mean = {(1.2+1.6+2.0+1.5+1.3+1.8)/6}")
print(f"S squared = {((1.2+1.6+2.0+1.5+1.3+1.8)/6)/(0.5**2)}")

print('\n')

print("(c) 12.4, 14.0, 16.0; σ = 2.4")
print("n-1 degrees of freedom")
print(f"Mean = {(12.4+14.0+16.0)/3}")
print(f"S squared = {((12.4+14.0+16.0)/3)/(2.4**2)}")


Exercise D - Question 1


(a) 104, 110, 100, 98, 106; σ = 4
n-1 degrees of freedom
Mean = 103.6
S squared = 6.475


(b) 1.2, 1.6, 2.0, 1.5, 1.3, 1.8; σ = 0.5
n-1 degrees of freedom
Mean = 1.5666666666666667
S squared = 6.266666666666667


(c) 12.4, 14.0, 16.0; σ = 2.4
n-1 degrees of freedom
Mean = 14.133333333333333
S squared = 2.4537037037037037


### This section will be used for the Week 7  Quiz Assessment

https://www.youtube.com/watch?v=JLmD0sJId1M&ab_channel=jbstatistics

https://amsi.org.au/ESA_Senior_Years/SeniorTopic4/4h/4h_2content_4.html

https://www.dartmouth.edu/~chance/teaching_aids/books_articles/probability_book/Chapter6.pdf

### Question 1

#### (A) What is the expected value and SD of each Xi?

$Pr(Xi=1) = Pr(Xi=2) = … = Pr(Xi=6) = \frac{1}{6}$

$E[X_i]$ = $(1 * \frac{1}{6})$ + $(2 * \frac{1}{6})$ + $(3 * \frac{1}{6})$ + $(4 * \frac{1}{6})$ + $(5 * \frac{1}{6})$ + $(6 * \frac{1}{6})$ = 3.500 = 3.50 (3 sig figs)

$SD(X_i)$ = $\sqrt{Var(X_i)}$	

$Var(X_i)$ = $E[X_i^2]$ - $\mu^2$ = $\frac{91}{6}$ - $(3.5^2)$ = $\frac{35}{12}$

$SD(X_i)$ = $\sqrt{\frac{35}{12}}$

$SD(X_i)$ = 1.7078... = 1.71 (3 sf)


#### (B)  What is the expected value and SD of the mean, X¯=∑iXi5?

##### New workings

$E[\bar{X}]$ = ?

We know that: $E[\bar{X}]$ = $\mu$

Therefore: $E[\bar{X}]$ = 3.50 (2 s.f.) from (A)

$SD(\bar{X})$ = ?

$Var(\bar{X})$ = $\sqrt{\frac{1}{N-1} \sum_{i=1}^{N} (x_i - \bar{x})^2}$

This formula is correct but at each squared difference, multiply by the probability

$Var(\bar{X})$ = $\frac{\sum (x_i - \bar{x})^2}{N-1}$ = $\frac{(1-3.5)^2 * \frac{1}{6} + ... + (6-3.5)^2 *\frac{1}{6}}{6-1}$

$Var(\bar{X})$ = $\frac{\frac{35}{12}}{5}$ = $\frac{7}{12}$

$SD(\bar{X})$ = $\sqrt{\frac{7}{12}}$ = 0.76376... = 0.764 (3 s.f.)

##### Alternative Working out

For population, variance = $\sigma^2$

For sample, variance = $\frac{\sigma^2}{n}$

Therefore using original variance, which is $\frac{35}{12}$, we get $\frac{\frac{35}{12}}{6}$ = $\frac{35}{72}$

$SD(\bar{X})$ = $\sqrt{\frac{35}{72}}$ = 0.69721... = 0.697 (3 ssf)

##### Old workings 

$E[\bar{X}]$ = ?

P{${\bar{X} = 1}$} = P{${(1,1)}$} = $\frac{1}{36}$

P{${\bar{X} = 1.5}$} = P{${(1,2) or (2,1)}$} = $\frac{1}{6}$

P{${\bar{X} = 2}$} = P{${(2,2)}$} = $\frac{1}{36}$

P{${\bar{X} = 2.5}$} = P{${(2,3) or (3,2)}$} = $\frac{1}{6}$

P{${\bar{X} = 3}$} = P{${(3,3)}$} = $\frac{1}{36}$

P{${\bar{X} = 3.5}$} = P{${(3,4) or (4,3)}$} = $\frac{1}{6}$

P{${\bar{X} = 4}$} = P{${(4,4)}$} = $\frac{1}{36}$

P{${\bar{X} = 4.5}$} = P{${(4,5) or (5,4)}$} = $\frac{1}{6}$

P{${\bar{X} = 5}$} = P{${(5,5)}$} = $\frac{1}{36}$

P{${\bar{X} = 5.5}$} = P{${(5,6) or (6,5)}$} = $\frac{1}{6}$

P{${\bar{X} = 6}$} = P{${(6,6)}$} = $\frac{1}{36}$

$E[\bar{X}]$ = $1(\frac{1}{36})$ + $1.5(\frac{1}{6})$ + $2(\frac{1}{36})$ + $2.5(\frac{1}{6})$ + $3(\frac{1}{36})$ + $3.5(\frac{1}{6})$ + $4(\frac{1}{36})$ + $4.5(\frac{1}{6})$ + $5(\frac{1}{36})$ + $5.5(\frac{1}{6})$ + $6(\frac{1}{36})$

$E[\bar{X}]$ = $\frac{7}{2}$ = 3.5 = 3.50 (3 s.f.)

$SD(\bar{X})$ = ?

$Var(\bar{X})$ = $E[(\bar{X} - 1.5)^2]$

$Var(\bar{X})$ = $(1 - 3.5)^2\frac{1}{36}$ + $(1.5 - 3.5)^2\frac{1}{6}$ + $(2 - 3.5)^2\frac{1}{36}$ + ... + $(5 - 3.5)^2\frac{1}{36}$ + $(5.5 - 3.5)^2\frac{1}{6}$ + $(6 - 3.5)^2\frac{1}{36}$

$Var(\bar{X})$ = $\frac{25}{144}$ + $\frac{2}{3}$ + $\frac{1}{16}$ + $\frac{1}{6}$ + $\frac{1}{144}$ + 0 + $\frac{1}{144}$ + $\frac{1}{6}$ + $\frac{1}{16}$ + $\frac{2}{3}$ + $\frac{25}{144}$

$Var(\bar{X})$ = $\frac{155}{72}$

$SD(\bar{X})$ = $\sqrt{\frac{155}{72}}$

$SD(\bar{X})$ = 1.46723... = 1.47 (3 s.f.)


#### (C) What is the expected value and SD of Y3?

$E[Y_3]$ = $(1 * \frac{1}{6})$ + $(2 * \frac{1}{6})$ + $(3 * \frac{1}{6})$ + $(4 * \frac{1}{6})$ + $(5 * \frac{1}{6})$ + $(6 * \frac{1}{6})$ = 3.500 = 3.50 (3 sig figs)

$SD(Y_3)$ = $\sqrt{Var(X_i)}$	

$Var(Y_3)$ = $E[Y_3^2]$ - $\mu^2$ = $\frac{91}{6}$ - $(3.5^2)$ = $\frac{35}{12}$

$SD(Y_3)$ = $\sqrt{\frac{35}{12}}$

$SD(Y_3)$ = 1.7078... = 1.71 (3 sf)



#### (D) What is the expected value and SD of the mean of the 5 values drawn, Y¯=∑iYi5?

Selecting 5 numbers from 6

Therefore, there are 6 outcomes: 1,2,3,4,5 or 1,2,3,4,6 or 1,2,3,5,6 or 1,2,4,5,6 or 1,3,4,5,6 or 2,3,4,5,6

$E[X]$ = ?

For first outcome: 1,2,3,4,5

$E[\bar{X}]$ = $\frac{1}{5}$ + $\frac{2}{5}$ + ... $\frac{5}{5}$

$E[\bar{X}]$ = 3

$Var\bar{(X)}$ = $(1-3)^2\frac{1}{5}$ + $(2-3)^2\frac{1}{5}$ + ... + $(5-3)^2\frac{1}{5}$ = 2

$SD\bar{(X)}$ = $\sqrt{2}$ = 1.4142... = 1.41 (3 s.f.)

Same for the rest of the outcomes:

Outcome: 1,2,3,4,6

$E[\bar{X}]$ = 3.2

$SD\bar{(X)}$ = $\sqrt{\frac{74}{25}}$ = 1.72946... = 1.72 (3 s.f.)

Outcome: 1,2,3,5,6

$E[\bar{X}]$ = 3.4

$SD\bar{(X)}$ = 1.85 (3 s.f.)

Outcome: 1,2,4,5,6

$E[\bar{X}]$ = 3.6

$SD\bar{(X)}$ = 1.85 (3 s.f.)

Outcome: 1,3,4,5,6

$E[\bar{X}]$ = 3.8

$SD\bar{(X)}$ = 1.72 (3 s.f.)

Outcome: 2,3,4,5,6

$E[\bar{X}]$ = 4

$SD\bar{(X)}$ = 1.41 (3 s.f.)


Next, using these values I got an average $E[\bar{X}]$ and average $SD\bar{(X)}$

$E[\bar{(X)}]$ = 3.50 (3 s.f.) 

$SD\bar{(X)}$ = 1.66 (3 s.f.) 

#### (E) What is the expected value and SD of the median of the 5 values drawn, M=median({Y1,Y2,Y3,Y4,Y5})?

$E[\bar{(M)}]$ = 3.50 (3 s.f.) (from (c))

For $SD\bar{(M)}$ = $\sqrt{2}$, $\sqrt{2}$, $\sqrt{\frac{74}{25}}$, $\sqrt{\frac{74}{25}}$, $\sqrt{\frac{86}{25}}$, $\sqrt{\frac{86}{25}}$ 

### GAP

### Exercise 2

#### (A) What is the expected value of the cost of a single meal to me?

4 friends each month (5 including me)

£20 a head (£100 in total)

x | 1 | 2 | 3 | 4 | 5 | 6

P(X) | $\frac{1}{6}$ | $\frac{1}{6}$ | $\frac{1}{6}$ | $\frac{1}{6}$ | $\frac{1}{6}$ | $\frac{1}{6}$

$E[X]$ = ($\frac{1}{6}$ * 0) + ($\frac{1}{6}$ * 0) + ($\frac{1}{6}$ * 100) + ($\frac{1}{6}$ * 0) + ($\frac{1}{6}$ * 0) + ($\frac{1}{6}$ * 20)

$E[X]$ = 20


#### (B) What is the standard deviation of the cost of a single meal to me?

$E[X_i^2]$ = ($\frac{1}{6} * 0^2$) + ($\frac{1}{6} * 0^2$) + ($\frac{1}{6} * 100^2$) + ($\frac{1}{6} * 0^2$) + ($\frac{1}{6} * 0^2$) + ($\frac{1}{6} * 20^2$)

$E[X_i^2]$ = $\frac{5000}{3}$ + $\frac{200}{3}$

$E[X_i^2]$ = $\frac{5200}{3}$

$Var(X_i)$ = $\sigma^2$ = $\frac{5000}{3}$ - $(20)^2$

$Var(X_i)$ = $\frac{4000}{3}$

$SD(X_i)$ = $\sqrt{\frac{4000}{3}}$

$SD(X_i)$ = 36.5 (3 sf)


#### (C) What is the expected value of the average cost of a meal over a year?

$E[X]$ was for a month

So a year is 12 months

Therefore $E[X]$ for a year = 12 * 20 = 240


#### (D) What is the standard deviation of the average cost of a meal over a year?

n = 12 now because a year is 12 months

$\sqrt{\frac{4000}{3}}$ * $\sqrt{12}$

$SD(X)$ = 126.49... = 126 (3 sf)


#### (E) If I go out in this way every month for 3 years, with what probability will I pay in total between £600 and £840 for my meals out?

Firstly need to work out $E[X]$ and $SD(X)$ for 3 years 

$E[X_i]$ = 20 * 36 (because 36 months in three years) = 720

$SD(X)$ = $\sqrt{\frac{4000}{3}}$ * $\sqrt{36}$

$SD(X)$ = $40\sqrt{30}$ = 219.089... = 219 (3 sf)

$Pr(600 < X < 840)$

Standardisation formula = $\frac{X - mu}{sigma}$

$\frac{600 - 720}{40\sqrt{30}}$ = -0.54772 = -0.55 (2 dp)

$\frac{840 - 720}{40\sqrt{30}}$ = 0.55772 = 0.55 (2 dp)

Corresponding z value for -0.55 = 0.2912

Corresponding z value for 0.55 = 0.7088

Therefore, probability = 0.7088 - 0.2912 = 0.4176 


#### (F) Approximate the amount C such that the following statement is true: There is a roughly 90% chance that I will spend less than £C over 3 years on my meals out, and a roughly 10% chance that I will spend more than £C.

Using the Z score table, I looked for the corresponding z value which has probability 90%

Z = 1.28 (probability is 0.8997 roughly 0.9 = 90%)

Rearrange formula for standardisation: x = Z * $\sigma$ + $\mu$

x = (1.28 * $40\sqrt{30}$) + 720

x = 1000.4339... = 1000 (3 sf)

£1000
