* X = set of population eleements
* x = set of sample elements
* N = population size
* n = sample size
* Greek letters refer to population attributes (usually a capital letter)
    * $\mu$ = population mean
    * $\sigma$ = standard deviation of population
* Roman letters refer to sample attributes (usually a lower-case letter)
    * $\bar{x}$ = sample mean
    * s = standard deviation of a sample

# Chapter 7: The Central Limit Theorem
There are two alternative forms of the theorem, and both alternatives are concerned with drawing finite samples size n from a population with a known mean, μ, and a known standard deviation, σ
* The first alternative says that if we collect samples of size n with a "large enough $n$," calculate each sample's `mean`, and create a histogram of those means, then the resulting histogram will tend to have an approximate normal bell shape.
* The second alternative says that if we again collect samples of size n that are "large enough," calculate the `sum` of each sample and create a histogram, then the resulting histogram will again tend to have a normal bell-shape.
> (the sample size should be at least **30** or the data should come from a normal distribution).

$X \sim N(5,6)$ is read as:  $X$ is a noramally distributed random variable with mean = 5 and standard deviation = 6.

#### Z-Scores
<span style='color:green'>If X is a normally distributed random variable and $X\sim N(\mu, \sigma)$, then the z-score is</span>:
$$z = \frac{x-\mu}{\sigma}$$

In statistics, a `z-score` tells us `how many standard deviations away a value is from the mean`. We use the following formula to calculate a z-score:

$z = \frac{(X-\mu)}{\sigma}$

where:
* X is a single raw data value
* μ is the population mean
* σ is the population standard deviation

In [1]:
from scipy.stats import zscore
import math

import ipywidgets as widgets
from ipywidgets import interact

In [2]:
zscore(a=[0.4])

array([nan])

In [3]:
import scipy.stats as st

In [4]:
st.norm.ppf(0.4)

-0.2533471031357997

In [5]:
st.norm.cdf(0.4)

0.6554217416103242

In [6]:
1 - st.norm.cdf(0.4)

0.3445782583896758

In [7]:
(85-63)/5

4.4

In [8]:
st.norm.cdf(4.4)

0.9999945874560923

In [9]:
2/5

0.4

In [10]:
1 - st.norm.cdf(0.4)

0.3445782583896758

In [11]:
(85-63)/5

4.4

In [12]:
st.norm.cdf(4.4)

0.9999945874560923

In [13]:
st.norm.ppf(0.9, loc=63, scale=5)

69.407757827723

In [14]:
st.norm.ppf(0.7, loc=63, scale=5)

65.6220025635402

#### <span style="color:orange">EXAMPLE 6.9</span>

In [15]:
z1 = (1.8-2)/.5
print(z1)

-0.3999999999999999


In [16]:
z2 = (2.75 - 2)/0.5
print(z2)

1.5


In [17]:
st.norm.cdf(z1)

0.3445782583896759

In [18]:
st.norm.cdf(z2)

0.9331927987311419

In [19]:
st.norm.cdf(z2) - st.norm.cdf(z1)

0.588614540341466

In [20]:
st.norm.ppf(.25, loc=2, scale=0.5)

1.6627551249019592

$\mu = 36.9 \text{ years}$

$\sigma = 13.9 \text{ years}$



In [21]:
(23 - 36.9)/13.9

-0.9999999999999999

In [22]:
(64.7 -36.9)/13.9

2.0000000000000004

In [23]:
st.norm.cdf((64.7 - 36.9)/13.9) - st.norm.cdf((23 - 36.9)/13.9)

0.8185946141203637

In [24]:
st.norm.cdf((50.8 - 36.9)/13.9)

0.8413447460685429

In [25]:
1 -  st.norm.cdf(-0.9999999)

0.8413447218714694

In [26]:
st.norm.ppf(.8, loc=36.9, scale=13.9)

48.59853514666351

In [27]:
st.norm.ppf(.75, loc=36.9, scale=13.9)

46.275407527725534

In [28]:
st.norm.ppf(.25, loc=36.9, scale=13.9)

27.524592472274463

In [29]:
st.norm.ppf(.75, loc=36.9, scale=13.9) - st.norm.ppf(.25, loc=36.9, scale=13.9)

18.75081505545107

In [30]:
st.norm.ppf(.40, loc=36.9, scale=13.9)

33.37847526641238

In [31]:
st.norm.ppf(.60, loc=36.9, scale=13.9)

40.42152473358762

Diameter

$\mu = 5.85 \text{ cm}$

$\sigma = 0.24 \text{ cm}$

In [32]:
1 - st.norm.cdf((6-5.85)/0.24)

0.2659855290487

In [33]:
st.norm.ppf(.4, loc=5.85, scale=0.24)

5.789196695247408

In [34]:
st.norm.ppf(.6, loc=5.85, scale=0.24)

5.910803304752592

In [35]:
st.norm.ppf(.9, loc=5.85, scale=0.24)

6.1575723757307035

#### <span style="color:orange">Example 7.1</span>

In [36]:
st.norm.cdf((92-90)/(15/math.sqrt(25))) - st.norm.cdf((85-90)/(15/math.sqrt(25)))

0.6997171101802624

## <span style="color:green"> The Central Limit Theorem for Sample Means (Averages) </span>
* Suppose $X$ is a random variable with a distribution that may be known or unknown.
* $\mu_x = \text{ the mean of } X$
* $\sigma_x = \text{ the standard deviation of } X$

If you draw random samples of size $n$, then as $n$ increases, the random variable $\bar{x}$ which consists of sample means, tends to be **noramlly distributed** and
* $\bar{x} \sim N \bigl( \mu_x, \frac{\sigma_x}{\sqrt{n}} \bigr)$

> The **central limit theorem** for sample means says that if you repeatedly draw samples of a given size (such as repeatedly rolling ten dice) and calculate their means, those means tend to follow a normal distribution (the sampling distribution). As sample sizes increase, the distribution of means more closely follows the normal distribution. The normal distribution has the same mean as the original distribution and a variance that equals the original variance divided by the sample size. Standard deviation is the square root of variance, so the standard deviation of the sampling distribution is the standard deviation of the original distribution divided by the square root of n. The variable n is the number of values that are averaged together, not the number of times the experiment is done.

> To put it more formally, if you draw random samples of size n, the distribution of the random variable  x¯ , which consists of sample means, is called the sampling distribution of the mean. The sampling distribution of the mean approaches a normal distribution as n, the sample size, increases.

The random variable $\bar{x}$ has a different z-score associated with it from that of the random variable $X$. The mean $\bar{x}$ is the value of $\bar{x}$ in one sample.
* $z=\frac{\bar{x}-\mu_x}{\bigl(\frac{\mu_x}{\sqrt{n}}\bigr)}$
* $\mu_x$ is the average of both $X \text{ and } \bar{x}$
* $\sigma_\bar{x} = \frac{\sigma_x}{\sqrt{n}}=$ standard deviation of $\bar{x}$ and is called the **standard error of the mean**.

#### <span style="color:orange">Example 7.1</span>
An unknown distribution has a mean of 90 and a standard deviation of 15. Samples of size n = 25 are drawn randomly from the population.

a. Find the probability that the **sample mean** is between 85 and 92.
* ![image.png](attachment:4b64106c-2b4e-4501-8831-59031dabfe66.png)

* $\therefore \mu = \mu_x = 90$
* $\therefore \sigma_x = \frac{15}{\sqrt{25}}$
* Let $\bar{x} = $ the mean of a sample of size 25.
    * $\mu_x=90$
    * $\sigma_x=15$
    * $n=90$
* so, $\bar{x}\sim N\bigl(90,\frac{15}{\sqrt{25}}\bigr)$

In [37]:
s_sigma = 15/math.sqrt(25)

In [38]:
(90, s_sigma)

(90, 3.0)

In [39]:
diff = st.norm.cdf(92, loc=90, scale=s_sigma) - st.norm.cdf(85, loc=90, scale=s_sigma)
diff

0.6997171101802624

$\therefore P(85\lt\bar{x}\lt92) = 0.6997$

#### <span style="color:orange">Example 7.2</span>
> The length of time, in hours, it takes an "over 40" group of people to play one soccer match is normally distributed with a **mean of two hours** and a **standard deviation of 0.5 hours**. A sample of size n = 50 is drawn randomly from the population. Find the probability that the **sample mean** is between 1.8 hours and 2.3 hours.

so Let,

* $X = $ the time, in hours, it takes to play one soccer match.
* $\mu =  \mu_x = 2$ hours
* $\sigma_x = 0.5$ hours
* $n = 50$
* $X \sim N\bigl(2, \frac{0.5}{\sqrt{50}}\bigr)$

find,

* $P(1.8\lt \bar{x} \gt 2.3)$ hours

In [40]:
st.norm.cdf(2.3, loc=2, scale=0.5/math.sqrt(50)) - st.norm.cdf(1.8, loc=2, scale=0.5/math.sqrt(50))

0.9976500872609771

Therefore, the probability that the mean time is between 1.8 hours and 2.3 hours is 0.9977

#### <span style="color:orange">Example 7.3</span>

> In a recent study reported Oct. 29, 2012 on the Flurry Blog, the mean age of tablet users is 34 years. Suppose the standard deviation is 15 years. Take a sample of size n = 100.

* $\mu_x = \mu = 34$ years
* $\sigma = 15$, $\sigma_x = \frac{\sigma}{\sqrt{n}}=\frac{15}{\sqrt{100}}=\frac{15}{10}=1.5$
* $n = 100$

a.

What are the mean and standard deviation for the sample mean ages of tablet users?

mean is 34

In [41]:
# sample standard deviation is determined by:
15/math.sqrt(100)

1.5

b.

What does the distribution look like?

The central limit theorem states that for large sample sizes(n), the sampling distribution will be approximately normal.

c.

Find the probability that the sample mean age is more than 30 years (the reported mean age of tablet users in this particular study).

$P(\bar{X}\gt30)$

In [42]:
1 - st.norm.cdf(30, loc=34, scale=1.5)

0.9961696194324102

d.

Find the 95th percentile for the sample mean age (to one decimal place).

In [43]:
# returns the x value at the point described in the parameters
# In the below case this is the point that lies at the 95th percentile
st.norm.ppf(0.95, loc=34, scale=15/math.sqrt(100))

36.46728044042721

#### <span style="color:orange">Example 7.4</span>
The mean number of minutes for app engagement by a tablet user is 8.2 minutes. Suppose the standard deviation is one minute. Take a sample of 60.

a.

What are the mean and standard deviation for the sample mean number of app engagement by a tablet user?

* $\mu_\bar{x} = \mu = 8.2 $
* $\sigma = 1 \text{ minutes}$
* $\sigma_\bar{x} = \frac{\sigma}{\sqrt{n}} = \frac{1}{\sqrt{60}}=0.13$
* Sample Size = 60

In [44]:
1/math.sqrt(60)

0.12909944487358055

* $\therefore \mu_\bar{x}=\mu=8.2$
* $\therefore \sigma_\bar{x}=\frac{\sigma}{\sqrt{n}}=\frac{1}{\sqrt{60}}=0.13$

b.

What is the standard error of the mean?

**Standard Error of the Mean** = $\sigma\bar{x}=\frac{\sigma X}{\sqrt{n}}$ = standard deviation of $\bar{x}$
> This allows us to calculate the probability of sample means of a particular distance from the mean, in repeated samples of size 60.

c.

Find the 90th percentile for the sample mean time for app engagement for a tablet user. Interpret this value in a complete sentence.

In [45]:
# Units are in Minutes
st.norm.ppf(0.9, loc=8.2, scale=1/math.sqrt(60))

8.365447595688675

d.
Find the probability that the sample mean is between eight minutes and 8.5 minutes.

$P(8\lt\bar{x}\lt8.5)$

In [46]:
st.norm.cdf((8.5-8.2)/(1/math.sqrt(60))) - st.norm.cdf((8.0-8.2)/(1/math.sqrt(60)))

0.9292639990455852

## <span style="color:green"> The Central Limit Theorem for Sums </span>

**Recall*:
<span style="color:orange">$X \sim N(5,6)$ is read as:  $X$ is a noramally distributed random variable with mean = 5 and standard deviation = 6. </span>



**Suppose X is a random variable with a distribution that may be **known or unknown** (it can be any distribution) and suppose**:

* $\mu_x=\text{ the mean of } X$
* $\sigma_x = \text{ the standard deviation of } X$

If you draw random samples of size n, then as n increases, the random variable ΣX consisting of sums tends to be normally distributed and

* $\sum{X} \sim N\bigl((n)(\mu_x), (\sqrt{n})(\sigma_x)\bigr)$


The random variable $\sum{X}$ has the following z-score associated with it:

a. $\sum{x}$ is one sum.

b. $z=\frac{\sum{x}-(n)(\mu_x)}{(\sqrt{n})(\sigma{x})}$

* $(n)(\mu_x) = \text{the mean of }\sum{X}$
* $(\sqrt{n}) (\sigma_x) = \text{standard deviation of } \sum{X}$

#### <span style="color:orange">Example 7.5</span>
An unknown distribution has a mean of 90 and a standard deviation of 15. A sample of size 80 is drawn randomly from the population.

a.

Find the probability that the sum of the 80 values (or the total of the 80 values) is more than 7,500.

* $\mu_x = 90$
* $\sigma_x = 15$
* sample size = n = 80
* $\sum X \sim N\bigl((80)(90), (\sqrt{80})(15)\bigr)$

In [47]:
print("The mean of the sums is:", 80*90)

The mean of the sums is: 7200


In [48]:
print("The standard deviation of the sums is:", math.sqrt(80)*15)

The standard deviation of the sums is: 134.1640786499874


Find: $P(\sum{X}\gt 7500)$

In [49]:
1-st.norm.cdf(7500, loc=7200, scale=134.16)

0.012671433369059626

![image.png](attachment:783ebb75-3c23-4c57-8090-9955c61e1c7b.png)

In [50]:
st.norm.cdf

<bound method rv_continuous.cdf of <scipy.stats._continuous_distns.norm_gen object at 0x000001FDE4A57A00>>

<div class="alert-success">
st.norm.cdf takes as input the x value, with loc=mean, scale=standard deviation
    
st.norm.ppf takes as input the percent, with loc=mean, scale=standard deviation
    
    
    
    
</div>

b.

Find the sum that is 1.5 standard deviations above the mean of the sums.

eg) Find $\sum{X}$ where $z=1.5$

$\sum{X}=(n)(\mu_x) + (z)(\sqrt{n})(\sigma_x) = (80)(90) + (1.5)(\sqrt{80})(15)=7401.2$

In [51]:
1.5*134.1640786 + 7200

7401.2461179

#### <span style="color:orange">Example 7.7</span>
The mean number of minutes for app engagement by a tablet user is 8.2 minutes. Suppose the standard deviation is one minute. Take a sample of size 70.

a.

What are the mean and standard deviation for the sums?

* $\mu_x = 8.2$
* $\mu_{\sum{X}} = n\mu_x = 70(8.2) = 574 \text{ minutes}$
* $\sigma_x = 1$
* $\sigma_{\sum{x}} = (\sqrt{n})(\sigma_x)=(\sqrt{70})(1)=8.37$ minutes
* sample size = 70
* $\sum X \sim N\bigl((70)(8.2), (\sqrt{70})(1)\bigr)$

In [52]:
print("The mean of the sums is: ",70*8.2, "minutes")

The mean of the sums is:  574.0 minutes


In [53]:
print("The standard deviation of the sums is: ", math.sqrt(70)*1, "minutes")

The standard deviation of the sums is:  8.366600265340756 minutes


b.

Find the 95th percentile for the sum of the sample. Interpret this value in a complete sentence.

In [54]:
st.norm.ppf(0.95, loc=574, scale=8.3666002653)

587.7618327916318

Therefore, ninety five percent of the sums of app engagement times are at most 587.76 minutes.

c.
Find the probability that the sum of the sample is at least ten hours.

Ten hours = 600 minutes

In [55]:
1 - st.norm.cdf(600, loc=574, scale=8.3666002653)

0.0009430837295741901

## <span style="color:green">Using the Central Limit Theorem</span>
It is important to understand when to use the **central limit theorem**

If you are being asked to find the probability of an **individual** value, do **not** use the clt. **Use the distribution of its random variable**.

### Examples of the Central Limit Theorem
#### Law of Large Numbers
The larger n gets, the smaller the standard deviation gets. Remember that the standard deviation for $\bar{X}$ is:

$\bar{X} = \frac{\sigma}{\sqrt{n}}$

Below is an interactive example to show as n gets larger, the output (standard deviation) gets smaller.

In [56]:
import ipywidgets as widgets
from ipywidgets import interact

In [57]:
sigma = widgets.IntSlider(value=5, min=1, max=1000)
mean = widgets.IntSlider(value=5, min=1, max=1000)
def some_output(sig, n):
    return sig/math.sqrt(n)

interact(some_output,
        sig=sigma,
        n=mean);

interactive(children=(IntSlider(value=5, description='sig', max=1000, min=1), IntSlider(value=5, description='…

#### <span style="color:orange">Example 7.8</span>
A study involving stress is conducted among the students on a college campus. The **stress scores follow a uniform distribution** with the lowest stress score equal to one and the highest equal to five. Using a sample of 75 students, find:

a.

The probability that the **mean stress score** for the 75 students is less than two.

Find $P(\bar{x} < 2)$

* n = 75

This is a `uniform distribution` so:

* $X \sim U(a, b)$ where a = the lowest value of x and b = the highest value of x.
* $\therefore X\sim U(1, 5)$ where a=1 and b=5
* $\mu_x=\frac{a+b}{2}=\frac{1+5}{2}=3$
* $\sigma_x = \sqrt{\frac{(b-a)^2}{12}}=\sqrt{\frac{(5-1)^2}{12}}=1.15$
* $\therefore \bar{X} \sim N\bigl(3,\frac{1.15}{\sqrt{75}}\bigr)$

In [58]:
st.norm.cdf(.2, loc=3, scale=1.15/math.sqrt(75))

5.365069419293428e-99

Near Zero (recall smallest score is one)

![image.png](attachment:7226ab21-6c07-4de5-be25-1e142d3915ee.png)

b.

Find the 90th percentile for the mean of 75 stress scores.

Let k= the 90th percentile.

Find k, where $P(\bar{x} \lt k) = 0.90$

k = 3.2

![image.png](attachment:a796ca63-89fb-4287-9ea0-c34acf7ac9e7.png)

In [59]:
st.norm.ppf(.9, loc=3, scale=1.15/math.sqrt(75))

3.170177952509939

The 90th percentile for the mean of 75 scores is about 3.2. This tells us that 90% of all the means of 75 stress scores are at most 3.2, and that 10% are greater than 3.2.

c.

The probability that the total of the 75 stress scores is less than 200.

Find $P(\sum_x < 200)$

In [60]:
st.norm.cdf(200, loc=75*3, scale=math.sqrt(75)*1.15)

0.00603282286495274

d.
Find the 90th percentile for the total of 75 stress scores.

Let k = the 90th percentile.

Find k where $P(\sum{x}\lt k) = 0.90$

k = 237.8

In [61]:
st.norm.ppf(0.90, loc=75*3, scale=math.sqrt(75)*1.15)

237.76334643824543

#### Example 7.9
* Exponential distribution
* $\mu = 22$ minutes
* n = 80 customers
* Let X - the excess time used by one INDIVIDUAL cell phone customer
* $X \sim \text{Exp}(\frac{1}{22})$
    * $\mu=22$
    * $\sigma=22$
* Let $\bar{X} = $ the mean excess time used by a sample of n=80 customers who exceed their contracted time allowance.
* $\bar{X} \sim N\bigl(22, \frac{22}{\sqrt{80}}\bigr)$ by the central limit theorem for sample means.

a.

Find $P(\bar{x}>20)$ (The probability that the mean excess time used by the 80 customers in the sample is longer than 20 minutes)

In [62]:
1 - st.norm.cdf(20, loc=22, scale=(22/math.sqrt(80)))

0.7919241165068476

b. 

#### <span style="color:orange">Example 7.9</span>
Suppose that a market research analyst for a cell phone company conducts a study of their customers who exceed the time allowance included on their basic cell phone contract; the analyst finds that for those people who exceed the time included in their basic contract, the `excess time used` follows an `exponential distribution` with a mean of 22 minutes.

Consider a random sample of 80 customers who exceed the time allowance included in their basic cell phone contract.

Let X = the excess time used by one INDIVIDUAL cell phone customer who exceeds his contracted time allowance.

$X\sim \text{Exp}(\frac{1}{22})$. From previous chapters, we know that:
* $\mu = 22$
* $\sigma = 22$

Let $\bar{X}$ = the mean excess tie used by a sample of n=80 customers who exceed their contracted tiem allowance.

$\bar{X} \sim N \bigl(22, \frac{22}{\sqrt{80}} \bigr)$ by the central limit theorem for sample means

**Using the ctl to find probability**

a.
Find the probability that the mean excess time used by the 80 customers in the sample is longer than 20 minutes. This is asking us to find $P( \bar{x}  > 20)$. Draw the graph.

In [63]:
1 - st.norm.cdf(20, loc=22, scale=22/math.sqrt(80))

0.7919241165068476

![image.png](attachment:5821bc5b-b306-4bea-865a-1147f8448fb0.png)

b.
Suppose that one customer who exceeds the time limit for his cell phone contract is randomly selected. Find the probability that this individual customer's excess time is longer than 20 minutes. This is asking us to find $P(x > 20)$.

Find $P(x > 20)$. Remember to use the exponential distribution for an `individual`: $X\sim \text{Exp}(\frac{1}{22})$.

$P(x>20)=e^{(-(\frac{1}{22})(20))}$ or $e^{(-0.04545(20))} = 0.4029)$

In [64]:
math.e**(-(1/22)*20)

0.402890321529133

c.
Explain why the probabilities in parts a and b are different.

1. $P(x>20)=0.4029$ but $P(\bar{x}>20)=0.7919
2. The probabilities are not equal because we use different distributions to calculate the probability for individuals and for means.
3. <span style="color:pink">When asked to find the probability of an individual value, use the stated distribution of its random variable; do not use the clt. Use the clt with the normal distribution when you are being asked to find the probability for a mean.</span>

Let $k=\text{ the } 95^{\text{th}}$ percentile.  Find $k$ where $P(\bar{x}<k)=0.95$

$k = 26.0$

In [65]:
st.norm.ppf(.95, loc=22, scale=22/math.sqrt(80))

26.045804975190627

![image.png](attachment:84e08460-913d-4021-b92c-20a6aa496a97.png)

The 95th percentile for the **sample mean excess time used** is about 26.0 minutes for random samples of 80 customers who exceed their contractual allowed time.

Ninety five percent of such samples would have means under 26 minutes; only five percent of such samples would have means above 26 minutes.

#### <span style="color:orange">Example 7.10</span>
In the United States, someone is sexually assaulted every two minutes, on average, according to a number of studies. Suppose the standard deviation is 0.5 minutes and the sample size is 100.

* $\mu_x = \mu = 2$ minutes
* $\sigma = 0.5$ minutes, $\sigma_x = \frac{\sigma}{\sqrt{n}} = \frac{0.5}{10} = 0.05$
* $n = 100$

a.
Find the median, the first quartile, and the third quartile for the sample mean time of sexual assaults in the United States.

In [66]:
print("The median is:", 0.05)

The median is: 0.05


In [67]:
print("The first quartile is:", st.norm.ppf(0.25, loc=2, scale=0.05))

The first quartile is: 1.9662755124901958


In [68]:
print("The third quartile is:", st.norm.ppf(0.75, loc=2, scale=0.05))

The third quartile is: 2.033724487509804


b.
Find the median, the first quartile, and the third quartile for the sum of sample times of sexual assaults in the United States.

$\mu_{\sum{x}} = n(\mu_x)=100(2)=200$

$\sigma_{\mu x} = \sqrt{n}(\sigma_x)=\sqrt{100}(0.5)=5$

In [69]:
(100*2, math.sqrt(100)*.5)

(200, 5.0)

In [70]:
print("The mean is:", 200)

The mean is: 200


In [71]:
print("The 25th percentile is:", st.norm.ppf(0.25, loc=200, scale=5))

The 25th percentile is: 196.6275512490196


In [72]:
print("The 75th percentile is:", st.norm.ppf(0.75, loc=200, scale=5))

The 75th percentile is: 203.3724487509804


c.
Find the probability that a sexual assault occurs on the average between 1.75 and 1.85 minutes.

$P(1.75\lt \bar{x} \lt 1.85)$

In [73]:
st.norm.cdf(1.85, loc=2, scale=0.05) - st.norm.cdf(1.75, loc=2, scale=0.05)

0.001349611380058222

d.
Find the value that is two standard deviations above the sample mean.

Recall that:
> In statistics, a `z-score` tells us `how many standard deviations away a value is from the mean`. We use the following formula to calculate a z-score:

> $z = \frac{(X – μ)}{σ}$

$\therefore z = 2$ and solve for x (the value we are trying to find)

* $2 = \frac{x-2}{0.05}$
* $2(0.05) = x - 2$
* $2(0.05) - 2 = x$
* $x = 2.1$

Therefore the value that is two standard deviations above the sample mean is = 2.1

e.
Find the IQR for the sum of the sample times.

IQR = 75th percentile - 25th percentile

Recall,
> * $(\sqrt{n}) (\sigma_x) = \text{standard deviation of } \sum{X}$
* $\sqrt{n}(\sigma)$
* $\sqrt{100} (0.5)$
* $10*0.5$
* 5

In [74]:
(st.norm.ppf(.75, loc=200, scale=5), st.norm.ppf(.25, loc=200, scale=5))

(203.3724487509804, 196.6275512490196)

In [75]:
st.norm.ppf(.75, loc=200, scale=5) - st.norm.ppf(.25, loc=200, scale=5)

6.744897501960793

#### <span style="color:orange">Example 7.11</span>
A study was done about violence against prostitutes and the symptoms of the post-traumatic stress that they developed. The age range of the prostitutes was 14 to 61. The mean age was 30.9 years with a standard deviation of nine years.



* $14 \geq \text{ age } \leq 61$
* mean age, $\mu = 30.9$
* stdev age, $\sigma = 9$

a.

In a sample of 25 prostitutes, what is the probability that the mean age of the prostitutes is less than 35?

* $\therefore \mu_x = \mu = 30.9$
* $\therefore \sigma = 9$

Central limit theorem for sample means:
* $\bar{X} \sim N \bigl( \mu_x, \frac{\sigma_x}{\sqrt{n}} \bigr) $

In [76]:
print("The sample standard deviation is:", (9)/math.sqrt(25))

The sample standard deviation is: 1.8


In [77]:
print("The probability, for a sample of 25 prostitutes, that the mean age is less than 35 is:", st.norm.cdf(35, loc=30.9, scale=1.8))

The probability, for a sample of 25 prostitutes, that the mean age is less than 35 is: 0.9886300895118156


b.

Is it likely that the mean age of the sample group could be more than 50 years? Interpret the results.

Find, $P(\bar{x}>50)$

In [78]:
print("The probability for b. is:", 1 - st.norm.cdf(50, loc=30.9, scale=1.8))

The probability for b. is: 0.0


> For this sample group, it is almost impossible for the group’s average age to be more than 50. However, it is still possible for an individual in this group to have an age greater than 50.

c.

In a sample of 49 prostitutes, what is the probability that the sum of the ages is no less than 1,600?

$P(\sum{x} \ge 1600)$

> $\sum{X} \sim N\bigl((n)(\mu_x), (\sqrt{n})(\sigma_x)\bigr)$

In [79]:
(49*30.9, math.sqrt(49)*9)

(1514.1, 63.0)

In [80]:
1 - st.norm.cdf(1600, loc=1514.1, scale=63)

0.08636374201938346

d.

Is it likely that the sum of the ages of the 49 prostitutes is at most 1,595? Interpret the results.

In [81]:
st.norm.cdf(1595, loc=1514.1, scale=63)

0.9004512360968469

This means that there is a 90% chance that the sum of the ages for the sample group n = 49 is at most 1595.

e.

Find the 95th percentile for the sample mean age of 65 prostitutes. Interpret the results.
> * $\bar{X} \sim N \bigl( \mu_x, \frac{\sigma_x}{\sqrt{n}} \bigr) $

In [82]:
(30.9, 9/math.sqrt(65))

(30.9, 1.116312611302876)

In [83]:
st.norm.ppf(0.95, loc=30.9, scale=1.11631261)

32.73617084537016

This indicates that 95% of the prostitutes in the sample of 65 are younger than 32.7 years, on average.

f.

Find the 90th percentile for the sum of the ages of 65 prostitutes. Interpret the results.

> $\sum{X} \sim N\bigl((n)(\mu_x), (\sqrt{n})(\sigma_x)\bigr)$

In [84]:
(65*30.9, math.sqrt(65)*9)

(2008.5, 72.56031973468694)

In [85]:
st.norm.ppf(0.90, loc=2008.5, scale=72.560319734)

2101.4897913515247

This indicates that 90% of the prostitutes in the sample of 65 have a sum of ages less than 2,101.5 years.