NORMAL DISTRIBUTION

In probability theory, a normal (or Gaussian or Gauss or Laplace–Gauss) distribution is a type of continuous probability distribution for a real-valued random variable. The general form of its probability density function is
$$ f(x) = \frac{1}{σ*\sqrt{2*{\Pi}}}*e^{\frac{-1}{2}{(\frac{x-μ}{σ})}^2}$$
where:

+ The parameter μ is the mean or expectation of the distribution

+ The parameter σ is its standard deviation

+ σ^2 is the variance

A random variable with a Gaussian distribution is said to be normally distributed, and is called a normal deviate.

Properties of the Normal  Distribution:

+ The mean, mode and median are all equal

+ The curve is symmetric at the center (i.e. around the mean, μ)

+ Because the distribution is symmetric, one-half (.50 or 50%) lies on either side of the mean

+ The total area under the curve is 1

+ If several independent random variables are normally distributed then their sum will also be normally distributed

+ The mean of the sum will be the sum of all the individual means

E(S) = E(X1) + E(X2) + E(X3) + ... + E(Xn)

+ The variance of the sum will be the sum of all the individual variances

V(S) = V(X1) + V(X2) + V(X3) + ... + V(Xn)


Exercise 1: Let X1, X2, and X3 be independent random variables that are normally distributed with means and variances as shown. Find mean and variance of Q = X1 - 4X2 + 3X3


|   | Mean  | Variance  |   
|---|---|---|
|  X1 | 10  | 1  |   
|  X2 | -20  | 2  | 
|  X3 | 30  | 3  |

$$E(Q) = 1*10 - 4*(-20) + 3*30 = 180$$

$$V(Q) =1^2*(1) +(-4)^2*2 + 3^2*3 = 60$$ 


The Standard Normal Distribution 

The standard normal random variable, Z, is the normal random variable with mean μ = 0 and standard deviation σ = 1 


Exercise 2: Finding Probabilities of the Standard Normal Distribution: $$P(-1 <= Z <= 1)$$


In [12]:
from scipy.stats import norm 
# The standard normal distribution has mean = 0  and sd = 1
mean = 0
sd = 1

# First, find P(Z <= -1)
Z1 = -1
P1 = norm(loc = mean , scale = sd).cdf(Z1)

# Then, find P(Z <= 1)
Z2 = 1
P2 = norm(loc = mean , scale = sd).cdf(Z2)

# P(-1 <= Z <= 1) = P(Z <= 1) - P(Z <= 1)
P = P2 - P1

# Print output
print(P)


0.6826894921370859


The Transformation of Normal Random Variables:

An area under any normal distribution is equivalent to an area under the standard normal.

The transformation of X to Z:

$$ Z = \frac{X - μ(X)}{σ(X)}$$

Exercise 3: Using the normal transformation to find P(50 <= X <= 60) with μ = 55, σ = 5  

$$P(50 <= X <= 60) = P(\frac{50 - 55}{5} <= Z <= \frac{60-55}{5}) = P(-1 <= Z <= 1) = 0.683$$

Confidence Interval or Interval Estimate

(https://www.simplypsychology.org/confidence-interval.html)

The confidence interval (CI) is a range of values that’s likely to include a population value with a certain degree of confidence. It is often expressed as a % whereby a population mean lies between an upper and lower interval.

What does a 95% confidence interval mean?

The 95% confidence interval is a range of values that you can be 95% confident contains the true mean of the population. Due to natural sampling variability, the sample mean (center of the CI) will vary from sample to sample.

The confidence is in the method, not in a particular CI. If we repeated the sampling method many times, approximately 95% of the intervals constructed would capture the true population mean.

Therefore, as the sample size increases, the range of interval values will narrow, meaning that you know that mean with much more accuracy compared with a smaller sample.

Confidence Interval for μ when σ is Known 
$$ \bar{X} - Z(\frac{\alpha}{2})*\frac{σ}{\sqrt{n}} <= μ <= \bar{X} + Z(\frac{\alpha}{2})*\frac{σ}{\sqrt{n}} $$
where:

+ $\bar{X}$ is sample mean or sample average

+ σ is the population standard deviation

+ n is the sample size

+ $\alpha$ is significance level

+ $(1-\alpha)*100\%$ is called confidence level

+ Z is the critical value

Exercise 4: A sample of size n = 100 produced the sample mean = 16. Assuming the population standard deviation σ = 3, compute a 95% confidence interval for the population mean µ

In [25]:
# Set variables
n = 100 
X_bar = 16
σ = 3
alpha = 1 - 0.95

# Find critical value Z(alpha/2)
import scipy.stats
Z = scipy.stats.norm.ppf(alpha/2)

# Find lower value
import math
Lower_Value = X_bar + Z*(σ/(math.sqrt(n)))

# Find the upper value
Upper_Value = X_bar - Z*(σ/(math.sqrt(n)))

# Print the result
print("We are 95% confident that the population mean is from {Lower_Value} to {Upper_Value} ".format(Lower_Value=Lower_Value,Upper_Value=Upper_Value))


We are 95% confident that the population mean is from 15.412010804637983 to 16.587989195362017 


Confidence Interval for μ when σ is Unknown 
$$ \bar{X} - t(\frac{\alpha}{2},n-1)*\frac{sd}{\sqrt{n}} <= μ <= \bar{X} + t(\frac{\alpha}{2},n-1)*\frac{sd}{\sqrt{n}} $$
where:

+ $\bar{X}$ is sample mean or sample average

+ sd is the sample standard deviation

+ n is the sample size

+ n - 1 is the degreew of freedom

+ $\alpha$ is significance level

+ $(1-\alpha)*100\%$ is called confidence level

+ t is the critical value

Exercise 5: A blood analyst wants to estimate the average AFP index of the Vietnamese people.  A random blood sample of  size 15 yields an average of 10.37 mg/ml and a standard deviation of s = 3.5 ng/ml.  Assuming a normal population of the AFP values, give a 95% confidence interval for the average AFP value of the Vietnamese population? (AFP=alpha-fetoprotein)


In [26]:
# Set variables
n = 15 
X_bar = 10.37
sd = 3.5
alpha = 1 - 0.95

# degree of freedom = n -1
df = n - 1

# Find critical value Z(alpha/2)
import scipy.stats
t = scipy.stats.t.ppf(alpha/2,df)

# Find lower value
import math
Lower_Value = X_bar + t*(sd/(math.sqrt(n)))

# Find the upper value
Upper_Value = X_bar - t*(sd/(math.sqrt(n)))

# Print the result
print("We are 95% confident that the population mean is from {Lower_Value} to {Upper_Value} ".format(Lower_Value=Lower_Value,Upper_Value=Upper_Value))


We are 95% confident that the population mean is from 8.431764604523753 to 12.308235395476245 


Large-Sample Confidence Intervals for the Population Proportion, p
$$p' - Z(\frac{\alpha}{2})*\sqrt{\frac{p'q'}{n}} <= p <= p' + Z(\frac{\alpha}{2})*\sqrt{\frac{p'q'}{n}}$$
where:

+ p is the population proportion

+ p' is the sample proportion (sometimes it is denoted as P^ and read P hat)

+ q' = 1 - p'



Exercise 6: Out of 250 patients treated with a particular drug, 206 recovered completely. Find a 95% confidence interval for overall proportion of patients who can be expected to recover when treated with this drug.

In [31]:
# Set variables
n = 250
p_hat = 206/n
q_hat = 1 - p_hat
alpha = 1 - 0.95

# Find critical value Z(alpha/2)
import scipy.stats
Z = scipy.stats.norm.ppf(alpha/2)

# Find lower value
import math
Lower_Value = p_hat + Z*math.sqrt((p_hat*q_hat)/n)

# Find upper value
Upper_Value = p_hat - Z*math.sqrt((p_hat*q_hat)/n)

# Print the result
print("We are 95% confident that the proportion of patients who can be expected to recover when treated with this drug is from {Lower_Value} to {Upper_Value} ".format(Lower_Value=Lower_Value,Upper_Value=Upper_Value))



We are 95% confident that the proportion of patients who can be expected to recover when treated with this drug is from 0.7767939103923087 to 0.8712060896076912 
