<a href="https://colab.research.google.com/github/wcj365/jay_data690/blob/main/13%20-%20Estimation%20with%20t-Distribution.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

## 13 - Estimation with t-distribution

### z-Distribution vs t-Distribution

z-distribution - when population standard deviation $\sigma$ is known (we have more confidence)

t-distribution - when population standard deviation is unknown (we have less confidence)

![image](https://github.com/wcj365/python-stats-dataviz/blob/master/images/z-and-t-distribution.gif?raw=1)

### Three Concepts of Statistical Inferece:
#### 1. Point Estimate

We use a sample statistics to estimate the population parameters:

Sample Mean $\bar{X}=\dfrac{1}{n}\sum\limits_{i=1}^n X_i$

Sample Variance $S^2=\dfrac{1}{n-1}\sum\limits_{i=1}^n (X_i-\bar{X})^2$

Sample Standard Deviation $S=\sqrt{\dfrac{1}{n-1}\sum\limits_{i=1}^n (X_i-\bar{X})^2}$

#### 2.Interval Estimate/Confidence Interval

How confident we are about our point estimate?

Confidence Interval = sample statistics $\pm$ margin of error

Margin of Error = Some multiplier M * Standard Error 

Confidence Interval ($\sigma$ known) = $\bar{x}\space\pm\space t_{\alpha/2}\left(\dfrac{\sigma}{\sqrt{n}}\right)$

Confidence Interval ($\sigma$ unknown) = $\bar{x}\space\pm\space t_{\alpha/2}\left(\dfrac{S}{\sqrt{n}}\right)$

Note:

- How confidence are you about your confidence? No 100%.

- if the population standard deviation is not known and we must use the smaple standard deviation to approximate the population standard deviation.


In [1]:
import math
import scipy.stats as stats
import numpy as np

### Point and interval estimate example

A random sample of **16** Americans yielded the following data on the number of pounds of beef consumed per year:

**118    115    125    110    112    130    117    112
115    120    113    118    119    122    123    126**

What is the average number of pounds of beef consumed each year per person in the United States?

### Step 1. Calculate Sample Size and Sample Mean

In [2]:
sample_data = [118, 115, 125, 110, 112, 130, 117, 112, 115, 120, 113, 118, 119, 122, 123, 126]
sample_size = len(sample_data)
print("Sample size =", sample_size)

Sample size = 16


In [3]:
sample_mean = round(np.mean(sample_data),2)
print("Sample mean =", sample_mean)

Sample mean = 118.44


So, our point estimate for the annual beef consumption per capita in US is **118.44 pounds**. That is an easy part. 

However, we don't know how good our point estimate is and how confident we are about the point estimate.

So, let's perform **interval estimate** so that we can provide a more "statistically correct" estimate.

### Step 2 - Calculate the Sample Standard Deviation & Sample Error

Sample Standard Deviation $S=\sqrt{\dfrac{1}{n-1}\sum\limits_{i=1}^n (X_i-\bar{X})^2}$

Sample Error = $\dfrac{S}{\sqrt{n}}$

Note: 

Pay attention to the **"n - 1"** in the Sample Standard Deviation formula. For population, it would have been just"n".

The default **Delta Degree of Freedom (DDOF)** is 0 which is applicable to populate data. 

For sample data, make sure to specify **ddof=1**.


In [4]:
sample_std = np.std(sample_data, ddof=1) 
sample_std = round(sample_std, 2)         
print("Sample Standard Deviation =", sample_std)

Sample Standard Deviation = 5.66


In [5]:
# Calculate Sample Standard Error

sample_std_err = sample_std / math.sqrt(sample_size)
sample_std_err = round(sample_std_err,2)
print("Sample Standard Error is", sample_std_err)

Sample Standard Error is 1.42


### Step 3 - Calculate t Critical Value using t-Distribution

At 95% confidence level, the t value is 2.13 according to the  Student T Distribution Table:

$\alpha$ = 1 - Confidence Level = 1 - 95% = 0.05

$\dfrac{\alpha}{2}$ = 0.025

n (sample size) = 16

df (degree of freedom) = n - 1 = 16 - 1 = 15

$t_{0.025,15}=2.13$

![Student T Table](https://github.com/wcj365/python-stats-dataviz/blob/master/images/StudentTTable.png?raw=1)



In [6]:
# Instead of looking it up in the t-table, 
# you use Python scipy.stats t-distribution.

t_value = stats.t.ppf(0.975, 15)
print("t critical value or t statistics = ", round(t_value,2))

t critical value or t statistics =  2.13


### Step 4 - Calculate Margin of Error
 Margin of Error = t-statistics * Sample Standard Error = $t_{\alpha/2,n-1}\left(\dfrac{s}{\sqrt{n}}\right)$

In [7]:
margin_of_error = round(t_value * sample_std_err,2) 
print("Margin of Error =", margin_of_error)

Margin of Error = 3.03


### Step 5 - Calculate Lower and Upper Limit of the Confidence Interval
- Lower Limit = Sample Mean - Margin of Error
- Upper Limit = Sample Mean + Margin of Error

In [8]:
lower_limit = sample_mean - margin_of_error
print ("Lower Limit = ", lower_limit)

Lower Limit =  115.41


In [9]:
upper_limit = sample_mean + margin_of_error
print ("Upper Limit = ", upper_limit)

Upper Limit =  121.47


### Step 6 - Now You have the Confidence Interval
Confidence Interval ($\sigma$ unknown) = $\bar{x}\space\pm\space t_{\alpha/2}\left(\dfrac{S}{\sqrt{n}}\right)$

In [10]:
print("The 95% Confidence Interval Estimate of American Annual Beef Consumption = (", lower_limit, ",", upper_limit, ")")

The 95% Confidence Interval Estimate of American Annual Beef Consumption = ( 115.41 , 121.47 )


### The End