# EGCI 305: Chapter 5 (Estimation)

Outline
> 1. [Packages](#ch5_packages)

> 2. [Example: bushing's hole](#ch5_ex_bushing)

> 3. [T distribution](#ch5_t)
>    - [Example: T distribution](#ch5_ex_t)
>    - [Example: fat content](#ch5_ex_fat)

> 4. [Chi-square distribution](#ch5_chi)
>    - [Example: breakdown voltage](#ch5_ex_breakdown)

<a name="ch5_packages"></a>

## Packages
> - **numpy** -- to work with array manipulation
> - **matplotlib** -- to work with visualization (backend)
> - **seaborn** -- to work with high-level visualization
> - **scipy.stats** -- to work with stat
> - **sympy** -- to work with integral calculation

In [None]:
# Import necessary libraries for statistical analysis and visualization
import numpy as np  # For numerical operations
import matplotlib.pyplot as plt  # For plotting
import seaborn as sns  # For statistical data visualization

# Print the versions of the imported libraries
print("Numpy version =", np.version.version)
print("Seaborn version =", sns.__version__)

import scipy  # Import SciPy for scientific computations
print("Scipy version =", scipy.__version__)

# Import statistical functions from SciPy
from scipy import stats
from scipy.stats import norm  # Normal distribution
from scipy.stats import t  # T distribution
from scipy.stats import chi2  # Chi-squared distribution

# Import sympy for symbolic mathematics and integral calculations
import sympy
print("Sympy version =", sympy.__version__)

from sympy import *


<a name="ch5_ex_bushing"></a>

### Example : bushing's hole
> - n = 40
> - xbar = 5.426
> - sd = 0.1
> - confidence = 0.90
#### Note that
> - X has normal distribution with mean = 5.426, sd = 0.1
> - $\bar{X}$ has normal distribution with mean = 5.426, sd = 0.1 /$\sqrt{40}$

In [None]:
# Calculate the critical z-value for a 90% confidence interval
zvalue = norm.ppf(1-0.05)  # 1 - (alpha/2) for two-tailed test
print("Z value = %.2f" % zvalue)

In [None]:
loc = 5.426
scale = 0.1 / float(sqrt(40))

### Interval function only supports 2-sided interval
interval = norm.interval(0.90, loc, scale)
print("Interval    =", np.round(interval, 3) )
print("Lower bound = %.3f" % interval[0])
print("Upper bound = %.3f" % interval[1])

In [None]:
### If using default loc = 0, scale = 1
interval = norm.interval(0.90)
print("Interval    =", np.round(interval, 3) )
print("Lower bound = %.3f" % interval[0])
print("Upper bound = %.3f" % interval[1])

<a name="ch5_t"></a>

## T Distribution
- **[Manual: scipy.stats.t](https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.t.html)**
    > - For t<sub>(df)</sub> --> loc = $\mu$, scale = s /$\sqrt n$
    > - Default loc = 0
    > - Default scale = 1

<a name="ch5_ex_t"></a>

### Example : T distribution
> t<sub>(df=5)</sub>

**Questions**
> - Q1 : P(T < ?) = 0.95
> - Q2 : P(T < 2.015<sub>(df=5)</sub>) = ?

In [None]:
### Try df = 5 and df = 40
df = 5

Q1_t = t.ppf(0.95, df)
Q1_z = norm.ppf(0.95)
print("Q1 t = %.3f" % Q1_t)
print("Q1 z = %.3f" % Q1_z, "\n")

Q2_t = t.cdf(2.015, df)
Q2_z = norm.cdf(2.015)
print("Q2 t = %.3f" % Q2_t)
print("Q2 z = %.3f" % Q2_z)

<a name="ch5_ex_fat"></a>

### Example : fat content

In [None]:
A = np.array( [25.2, 21.3, 22.8, 17.0, 29.8, 21.0, 25.5, 16.0, 20.9, 19.5] )

### By default, numpy calculates population SD (so, set df for sample SD)
print("Sample mean   = %.4f" % A.mean())
print("Sample sd     = %.4f" % A.std(ddof=1)) 
print("Population sd = %.4f" % A.std())
print()

### Check Q-Q plot (values vs. ideal normal line)
fig = plt.figure( figsize = (3,2) )
stats.probplot(A, dist = 'norm', plot = plt)
plt.show()

In [None]:
### By default, panda calculates sample SD

import pandas as pd
pd.DataFrame(A).describe().transpose()

In [None]:
alpha = 0.025
q = 1 - alpha

tvalue = t.ppf(q, 9)
print("t value = %.3f" % tvalue)

In [None]:
loc = 21.9
scale = 4.134 / float(sqrt(10))

### Interval function only supports 2-sided interval
interval = t.interval(0.95, 9, loc, scale)
print("Interval    =", np.round(interval, 2) )
print("Lower bound = %.2f" % interval[0])
print("Upper bound = %.2f" % interval[1])

In [None]:
### If using default loc = 0, scale = 1
interval = t.interval(0.95, 9)
print("Interval    =", np.round(interval, 2) )
print("Lower bound = %.2f" % interval[0])
print("Upper bound = %.2f" % interval[1])

<a name="ch5_chi"></a>

## Chi-Square Distribution
- **[Manual: scipy.stats.chi2](https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.chi2.html)**
    > - For $\chi$<sup>2</sup><sub>(df)</sub>
    > - Default loc = 0
    > - Default scale = 1

<a name="ch5_ex_breakdown"></a>

### Example : breakdown voltage

In [None]:
B = np.array( [1470, 1510, 1690, 1740, 1900, 2000, 2030, 2100, 2190, 
               2200, 2290, 2380, 2390, 2480, 2500, 2580, 2700] )

print("Sample size     = %d" % B.size)
print("Sample mean     = {:,.1f}".format(B.mean()) )
print("Sample sd       = {:,.1f}".format(B.std(ddof=1)) )
print("Sample variance = {:,.1f}".format(B.var(ddof=1)) )
print()

### Check Q-Q plot (values vs. ideal normal line)
fig = plt.figure( figsize = (3,2) )
stats.probplot(B, dist = 'norm', plot = plt)
plt.show()

#### Note that the alpha subscript of chi-square in slide means RHS area
>- $\chi$<sup>2</sup><sub>0.025</sub> means RHS area = 0.025 --> q = 1-0.025 = 0.975
>- $\chi$<sup>2</sup><sub>0.975</sub> means RHS area = 0.975 --> q = 1-0.975 = 0.025

In [None]:
q_lower = 1-0.025
q_upper = 1-0.975

chi2_lower = chi2.ppf(q_lower, 16)
chi2_upper = chi2.ppf(q_upper, 16)
print("chi2 lower = %.3f" % chi2_lower)
print("chi2 upper = %.3f" % chi2_upper)