In [2]:
import numpy as np
from scipy import stats
import matplotlib.pyplot as plt
import pandas as pd
from IPython.display import Image

# Definition

A Statistical Hypothesis is an affirmation or proposition of some feature in a population, generally over a parameter. Test an hypothesis is compare the predictions of the reality that we observe on a sample

1. An emited hypothesis is usually called Null Hypothesis or $H_0$. It is whe the real value and it's hypothetical value are different due to random ocurrences, this means, there is no difference between them.
2. The Contrary Hypothesis is called alternative hypothesis or $H_1$ 

## Examples

1. Suspect that the nuts weight 100 grams, but really their weght is not 100 grams. To contrast this, we would rise:

$$H_0: \mu = 100$$
$$H_1: \mu \neq 100$$

2. Thinking of the proportion of people that vote for party "A" in elections now is inferior and they didn't done well. To contrast the hypothesis:

$$H_0:p \geq 0.35 $$
$$H_1:p < 0.35 $$

3. I would be happy to find out that they can't prove that my mean mark went down from 6.2 like it seems in the last tests. To contrast the hypothesis:

$$H_0:\mu \ge 6.2 $$
$$H_1:\mu \lt 6.2 $$
<br>



## Test Statistic for Means

### Case 1: Known σ
When the population Standard Deviation (σ) is known or σ is unknown but n > 30:

$$Z = \frac{(X̄ − μ)}{\dfrac{σ}{\sqrt(n)}}$$

- Random variable ~ Normal
- σ known, or σ unknown with n > 30



Example:
from 1500 cows that were feeded with a high protein fiber over a month. We have a sample of 29 cows with a **mean weight** gain of 7.7 libs. If the SD (Standard Deviation) of all the cows. Prove the hypothesis were the mean weight gain per cow was more than 5 libs
- Null Hypothesis: $H_0$ > 5 libs
- Alternative Hypothesis: $H_1$ $\leq$ 5 

$$Z = \frac{7.7 - 5}{\frac{7.1}{\sqrt{29}}} = 2.049$$

With this, if we need a 5% of significance ($\alpha$), if we go to the table we got that z = 1.645, then our $Z$ > z and we reject $H_0$ with $\alpha$ = 5% 

In [9]:
# Other example with mean height of man over 18.
# alpha = 5%, sigma = 4
# H_0 = the mean height of men is 180cm
sample = 15
data = np.array([167, 167, 168, 168, 168, 169, 171, 172, 173, 175, 175, 175, 177, 182, 195])
sigma = 4
mean = data.mean()
mu = 180
alpha = 0.05

# H_0 = 180
# H_1 != 180

Z = (mean-mu)/(sigma/np.sqrt(sample))
print(f'Z value is: {Z:.4f}')

# Calculating critical Z
z = stats.norm.ppf(alpha/2) 
print(f"Critial z is: {z:.4f}")
print(f'Due the value are completely different, then we reject H_0')

Z value is: -6.3259
Critial z is: -1.9600
Due the value are completely different, then we reject H_0


#### Case 2 (T-Test): Unknown σ and Small Sample (n < 30)

$$t = \dfrac{X̄ − μ}{\dfrac{S}{\sqrt(n)}}$$


- σ unknown
- Small sample: n < 30
- $S$ is the sample standard deviation

**Hypothesis Testing Problem**

Let $X$ be the variable **"profitability of a certain type of investment funds after a strong appreciation of the Euro against the Dollar."**  
It is considered that the mean of this variable is 15.  
An economist claims that this average profitability **has changed**, so a study is carried out under the conditions described above, using a sample of 9 funds whose **sample mean** is $\bar{x}$ = 15.308 and whose **sample variance** is $s^2$ = 0.193.

Task
1. **State the necessary hypotheses** and **test the economist's claim** at a **5% significance level**.
2. Based on the result of part 1, **reason whether the 95% confidence interval** for the population mean **will include the value 15**.

In [None]:
# H_0 the mean didn't change
# H_1 the mean != 15
alpha = 0.05
x_bar = 15.308
mu = 15
variance = 0.193
sigma = np.sqrt(variance)
n = 9


t = (x_bar-mu)/(sigma/np.sqrt(n))
print(f"The t score is: {t:5.4f}")

# To reject the H_0 then this has not be true -->  -crit_t < t < crit_t 
critical_t = stats.t.ppf(1-(alpha/2), df=n-1)
print(f'The critical t is: {np.abs(critical_t)}')

# Calculating p-value
pvalue_one_tail = 1 - (stats.t.cdf(t, df= (n-1)))
print(f'The P-Value of the 2 tails is: {pvalue_one_tail*2:5.4f}')

print(f'Because of t is between +- Critical T, there is no evidence to reject H_0')
# The p-value is the probability of obtaining a result as extreme or more extreme than the observed one, assuming the null hypothesis is true.
# this means if 
#                   p ≪ alpha              Reject H_0, the data is weird if H_0 were true
#                   p ≈  alpha              Weak evidence
#                   p >  alpha              Don't Reject H_0, the data is compatible
print(f"And because of P-Value > alpha, we don't reject H_0")

The t score is: 2.1033
The critical t is: 2.306004135204166
This means the mean didn't change from 15
The P-Value of the 2 tails is: 0.0686


## Test Statistic for Proportions

When working with proportions:

$$Z = \dfrac{p̂ − p}{\sqrt{\dfrac{p(1 − p)}{n}}}$$


- Random variable ~ Binomial (or Normal approximation)
- σ unknown, n > 30


## **Errors in Hypothesis Testing**

### Error Type I ($\alpha$)
Occurs when we **reject the null hypothesis $H_0$** even though it is **true**. Is also called a **false positive**.
$$Probability: \alpha \text{  (significance level)}$$
Example: Concluding a new drug works when it actually doesn't.



### Error Type II ($\beta$)
Occurs when we **fail to reject $H_0$** even though the **alternative hypothesis $H_1$** is **true**. Is also called a **false negative**.
$$ Probability: \beta $$
Example: Concluding a new drug doesn't work when it actually does.



### Statistical Power
Power = **1 − $\beta$**. Represents the probability of **correctly rejecting $H_0$** when $H_1$ is true.
$$High power → less chance of missing a real effect.$$

### **Confusion Matrix**
|   Sample-based decision   | $H_0$ is True | $H_0$ is False |
|----------------------------------------|:------------------:|:--------------:|
| **Don't reject $H_0$**  | <font color='green'>__Correct decision__</font> <br>(Probability = $1-\alpha$)     |<font color='red'>__Type II Error__ </font> <br> Don't Reject $H_0$ when is false <br>(Probability = $\beta$) |
| **Reject $H_0$**   | <font color='red'>__Type I Error__</font>  <br>Reject $H_0$ when is true <br>(Probability = $\alpha$)  | <font color='green'>__Right Decision__</font> <br>(probabilidad = $1-\beta$)|