In [3]:
import numpy as np
import pandas as pd
import scipy.stats as stats
import matplotlib.pyplot as plt
import seaborn as sns

%matplotlib inline

### Exercise 1

1. It is assumed that the mean systolic blood pressure is `μ = 120 mm Hg`. In the Honolulu Heart Study, a sample of `n = 100` people had an average systolic blood pressure of 130.1 mm Hg with a standard deviation of 21.21 mm Hg. Is the group significantly different (with respect to systolic blood pressure!) from the regular population?

   - Set up the hypothesis test.
   - Write down all the steps followed for setting up the test.
   - Calculate the test statistic by hand and also code it in Python. It should be 4.76190. What decision can you make based on this calculated value?

### Steps:

<b>1. Define the null hypothesis - This is our assumption about the population.

It is defined by <b>H0</b> and in this case <b>H0</b>: `μ = 120 mm Hg`

<b>2. Define the alternative hypothesis - This means, what if our assumption is not true, infact the value needs to be smaller.
    
It is defined by <b>Ha</b> and in this case <b>Ha</b>: `μ < 120 mm Hg`

<b>3. Decide a test statistic:

- A sample of `n = 100` people were messured and found to have a sample mean blood pressure of `130.1 mm Hg`.<br>
- It is known that the sample standard deviation `σ is 21.21 mm Hg`.<br>
- Degree of Freedom: as `n = 100` --> `df = 99` (n-1)

<b>--> As we don`t have "Population standard deviation" we can NOT use Z-Statistics.  We have to use T-Statistics! Even though the sample size is >30.

<b>4. Set the Confidence level / significance level:

This defines the rejection region / critical region (How strong has to be the evidence against the null hypothesis). Significance level is defined by greek letter ‘α’. Usual values for confidence level are ( 90%, 95%, 99%) -> significance levels = (0.10, 0.05, 0.01). Let’s use a confidence level of 95% -> α = 0.05

<b>5. Calculate the test statistic based on the given information:

![image.png](attachment:image.png)

<b>In our case:

t = ( sample_mean - pop_mean ) / ( pop_std_dev / np.sqrt(n) )

$t = \dfrac {130.1-120} {\dfrac {21.21} {\sqrt{100}}}$

In [16]:
t = (130.1 - 120) / ( 21.21 / np.sqrt(100) )
print("The t score is: {:.2f}".format(t))

The t score is: 4.76


<b>6. Where is located the `critical value` of the t-student distribution with 99 degrees of freedom and an area of 0.95?

In [17]:
tc = +- stats.t.ppf(0.95, df=99)
print("The tc score is: {:.2f}".format(tc))

The tc score is: -1.66


<b>7. Result:

`t = 4.76` > `tc = -1.66` --> The statistic is <b>NOT within the acceptable region</b> (-1.66/1.66) hence we need to <b>reject the null hypothesis.
    
<b>Ergo:  Yes, the group is significantly different (with respect to systolic blood pressure!) from the regular population.

### Exercise 2
2. In a packing plant, a machine packs cartons with jars. It is supposed that a new machine will pack faster on the average than the machine currently used. To test that hypothesis, the times it takes each machine to pack ten cartons are recorded. The results, in seconds, are shown in the tables in the file `Data/machine.txt`. Assume that there is sufficient evidence to conduct the t test, does the data provide sufficient evidence to show if one machine is better than the other