# Lab | Inferential statistics

- It is assumed that the mean systolic blood pressure is `μ = 120 mm Hg`. In the Honolulu Heart Study, a sample of `n = 100` people had an average systolic blood pressure of 130.1 mm Hg with a standard deviation of 21.21 mm Hg. 

## 1. Is the group significantly different (with respect to systolic blood pressure!) from the regular population?

### 1.1 Set up the hypothesis test.

In [7]:
from statistics import math

# Step 1) H0: μ = 120 mm Hg (population mean = sample mean)
# Step 2) H1: μ != 120 mm Hg
    
# Step 3) a = 0.05 (significance level)

# Step 4) Calculate the test statistic and corresponding p-value

# [Sample size > 30] -> Z test or t test
# [Unknown pop. variance] -> t test

sample_mean = 130.1
pop_mean = 120
sample_stddev = 21.21
n = 100

t_statistic = (sample_mean - pop_mean)/(sample_stddev/math.sqrt(n))
print("t statistic is: ", t_statistic)

t statistic is:  4.761904761904759


In [11]:
# Step 5) Drawing a conclusion.
# Comparing our statistic with the t test one
# t test: DF = 100 and a = 0.05 -> 1.984

if (t_statistic > 1.984) == True:
    print('We fail to reject H0 (population mean = sample mean) and we accept H1')

We fail to reject H0 (population mean = sample mean) and we accept H1


### 1.2 Write down all the steps followed for setting up the test.

1. Specify the Null Hypothesis
2. Specify the Alternative Hypothesis
3. Set the Significance Level (a)
4. Calculate the Test Statistic and Corresponding P-Value
5. Drawing a Conclusion

https://www.nedarc.org/statisticalhelp/advancedstatisticaltopics/hypothesisTesting.html

Step 1: Specify the Null Hypothesis
The null hypothesis (H0) is a statement of no effect, relationship, or difference between two or more groups or factors.  In research studies, a researcher is usually interested in disproving the null hypothesis.

Examples:
There is no difference in intubation rates across ages 0 to 5 years.
The intervention and control groups have the same survival rate (or, the intervention does not improve survival rate).
There is no association between injury type and whether or not the patient received an IV in the prehospital setting.

Step 2: Specify the Alternative Hypothesis
The alternative hypothesis (H1) is the statement that there is an effect or difference.  This is usually the hypothesis the researcher is interested in proving.  The alternative hypothesis can be one-sided (only provides one direction, e.g., lower) or two-sided.  We often use two-sided tests even when our true hypothesis is one-sided because it requires more evidence against the null hypothesis to accept the alternative hypothesis.

Examples:
The intubation success rate differs with the age of the patient being treated (two-sided).
The time to resuscitation from cardiac arrest is lower for the intervention group than for the control (one-sided).
There is an association between injury type and whether or not the patient received an IV in the prehospital setting (two sided).

Step 3: Set the Significance Level (a)
The significance level (denoted by the Greek letter alpha— a) is generally set at 0.05.  This means that there is a 5% chance that you will accept your alternative hypothesis when your null hypothesis is actually true. The smaller the significance level, the greater the burden of proof needed to reject the null hypothesis, or in other words, to support the alternative hypothesis.

Step 4: Calculate the Test Statistic and Corresponding P-Value
In another section we present some basic test statistics to evaluate a hypothesis. Hypothesis testing generally uses a test statistic that compares groups or examines associations between variables.  When describing a single sample without establishing relationships between variables, a confidence interval is commonly used.

The p-value describes the probability of obtaining a sample statistic as or more extreme by chance alone if your null hypothesis is true.  This p-value is determined based on the result of your test statistic.  Your conclusions about the hypothesis are based on your p-value and your significance level. 

Example:
P-value = 0.01 This will happen 1 in 100 times by pure chance if your null hypothesis is true. Not likely to happen strictly by chance.
Example:
P-value = 0.75 This will happen 75 in 100 times by pure chance if your null hypothesis is true. Very likely to occur strictly by chance.

Cautions About P-Values
Caution SignYour sample size directly impacts your p-value.  Large sample sizes produce small p-values even when differences between groups are not meaningful.  You should always verify the practical relevance of your results.  On the other hand, a sample size that is too small can result in a failure to identify a difference when one truly exists. 

Plan your sample size ahead of time so that you have enough information from your sample to show a meaningful relationship or difference if one exists. See calculating a sample size for more information.

Example:
Average ages were significantly different between the two groups (16.2 years vs. 16.7 years; p = 0.01; n=1,000). Is this an important difference?  Probably not, but the large sample size has resulted in a small p-value.

Example:
Average ages were not significantly different between the two groups (10.4 years vs. 16.7 years; p = 0.40, n=10). Is this an important difference?  It could be, but because the sample size is small, we can't determine for sure if this is a true difference or just happened due to the natural variability in age within these two groups.
If you do a large number of tests to evaluate a hypothesis (called multiple testing), then you need to control for this in your designation of the significance level or calculation of the p-value.  For example, if three outcomes measure the effectiveness of a drug or other intervention, you will have to adjust for these three analyses.

Step 5: Drawing a Conclusion

P-value <= significance level (a) => Reject your null hypothesis in favor of your alternative hypothesis.  Your result is statistically significant.

P-value > significance level (a) => Fail to reject your null hypothesis.  Your result is not statistically significant.

Hypothesis testing is not set up so that you can absolutely prove a null hypothesis.  Therefore, when you do not find evidence against the null hypothesis, you fail to reject the null hypothesis. When you do find strong enough evidence against the null hypothesis, you reject the null hypothesis.  Your conclusions also translate into a statement about your alternative hypothesis.  When presenting the results of a hypothesis test, include the descriptive statistics in your conclusions as well.  Report exact p-values rather than a certain range.  For example, "The intubation rate differed significantly by patient age with younger patients have a lower rate of successful intubation (p=0.02)."  Here are two more examples with the conclusion stated in several different ways.

Example:

H0: There is no difference in survival between the intervention and control group.

H1: There is a difference in survival between the intervention and control group.

a = 0.05; 20% increase in survival for the intervention group; p-value = 0.002

Conclusion:

Reject the null hypothesis in favor of the alternative hypothesis.
The difference in survival between the intervention and control group was statistically significant.
There was a 20% increase in survival for the intervention group compared to control (p=0.001).

Example:

H0: There is no difference in survival between the intervention and control group.

H1: There is a difference in survival between the intervention and control group.

a = 0.05; 5% increase in survival between the intervention and control group; p-value = 0.20

Conclusion:

Fail to reject the null hypothesis.
The difference in survival between the intervention and control group was not statistically significant.
There was no significant increase in survival for the intervention group compared to control (p=0.20).

### 1.3 Calculate the test statistic by hand and also code it in Python. It should be 4.76190. We will take a look at how to make decisions based on this calculated value.

- Made on item 1.1, variable t_statistic

In [12]:
t_statistic

4.761904761904759

## 2. If you finished the previous question, please go through the code for principal_component_analysis_example provided in the files_for_lab folder .