### Introduction to Hypothesis Testing

In the statistical world, a hypothesis is an assumption about a __population parameter__. Examples of hypotheses (that’s plural for hypothesis) include the following:
- The average adult drinks 1.7 cups of coffee per day.
- Twelve percent of undergraduate students will go directly to graduate school after graduation.
- No more than 2 percent of our products sold to customers are defective

In each case, we have made a statement about the population that may or may not be true. The purpose of hypothesis testing is to make a statistical conclusion about accepting or not accepting such statements.

We start by having a null hypothesis $H_0$ which represents the status quo, that is the population is not effected by whatever intervention applied. The alternate hypothesis $H_1$ is to denote that there is an effect on the population. 


We can visualise the procedure as follows:
- Collect a sample of size n, and calculate the test statistic, which in this case is the sample mean.
- Plot the sample mean on the x-axis of the sampling distribution curve.
- If the sample mean falls within the white region, we do not reject $H_0$. That is, we do not have enough evidence to support $H_1$, the alternative hypothesis, which states that the population mean is not equal to 6.0 days.
- If the sample mean falls in either shaded region, otherwise known as the rejection region, we reject $H_0$. That is, we have enough evidence to support $H_1$, which results in our belief that the true population mean is not equal to 6.0 day

Because there are two rejection regions in this figure, we have a two-tail hypothesis test.

For example, we may have a null hypothesis as:

$$H_0:\mu=6.0 days$$
$$H_1:\mu\ne6.0 days$$

where $\mu$ is the mean number of days for a medical suture to dissolve. We invented a new suture, and want to test if there is a differences in the days it will take to dissolve. For the new suture, we found the mean to be 6.1 days. Is the new suture able to dissovle faster than the normal sutures?

Let’s say that I know that the standard deviation of the population, $\sigma$, is 0.5 days, and my sample size to test the hypothesis, n, is 30. We’ll also set $\alpha$ = 0.05, which means I’m willing to accept a 5 percent chance of committing a Type I error. 

When the samples are less than 30, we use the t-distribution. If the population standard deviation is unknown we use the sample standard distribution divided by the square root of the sample size. 

In [3]:
from scipy import stats
import math
cf=stats.norm.interval(.95,loc=6,scale=0.5/math.sqrt(30))
print(cf)


(5.821080585628284, 6.178919414371716)


In [10]:
def zscore(xtest,mean,std,size):
    return (xtest-mean)/(std/math.sqrt(size))
zx=zscore(6.1,6.0,0.5,30)

#explain what this means
print(zx,'set 𝛼=0.05,the critical value of z test is 1.96.because zx < 1.96, the conclusion is do not reject null Hypothesis') 
#explain what this means
print(1-stats.norm.cdf(zx),'it means p-value.Since p-value > 𝛼/2 , the conclusion is do not reject null Hypothesis') 

1.0954451150103284 set 𝛼=0.05,the critical value of z test is 1.96.because zx < 1.96, the conclusion is do not reject null Hypothesis
0.1366608391461499 it means p-value.Since p-value > 𝛼/2 , the conclusion is do not reject null Hypothesis


!['hypothesis'](lab4-img\hypo_1.png)

Is the new suture better, in terms of being able to dissolve faster, than the old sutures? Why?

A random sample of 100 working mothers spend an average of 11.5 minutes per day talking with their children. Assume the population standard deviation is 2.3 minutes and mean is 11 mins. 

The null hypothesis is $\mu=11$. The alternative hypothesis is $\mu>11$. Test this claim. The calculation is shown below. 

#### Your Turn
Explain the calculation below.


In [11]:
def zscore(xtest,mean,std,size):
    return (xtest-mean)/(std/math.sqrt(size))
zx=zscore(11.5,11.0,2.3,100)
print (zx) 
print(1-stats.norm.cdf(zx),'is p-value,it means minimum significanse level which can reject null hypothesis')

2.173913043478261
0.014855833143976649 is p-value,it means minimum significanse level which can reject null hypothesis


### Your Turn

1. Formulate a hypothesis statement for the following claim: “The average adult drinks 1.7 cups of coffee per day.” A sample of 35 adults drank an average of 1.95 cups per day. Assume the population standard deviation is 0.5 cups. Using
$\alpha$ = 0.10, test your hypothesis. What is your conclusion?

2. Formulate a hypothesis statement for the following claim: “The average age of our customers is less than 40 years old.” A sample of 50 customers had an average age of 38.7 years. Assume the population standard deviation is 12.5 years. Using $\alpha$ = 0.05, test your hypothesis. What is your conclusion?


In [4]:
import math
from scipy import stats

def zscore(xtest,mean,std,size):
    return (xtest-mean)/(std/math.sqrt(size))

zy=zscore(1.7,1.95,0.5,35)
print(stats.norm.cdf(zy))
print("p-value=0.0015480< 𝛼, reject null hypothesis")

zy=zscore(38.7,40,12.5,50)
print(stats.norm.cdf(zy))
print("p-value=0.23105068476057322 > 𝛼, do not reject null hypothesis(accept null hypothesis)")

0.0015480102749795086
p-value=0.0015480< 𝛼, reject null hypothesis
0.23105068476057322
p-value=0.23105068476057322 > 𝛼, do not reject null hypothesis(accept null hypothesis)
