## Hypothesis Testing and P-Value

- A hypothesis can also be a null hypothesis, H<sub>0</sub>, and an alternate hypothesis, H<sub>1</sub>. You can write the null hypothesis and alternate hypothesis as follows:<br>
H<sub>0</sub>: μ<sub>1</sub> = μ<sub>2</sub><br>
H<sub>1</sub>: μ<sub>1</sub> != μ<sub>2</sub>
where μ<sub>1</sub> is the mean of one data and μ<sub>2</sub> is the mean of another data. We can use statistical tests to get your p-value. We use a t-test for continuous variables or data and a chi-square test for categorical variables or data. For more complex testing, you use ANOVA. If data is not normally distributed, use non-parametric tests.<br>
A P-value helps to determine the significance of statistical test results. A small p-value < alpha, which is usually 0.05, indicated that the observed data is sufficiently inconsistent with the null hypothesis, so the null hypothesis may be rejected. The alternate hypothesis is true at 95% confidence interval. A larger p-value means that we failed to reject null hypothesis.

### T-Test

A t-test is one of the more important tests in statistics. A t-test is used to determine whether the mean between two data points or samples are equal to each other.
H<sub>0</sub>: μ<sub>1</sub> = μ<sub>2</sub><br>
H<sub>1</sub>: μ<sub>1</sub> != μ<sub>2</sub>

#### Types of t-test

##### One-Sample Test
- To use a one-sample t-test in R, you can use the `t.test()` function

In [1]:
set.seed(123)

In [2]:
var1 <- rnorm(100,mean=2,sd=1)
var2 <- rnorm(100,mean=3,sd=1)
var3 <- rnorm(100,mean=3,sd=2)

In [3]:
data <- data.frame(var1,var2,var3)

In [4]:
t.test(data$var1,mu=0.6)


	One Sample t-test

data:  data$var1
t = 16.328, df = 99, p-value < 2.2e-16
alternative hypothesis: true mean is not equal to 0.6
95 percent confidence interval:
 1.909283 2.271528
sample estimates:
mean of x 
 2.090406 


H<sub>0</sub>: μ<sub>1</sub> = m<br>
H<sub>1</sub>: μ<sub>1</sub> != m<br>
m is 0.6, The p-value is 2.2e<sup>-16</sup>, so the p-value is less than 0.05, which is the alpha value. Therefore, the null hypothesis can be rejected. The alternate hypothesis, μ != 0.6 is true at 95% confidence interval.

###### Q2

In [6]:
x <- c(5,3,4,3,2,6,3,2,3,6,7,5,3)

In [7]:
x

In [8]:
t.test(x)


	One Sample t-test

data:  x
t = 8.8318, df = 12, p-value = 1.347e-06
alternative hypothesis: true mean is not equal to 0
95 percent confidence interval:
 3.013192 4.986808
sample estimates:
mean of x 
        4 


m is 0, the p-value is 1.347e<sup>-6</sup>, therefore we can reject the null hypothesis. Therefore the alternate hypothesis μ != 0 is true at 95% interval.

###### Q3
- We have collected a random sample of 31 energy bars from a number of different stores to represent the population of energy bars available to the general consumer. The labels on the bars claim that each bar contains 20 grams of protein.

In [2]:
x <- c(20.70,20.75,22.14,19.72,25.06,27.46,22.91,19.56,18.28,22.44,22.15,25.34,21.10,16.26,19.08,19.85,20.33,18.04,
       17.46,19.88,21.29,21.54,24.12,20.53,21.39,24.75,21.08,19.95,22.12,22.33,25.79)

In [3]:
t.test(x,mu=20)


	One Sample t-test

data:  x
t = 3.0668, df = 30, p-value = 0.004553
alternative hypothesis: true mean is not equal to 20
95 percent confidence interval:
 20.46771 22.33229
sample estimates:
mean of x 
     21.4 


###### Q4
We have the potato yield from 12 different farms. We know that the standard potato yield for the given variety is mu=20. Test the potato yield from these farms is significantly better than the standard yield.<br>
H<sub>0</sub>: mu=20
H<sub>1</sub>: mu > 20

In [1]:
x <- c(21.5,24.5,18.5,17.2,14.5,23.2,22.1,20.5,19.4,18.1,24.1,18.5)

In [2]:
t.test(x,mu=20,alternative = 'greater')


	One Sample t-test

data:  x
t = 0.20066, df = 11, p-value = 0.4223
alternative hypothesis: true mean is greater than 20
95 percent confidence interval:
 18.60874      Inf
sample estimates:
mean of x 
   20.175 


##### Two Sampled T-Test
The two sample unpaired t-test is when you compare two means of two independent samples. To use a two-sample unpaired t-test with a variance as equal in R:<br>
To test:<br>
H<sub>0</sub>: muA - muB = 0<br>
H<sub>1</sub>: muA - muB != 0

In [3]:
set.seed(123)

In [4]:
var1 <- rnorm(100,mean=2,sd=1)
var2 <- rnorm(100,mean=3,sd=1)
var3 <- rnorm(100,mean=3,sd=2)

In [5]:
data <- data.frame(var1,var2,var3)

In [6]:
t.test(data$var1,data$var2,var.equal = TRUE,paired = FALSE)


	Two Sample t-test

data:  data$var1 and data$var2
t = -6.0315, df = 198, p-value = 7.843e-09
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
 -1.0642808 -0.5398138
sample estimates:
mean of x mean of y 
 2.090406  2.892453 


- The p-value is 7.843e<sup>-0</sup> so it is less than 0.05, so we can reject the null hypothesis

###### Q1
A group of men and women who did workouts at a gym three times a week for a year. Then, their trainer measured the body fat. The table below shows the data

|Group|Body Fat Percentages|
|:---:|:------------------:|
|Men|13.3|6.0|20.0|8.0|14.0|19.0|18.0|25.0|16.0|24.0|15.0|1.0|15.0|
|Women|22.0|16.0|21.7|21.0|30.0|26.0|12.0|23.2|28.0|23.0|

Check whether the underlying populations of men and women at the gym have the same mean body fat.

In [7]:
men <- c(13.3,6.0,20.0,8.0,14.0,19.0,18.0,25.0,16.0,24.0,15.0,1.0,15.0)
women <- c(22.0,16.0,21.7,21.0,30.0,26.0,12.0,23.2,28.0,23.0)

In [8]:
t.test(men,women,var.equal=TRUE,paired=FALSE)


	Two Sample t-test

data:  men and women
t = -2.8, df = 21, p-value = 0.01073
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
 -12.798339  -1.889353
sample estimates:
mean of x mean of y 
 14.94615  22.29000 
