## Statistical Tests 

### Test statistics

The test statistic is a number calculated from a statistical test of a hypothesis. It shows how closely your observed data match the distribution expected under the null hypothesis of that statistical test.

The test statistic is used to calculate the p value of your results, helping to decide whether to reject your null hypothesis.

A test statistic describes how closely the distribution of your data matches the distribution predicted under the null hypothesis of the statistical test you are using.

The distribution of data is how often each observation occurs, and can be described by its central tendency and variation around that central tendency. Different statistical tests predict different types of distributions, so it’s important to choose the right statistical test for your hypothesis.

The test statistic summarizes your observed data into a single number using the central tendency, variation, sample size, and number of predictor variables in your statistical model.

Generally, the test statistic is calculated as the pattern in your data (i.e., the correlation between variables or difference between groups) divided by the variance in the data (i.e., the standard deviation).

### TYPES OF TEST STATISTICS

![Screenshot%202023-07-20%20063723.png](attachment:Screenshot%202023-07-20%20063723.png)

In practice, you will almost always calculate your test statistic using a statistical program (python, R, SPSS, Excel, etc.), which will also calculate the p value of the test statistic. However, formulas to calculate these statistics by hand can be found online.

### T TEST


A t test is a statistical test that is used to compare the means of two groups. It is often used in hypothesis testing to determine whether a process or treatment actually has an effect on the population of interest, or whether two groups are different from one another.

t test example: 

You want to know whether the mean petal length of iris flowers differs according to their species. You find two different species of irises growing in a garden and measure 25 petals of each species. You can test the difference between these two groups using a t test and null and alterative hypotheses.  

The null hypothesis (H0) is that the true difference between these group means is zero.  
The alternate hypothesis (Ha) is that the true difference is different from zero.

### WHEN TO USE T TEST

A t test can only be used when comparing the means of two groups (a.k.a. pairwise comparison). If you want to compare more than two groups, or if you want to do multiple pairwise comparisons, use an ANOVA test or a post-hoc test.

The t test is a parametric test of difference, meaning that it makes the same assumptions about your data as other parametric tests. The t test assumes your data:

-are independent  
-are (approximately) normally distributed  
-have a similar amount of variance within each group being compared (a.k.a. homogeneity of variance)    

If your data do not fit these assumptions, you can try a nonparametric alternative to the t test, such as the Wilcoxon Signed-Rank test for data with unequal variances.

Most statistical software (Python, R, SPSS, etc.) includes a t test function. This built-in function will take your raw data and calculate the t value. It will then compare it to the critical value, and calculate a p-value. This way you can quickly see whether your groups are statistically different.

## ANOVA

ANOVA, which stands for Analysis of Variance, is a statistical test used to analyze the difference between the means of more than two groups.

A one-way ANOVA uses one independent variable, while a two-way ANOVA uses two independent variables.

One-way ANOVA example:  
As a crop researcher, you want to test the effect of three different fertilizer mixtures on crop yield. You can use a one-way ANOVA to find out if there is a difference in crop yields between the three groups.

### How does an ANOVA test work?
ANOVA determines whether the groups created by the levels of the independent variable are statistically different by calculating whether the means of the treatment levels are different from the overall mean of the dependent variable.

If any of the group means is significantly different from the overall mean, then the null hypothesis is rejected.

ANOVA uses the **F test for statistical significance**. This allows for comparison of multiple means at once, because the error is calculated for the whole set of comparisons rather than for each individual two-way comparison (which would happen with a t test).

The F test compares the variance in each group mean from the overall group variance. If the variance within groups is smaller than the variance between groups, the F test will find a higher F value, and therefore a higher likelihood that the difference observed is real and not due to chance.

### Two- way ANOVA

A two-way ANOVA is used to estimate how the mean of a quantitative variable changes according to the levels of two categorical variables.  
Use a two-way ANOVA when you want to know how two independent variables, in combination, affect a dependent variable.

### Assumptions of ANOVA
The assumptions of the ANOVA test are the same as the general assumptions for any parametric test:

-Independence of observations: the data were collected using statistically valid sampling methods, and there are no hidden relationships among observations. If your data fail to meet this assumption because you have a confounding variable that you need to control for statistically, use an ANOVA with blocking variables.  

-Normally-distributed response variable: The values of the dependent variable follow a normal distribution.  

-Homogeneity of variance: The variation within each group being compared is similar for every group. If the variances are different among the groups, then ANOVA probably isn’t the right fit for the data.

### CHI SQUARED TEST 

A Pearson’s chi-square test is a statistical test for categorical data. It is used to determine whether your data are significantly different from what you expected.  

Pearson’s chi-square (Χ2) tests, often referred to simply as chi-square tests, are among the most common **nonparametric tests**.  

Nonparametric tests are used for data that don’t follow the assumptions of parametric tests, especially the assumption of a normal distribution.

If you want to test a hypothesis about the distribution of a categorical variable you’ll need to use a chi-square test or another nonparametric test. Categorical variables can be nominal or ordinal and represent groupings such as species or nationalities. Because they can only have a few specific values, they can’t have a normal distribution.

There are two types of Pearson’s chi-square tests:


-**The chi-square goodness of fit test** is used to test whether the frequency distribution of a categorical variable is different from your expectations.  

-**The chi-square test of independence** is used to test whether two categorical variables are related to each other.

### How to perform a chi-square test
The exact procedure for performing a Pearson’s chi-square test depends on which test you’re using, but it generally follows these steps:

Create a table of the observed and expected frequencies. This can sometimes be the most difficult step because you will need to carefully consider which expected values are most appropriate for your null hypothesis.

Calculate the chi-square value from your observed and expected frequencies using the chi-square formula.

Find the critical chi-square value in a chi-square critical value table or using statistical software.

Compare the chi-square value to the critical value to determine which is larger.

Decide whether to reject the null hypothesis. You should reject the null hypothesis if the chi-square value is greater than the critical value.   

If you reject the null hypothesis, you can conclude that your data are significantly different from what you expected.

In [None]:
from scipy.stats import chi2_contingency
#defining the table 
data = [[207, 282, 241], [234, 242, 232]]
stat, p, dof, expected = chi2_contingency(data)
#interpret p-value
alpha = 0.05
print ("p-value is " + str(p))
if p <= alpha:    
       print("Dependent (reject H0)")
else:
        print("Independent (H0 holds true)")

In [1]:
from scipy.stats import chi2_contingency
#defining the table 
data = [[207, 282, 241], [234, 242, 232]]
stat, p, dof, expected = chi2_contingency(data)
#interpret p-value
alpha = 0.05
print ("p-value is " + str(p))
if p <= alpha:    
       print("Dependent (reject H0)")
else:
        print("Independent (H0 holds true)")

p-value is 0.1031971404730939
Independent (H0 holds true)
