# Inferential Statistical Analysis 
### (t-test, z-test, chi squared test and ANOVA Classification)

**Inferential statistics** is a powerful method used in data analysis to **draw conclusions** and **make predictions** about a **larger population** based on **sample data**. 

1. **Descriptive Statistics** vs. **Inferential Statistics**:
   - **Descriptive statistics** summarize characteristics of a data set, such as the distribution, central tendency (averages), and variability. These statistics precisely describe the data collected from an entire population.
   - **Inferential statistics**, on the other hand, allow us to make **inferences** based on a sample. Since collecting data from an entire population is often impractical or expensive, inferential statistics help us estimate population parameters and test hypotheses.

2. **Estimating Population Parameters from Sample Statistics**:
   - Suppose you collect data from a sample (e.g., SAT scores of 11th graders in a school). Inferential statistics allow you to estimate characteristics of the **larger population** from which the sample is drawn.
   - For example, you can estimate the mean SAT score for all 11th graders in the US based on your sample data.

3. **Hypothesis Testing**:
   - Inferential statistics also involve **testing hypotheses**. You formulate a hypothesis (e.g., the relationship between SAT scores and family income) and use sample data to draw conclusions about the entire population.
   - Proper sampling methods (random and unbiased) are crucial to ensure valid statistical inferences and generalizability.

4. **Sampling Error**:
   - Since sample size is smaller than the population, some aspects of the population may not be captured by sample data.
   - Inferential statistics account for this by acknowledging that there is always some **uncertainty** due to sampling error.
  

Refer PPT for 

# Types of Samplings:
- Simple 
- Stratified
- Cluster
- Systematic


# Sampling Techniques


- For Smaller Samples (less than 30 samples)
  1. t-test
  2. f-test
  3. chi-square test
- Large Samples (Greater than 30)
  1. Z-Score

# 1. t-test (t-distribution)

* A t-test is a statistical tool used to compare the **means** of **two groups**. 
* It helps assess whether the observed difference between the means is likely due to random chance or an actual underlying difference.

**Common scenarios:**

* **Comparing two groups:** Does a new fertilizer increase plant growth compared to the old one? (Independent samples)
* **One group, pre-post treatment:**  Does a training program improve test scores? (Paired samples)

**Key aspects:**

* **Hypothesis testing:** We set up a **null hypothesis** that the means are **equal** and an **alternative hypothesis** that they are **different**.
* **T-statistic:** This statistic considers the difference in means, accounting for the variability within each group.
* **P-value:** A p-value from a t test is the probability that the results from your sample data occurred by chance. This value indicates the probability of observing such a difference if the null hypothesis were true. Lower p-values (typically below 0.05) suggest the observed difference is unlikely due to chance, supporting the alternative hypothesis. 

**Simple analogy:**

Imagine a seesaw. The t-test checks if the weights (means) on each side are balanced (no difference). A bigger difference in means (greater distance from the center) and smaller variability within each group (shorter plank lengths) tilt the seesaw more, favoring the alternative hypothesis (weights are imbalanced).


* T-tests have assumptions about the data (e.g., normality, equal variances). 
* Consulting a statistician is recommended for in-depth analysis and interpretation.

[Reference](https://www.statisticshowto.com/probability-and-statistics/t-test/)

Larger t scores = more difference between groups.
Smaller t score = more similarity between groups.

**To compare three or more means, use an ANOVA instead.**

Types of t-test:

- An Independent Samples/between-samples and unpaired-samples t-test: compares the means for two groups.
- A Paired sample t-test/correlated pairs t-test/dependent samples t test: compares means from the same group at different times (say, one year apart).
- A One sample t-test: tests the mean of a single group against a known mean.

## When to Choose a Paired T Test / Paired Samples T Test / Dependent Samples T Test
Choose the paired t-test if you have two measurements on the same item, person or thing. But you should also choose this test if you have two items that are being measured with a unique condition. For example, you might be measuring car safety performance in vehicle research and testing and subject the cars to a series of crash tests. Although the manufacturers are different, you might be subjecting them to the same conditions.

## [Degrees of Freedom](https://www.statisticshowto.com/probability-and-statistics/hypothesis-testing/degrees-of-freedom/)

- It follows t-distribution similar to normal distribution.
- Due to presence of df (v), tail heavyness will be there. 




# 2. Z test (Large Sample Test)

[Ref 1](https://www.cuemath.com/data/z-test/)

[Ref 2](https://statisticsbyjim.com/hypothesis-testing/z-test/)

Z test is a statistical test that is conducted on data that approximately follows a normal distribution. The z test can be performed on one sample, two samples, or on proportions for hypothesis testing. It checks if the means of two large samples are different or not when the population variance is known.




# 3. Chi Square Test

[Reference 1](https://www.scribbr.com/statistics/chi-square-tests/)

- For categorical data


# 4. ANalysis Of VAriance (ANOVA)
### [Reference](https://www.scribbr.com/statistics/one-way-anova/#:~:text=Use%20a%20one-way%20ANOVA%20when%20you%20have%20collected,%28i.e.%20at%20least%20three%20different%20groups%20or%20categories%29.)


ANOVA, which stands for Analysis of Variance, is a statistical test used to analyze the difference b
etween the means of more than two groups.

## One Way ANOVA:

- A one-way ANOVA uses one independent variable, while a two-way ANOVA uses two independent variables.
- Use a one-way ANOVA when you have collected data about one categorical independent variable and one quantitative dependent variable. The independent variable should have at least three levels (i.e. at least three different groups or categories).
- A quantitative variable is a variable that can be measured numerically, such as height, weight, speed, etc. There are two types of quantitative variables: discrete and continuous
- ANOVA tells you if the dependent variable changes according to the level of the independent variable. For example:

1. Your independent variable is social media use, and you assign groups to low, medium, and high levels of social media use to find out if there is a difference in hours of sleep per night.
2. Your independent variable is brand of soda, and you collect data on Coke, Pepsi, Sprite, and Fanta to find out if there is a difference in the price per 100ml.
3. You independent variable is type of fertilizer, and you treat crop fields with mixtures 1, 2 and 3 to find out if there is a difference in crop yield.

**If any of the group means is significantly different from the overall mean, then the null hypothesis is rejected.**

![image.png](attachment:5e96100d-ef14-4793-babb-5b23afc1910c.png)


## [Example Use Case for 1 Way](https://stattrek.com/anova/completely-randomized/one-way-example)

Also check formula chart