In inferential statistics, various probability distributions play a crucial role in analyzing and drawing conclusions from sample data.

## Normal Distribution (Gaussian distribution or 'Bell Curve'or Z-distribution)
It is a continous probability distribution that is symmmetric and bell shaped.
It is often used to model the distribution of continuous variables such as heights, weights or test scores.

![image.png](attachment:image.png)

##### Applications of the normal distribution:
1. **Statistical Inference**
The normal distribution is used in many statistical tests and procedures, such as hypothesis testing, confidence intervals, and regression analysis.
These methods rely on assumptions of normality for accurate interpretation and inference.

2. **Modeling and Simulation**
The normal distribution is often used to model random variables in various fields, including finance, physics, engineering, and social sciences.
 It provides a convenient approximation for many natural phenomena.

3. **Quality Control**
In manufacturing and quality control processes, the normal distribution is used to model product measurements and monitor deviations from desired specifications.

4. **Data Analysis**
The normal distribution is useful for analyzing and interpreting data. Many statistical techniques assume or approximate normality, allowing for the application of powerful tools and methods.

## Binomial Distribution
It models the number of successes in a fixed number of independent Bernoulli trials.
It is characterized by two paramaters: **the probability of success in a single trial(p)** and **the number of trials(n)**
The binomial distribution is used in hypothesis testing and estimating propotions.

## Poisson Distribution
It is used to model the number of events that occur in a fixed interval of time or space, given the average rate of occurence.
It is commonly used for count data such as the number of phone calls received per hour or the number of defects in a product.

## t-distribution
It is a symmetric probability distribution.
It is commonly used when working with small sample sizes or when the population standard deviation is unknown.
It is used in hypothesis testing and constructing confidence intervals for population means.

![Alt text](image-10.png)

![Alt text](image-9.png)

##### When to use t-distribution
- The sample size is 30 or less than 30.
- The population standard deviation is unknown.
- The population distribution must be unimodal or skewed

## Z-Score
It is also known as **standard score**, it is a statistical measurement that represents the number of standard deviations a particular data point or observation is from the mean of a distribution.
It is used to standardize and compare various across different distributions.

![Alt text](image.png)

**A positive Z-score** - indicates that the data point is above the mean.

**A negative Z-score** - indicates that the data point is below the mean.

**A Z-score of 0** - indicates that the data point is exactly at the mean.

##### Common use cases of Z-scores:
1. **Identifying Outliers**
Data points with Z-scores that are significantly higher or lower than the mean may be considered outliers.

2. **Standardizing Data**
Z-scores can be used to standardize data by transforming it into a standard normal distribution with a mean of 0 and a standard deviation of 1. This allows for easier comparison and analysis of different variables.

3. **Hypothesis testing**
Z-tests are used in hypothesis testing when the population mean and standard deviation are known. The Z-score is compared to critical values to determine the statistical significance of the test.

4. **Calculating Percentiles**
Z-scores can be used to determine the percentile rank of a data point within a distribution. The Z-score corresponds to the area under the normal distribution curve, which can be converted into percentiles.

## P-Value
This is the probability of rejecting a null-hypothesis when the hypothesis is proven true.
The null hypothesis is a statemnt that says that there is no difference between two measures.
If the p-value is equal to or less than the significance level, then the null hypothesis is inconsistent and needs to be rejected.

![Alt text](image-1.png)


## One-tailed and two-tailed tests
In hypothesis testing, there are two types of tests: one-tailed tests and two-tailed tests. These tests differ in how they define the alternative hypothesis and, consequently, how they calculate the p-value and interpret the results.

### One-Tailed Test
In a one-tailed (or one-sided) test, the alternative hypothesis (Ha) specifies the direction of the effect or difference being tested. The null hypothesis (H0) assumes no effect or difference. The one-tailed test is used when we have a specific hypothesis about the direction of the effect.

For example, let's consider a drug study. The null hypothesis (H0) states that the drug has no effect, while the alternative hypothesis (Ha) states that the drug has a positive effect. In this case, a one-tailed test would focus on whether the drug improves the condition, without considering the possibility of it having a negative effect.

When performing a one-tailed test, the p-value is calculated as the probability of observing a test statistic as extreme as, or more extreme than, the one obtained from the sample data, assuming the null hypothesis is true, in the specified direction. The p-value is then compared to the significance level (α) to determine if the result is statistically significant.

### Two-Tailed Test
In a two-tailed (or two-sided) test, the alternative hypothesis (Ha) does not specify the direction of the effect or difference; it only states that there is a difference or effect. The null hypothesis (H0) assumes no effect or difference. The two-tailed test is used when we want to test if there is any difference, regardless of the direction.

![Alt text](image-2.png)


Using the same drug study example, a two-tailed test would assess whether the drug has any effect, positive or negative, compared to the control group.

When conducting a two-tailed test, the p-value is calculated as the probability of observing a test statistic as extreme as, or more extreme than, the one obtained from the sample data, assuming the null hypothesis is true, in either direction. The p-value is then compared to the significance level (α) divided by 2, as it represents the probability of extreme values in both tails of the distribution.

Interpretation:
For both one-tailed and two-tailed tests, the p-value is compared to the significance level (α) to determine statistical significance. If the p-value is less than α, the result is considered statistically significant, and the null hypothesis is rejected in favor of the alternative hypothesis. If the p-value is greater than or equal to α, the result is not considered statistically significant, and we fail to reject the null hypothesis.

It's important to select the appropriate type of test based on the research question and prior knowledge or expectations regarding the direction of the effect. One-tailed tests have more power to detect effects in a specific direction, while two-tailed tests are more conservative and detect any significant difference, regardless of direction.

## Confidence Interval (CI)
It is a type of interval statistics for a population parameter. The **CI** helps in determing the interval at which the population mean can be defined.

![Alt text](image-3.png)



## F-distribution

The F-distribution, also known as **Fisher Snedecor's Distribution**, is a continuous probability distribution that arises in statistical inference.
It is a right-skewed distribution and is used in various statistical tests, particularly those involving the comparison of variances or ratios of variances such as ANOVA

![Alt text](image-4.png)


The distribution of all possible values of **f** statistics is called F-distribution. The d1 and d2 represent the degrees of freedom in the chart below.

![Alt text](image-5.png)


### Chi-Square Distribution
This is a positively skewed distribution that arises in many statistical tests, particularly those involving categorical data.

The shape of a chi-square distribution is determined by the parameter *k*. The graph below shows examples of chi-square distributions with different values of *k*.

![Alt text](image-6.png)

#### Applications of chi-square distribution.
- Chi-square test of independence.
- Chi-square test of goodness.
- Confidence interval estimation for the variance of a population.

##### Chi-square test of independence.
The chi-square test of independence is a statistical test used to determine whether two categorical variables are independent of each other or not.

##### Chi-square test of goodness.
The Chi-square test can be used to test whether the observed data differs significantly from the expected data. 

## ANOVA
**Analysis of Variance (ANOVA)** is a statistical method used to test differences between two or more means.

This test basically compares the means between groups and determines whether any of these means are significantly different from each other:

![Alt text](image-7.png)
