# Inferential Statistics
## 1. What is Inferential Statistics?
- Inferential statistics is the branch of statistics that helps us make conclusions or predictions about a population based on data collected from a sample.
- Unlike descriptive statistics (which only summarizes data), inferential statistics allows us to generalize from a smaller dataset to a larger group.

![alt text](Inferential-Statistics.jpg)


## 2. Key Ideas
- **Population vs. Sample**
    - Population: entire group we want to study (e.g., all students in a country).
    - Sample: smaller part of the population (e.g., 200 students surveyed).
    - Goal: Use sample data to make statements about the population.

- **Probability**
    - Foundation of inferential statistics.
    - Helps measure uncertainty in predictions.
- Random Sampling
    - Ensures the sample is representative of the population.

 ## 3. Main Techniques in Inferential Statistics

- **Estimation**
    - Point Estimation → estimating a parameter (e.g., population mean) using a sample mean.
    - Interval Estimation → Confidence Intervals (range of values within which the population parameter lies with a certain probability).
- **Hypothesis Testing**
    - Formulate assumptions (null & alternative hypotheses).
    - Use tests (t-test, chi-square, ANOVA, etc.) to check if sample evidence supports/rejects the null hypothesis.
    - Relies on concepts like p-value and significance level (α).
- **Regression & Correlation**
    - Analyze relationships between variables.
    - Regression → predict one variable based on others.
    - Correlation → measure strength of linear relationship.

# Hypothesis Testing
    Hypothesis Testing is a statistical method used to decide whether there is enough evidence in a sample of data to support a particular claim about the population.
- Type of Hypothesis:
  1. Null Hypothesis (H0) :
     - The default assumption (e.g., no difference, no effect).
     - There is equal chance for every outcome.
  2. Alternative Hypothesis (H1):
     - The claim we want to test (e.g., there is a difference).
     - Or it is the opposit of H0.

![alt text](Hypotesis.png)

# Significance Level (𝛼)
    In hypothesis testing, the significance value, or alpha (α), is a threshold set by the researcher before the experiment to define the acceptable risk of a Type I error—incorrectly rejecting a true null hypothesis.

- Common values: 0.05 (5%), 0.01 (1%)
- This is the maximum risk we are willing to take of rejecting 𝐻0 when it’s true (Type I error).
![alt text](Statistical-Significance.png)
## Typical values of (α)
- 0.05 (5%) most common to practice.
- 0.01 (1%) stricker, used in medical or critical fields.
- 0.10 (10%) less strick, exploratory research. 

# Type I and Type II Errors
    When we perform a hypothesis test, there are two possible truths (either 𝐻0 is true or false) and two possible decisions (reject or fail to reject 𝐻0).
![alt text](Confusion_matrix.png "Title")
## Type I Error (False Positive)
- Rejecting 𝐻0 when it is actually true.
- "Detecting an effect when there is none."
- Probability of making this error = α (significance level).
- Example: A medical test says a healthy patient has a disease.

## Type II Error (False Negative)
- Failing to reject 𝐻0 when it is actually false.
- "Missing a real effect."
- Probability of this error = β.
- Example: A medical test says a sick patient is healthy.

## Outcomes:
1. We reject the H0 when in reality it is false.
2. We fail to reject H0 when in reality it is true.
3. We reject the H0 when in reality it is true. **(Type I error)**
4. We fail to reject the H0 when in reality it is false. **(Type II error)**

# What is a P-Value?
    A p-value is the probability of obtaining results at least as extreme as the observed data, assuming the null hypothesis (𝐻0) is true.
- It measures how compatible the data are with 𝐻0.
## Key Points:
1. Small p-value (≤ α)
    - Evidence against 𝐻0.
    - We reject the null hypothesis.
2. Large p-value (> α)
    - Data are consistent with 𝐻0.
    - We fail to reject 𝐻0.
3. It is not
    - The probability that 𝐻0 is true.
    - The probability that 𝐻1 is true.
    - A measure of "effect size."
## Example
- Imagine we test whether a coin is fair ( 𝐻0 : 𝑝 =0.5 ).
- We flip it 10 times and get 9 heads.
  - If the coin were truly fair, what’s the probability of seeing 9 or more heads?
  - That probability is the p-value.
  - If it’s very small (say 0.01), it suggests that such a result would rarely occur by chance, so we doubt 𝐻0.

# What is a Confidence Interval?
    A Confidence Interval (CI) gives a range of values that is likely to contain the true population parameter (like mean or proportion), based on your sample data.
## It reflects both:
- Estimate (e.g., sample mean)
- Uncertainty (due to sampling variability)

![alt text](confidence-interval.webp)

## Key Idea
- A 95% CI means:
- If we were to take 100 different random samples and build a 95% CI for each, then about 95 of them would contain the true population mean.
- **Important: It does not mean “there is a 95% chance the true mean is in this one interval.”
The interval is fixed, the randomness is in the sampling.**