# Inferential Statistics

**Inferential Statistics** uses a random sample of data taken from a population to describe and make inferences about the population. While *Descriptive Statistics* summarizes the data you have, *Inferential Statistics* helps you reach conclusions about data you don't have.

---

## 1. Core Concepts & Formulas

### A. Point Estimation
Using a single value (like the sample mean $\bar{x}$) to estimate a population parameter ($\mu$).
$$\hat{\mu} = \bar{x}$$

### B. Confidence Intervals (CI)
A range of values used to estimate a population parameter with a specific level of confidence (usually 95%).
$$CI = \bar{x} \pm (Z^* \times \frac{\sigma}{\sqrt{n}})$$
* $Z^*$: Critical value (e.g., 1.96 for 95% confidence).
* $\frac{\sigma}{\sqrt{n}}$: Standard Error.

### C. Hypothesis Testing
A formal process for determining if a result is statistically significant.
* **Null Hypothesis ($H_0$):** The assumption of "no effect" or "no difference."
* **Alternative Hypothesis ($H_a$):** What you want to prove (there is an effect).

---

## 2. The Process: How to Perform Inference
1. **Define the Population:** The group you want to study (e.g., all users of an app).
2. **Collect a Sample:** A representative subset (e.g., 500 random users).
3. **Calculate Statistics:** Find the mean, standard deviation, and standard error of the sample.
4. **Apply a Test:** Use a Z-test, T-test, or ANOVA to find the **P-value**.
5. **Draw a Conclusion:** If P-value < $\alpha$ (usually 0.05), reject the Null Hypothesis.



---

## 3. Data Science & ML Use Cases
* **Model Comparison:** Testing if Model A is significantly better than Model B using a paired T-test.
* **Feature Importance:** Using P-values to decide which features (variables) actually impact the target variable in a Regression model.
* **A/B Testing:** Determining if a change in a website's UI led to a statistically significant increase in sales.
* **Survey Analysis:** Predicting the behavior of millions of customers based on feedback from a few thousand.

---

## 4. Python Implementation: Hypothesis Testing
This code performs a **One-Sample T-test** to see if a sample mean significantly differs from a known population mean.



In [1]:
import numpy as np
from scipy import stats

# 1. Setup Data
# Population Mean (e.g., average delivery time is 30 mins)
mu = 30 
# Sample delivery times (observed)
sample_data = [31, 33, 29, 34, 32, 35, 30, 31, 36, 32]

# 2. Perform T-Test
t_stat, p_value = stats.ttest_1samp(sample_data, mu)

# 3. Output Results
print(f"Sample Mean: {np.mean(sample_data):.2f}")
print(f"T-Statistic: {t_stat:.4f}")
print(f"P-Value: {p_value:.4f}")

if p_value < 0.05:
    print("Conclusion: Reject Null Hypothesis (Significant difference)")
else:
    print("Conclusion: Fail to Reject Null Hypothesis (No significant difference)")

Sample Mean: 32.30
T-Statistic: 3.2857
P-Value: 0.0094
Conclusion: Reject Null Hypothesis (Significant difference)


### Pro Tip for your Notebook:
Inferential statistics assumes your sample is **representative**. If your sample is biased (e.g., surveying only tech-savvy people about internet usage), your inferences will be wrong, no matter how good your math is!