<a href="https://colab.research.google.com/github/davidofitaly/algebra_stats_probability_ds_notes/blob/main/03_descriptive_inferential_statistics.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

###*Mean and Weighted Mean*

#####The **mean** and **weighted mean** are fundamental concepts in statistics, used to measure the central tendency of a dataset. This includes distinctions between **sample mean** and **population mean**.



##### **Arithmetic Mean:**

#####The **mean** (or **arithmetic mean**) represents the average value of a dataset. It can be calculated for either a **population** or a **sample**.


##### **Population Mean ($\mu$):**

#####The **population mean** is the average of all values in an entire population. It is a fixed value.

##### **Formula:**
$$
\mu = \frac{\sum_{i=1}^N x_i}{N}
$$

##### **Where:**
- $x_i$: The $i$-th value in the population,
- $N$: Total number of values in the population.

##### **Sample Mean ($\bar{x}$):**

#####The **sample mean** is the average of values in a sample, which is a subset of the population. It is used to estimate the population mean.

##### **Formula:**
$$
\bar{x} = \frac{\sum_{i=1}^n x_i}{n}
$$

##### **Where:**
- $x_i$: The $i$-th value in the sample,
- $n$: Total number of values in the sample.

##### **Example (Population and Sample):**

Given the population $[3, 5, 7, 9]$:
$$
\mu = \frac{3 + 5 + 7 + 9}{4} = \frac{24}{4} = 6
$$

For a sample $[3, 9]$ taken from the population:
$$
\bar{x} = \frac{3 + 9}{2} = \frac{12}{2} = 6
$$


##### **Weighted Mean:**

The **weighted mean** accounts for the importance (or weight) of each value, whether in a population or sample.

##### **Formula:**
$$
\text{Weighted Mean (} \mu_w \text{ or } \bar{x}_w \text{)} = \frac{\sum_{i=1}^n w_i x_i}{\sum_{i=1}^n w_i}
$$

##### **Where:**
- $x_i$: The $i$-th value,
- $w_i$: The weight associated with $x_i$,
- $\sum_{i=1}^n w_i$: The total of all weights.

##### **Example (Weighted Mean):**

Given values $[10, 20, 30]$ with weights $[1, 2, 3]$:
$$
\mu_w = \frac{1 \cdot 10 + 2 \cdot 20 + 3 \cdot 30}{1 + 2 + 3}
= \frac{10 + 40 + 90}{6}
= \frac{140}{6} = 23.33
$$


##### **Key Distinctions:**

1. **Population Mean ($\mu$):**
   - Considers all values in the population,
   - Fixed and unchanging.

2. **Sample Mean ($\bar{x}$):**
   - Uses only a subset of the population,
   - May vary between samples and is used to estimate $\mu$.

3. **Weighted Mean ($\mu_w$ or $\bar{x}_w$):**
   - Adjusts for the relative importance of values,
   - Useful in scenarios where some data points carry more significance.


### *Median*

#####The **median** is a measure of central tendency that represents the middle value in a dataset when the values are arranged in ascending or descending order. Unlike the mean, the median is not influenced by extreme values or outliers.



##### **Key Characteristics:**

1. **Order Matters:**  
   To determine the median, the data must first be arranged in order (ascending or descending).

2. **Robustness:**  
   The median is resistant to extreme values, making it ideal for skewed datasets.

3. **Applicability:**  
   The median can be calculated for both populations and samples.



##### **Calculating the Median:**

1. **For Odd Number of Observations ($n$):**  
   The median is the value at the middle position:
   $$
   \text{Median} = x_{\frac{n+1}{2}}
   $$

2. **For Even Number of Observations ($n$):**  
   The median is the average of the two middle values:
   $$
   \text{Median} = \frac{x_{\frac{n}{2}} + x_{\frac{n}{2} + 1}}{2}
   $$


##### **Example:**

**Dataset 1 (Odd Number of Values):**  
$[3, 5, 7, 9, 11]$  
- Arrange in ascending order: $[3, 5, 7, 9, 11]$  
- Middle position: $7$ (3rd value)  
$$
\text{Median} = 7
$$

**Dataset 2 (Even Number of Values):**  
$[2, 4, 6, 8, 10, 12]$  
- Arrange in ascending order: $[2, 4, 6, 8, 10, 12]$  
- Middle values: $6$ and $8$  
$$
\text{Median} = \frac{6 + 8}{2} = 7
$$



##### **Median vs. Mean:**

1. **Sensitivity to Outliers:**  
   - The **mean** is affected by extreme values, while the **median** is not.

2. **Use Cases:**  
   - The **median** is preferred for skewed datasets (e.g., income, house prices).
   - The **mean** is used for symmetrical datasets with no outliers.


###*Mode*


#####The **mode** (dominanta) is a measure of central tendency that identifies the most frequently occurring value(s) in a dataset. It is especially useful for categorical or discrete data and can be applied to both populations and samples.



##### **Key Characteristics:**

1. **Frequency-Based:**  
   The mode is the value(s) that appear most often in the dataset.

2. **Applicability:**  
   - For numerical data: Finds the most common value.  
   - For categorical data: Identifies the most frequent category.

3. **Types of Datasets:**  
   - A dataset can have:
     - **No mode:** If all values occur with the same frequency.  
     - **One mode (Unimodal):** If a single value dominates.  
     - **Two modes (Bimodal):** If two values have the highest frequency.  
     - **Multiple modes (Multimodal):** If more than two values share the highest frequency.



##### **Calculating the Mode:**

1. Arrange the data (optional, but helpful for clarity).  
2. Count the frequency of each value.  
3. Identify the value(s) with the highest frequency.



##### **Example 1 (Numerical Data):**  
Dataset: $[2, 4, 4, 6, 6, 6, 8, 10]$  
- Frequency:  
  $2 \rightarrow 1$, $4 \rightarrow 2$, $6 \rightarrow 3$, $8 \rightarrow 1$, $10 \rightarrow 1$  
- **Mode:** $6$ (appears 3 times)



##### **Example 2 (Categorical Data):**  
Dataset: $[\text{Red}, \text{Blue}, \text{Red}, \text{Green}, \text{Blue}, \text{Blue}]$  
- Frequency:  
  $\text{Red} \rightarrow 2$, $\text{Blue} \rightarrow 3$, $\text{Green} \rightarrow 1$  
- **Mode:** $\text{Blue}$ (appears 3 times)



##### **Mode vs. Mean and Median:**

1. **Dataset Type:**  
   - The **mode** is ideal for categorical data.  
   - The **mean** and **median** are better for numerical data.

2. **Skewed Data:**  
   - The **mode** provides a quick insight into the most typical value, especially in skewed distributions.

3. **Multiple Values:**  
   - The mode can represent multiple values, unlike the mean or median, which are unique.



###*Variance and Standard Deviation ($\sigma$) for a Population*

#####The **variance** and **standard deviation** measure the spread of data points in a population. Variance quantifies the average squared deviations from the mean, while standard deviation represents the square root of the variance.



##### **Formulas:**

1. **Population Variance ($\sigma^2$):**
   $$
   \sigma^2 = \frac{\sum_{i=1}^N (x_i - \mu)^2}{N}
   $$

2. **Population Standard Deviation ($\sigma$):**
   $$
   \sigma = \sqrt{\sigma^2} = \sqrt{\frac{\sum_{i=1}^N (x_i - \mu)^2}{N}}
   $$

##### **Where:**
- $x_i$: The $i$-th value in the population,
- $\mu$: Population mean,
- $N$: Total number of values in the population.



##### **Steps to Calculate:**

1. Compute the population mean ($\mu$):
   $$
   \mu = \frac{\sum_{i=1}^N x_i}{N}
   $$
2. Subtract the mean from each data point and square the result:
   $$
   (x_i - \mu)^2
   $$
3. Compute the variance:
   $$
   \sigma^2 = \frac{\sum_{i=1}^N (x_i - \mu)^2}{N}
   $$
4. Take the square root of the variance to find the standard deviation:
   $$
   \sigma = \sqrt{\sigma^2}
   $$



##### **Example:**

Consider the population dataset: $[3, 5, 7, 9]$

1. Calculate the mean:
   $$
   \mu = \frac{3 + 5 + 7 + 9}{4} = 6
   $$
2. Compute squared deviations:
   $$
   (3 - 6)^2 = 9, \quad (5 - 6)^2 = 1, \quad (7 - 6)^2 = 1, \quad (9 - 6)^2 = 9
   $$
3. Calculate variance:
   $$
   \sigma^2 = \frac{9 + 1 + 1 + 9}{4} = \frac{20}{4} = 5
   $$
4. Calculate standard deviation:
   $$
   \sigma = \sqrt{5} \approx 2.24
   $$


#### exercise 3.1

######A company surveyed the monthly expenses (in dollars) of its employees on transportation. The expenses for all 6 employees are:  
$$
[150, 200, 180, 220, 210, 190]
$$  

######Calculate:  
1. The population mean ($\mu$).  
2. The population variance ($\sigma^2$).  
3. The population standard deviation ($\sigma$).  

In [None]:
expenses = [150, 200, 180, 220, 210, 190]

mi = sum(expenses) / len(expenses)

print(f'Mean of the data is: {mi:.2f}')

variance = sum((x - mi)**2 for x in  expenses) / len(expenses)

print(f'Variance of the data is: {variance:.2f}')

standard_deviation = (variance)**(1/2)

print(f'Standard deviation of the data is: {standard_deviation:.2f}')


Mean of the data is: 191.67
Variance of the data is: 513.89
Standard deviation of the data is: 22.67


### *Sample Standard Deviation ($s$)*

#####The **sample variance** and **sample standard deviation** measure the spread of data points in a sample. Since a sample is a subset of the population, the variance is divided by $n-1$ (degrees of freedom) to account for sampling bias.



##### **Formulas:**

1. **Sample Variance ($s^2$):**
   $$
   s^2 = \frac{\sum_{i=1}^n (x_i - \bar{x})^2}{n - 1}
   $$

2. **Sample Standard Deviation ($s$):**
   $$
   s = \sqrt{s^2} = \sqrt{\frac{\sum_{i=1}^n (x_i - \bar{x})^2}{n - 1}}
   $$

##### **Where:**
- $x_i$: The $i$-th value in the sample,
- $\bar{x}$: Sample mean,
- $n$: Total number of values in the sample.



##### **Steps to Calculate:**

1. Compute the sample mean ($\bar{x}$):
   $$
   \bar{x} = \frac{\sum_{i=1}^n x_i}{n}
   $$
2. Subtract the mean from each data point and square the result:
   $$
   (x_i - \bar{x})^2
   $$
3. Compute the sample variance:
   $$
   s^2 = \frac{\sum_{i=1}^n (x_i - \bar{x})^2}{n - 1}
   $$
4. Take the square root of the variance to find the standard deviation:
   $$
   s = \sqrt{s^2}
   $$


##### **Example:**

Consider the sample dataset: $[3, 5, 7, 9]$

1. Calculate the sample mean:
   $$
   \bar{x} = \frac{3 + 5 + 7 + 9}{4} = 6
   $$
2. Compute squared deviations:
   $$
   (3 - 6)^2 = 9, \quad (5 - 6)^2 = 1, \quad (7 - 6)^2 = 1, \quad (9 - 6)^2 = 9
   $$
3. Calculate variance:
   $$
   s^2 = \frac{9 + 1 + 1 + 9}{4 - 1} = \frac{20}{3} \approx 6.67
   $$
4. Calculate standard deviation:
   $$
   s = \sqrt{6.67} \approx 2.58
   $$


#### excercise 3.2

#####A researcher collected the heights (in cm) of a random sample of 5 students from a school. The measured heights are:  
$$
[160, 165, 170, 175, 180]
$$  

Calculate:  
1. The sample mean ($\bar{x}$).  
2. The sample variance ($s^2$).  
3. The sample standard deviation ($s$).  


In [None]:
expenses = [160, 165, 170, 175, 180]

mean = sum(expenses) / len(expenses)

print(f'Mean of the data is: {mean:.2f}')

variance = sum((x - mean)**2 for x in  expenses) / (len(expenses) - 1)

print(f'Variance of the data is: {variance:.2f}')

standard_deviation = (variance)**(1/2)

print(f'Standard deviation of the data is: {standard_deviation:.2f}')


Mean of the data is: 170.00
Variance of the data is: 62.50
Standard deviation of the data is: 7.91


### *Normal Distribution*

#####The **normal distribution** is a continuous probability distribution that is symmetrical and bell-shaped. It is one of the most important and widely used distributions in statistics, often referred to as the **Gaussian distribution**.


##### **Key Characteristics:**

1. **Symmetry**:  
   The normal distribution is symmetric around its mean. This means that the data on the left side of the mean is a mirror image of the data on the right side.

2. **Mean, Median, and Mode**:  
   In a normal distribution, the mean, median, and mode are all equal and located at the center of the distribution.

3. **Shape**:  
   The distribution forms a bell-shaped curve where the highest point corresponds to the mean. The tails of the curve approach the horizontal axis but never touch it.

4. **Spread**:  
   The spread of the distribution is determined by the standard deviation ($\sigma$). A larger standard deviation results in a wider curve, while a smaller standard deviation results in a narrower curve.


##### **68-95-99.7 Rule**:  
- About **68%** of the data falls within **one standard deviation** of the mean.  
- About **95%** of the data falls within **two standard deviations** of the mean.  
- About **99.7%** of the data falls within **three standard deviations** of the mean.


##### **Standard Normal Distribution**:  
When the mean ($\mu$) is 0 and the standard deviation ($\sigma$) is 1, the distribution is called the **standard normal distribution**. It is used to calculate probabilities and Z-scores, which express how far a data point is from the mean in terms of standard deviations.


###*Probability Density Function (PDF) in Normal Distribution*

#####The **Probability Density Function (PDF)** in the **Normal Distribution** describes the probability distribution of a continuous random variable that follows a normal (Gaussian) distribution. The normal distribution is one of the most widely used probability distributions in statistics, known for its characteristic bell-shaped curve.



##### **Formula for the Normal Distribution PDF:**

The PDF of a normal distribution is given by:

$$
f(x) = \frac{1}{\sigma \sqrt{2\pi}} e^{-\frac{(x - \mu)^2}{2\sigma^2}}
$$

Where:
- $x$ is the random variable,
- $\mu$ is the mean (average) of the distribution,
- $\sigma$ is the standard deviation,
- $e$ is Euler's number (approximately 2.718),
- $\pi$ is the mathematical constant Pi (approximately 3.1416).

This formula describes the probability density of a continuous random variable $x$ following a normal distribution. The shape of the curve is symmetric around the mean $\mu$, with the highest point of the curve at $\mu$.



##### **Example:**

#####For a normal distribution with $\mu = 50$ and $\sigma = 10$, the PDF at $x = 60$ would be:

$$
f(60) = \frac{1}{10 \sqrt{2\pi}} e^{-\frac{(60 - 50)^2}{2 \cdot 10^2}} = \frac{1}{10 \sqrt{2\pi}} e^{-0.5}
$$

#####This would give the probability density at that point. To find the actual probability, you would integrate this function over a range of values.



###*Cumulative Distribution Function (CDF) in Normal Distribution*

#####The **Cumulative Distribution Function (CDF)** of a normal distribution provides the probability that a random variable $x$ takes a value less than or equal to a given value. The CDF gives the cumulative probability up to a certain point on the distribution.



##### **Formula for the Normal Distribution CDF:**

The CDF for a normal distribution is given by:

$$
F(x) = \frac{1}{2} \left[ 1 + \text{erf}\left( \frac{x - \mu}{\sigma \sqrt{2}} \right) \right]
$$

#####Where:
- $F(x)$ is the cumulative probability that $x$ is less than or equal to a specific value,
- $x$ is the random variable,
- $\mu$ is the mean (average) of the distribution,
- $\sigma$ is the standard deviation,
- $\text{erf}$ is the **error function**, which is a special mathematical function used to compute values related to the normal distribution.

The CDF is an increasing function and ranges from 0 to 1. As $x$ moves farther away from the mean, the cumulative probability approaches 1 (for large values of $x$) or 0 (for small values of $x$).



##### **Interpretation of the CDF:**

- **$F(x)$** gives the probability that the random variable $x$ will take a value less than or equal to a specific value.
- The CDF is useful for calculating the probability that a random variable falls within a particular range. For example, to calculate the probability that $x$ lies between two values $a$ and $b$, we can use:

$$
P(a \leq x \leq b) = F(b) - F(a)
$$

- The CDF starts at 0 for very low values of $x$ and approaches 1 as $x$ increases.
- The **error function (erf)** is closely related to the standard normal distribution and is often used in calculations involving the CDF.



##### **Example:**

#####For a normal distribution with $\mu = 50$ and $\sigma = 10$, the CDF at $x = 60$ would be:

$$
F(60) = \frac{1}{2} \left[ 1 + \text{erf}\left( \frac{60 - 50}{10 \sqrt{2}} \right) \right]
$$



#### excercise 3.3

#####You are given a normal distribution with the following parameters:
- Mean ($\mu$) = 29.23
- Standard Deviation ($\sigma$) = 1.37

#####Write a Python program to compute the **Cumulative Distribution Function (CDF)** for the value $x = 29.23$.



In [None]:
from scipy.stats import norm

mean = 29.23
standard_deviation = 1.37

x = norm.cdf(29.23, loc=mean, scale=standard_deviation)

print(f'Value is equal to: {x}')

Value is equal to: 0.5


#### Exercise 3.4

##### You are given a normal distribution with the following parameters:
- Mean ($\mu$) = 29.23
- Standard Deviation ($\sigma$) = 1.37

##### Write a Python program to compute the **Cumulative Distribution Function (CDF)** for the range between $x = 28$ and $x = 30$.


In [None]:
mean = 29.23
standard_deviation = 1.37

x = norm.cdf(30, loc=mean, scale=standard_deviation) - norm.cdf(28, loc=mean, scale=standard_deviation)

print(f'Value is equal to: {x}')

Value is equal to: 0.5283135419737843


###Inverse Cumulative Distribution Function (Inverse CDF)

#####The **Inverse Cumulative Distribution Function** (also called the **Quantile Function**) is the reverse of the Cumulative Distribution Function (CDF). While the CDF gives the probability that a random variable is less than or equal to a specific value, the inverse CDF provides the value corresponding to a given cumulative probability.

##### Definition:
#####If $F(x)$ is the CDF of a random variable $X$, then the inverse CDF is:

$$
F^{-1}(p) = x \quad \text{such that} \quad F(x) = p
$$

#####Where:
- $F^{-1}(p)$ returns the value $x$ for a given probability $p$,
- $p$ is the probability between 0 and 1.



#### Exercise 3.4

You are given a normal distribution with the following parameters:
- Mean ($\mu$) = 29.23
- Standard Deviation ($\sigma$) = 1.37

##### Write a Python program to calculate the value corresponding to the **95th percentile** of the distribution using the **Inverse Cumulative Distribution Function (PPF)**.


In [None]:
x = norm.ppf(.95, loc=29.23, scale=1.37)

print(f'Value is equal: {x:.2f}')

Value is equal: 31.48


###*Standardization (Z-Score Transformation)*


#####**Standardization** is the process of transforming data so that it has a mean of 0 and a standard deviation of 1. This is achieved by **scaling** the data relative to its own mean and standard deviation.

#####The **Z-Score** is the most commonly used method for standardization. It represents how many standard deviations a particular value is away from the mean of the dataset.

#### Z-Score Formula:
For a given value $x$, the Z-score is calculated as:

$$
Z = \frac{x - \mu}{\sigma}
$$

#####Where:
- $x$ is the value you want to standardize,
- $\mu$ is the mean of the dataset,
- $\sigma$ is the standard deviation of the dataset.

##### Interpretation:
- A **Z-score of 0** means the value is equal to the mean.
- A **Z-score greater than 0** means the value is above the mean.
- A **Z-score less than 0** means the value is below the mean.

##### Purpose of Standardization:
- **Comparison across different datasets**: It allows comparison of data from different distributions, even if the scales or units are different.
- **Outlier detection**: Helps in identifying outliers, as values with very high or very low Z-scores are considered outliers.
- **Improving machine learning algorithms**: Many machine learning models (such as those using gradient descent) perform better when data is standardized, as the features are on the same scale.

##### Example:
Suppose we have a dataset with a mean ($\mu$) of 50 and a standard deviation ($\sigma$) of 10. If we want to standardize the value $x = 70$, the Z-score would be:

$$
Z = \frac{70 - 50}{10} = 2
$$

This means that the value $x = 70$ is 2 standard deviations above the mean.


###*Coefficient of Variation (CV)*

#####The **Coefficient of Variation (CV)** is a measure of the relative variability of data in relation to its mean. It is a standardized measure of dispersion that allows for comparing variability between different datasets, even if they have different units or scales.

##### Formula for Coefficient of Variation:

$$
CV = \frac{\sigma}{\mu} \times 100
$$

Where:
- $\sigma$ is the **standard deviation**,
- $\mu$ is the **mean** of the dataset.

##### Interpretation:
- **High CV**: Indicates a high level of variability relative to the mean. For example, a CV of 50% means that the standard deviation is half of the mean value.
- **Low CV**: Indicates lower variability relative to the mean. For example, a CV of 5% means that the standard deviation is only 5% of the mean value.
- **Comparison between datasets**: The Coefficient of Variation is particularly useful when comparing the variability between two datasets with different means. For example, when comparing two different investments, CV helps determine which one is riskier (i.e., has higher variability relative to the mean return).

##### Example:
If the average weight of dogs in a group is 25 kg, and the standard deviation is 5 kg, the Coefficient of Variation can be calculated as:

$$
CV = \frac{5}{25} \times 100 = 20\%
$$

#####This means that the standard deviation of the dogs' weight is 20% of the average weight. Therefore, the weights of the dogs in this group are relatively variable compared to their average.


### *Central Limit Theorem (CLT)*

#####The **Central Limit Theorem (CLT)** is a fundamental principle in statistics that describes the behavior of the sampling distribution of the sample mean. It states:

##### Definition:
As the sample size ($n$) increases, the distribution of the sample mean approaches a **normal distribution** (Gaussian distribution), regardless of the shape of the population distribution, provided the population has a finite mean ($\mu$) and finite variance ($\sigma^2$).

##### Key Points:
1. **Normality of the Sampling Distribution**:
   - For sufficiently large sample sizes, the distribution of the sample mean ($\bar{X}$) will be approximately normal, even if the population distribution is not normal.
   
2. **Mean of the Sampling Distribution**:
   - The mean of the sampling distribution is equal to the population mean ($\mu$).

3. **Variance and Standard Error**:
   - The standard deviation of the sampling distribution (called the **Standard Error** of the mean) is:
     $$
     \text{Standard Error} = \frac{\sigma}{\sqrt{n}}
     $$
   - This means that as the sample size increases, the spread of the sample mean decreases.

4. **Applicability**:
   - The CLT is particularly useful for inferential statistics, such as constructing confidence intervals and hypothesis testing.

##### Why is the CLT Important?
- The CLT allows us to make inferences about population parameters using sample statistics.
- It justifies the use of normal distribution-based methods (e.g., $z$-tests and $t$-tests) for large samples, even if the underlying population distribution is unknown.

##### Practical Implications:
- If the sample size is **large enough** (commonly $n \geq 30$ is used as a rule of thumb), the sampling distribution of the sample mean is approximately normal.
- For smaller sample sizes, the shape of the population distribution becomes more important, and the CLT might not hold unless the population itself is normal.

##### Example:
Suppose the weight of apples in a population has a non-normal distribution with a mean ($\mu$) of 150g and a standard deviation ($\sigma$) of 20g. If we take a random sample of size $n = 40$:
- The sampling distribution of the sample mean will be approximately normal.
- The mean of the sampling distribution will be 150g.
- The standard error of the mean will be:
  $$
  \text{Standard Error} = \frac{\sigma}{\sqrt{n}} = \frac{20}{\sqrt{40}} = 3.16
  $$




### *Confidence Intervals*

#####A **Confidence Interval (CI)** is a range of values, derived from a sample, that is likely to contain the true population parameter (e.g., mean or proportion) with a certain level of confidence.

##### Key Concepts:

1. **Confidence Level**:
   - The confidence level indicates the percentage of times the calculated interval would contain the true population parameter if repeated samples were taken.
   - Common confidence levels are **90%**, **95%**, and **99%**.

2. **Structure of a Confidence Interval**:
   - A confidence interval for the mean is typically expressed as:
     $$
     \text{CI} = \bar{x} \pm Z \cdot \frac{\sigma}{\sqrt{n}}
     $$
     Where:
     - $\bar{x}$: Sample mean,
     - $Z$: Z-score corresponding to the confidence level,
     - $\sigma$: Standard deviation of the population (or sample if $\sigma$ is unknown),
     - $n$: Sample size.

3. **Interpretation**:
   - A 95% confidence interval means that if we were to draw 100 different samples and calculate the confidence interval for each, approximately 95 of them would contain the true population mean.

##### Example:
Suppose the average weight of a sample of 50 apples is $150$g, with a standard deviation of $10$g. A 95% confidence interval can be calculated as follows:

- Z-score for 95% confidence: $Z = 1.96$
- Margin of error:
  $$
  \text{Margin of Error} = Z \cdot \frac{\sigma}{\sqrt{n}} = 1.96 \cdot \frac{10}{\sqrt{50}} = 2.77
  $$
- Confidence Interval:
  $$
  \text{CI} = 150 \pm 2.77 = (147.23, 152.77)
  $$

This means we are 95% confident that the true average weight of all apples lies between 147.23g and 152.77g.




95% Confidence Interval: (147.23, 152.77)


### *p-Value*

#####The **p-value** is a statistical measure that helps determine the significance of a result in hypothesis testing. It quantifies the probability of obtaining a result at least as extreme as the observed one, assuming that the null hypothesis ($H_0$) is true.


#####Key Concepts:

1. **Hypothesis Testing**:
   - **Null Hypothesis ($H_0$)**: The default assumption, e.g., "there is no effect" or "two groups are equal."
   - **Alternative Hypothesis ($H_a$)**: The claim being tested, e.g., "there is an effect" or "the groups are different."

2. **Interpretation of p-Value**:
   - A **small p-value** (e.g., $p < 0.05$) suggests that the observed data is unlikely under the null hypothesis, providing evidence to reject $H_0$.
   - A **large p-value** (e.g., $p > 0.05$) indicates insufficient evidence to reject $H_0$.

3. **Common Significance Levels**:
   - The threshold (denoted as $\alpha$) for determining significance is often set at **0.05** (5%).
     - If $p \leq \alpha$, reject $H_0$.
     - If $p > \alpha$, fail to reject $H_0$.


##### Example:
Suppose we are testing whether a coin is fair:
- Null Hypothesis ($H_0$): The coin is fair (50% chance of heads).
- Alternative Hypothesis ($H_a$): The coin is biased (not 50% chance of heads).

If we observe heads 90 times out of 100 flips, we calculate the p-value to assess whether this outcome is likely under $H_0$. A small p-value (e.g., $p = 0.002$) would suggest the coin is biased.


##### p-Value vs. Confidence Interval:
- A **confidence interval** provides a range of plausible values for a population parameter.
- A **p-value** evaluates whether the observed data is consistent with the null hypothesis.

