# Problem Statement

The objective of this analysis is to address the following problem:

> "Generate 100,000 samples of size 10 from the standard normal distribution. For each sample, compute the standard deviation with ddof=1 (sample SD) and with ddof=0 (population SD). Plot histograms of both sets of values on the same axes with transparency. Describe the differences you see. Explain how you expect these differences to change if the sample size is increased."

To solve this, we will:

- Generate **100,000 independent samples**, each of size **n = 10**, from the standard normal distribution $N(0, 1)$.
- For every sample, compute two versions of the standard deviation:
  - **Sample Standard Deviation** using `ddof = 1`.
  - **Population Standard Deviation** using `ddof = 0`.
- Compare the distributions of these two methods of calculating the standard deviation by plotting their **histograms on the same axes** with transparency.
- Describe the **visual** and **numerical** differences between the two distributions.
- Explain how and why these differences are expected to change as the **sample size increases**.

The goal is to understand the sampling variability, bias, and distributional behaviour of the standard deviation formulas under repeated sampling.


---

# Libraries Imported

---

# Standard Normal Distribution

The Standard Normal Distribution is a continuous distribution, perfectly symmetric around the mean. In a Standard Normal Distribution, values are most concentrated near the centre and become increasingly rare as they move further away from the mean. A standard normal distribution is denoted as \( N(0, 1) \), due to the fact it has a mean of \(0\) and a standard deviation of \(1\).

![Standard Normal Distribution](https://upload.wikimedia.org/wikipedia/commons/thumb/8/8c/Standard_deviation_diagram.svg/640px-Standard_deviation_diagram.svg.png)

The bell-shaped curve of the distribution, also referred to as the Gaussian curve, shows how data points are clustered around the mean. This is further demonstrated by the fact that approximately **68.27%** of all values lie within one standard deviation of the mean. This illustrates, within the distribution, the high concentration around the centre or mean. The distribution has no skew and very light tails, demonstrating how the chances of seeing values far from the mean are minimal. The standard normal distribution plays an integral role in statistical theory. This is further demonstrated in the Central Limit Theorem, which shows the reasoning as to why the averages of independent random variables tend to follow a normal distribution.



## Sample Standard Deviation (ddof=1)

In cases where the data represents a subset of the entire population, a sample standard deviation is generally used. In this case, the standard deviation is calculated with `ddof = 1`. The formula in these circumstances divides by $n - 1$ instead of $n$. This adjustment is known as **Bessel’s correction**.

The sample mean $\bar{x} = \frac{\sum x}{n}$ has a propensity to be nearer the sample observations when compared to the true population mean. In turn, if the formula is not modified (i.e., dividing by $n$), then the actual variability is consistently underestimated. By dividing by $n - 1$, this accounts for the issue by slightly increasing the estimate, in turn making the sample variance an unbiased approximation of the population variance.

Due to the impact of **Bessel’s correction**, the sample standard deviation tends to be slightly larger than the standard deviation computed using division by $n$, especially when the sample size is small. The sample standard deviation is calculated using:

$$
s = \sqrt{\frac{\sum_{i=1}^{n} (x_i - \bar{x})^2}{n - 1}}
$$



## Population SD (ddof=0)

## References For This Section
1. Standard Normal Distribution - https://www.probabilitycourse.com/chapter4/4_2_3_normal.php
2. Standard Normal Distribution - https://www.datacamp.com/blog/standard-normal-distribution
3. Standard Normal Distribution - https://www.geeksforgeeks.org/maths/standard-normal-distribution/
4. Standard Normal Distribution - https://www.analyticsvidhya.com/blog/2020/04/statistics-data-science-normal-distribution/
5. Standard Deviation - https://www.geeksforgeeks.org/maths/standard-deviation-formula/
6. Sample Standard Deviation - https://www.datacamp.com/tutorial/sample-standard-deviation
7. Sample & Population Standard Deviation - https://www.khanacademy.org/math/statistics-probability/summarizing-quantitative-data/variance-standard-deviation-sample/a/population-and-sample-standard-deviation-review
8. Sample vs Population Standard Deviation - https://www.statology.org/population-vs-sample-standard-deviation/
9. Bessels Correction - https://www.geeksforgeeks.org/machine-learning/bessels-correction/
10. Bessels Correction - https://towardsdatascience.com/bessels-correction-why-do-we-divide-by-n-1-instead-of-n-in-sample-variance-30b074503bd9/
11. Sample Mean vs Population Mean - https://www.onlinemathlearning.com/population-mean.html
12. Sample Mean vs Population Mean - https://statisticsbyjim.com/basics/sample-mean-vs-population-mean-symbol-formulas/
13. Degrees of Freedom - https://www.statsdirect.com/help/basics/degrees_freedom.html
14. Degrees of Freedom - https://statisticsbyjim.com/hypothesis-testing/degrees-freedom-statistics/
15. Markdown Equations - https://ashki23.github.io/markdown-latex.html
16. Markdown - https://www.markdownguide.org/basic-syntax/
17. Markdown - https://www.datacamp.com/tutorial/markdown-in-jupyter-notebook
18. Markdown - https://github.com/adam-p/markdown-here/wiki/markdown-cheatsheet



---

# Generating The Samples

---

# Plotting Histograms

---

# Interpretation of Results

## Visual Interpretation

## Calculated Interpretation — Skewness / Kurtosis

## Expectations if Sample Size Increased