# 1. Foundations of Statistics

Welcome to the statistics series! This first notebook lays the groundwork, defining what statistics is, the different types of data we work with, and a quick refresher on some basic mathematical concepts that will be used throughout the series.

## 1.1 Introduction

### What is statistics?
**Statistics** is the science of collecting, organizing, analyzing, interpreting, and presenting data.

#### Descriptive vs. Inferential Statistics
- **Descriptive Statistics:** This involves summarizing and organizing data so it can be easily understood. We use measures like mean, median, and standard deviation, and visualizations like histograms and bar charts. (e.g., "The average height of students in this class is 175cm.")
- **Inferential Statistics:** This involves using data from a sample to make inferences or predictions about a larger population. It's about drawing conclusions beyond the immediate data. (e.g., "Based on our sample, we estimate that the average height of all students in the university is between 172cm and 178cm.")

#### Populations and Samples
- **Population:** The entire group that you want to draw conclusions about. (e.g., All students at a university).
- **Sample:** A specific, smaller group that you will collect data from. The sample should be representative of the population.

#### Parameters vs. Statistics
- **Parameter:** A measure that describes the whole population (e.g., the true average height of *all* students).
- **Statistic:** A measure that describes the sample (e.g., the average height of the students you *measured*).

### Types of Data

#### Qualitative vs. Quantitative
- **Qualitative (Categorical):** Describes qualities or characteristics. It can't be measured with numbers. (e.g., Eye color, country of birth, yes/no answers).
- **Quantitative (Numerical):** Represents quantities. It can be measured. (e.g., Height, temperature, number of items).

#### Discrete vs. Continuous (for Quantitative Data)
- **Discrete:** Data that can only take on certain, specific values (often integers). You can count it. (e.g., The number of students in a class, the result of a dice roll).
- **Continuous:** Data that can take any value within a given range. You measure it. (e.g., A person's height, the temperature of a room).

#### Scales of Measurement
1.  **Nominal:** Categorical data that cannot be ordered. (e.g., `['Red', 'Blue', 'Green']`).
2.  **Ordinal:** Categorical data that can be logically ordered or ranked, but the distance between values is not meaningful. (e.g., `['Small', 'Medium', 'Large']`, `['Disagree', 'Neutral', 'Agree']`).
3.  **Interval:** Numerical data where the order and the exact differences between values are known. There is no true zero. (e.g., Temperature in Celsius or Fahrenheit. 0°C doesn't mean 'no temperature').
4.  **Ratio:** Numerical data that has all the properties of interval data, but with a true, meaningful zero. (e.g., Height, weight, age. 0kg means 'no weight').

## 1.2 Basic Math Refresher

### Summation (Σ notation)
The symbol Σ (sigma) means 'sum up'. For a set of values `x = {x₁, x₂, ..., xₙ}`:
$$ \sum_{i=1}^{n} x_i = x_1 + x_2 + ... + x_n $$

In [1]:
import numpy as np

data = np.array([2, 4, 6, 8, 10])

# Using Python's built-in sum()
sum_py = sum(data)

# Using NumPy's np.sum()
sum_np = np.sum(data)

print(f"The sum of {data} is {sum_np}")

The sum of [ 2  4  6  8 10] is 30


### Factorials
The factorial of a non-negative integer `n`, denoted by `n!`, is the product of all positive integers up to `n`. `0! = 1`.
$$ n! = n \times (n-1) \times (n-2) \times ... \times 1 $$

In [2]:
import math

n = 5
print(f"{n}! = {math.factorial(n)}")

5! = 120


### Combinations and Permutations
- **Permutation:** The number of ways to arrange `k` items from a set of `n` items, where **order matters**. Formula: `n! / (n-k)!`
- **Combination:** The number of ways to choose `k` items from a set of `n` items, where **order does not matter**. Formula: `n! / (k! * (n-k)!)`

In [3]:
# Example: How many ways can you choose 2 letters from {A, B, C}?
n = 3
k = 2

# Permutations (AB, BA, AC, CA, BC, CB)
perms = math.perm(n, k)
print(f"Permutations of choosing {k} from {n}: {perms}")

# Combinations (AB, AC, BC)
combs = math.comb(n, k)
print(f"Combinations of choosing {k} from {n}: {combs}")

Permutations of choosing 2 from 3: 6
Combinations of choosing 2 from 3: 3
