# D2.6 Basic Statistics for Scientific Measurements
<hr style="height:2px;border-width:0;color:gray;background-color:gray">

## D2.6.1 Motivation — Why We Use Statistics in Physics

No matter how careful we are in taking measurements, **no two measurements are exactly identical**.  
Even the best equipment has noise, human reaction time has limits, and the environment subtly fluctuates.

Because of this, the goal in physics is **not** to find one perfect measurement, but rather to understand the **pattern** in a collection of measurements. This is where basic statistics become essential:

- The **mean** estimates the central value.  
- The **median** provides a robust alternative when outliers occur.  
- The **standard deviation** tells us how spread out the measurements are — the *precision*.  
- The **standard error of the mean (SEM)** tells us how precisely we know the *mean itself*, improving as we take more data.

Physics experiments, laboratory reports, and scientific publications all rely on these concepts.  
Understanding them now will help you evaluate data quality, quantify confidence, and interpret results correctly.

<hr style="height:2px;border-width:0;color:gray;background-color:gray">

## D2.6.2 Mean

The **mean** (also called the average) is the most common measure of central tendency.  
Given $N$ measurements $x_1, x_2, \dots, x_N,$ the mean is:

$$
\bar{x} = \frac{1}{N} \sum_{i=1}^{N} x_i
$$

It represents our *best estimate* of the true value when random error is present.

---

<div style="background-color:#e0f7fa; border-left:6px solid #006a80; padding:14px; border-radius:4px;">
<h3 style="margin-top:0; color:#000000;">Example — Calculating the Mean</h3>

You measure the time for a pendulum to complete one swing five times, obtaining:

$1.21,\ 1.25,\ 1.23,\ 1.22,\ 1.24\ \text{s}$

Compute the mean:

$$
\bar{x} = \frac{1.21 + 1.25 + 1.23 + 1.22 + 1.24}{5}
= 1.23\ \text{s}
$$

The mean swing time is **1.23 s**.
</div>

<hr style="height:2px;border-width:0;color:gray;background-color:gray">

## D2.6.3 Median

The **median** is the middle value when numbers are placed in order.  
It is especially useful when data contain **outliers**, since it is not affected by a few extreme values.

To find the median:

1. Sort the data.  
2. If $N$ is odd → choose the middle value.  
3. If $N$ is even → take the average of the two middle values.

---

<div style="background-color:#e0f7fa; border-left:6px solid #006a80; padding:14px; border-radius:4px;">
<h3 style="margin-top:0; color:#000000;">Example — Calculating the Median </h3>

Consider the dataset (in seconds):

$1.21,\ 1.22,\ 1.23,\ 1.24,\ 2.10$

Sorted, the middle value is $1.23$.

Even though $2.10$ is an outlier, the **median = 1.23 s**, which is not distorted by it.
</div>

<hr style="height:2px;border-width:0;color:gray;background-color:gray">

## D2.6.4 Standard Deviation

The **standard deviation** measures how spread out the data are from the mean.  
In experimental physics, we almost always work with a *finite set* of measurements, not an entire population.  
For that reason, we compute the **sample standard deviation**, defined as:

$$
s = \sqrt{\frac{1}{N-1} \sum_{i=1}^{N} (x_i - \bar{x})^2}
$$

whereas the **population standard deviation** is:

$$
\sigma = \sqrt{\frac{1}{N} \sum_{i=1}^{N} (x_i - \mu)^2}
$$

The difference between the two lies in the denominator:

- The **population standard deviation** uses $N$ and is only appropriate when every possible measurement in the population is known (rare in physics).
- The **sample standard deviation** uses $N-1$.  
  This correction, called **Bessel’s correction**, compensates for the fact that we are estimating both the mean and the spread from a limited sample.

Because real experiments only collect a handful of measurements, the **sample standard deviation $s$** is the correct tool.  
It quantifies the **precision** of the measurements themselves — how tightly they cluster around the mean.


- A **small** $s$ means measurements are tightly grouped (high precision).  
- A **large** $s$ means measurements vary widely (low precision).

---

<div style="background-color:#e0f7fa; border-left:6px solid #006a80; padding:14px; border-radius:4px;">
<h3 style="margin-top:0; color:#000000;">Example — Standard Deviation</h3>

Using the pendulum times from the first example:

$1.21,\ 1.25,\ 1.23,\ 1.22,\ 1.24$

Mean: $1.23$

Compute squared deviations:

- $(1.21 - 1.23)^2 = 0.0004$  
- $(1.25 - 1.23)^2 = 0.0004$  
- $(1.23 - 1.23)^2 = 0$  
- $(1.22 - 1.23)^2 = 0.0001$  
- $(1.24 - 1.23)^2 = 0.0001$  

Sum = $0.0010$

Then:

$$
s = \sqrt{\frac{0.0010}{4}} \approx 0.016\ \text{s}
$$

Standard deviation is **0.016 s**.
</div>

<hr style="height:2px;border-width:0;color:gray;background-color:gray">

## D2.6.5 Standard Error of the Mean (SEM)

The standard deviation measures the *spread of the data*, but sometimes we want to know:

> **How precisely do we know the mean itself?**

The **standard error of the mean** answers this:

$$
\text{SEM} = \frac{s}{\sqrt{N}}
$$

As we collect more data, the SEM **gets smaller**, meaning our estimate of the mean becomes more precise.

---

<div style="background-color:#e0f7fa; border-left:6px solid #006a80; padding:14px; border-radius:4px;">
<h3 style="margin-top:0; color:#000000;">Example — Standard Error of the Mean</h3>

Using $s = 0.016$ s and $N = 5$:

$$
\text{SEM} = \frac{0.016}{\sqrt{5}} \approx 0.007\ \text{s}
$$

So we report:

**Mean pendulum time:**  
$$1.23 \pm 0.01\ \text{s}$$  
(The uncertainty reflects the SEM.)
</div>

---

<div style="background-color:#e8f5e9; border-left:5px solid #006633; padding:12px; border-radius:4px;">
<h3 style="margin-top:0; color:#000000;">Box Activity 1 — Putting It All Together</h3>

You measure the length of a wooden block 8 times:

$12.4,\ 12.5,\ 12.6,\ 12.5,\ 12.7,\ 12.3,\ 12.4,\ 12.5\ \text{cm}$

Compute the following:

1. The **mean**  
2. The **median**  
3. The **standard deviation**  
4. The **standard error of the mean**  
5. A final reported value in the form:  
   $$\bar{x} \pm \text{SEM}$$

---

<details>
<summary style="background-color:#006633; color:white; padding:8px; border-radius:4px; cursor:pointer;">
A Possible Solution Guide
</summary>

<div style="background-color:#e8f5e9; padding:10px; border-radius:4px; margin-top:6px;">

**1. Mean**

Sum = $99.0$  
Mean = $99.0 / 8 = 12.375 \approx 12.38$ cm

---

**2. Median**

Sorted:  
$12.3,\ 12.4,\ 12.4,\ 12.5,\ 12.5,\ 12.5,\ 12.6,\ 12.7$

Middle two values (4th and 5th): $12.5,\ 12.5$  
Median = $12.5$ cm

---

**3. Standard Deviation**

Compute deviations from the mean (rounded to three decimals):

$s \approx 0.13\ \text{cm}$

---

**4. Standard Error of the Mean**

$$
\text{SEM} = \frac{0.13}{\sqrt{8}} \approx 0.046\ \text{cm}
$$

---

**5. Final Reported Value**

$$12.38 \pm 0.05\ \text{cm}$$

(rounded uncertainty)
</div>

</details>

</div>

---

<div style="background-color:#ffe6e6; border-left:6px solid #cc0000; padding:14px; border-radius:4px;">
<h3 style="margin-top:0; color:#000000;">Danger Box — Standard Deviation vs. Standard Error</h3>

A **very common mistake** is confusing:

- **Standard deviation (s):** describes the *spread of the data*  
- **Standard error of the mean (SEM):** describes the *precision of the mean*

These two numbers can be very different.  
A small SEM does **not** mean the data themselves have low variation — only that the mean is well determined.

Never report SEM when you intend to describe the scatter in the measurements.

</div>

<hr style="height:2px;border-width:0;color:gray;background-color:gray">
