---
title: "Sampling distributions"
author: "Mr. Abraham"
execute:
  echo: false
format:
  revealjs:
    smaller: true
    scrollable: true
    incremental: true   
---

## Previously on AP Stats...
<br>
At the end of the previous unit, we explored the Geometric Distribution and the Binomial Distribution. In this unit, we consider what is called the sampling distribution, which is a term for describing the distribution of a statistic.
<br><br>
For example, if we take a survey of a random sample of a population and compute the proportion of “yes” responses to a particular question, then it’s helpful to consider the distribution of all possible sample proportions.
<br><br>
We explore sampling distributions for two common statistics: the sample proportion and the sample mean. We then consider the distributions of a difference of sample proportions and a difference of sample means. In each case, the goal is the same – to describe the center, spread, and shape of the sampling distribution and determine when the normal approximation to the sampling distribution is reasonable.  

## ..and NOW
:::{.incremental}
- **Sampling distribution for a sample proportion**
- **Sampling distribution for a sample mean**
- **Sampling distribution for a difference of proportions or means**  
:::
## Sampling distribution for a sample proportion
Often, instead of the number of successes in n trials, we are interested in the proportion of successes in n trials. We can use the sampling distribution for a sample proportion to answer questions such as the following:  
<br>  
- Given a fair coin, what is the probability that in 200 tosses you would get greater than 52% Tails just by random variation?    
<br>
- In a particular state, 48% support a controversial measure. When estimating the percent through polling, what is the probability that a random sample of size 200 will mistakenly estimate the percent support to be greater than 50%?  

## Learning objectives
:::{.incremental}
1. Understand the concept of a sampling distribution.
2. Describe the center, spread, and shape of the sampling distribution for a sample proportion.
3. Recognize the relationship between the distribution of a sample proportion and the corresponding binomial distribution.
4. Explain the Central Limit Theorem and what it says about the shape of the sampling distribution for a sample proportion.
5. Verify appropriate conditions and, if met, carry out normal approximation for a sample proportion or sample count.
:::

## The mean and standard deviation of $\hat{p}$  
To answer the two questions posed at the beginning of this section, we investigate the distribution of the sample proportion $\hat{p}$. 
<br><br>
Let's say we're trying to figure out the proportion of prom-goers who were victimized by Sam Mulbah's dance moves...
A random sample of $40$ students follows a binomial distribution with $n = 40$ and $p = 0.35$ that is centered on $14$ and has standard deviation $3.0$. What does the distribution of the proportion of injured people in a sample of size $40$ look like? 
<br><br>
To convert from a count to a proportion, we divide the count (i.e. number of people who said, "Yes, I was personally injured by Sam Mulbah") by the sample size, $n = 40$. For example, $8$ becomes $8/40 = 0.20$ as a proportion and $11$ becomes $11/40 = 0.275$.  We can find the general formula for the mean (expected value) and standard deviation of a sample proportion $\hat{p}$ using our tools that we’ve learned so far. To get the sample mean for $\hat{p}$, we divide the binomial mean $\mu_{\text{binomial}} = np$ by $n$:  
<br><br>
$$\mu_{\hat{p}} = \frac{\mu_{\text{binomial}}}{n} = \frac{np}{n} = p$$
<br><br>
As one might expect, the sample proportion $\hat{p}$ is centered on the true proportion $p$. Likewise, the standard deviation of $\hat{p}$ is equal to the standard deviation of the binomial distribution divided by $n$:
<br><br>
$$\sigma_{\hat{p}} = \frac{\sigma_{\text{binomial}}}{n} = \frac{\sqrt{np(1-p)}}{n} = \sqrt{\frac{p(1-p)}{n}}$$

## The mean and standard deviation of a sample proportion {background-color="steel blue"}

The mean and standard deviation of the sample proportion describe the center and spread of the distribution of all possible sample proportions $\hat{p}$ from a random sample of size $n$ with true population proportion $p$.
<br><br>
$$\mu_{\hat{p}} = p$$$$\sigma_{\hat{p}} = \sqrt{\frac{p(1-p)}{n}}$$

---

```{ojs}
d3 = require("d3@7")

jStat = require("jstat")

n = 40

p = 0.35

function binChart(n, p, useProportion = false, width = 350, height = 200) {
  const data = [];
  for (let x = 0; x <= n; x++) {
    const prob = jStat.binomial.pdf(x, n, p);
    data.push({
      x: useProportion ? x / n : x,
      label: x,
      y: prob
    });
  }

  const margin = {top: 10, right: 10, bottom: 35, left: 40};
  const svg = d3.create("svg")
    .attr("width", width)
    .attr("height", height);

  const x = d3.scaleLinear()
    .domain(d3.extent(data, d => d.x))
    .range([margin.left, width - margin.right]);

  const y = d3.scaleLinear()
    .domain([0, d3.max(data, d => d.y)]).nice()
    .range([height - margin.bottom, margin.top]);

  svg.append("g")
    .attr("transform", `translate(0,${height - margin.bottom})`)
    .call(d3.axisBottom(x).ticks(6).tickFormat(d3.format(useProportion ? ".2f" : "d")))
    .attr("font-size", "10px");

  svg.append("g")
    .attr("transform", `translate(${margin.left},0)`)
    .call(d3.axisLeft(y).ticks(5))
    .attr("font-size", "10px");

  svg.selectAll("rect")
    .data(data)
    .join("rect")
    .attr("x", d => x(d.x) - 2)
    .attr("y", d => y(d.y))
    .attr("width", Math.max(2, (width - margin.left - margin.right) / data.length - 1))
    .attr("height", d => y(0) - y(d.y))
    .attr("fill", "steelblue");

  svg.append("text")
    .attr("x", width / 2)
    .attr("y", margin.top + 5)
    .attr("text-anchor", "middle")
    .attr("font-size", "11px")
    .text(useProportion ? "Binomial (Proportion) — x/n" : "Binomial (Counts) — x");

  return svg.node();
}

html`<div style="display: flex; gap: 20px;">
  ${binChart(n, p, false)}
  ${binChart(n, p, true)}
</div>`

```
---

```{ojs}
ps = [0.10, 0.20, 0.50, 0.80, 0.90]

ns = [10, 25, 50, 100, 250]

function binomialChart(n, p, width = 200, height = 150) {
  const data = [];
  for (let k = 0; k <= n; k++) {
    data.push({ x: k, y: jStat.binomial.pdf(k, n, p) });
  }

  const margin = {top: 10, right: 10, bottom: 10, left: 10};
  const svg = d3.create("svg")
    .attr("width", width)
    .attr("height", height);

  const x = d3.scaleLinear()
    .domain([0, n])
    .range([margin.left, width - margin.right]);

  const y = d3.scaleLinear()
    .domain([0, d3.max(data, d => d.y)]).nice()
    .range([height - margin.bottom, margin.top]);

  svg.append("g")
    .attr("transform", `translate(0,${height - margin.bottom})`)
    .call(d3.axisBottom(x).ticks(5).tickFormat(d3.format("d")).tickSizeOuter(0))
    .attr("font-size", "8px");

  svg.append("g")
    .attr("transform", `translate(${margin.left},0)`)
    .call(d3.axisLeft(y).ticks(4))
    .attr("font-size", "8px");

  svg.selectAll("rect")
    .data(data)
    .join("rect")
    .attr("x", d => x(d.x) - 1)
    .attr("y", d => y(d.y))
    .attr("width", Math.max(1, (width - margin.left - margin.right) / (n + 1) - 1))
    .attr("height", d => y(0) - y(d.y))
    .attr("fill", "steelblue");

  svg.append("text")
    .attr("x", width / 2)
    .attr("y", margin.top + 2)
    .attr("text-anchor", "middle")
    .attr("font-size", "9px")
    .text(`n=${n}, p=${p}`);

  return svg.node();
}

html`${ps.map(p => html`<div style="display: flex; gap: 10px; margin-bottom: 10px;">
  ${ns.map(n => binomialChart(n, p))}
</div>`)}`


```