### Sampling Distribution of a Sample Proportion

#### Theory
Imagine a large population where a certain proportion, `p`, has a specific characteristic (e.g., the proportion of all voters who support a candidate). It's usually impossible to ask everyone, so we take a sample.

A **sample proportion**, denoted **p̂** (read "p-hat"), is the proportion of that characteristic found in our single sample. `p̂ = (number of successes in sample) / (sample size n)`.

Now, imagine we take *many* different random samples of the same size `n` from that population. Each sample would likely have a slightly different `p̂`. The **sampling distribution of the sample proportion** is the distribution formed by all these possible `p̂` values.

This distribution has a predictable center and spread:
*   **Mean (Center):** The mean of all the possible `p̂` values is equal to the true population proportion, `p`.
    *   **μ_{p̂} = p**
*   **Standard Deviation (Spread):** This measures the typical distance between a sample proportion (`p̂`) and the population proportion (`p`). It's often called the **standard error of the proportion**.
    *   **σ_{p̂} = √[ p(1-p) / n ]**
    *   Notice that as the sample size `n` gets larger, the standard deviation gets smaller, meaning our sample proportions will be clustered more tightly around the true mean `p`.

#### Calculation Example
Suppose that in a large city, **60%** (`p = 0.60`) of residents own a pet. We decide to take random samples of **100** people (`n=100`).

*   The sampling distribution of `p̂` will have a **mean** of:
    *   `μ_{p̂} = p = 0.60`
*   It will have a **standard deviation** of:
    *   `σ_{p̂} = √[ 0.60(1-0.60) / 100 ] = √[ 0.24 / 100 ] = √0.0024 ≈ 0.049`

**Interpretation:** This means if we repeatedly take samples of 100 residents, the sample proportion of pet owners will, on average, be 60%. A typical sample proportion is expected to be about 0.049 (or 4.9 percentage points) away from this true mean.

#### Real-Life Usage
*   **Election Polling:** A pollster surveys 1,000 people and finds `p̂ = 55%` support for a candidate. The sampling distribution tells them how much this `p̂` is likely to vary from the true population support `p` just due to random sampling error.

---

### Normal Conditions for Sampling Distributions of Sample Proportions

#### Theory
The Central Limit Theorem tells us that under the right conditions, the shape of the sampling distribution of `p̂` is **approximately Normal**. This is incredibly useful because it allows us to use Z-scores to calculate probabilities.

The two conditions that must be met are:
1.  **10% Condition (for Independence):** The sample size `n` should be no more than 10% of the population size `N` (`n ≤ 0.10N`). This allows us to assume that individual selections are independent even when sampling without replacement.
2.  **Large Counts Condition (for Normality):** The sample size `n` must be large enough so that we expect to see at least 10 "successes" and at least 10 "failures". This ensures the Normal curve is a good approximation for the shape of the distribution.
    *   **np ≥ 10**  and  **n(1-p) ≥ 10**

#### Calculation Example
Using our pet ownership example: `p = 0.60`, `n = 100`.
1.  **10% Condition:** We assume the city has more than 1,000 residents (10 * 100), so this condition is met.
2.  **Large Counts Condition:**
    *   `np = 100 * 0.60 = 60`. This is ≥ 10.
    *   `n(1-p) = 100 * 0.40 = 40`. This is ≥ 10.

Since both conditions are met, we can say that the sampling distribution of `p̂` is approximately **Normal** with a mean of 0.60 and a standard deviation of 0.049. We write this as `p̂ ~ N(μ=0.60, σ=0.049)`.

#### Real-Life Usage
*   **Hypothesis Testing:** Before a researcher can perform a Z-test for a proportion, they *must* check these conditions. If the conditions fail, the P-value calculated from the Normal model will be inaccurate.

---

### Probability of Sample Proportions

#### Theory
Once we've confirmed our sampling distribution is approximately Normal, we can find the probability of observing a sample proportion (`p̂`) within a certain range. The process is the same as finding probabilities for any Normal distribution: **calculate a Z-score**.

The Z-score formula for a sample proportion is:
**Z = (p̂ - p) / σ_{p̂}**
*   `p̂` is the sample proportion value we are interested in.
*   `p` is the mean of the distribution (the true population proportion).
*   `σ_{p̂}` is the standard deviation of the distribution.

#### Calculation Example
Continuing our pet example (`p=0.60, n=100, μ_{p̂}=0.60, σ_{p̂}=0.049`), what is the probability that a random sample of 100 residents will have a sample proportion of **less than 55%** (`p̂ < 0.55`)?

1.  **Conditions:** We already verified them. The distribution is Normal.
2.  **Find the Z-score:**
    *   `Z = (0.55 - 0.60) / 0.049 = -0.05 / 0.049 ≈ -1.02`
3.  **Find the probability:** We look up the probability for `Z < -1.02` using a Z-table or software.
    *   `P(Z < -1.02) ≈ 0.1539`

**Interpretation:** There is about a 15.4% chance of getting a sample proportion of less than 55% pet owners, even when the true proportion is 60%.

#### Real-Life Usage
*   **A/B Testing:** A company's website has a historical click-through rate of 10% (`p=0.10`). After a redesign, they show the new site to 500 visitors (`n=500`). They can calculate the probability of observing a new rate of 13% (`p̂=0.13`) or higher *if the redesign had no real effect*. If this probability is very low, they can conclude the redesign was successful.

---

### Sampling Distribution of the Difference in Sample Proportions

#### Theory
Often, we want to compare two different populations. For example, do more men than women support a certain law? Here, we are interested in the **difference between two sample proportions**, `p̂₁ - p̂₂`.

The **sampling distribution of the difference** is the distribution of all possible values of `p̂₁ - p̂₂` if we were to take many samples from both populations.

*   **Mean (Center):** The mean of the distribution is the true difference between the population proportions.
    *   **μ_{p̂₁ - p̂₂} = p₁ - p₂**
*   **Standard Deviation (Spread):** To find the variance of the difference, we **add** the individual variances. The standard deviation is the square root of that sum.
    *   `σ²_{p̂₁ - p̂₂} = σ²_{p̂₁} + σ²_{p̂₂} = [p₁(1-p₁) / n₁] + [p₂(1-p₂) / n₂]`
    *   **σ_{p̂₁ - p̂₂} = √[ (p₁(1-p₁)/n₁) + (p₂(1-p₂)/n₂) ]**

The **conditions for Normality** must hold for **both** samples independently.

#### Calculation Example
Let's say in City A, the true support for a project is 60% (`p₁=0.60`), and we sample 100 people (`n₁=100`). In City B, support is 52% (`p₂=0.52`), and we sample 150 people (`n₂=150`).

*   **Mean of the difference:**
    *   `μ_{p̂₁ - p̂₂} = 0.60 - 0.52 = 0.08`
*   **Standard deviation of the difference:**
    *   `σ_{p̂₁ - p̂₂} = √[ (0.60*0.40/100) + (0.52*0.48/150) ]`
    *   `σ_{p̂₁ - p̂₂} = √[ 0.0024 + 0.001664 ] = √0.004064 ≈ 0.0637`

**Interpretation:** The sampling distribution for the difference `p̂₁ - p̂₂` will be approximately Normal with a mean of 0.08 and a standard deviation of 0.0637.

#### Real-Life Usage
*   **Medical Studies:** Researchers want to compare the effectiveness of a new drug versus a placebo. `p₁` is the proportion of patients who recover with the drug, and `p₂` is the proportion who recover with the placebo. They analyze the distribution of `p̂₁ - p̂₂` to see if the new drug produces a statistically significant improvement.

***

### Python Code Illustration


In [None]:
import numpy as np
from scipy.stats import norm

# --- Part 3: Probability of a Single Sample Proportion ---
print("--- Probability for a Single Sample Proportion ---")
# Example: True pet ownership p=0.60. Sample size n=100.
# What is the probability of observing a sample proportion p̂ < 0.55?

p = 0.60
n = 100

# 1. Check Normal conditions
if (n * p >= 10 and n * (1-p) >= 10):
    print("Large Counts Condition is met. Normal model is appropriate.")
else:
    print("Normal model is not appropriate.")

# 2. Calculate the mean and standard deviation of the sampling distribution
mu_phat = p
sigma_phat = np.sqrt(p * (1-p) / n)
print(f"Mean (μ_p̂): {mu_phat:.4f}")
print(f"Std Dev (σ_p̂): {sigma_phat:.4f}\n")

# 3. Calculate the probability P(p̂ < 0.55)
phat = 0.55
# We use norm.cdf() which gives the cumulative probability up to a value
probability = norm.cdf(phat, loc=mu_phat, scale=sigma_phat)
print(f"The probability of the sample proportion being less than {phat} is: {probability:.4f}")
print("-" * 50)


# --- Part 4: Probability for the Difference in Sample Proportions ---
print("\n--- Probability for the Difference in Proportions ---")
# Example: City A (p1=0.60, n1=100) vs City B (p2=0.52, n2=150).
# What is the probability that a sample from City A has a GREATER proportion
# of support than a sample from City B? i.e., P(p̂1 - p̂2 > 0)

p1, n1 = 0.60, 100
p2, n2 = 0.52, 150

# 1. Check Normal conditions for both
if (n1*p1>=10 and n1*(1-p1)>=10 and n2*p2>=10 and n2*(1-p2)>=10):
    print("Normal conditions met for both samples.")
else:
    print("Normal conditions not met.")

# 2. Calculate the mean and standard deviation of the difference
mu_diff = p1 - p2
sigma_diff = np.sqrt( (p1*(1-p1)/n1) + (p2*(1-p2)/n2) )
print(f"Mean of the difference (μ_diff): {mu_diff:.4f}")
print(f"Std Dev of the difference (σ_diff): {sigma_diff:.4f}\n")

# 3. Calculate the probability P(p̂1 - p̂2 > 0)
# This is 1 - P(p̂1 - p̂2 <= 0)
# We want the area to the RIGHT of 0, so we use 1 - norm.cdf()
prob_diff_gt_0 = 1 - norm.cdf(0, loc=mu_diff, scale=sigma_diff)
print(f"The probability of the sample proportion from City A being greater than City B is: {prob_diff_gt_0:.4f}")
