# Introduction to Probability & Sampling (with/without Replacement) — Python Notes



## 1) What are the chances? *(Motivation)*

* We casually ask “what are the chances?” (closing a sale, rain, winning).
* To answer rigorously, we use **probability**.

---

## 2) Measuring chance *(Definition & Simple Example)*

* **Probability of an event**

  $$
  P(\text{event})=\frac{\#\text{ favorable outcomes}}{\#\text{ possible outcomes}}
  $$

* Example (coin flip): 1 way to get **Heads** out of 2 outcomes → $\frac{1}{2}=0.5=50\%$.
* Probabilities range from **0%** (impossible) to **100%** (certain).
![image.png](attachment:79df112f-0679-48de-93c0-5617c03c8fcd.png)
---

## 3) Assigning salespeople *(Scenario setup)*

* We have a sales team; we put each person’s name on a ticket and pick randomly.

![image.png](attachment:36480ed6-9e6e-489d-a859-cae734823fd5.png)


* First pick: **Brian**. With 4 people, $P(\text{Brian}) = \frac{1}{4} = 25\%$.

### Code: Create a small DataFrame of salespeople

```python
import pandas as pd

# A small dataset of salespeople and their total sales
sales_counts = pd.DataFrame({
    "name":    ["Brian", "Claire", "Damian", "Amir"],
    "n_sales": [128,       75,       69,      178]
})

sales_counts
```

#### Expected Output

```
     name  n_sales
0   Brian      128
1  Claire       75
2  Damian       69
3    Amir      178
```

#### Line-by-line Explanation

* `import pandas as pd`

  * **What:** Imports pandas.
  * **Why:** We’ll use a DataFrame to mirror the “tickets in a box” with rows.
  * **Result:** `pd` refers to pandas.

* `sales_counts = pd.DataFrame({...})`

  * **What:** Builds a table with names and a made-up sales count.
  * **Why:** Provides concrete rows to sample from (one row per person).
  * **Result:** DataFrame with 4 rows (Brian, Claire, Damian, Amir).

* `sales_counts`

  * **What:** Displays the DataFrame.
  * **Why:** So we can see our “box” of names.
  * **Result/Meaning:** Confirms the four candidates we will sample from.

---

## 4) Sampling from a DataFrame *(Random single pick)*

* `DataFrame.sample()` randomly picks rows. By default, `n=1`.
![image.png](attachment:08ad5fac-1d35-4a5a-ab80-85f2e4d1c5f5.png)

 
### Code: Randomly sample 1 person (unseeded — output will vary)

However, if we run the same thing again, we may get a different row since the sample method chooses randomly
 
```python
# Randomly pick 1 row (person). Output will vary run-to-run if no seed is set.
one_pick = sales_counts.sample(1)
one_pick
```

#### Example Output (will vary)

```
     name  n_sales
0   Brian      128
```

#### Line-by-line Explanation

* `one_pick = sales_counts.sample(1)`

  * **What:** Randomly selects 1 row.
  * **Why:** Simulates drawing one ticket from the box.
  * **Result:** A 1-row DataFrame. The name can be any of the four.

* `one_pick`

  * **What:** Displays that sampled row.
  * **Why:** To see who was picked.
  * **Meaning:** Shows the randomly selected person for the meeting.

---

## 5) Setting a random seed *(Reproducibility)*

* To show the team exactly how Brian was picked (and get the **same** result each run), we set a **seed**.
* The transcript mentions `np.random.seed(...)`; pandas also offers `random_state=` on `.sample()` for deterministic results.
*  The number itself doesn't matter. We could use 5, 139, or 3 million. The only thing that matters is that we use the same seed the next time we run the script. Now, we, or one of the sales-team members, can run this code over and over and get Brian every time.



![image.png](attachment:9e7be85f-3820-43a5-ae14-3c8e68c2cb77.png)

### Code: Reproducible single pick using NumPy’s seed (matches transcript)

```python
import numpy as np

np.random.seed(139)           # set the global RNG seed
seeded_pick = sales_counts.sample(1)   # uses NumPy's RNG when random_state isn't provided
seeded_pick
```

#### Expected Output

```
     name  n_sales
0   Brian      128
```

#### Line-by-line Explanation

* `import numpy as np`

  * **What:** Imports NumPy.
  * **Why:** We’ll set the global random seed as per the transcript.
  * **Result:** `np` refers to NumPy.

* `np.random.seed(139)`

  * **What:** Fixes the starting state of the random number generator.
  * **Why:** Ensures the same “random” choice every run for demos/reviews.
  * **Result:** Subsequent random draws (including `sample()` without `random_state`) are reproducible.

* `seeded_pick = sales_counts.sample(1)`

  * **What:** Randomly selects 1 row (now deterministic due to the seed).
  * **Why:** Simulates the draw but reproducibly.
  * **Result:** Always returns the same row under this seed (here, **Brian**).

* `seeded_pick`

  * **What:** Displays the deterministic selection.
  * **Meaning:** Shows the reproducible choice; aligns with “we can get Brian every time” if we keep the same seed & code.

> **Alternative (also deterministic):**
> `sales_counts.sample(1, random_state=139)` — uses a dedicated per-call seed instead of the global NumPy seed.

---

## 6) A second meeting — *Sampling without replacement* *(Dependent events)*

* New constraint: **Brian is already taken**, so we pick from the remaining 3.
* This is **sampling without replacement**: we **do not return** the first name to the box.
* Now events are **dependent**: the second pick’s probabilities change based on the first.
* Now there's another potential client who wants to meet at the same time, so we need to pick another salesperson. Brian has already been picked and he can't be in two meetings at once, so we'll pick between the remaining three. This is called sampling without replacement, since we aren't replacing the name we already pulled out.

![image.png](attachment:73517649-6f9f-4fca-a513-4c551c30bef0.png)

### Code: Two people for simultaneous meetings (without replacement)

![image.png](attachment:bcf7738b-81cf-43de-94ab-0e3bc3ed8308.png)
* This time, Claire is picked, and the probability of this is one out of three, or about 33%.

#### Significance of the Output

* If **Claire** was picked **first**, $P(\text{Claire second}) = 0\%$ (can’t be picked again).
* If **someone else** was picked first, then for the **second pick** there are 3 names left and Claire is 1 of them → $\frac{1}{3} \approx 33\%$.

---

## 7) A second meeting — *Sampling with replacement* *(Independent events)*

* If the two meetings are on **different days**, the same person can attend both.
* This is **sampling with replacement**: after a pick, we **return** the name to the box.
* Now each pick is **independent**: the second pick’s probabilities **do not change** due to the first pick.
* To recreate this in Python, we can pass 2 into the sample method, which will give us 2 rows of the DataFrame.

### Code: Two independent picks (with replacement)

![image.png](attachment:4dd67d13-51c0-4756-b551-c6ca353f469c.png)

Now let's say the two meetings are happening on different days, so the same person could attend both. In this scenario, we need to return Brian's name to the box after picking it. This is called sampling with replacement.

![image.png](attachment:3be800a6-600e-4458-a943-c0d240eac596.png)

#### Significance of the Output

![image.png](attachment:26a7d2de-538f-4c7d-bdcd-778502682423.png)
Claire gets picked for the second meeting, but this time, the probability of picking her is 25%.

* Each pick is **independent** and has $P(\text{Claire}) = \frac{1}{4} = 25\%$, regardless of the first pick’s outcome.

---

## 8) Sampling multiple times with replacement *(Duplicates possible)*

* With replacement, duplicates across many picks are expected/allowed.
* Matches the slide idea: `sales_counts.sample(5, replace=True)` can repeat names.

![image.png](attachment:0110e790-ca07-43b3-8380-f59978657f20.png)

#### Line-by-line Explanation

* To sample with replacement, set the replace argument to True, so names can appear more than once. If there were 5 meetings, all at different times, it's possible to pick some rows multiple times since we're replacing them each time.

---

## 9) Independent vs Dependent Events *(Concept wrap-up)*

* **Independent events:**
  The outcome of the first event **does not** change the probability of the second.

  * **Example:** Sampling **with** replacement → each pick has the same probabilities.
  * **Claire second (with replacement):** $25\%$ regardless of who was first.

* **Dependent events:**
  The outcome of the first event **does** change the probability of the second.

  * **Example:** Sampling **without** replacement → the pool shrinks after each pick.
  * **Claire second (without replacement):**

    * If Claire was first → $0\%$.
    * If someone else was first → $\frac{1}{3} \approx 33\%$.

---

## 10) Key Takeaways

* **Sampling with replacement** → **independent** picks; probabilities stay constant each draw.
* **Sampling without replacement** → **dependent** picks; probabilities change as items are removed.
* Use **seeds** (`np.random.seed(...)` or `random_state=...`) for **reproducible** demonstrations.
* In pandas, `.sample(n, replace=...)` simulates both scenarios cleanly.

---



### Exercise
With or without replacement?
In the video, you learned about two different ways of taking samples: with replacement and without replacement. Although it isn't always easy to tell which best fits various situations, it's important to correctly identify this so that any probabilities you report are accurate. In this exercise, you'll put your new knowledge to the test and practice figuring this out.

Instructions

For each scenario, decide whether it's sampling with replacement or sampling without replacement.


![image.png](attachment:99d65fb7-5864-48d3-9c3b-c95db5f61021.png)

![image.png](attachment:7614aea3-d24b-4b12-b34d-38e0dacdd81e.png)



```python

# Count the deals for each product
counts = amir_deals['product'].value_counts()
print(counts)

<script.py> output:
    Product B    62
    Product D    40
    Product A    23
    Product C    15
    Product F    11
    Product H     8
    Product I     7
    Product E     5
    Product N     3
    Product G     2
    Product J     2
    Name: product, dtype: int64
```
2. Calculate the probability of selecting a deal for the different product types by dividing the counts by the total number of deals Amir worked on. Save this as probs.

```python
# Count the deals for each product
counts = amir_deals['product'].value_counts()

# Calculate probability of picking a deal with each product
probs = counts / len(amir_deals)
print(probs)

<script.py> output:
    Product B    0.348
    Product D    0.225
    Product A    0.129
    Product C    0.084
    Product F    0.062
    Product H    0.045
    Product I    0.039
    Product E    0.028
    Product N    0.017
    Product G    0.011
    Product J    0.011
    Name: product, dtype: float64

```
3. Question
* If you randomly select one of Amir's deals, what's the probability that the deal will involve Product C?

Possible answers

15%

80.43%

### 8.43%

22.5%

124.3%



### Exercise
Sampling deals
In the previous exercise, you counted the deals Amir worked on. Now it's time to randomly pick five deals so that you can reach out to each customer and ask if they were satisfied with the service they received. You'll try doing this both with and without replacement.

Additionally, you want to make sure this is done randomly and that it can be reproduced in case you get asked how you chose the deals, so you'll need to set the random seed before sampling from the deals.

Both pandas as pd and numpy as np are loaded and amir_deals is available.

Instructions 1/3

Set the random seed to 24.
Take a sample of 5 deals without replacement and store them as sample_without_replacement.

```python
# Set random seed
np.random.seed(24)

# Sample 5 deals without replacement
sample_without_replacement = amir_deals.sample(5)
print(sample_without_replacement)

<script.py> output:
         Unnamed: 0    product   client status   amount  num_users
    127         128  Product B  Current    Won  2070.25          7
    148         149  Product D  Current    Won  3485.48         52
    77           78  Product B  Current    Won  6252.30         27
    104         105  Product D  Current    Won  4110.98         39
    166         167  Product C      New   Lost  3779.86         11
In [2]:

```

2. Take a sample of 5 deals with replacement and save as sample_with_replacement

```python
# Set random seed
np.random.seed(24)

# Sample 5 deals with replacement
sample_with_replacement = amir_deals.sample(5, replace=True)
print(sample_with_replacement)

<script.py> output:
         Unnamed: 0    product   client status   amount  num_users
    162         163  Product D  Current    Won  6755.66         59
    131         132  Product B  Current    Won  6872.29         25
    87           88  Product C  Current    Won  3579.63          3
    145         146  Product A  Current    Won  4682.94         63
    145         146  Product A  Current    Won  4682.94         63
```
3. Question
What type of sampling is better to use for this situation?2

Possible answers


With replacement

Without replacement

It doesn't matter


# Introduction to Statistics in Python — Discrete Distributions & Sampling


---

## 1) Discrete Distributions

* A **probability distribution** assigns a probability to each possible outcome of a random process.
* **Example**: Rolling a fair six-sided die (outcomes 1–6) — each outcome has probability $1/6 \approx 0.1667$.
* The **expected value (mean)** of a distribution is the probability-weighted average of outcomes.

---

## 2) Rolling a Fair Die (Uniform Discrete Distribution)

* Outcomes: $\{1,2,3,4,5,6\}$
* Probabilities: all equal to $1/6$
* This is a **discrete uniform distribution** (all outcomes equally likely).

### Code: Represent a fair die as a DataFrame and compute its expected value

![image.png](attachment:5364b1cf-cd98-4fa7-b969-a4e78d022af2.png)

```python
import numpy as np
import pandas as pd

# Build a DataFrame for a fair die
die = pd.DataFrame({
    "number": [1, 2, 3, 4, 5, 6],
    "prob":   [1/6]*6
})

# Expected value (theoretical mean) of a fair die
ev_fair = np.mean(die["number"])
ev_fair
```

**Expected output**

```
3.5
```

**Line-by-line explanation**

* `import numpy as np; import pandas as pd`
  Imports NumPy and pandas for numerical work and tabular data.
* `die = pd.DataFrame({...})`
  Creates a table with two columns: `number` (1–6) and `prob` (=1/6 each). This encodes the distribution.
* `ev_fair = np.mean(die["number"])`
  For a uniform die, the expected value equals the arithmetic mean of 1..6, which is 3.5.
  (Equivalently: $\sum x \cdot p(x) = \frac{1+2+3+4+5+6}{6} = 3.5$.)
* Returning `ev_fair` shows the theoretical mean **3.5**.

**Why it matters**
The **expected value** summarizes the center of the distribution you’d expect **on average** over many repeated rolls.

---

## 3) Visualizing a Probability Distribution & “Probability = Area”

* A bar plot can show each **outcome** on the x-axis and its **probability** as bar height.

![image.png](attachment:dce3a58e-c6d7-4e10-970e-59624f022dc6.png)

### Probability = area
* We can calculate probabilities of different outcomes by taking areas of the probability distribution.

* For example, what's the probability that our die roll is less than or equal to 2? To figure this out,

* we'll take the area of each bar representing an outcome of 2 or less.

* For discrete distributions, the **probability of an event** (e.g., roll ≤ 2) is the **sum of bar areas** for outcomes in that event.
![image.png](attachment:0df9a00b-1a25-4a1e-bcc0-6488645c1086.png)
  * Fair die: $P(\text{roll} \le 2) = P(1) + P(2) = \frac{1}{6} + \frac{1}{6} = \frac{2}{6} = \frac{1}{3}$.

**Why it matters**
For **discrete** distributions, the probability of a compound event is just the **sum** of the relevant probabilities.

---

## 4) An “Uneven” Die (Reassigning Probability Mass)

> Scenario: The “2” face is turned into a “3”, so:
>
> * $P(2) = 0$
> * $P(3) = 1/3$ (since it inherits its original $1/6$ **plus** the $1/6$ from the “2”)
> * All other faces remain at $1/6$

![image.png](attachment:413720d0-b68b-4272-aed9-6a4e7f4f26b9.png)

### Visualizing uneven probabilities
* When we visualize these new probabilities, the bars are no longer even
  
![image.png](attachment:075716a2-413c-45ba-8369-d8bc6b27e4af.png)


**Why it matters**
Shifting probability mass toward larger outcomes **increases** the expected value.


* Compares to fair die’s $1/3$: the event “≤2” is **less likely** now.

![image.png](attachment:1521f84e-f63f-454c-9349-9541fe75d22b.png)

---

## 5) Sampling from a Discrete Distribution

> We simulate actual rolls to see how **samples** compare to the **theoretical** distribution.
![image.png](attachment:acb433ec-64b5-4ac5-b263-39885d978a8b.png)
> Just like we sampled names from a box, we can do the same thing with probability distributions like the ones we've seen. Here's a DataFrame called die that represents a fair die, and its expected value is 3-point-5. We'll sample from it 10 times to simulate 10 rolls. Notice that we sample with replacement so that we're sampling from the same distribution every time.

**Significance of the output**
Even though each face is equally likely, a **small sample** can be uneven (e.g., more 1s and 3s). The sample mean (**3.0**) is **not exactly** the theoretical mean (**3.5**)—that variation is normal for small $n$.

### Code: Visualize the 10-roll sample

```python
import matplotlib.pyplot as plt
import numpy as np

# Histogram with bins aligned to die faces
rolls_10["number"].hist(bins=np.linspace(1, 7, 7))
plt.xlabel("Die face")
plt.ylabel("Frequency")
plt.title("Histogram of 10 simulated fair-die rolls")
plt.show()
```
![image.png](attachment:39fa0cef-b459-4b58-bc5a-b3a9d207767b.png)

**Expected output (description)**
A histogram with 6 bins (for faces 1–6). The bar heights show how many times each face appeared in the 10 rolls. They won’t be equal in such a small sample.

**Line-by-line explanation**

* `rolls_10["number"].hist(...)`
  Draws a histogram of observed faces. `bins=np.linspace(1,7,7)` creates edges at 1,2,3,4,5,6 (right-open at 7).
* `plt.xlabel`, `plt.ylabel`, `plt.title`
  Label and title the chart.
* `plt.show()`
  Renders the plot.

---

## 6) Sample Distribution vs. Theoretical Distribution (Growing Sample Size)

> As we increase the number of rolls, the sample looks more like the theoretical distribution, and the sample mean approaches 3.5.
![image.png](attachment:7c2226cd-8c07-415e-8a14-d9275363f653.png)


**Significance of the output**

* $n=10$ → mean **3.0** (farther from 3.5)
* $n=100$ → mean **3.4** (closer)
* $n=1000$ → mean **3.48** (very close)
  This illustrates the **Law of Large Numbers**.

---

## 7) Law of Large Numbers (LLN)

* **Statement**: As sample size $n \to \infty$, the sample mean $\bar{X}_n$ **approaches** the theoretical mean $E[X]$.
* **Here**: As we roll the die more times, the histogram flattens toward equal bars, and the sample mean approaches **3.5**.

---

## Quick Reference Table (from the slides)

| Sample size | Sample mean |
| ----------: | ----------: |
|          10 |        3.00 |
|         100 |        3.40 |
|        1000 |        3.48 |

The **theoretical** mean remains **3.5**.

---

## Key Takeaways

* **Discrete distributions** assign probabilities to countable outcomes (e.g., die faces).
* A **discrete uniform** distribution has equal probability for each outcome (fair die: each $=1/6$).
* The **expected value** (mean) summarizes the long-run average outcome (fair die: **3.5**).
* **Probability as area/sum** (discrete case): add bar heights (probabilities) over the outcomes in the event.
* **Changing probabilities** (e.g., moving mass from 2 to 3) shifts the expected value (from **3.5** to **\~3.667** here).
* **Sampling** shows randomness: small samples vary; larger samples better reflect the true distribution.
* **Law of Large Numbers**: with more data, the **sample mean** converges to the **theoretical mean**.


### Exercise
Creating a probability distribution
A new restaurant opened a few months ago, and the restaurant's management wants to optimize its seating space based on the size of the groups that come most often. On one night, there are 10 groups of people waiting to be seated at the restaurant, but instead of being called in the order they arrived, they will be called randomly. In this exercise, you'll investigate the probability of groups of different sizes getting picked first. Data on each of the ten groups is contained in the restaurant_groups DataFrame.

Remember that expected value can be calculated by multiplying each possible outcome with its corresponding probability and taking the sum. The restaurant_groups data is available. pandas is loaded as pd, numpy is loaded as np, and matplotlib.pyplot is loaded as plt.

Instructions 1/4

1. Create a histogram of the group_size column of restaurant_groups, setting bins to [2, 3, 4, 5, 6]. Remember to show the plot.

```python
# Create a histogram of restaurant_groups and show plot
restaurant_groups['group_size'].hist(bins=[2,3,4,5,6])
plt.show()
```
![image.png](attachment:d1ae4727-d387-4796-b91d-be18003a2e82.png)

2. Count the number of each group_size in restaurant_groups, then divide by the number of rows in restaurant_groups to calculate the probability of randomly selecting a group of each size. Save as size_dist.
Reset the index of size_dist.
Rename the columns of size_dist to group_size and prob.

```python
# Create probability distribution
size_dist = restaurant_groups['group_size'].value_counts() / len(restaurant_groups['group_size']) 

# Reset index and rename columns
size_dist = size_dist.reset_index()
size_dist.columns = ['group_size', 'prob']

print(size_dist)

<script.py> output:
       group_size  prob
    0           2   0.6
    1           4   0.2
    2           6   0.1
    3           3   0.1

```
3.  Calculate the expected value of the size_dist, which represents the expected group size, by multiplying the group_size by the prob and taking the sum.

```python
# Create probability distribution
size_dist = restaurant_groups['group_size'].value_counts() / restaurant_groups.shape[0]
# Reset index and rename columns
size_dist = size_dist.reset_index()
size_dist.columns = ['group_size', 'prob']

# Calculate expected value
expected_value = np.sum(size_dist['group_size'] * size_dist['prob'])
print(expected_value)


<script.py> output:
    2.9000000000000004
    
```
4. Calculate the probability of randomly picking a group of 4 or more people by subsetting for groups of size 4 or more and summing the probabilities of selecting those groups.

```python
# Create probability distribution
size_dist = restaurant_groups['group_size'].value_counts() / restaurant_groups.shape[0]
# Reset index and rename columns
size_dist = size_dist.reset_index()
size_dist.columns = ['group_size', 'prob']

# Expected value
expected_value = np.sum(size_dist['group_size'] * size_dist['prob'])

# Subset groups of size 4 or more
groups_4_or_more = size_dist[size_dist['group_size'] >= 4]

# Sum the probabilities of groups_4_or_more
prob_4_or_more = groups_4_or_more['prob'].sum()
print(prob_4_or_more)



<script.py> output:
    0.30000000000000004
```


# Introduction to Statistics in Python: Continuous Distributions

## 1. Continuous Distributions

- **Discrete distributions** model countable events (e.g., number of heads in coin tosses).
- **Continuous distributions** model variables that can take *any* value in a range (e.g., waiting time for a bus).
- For continuous variables, there are infinitely many possible values.

---

## 2. Example: Waiting for the Bus

Suppose:
- The city bus arrives *exactly* every 12 minutes.
- You arrive at a random time.
- Your possible waiting time ranges from **0** minutes (you just catch the bus) to **12** minutes (you just missed it).

![image.png](attachment:605b5ea2-c576-4606-a265-32e4f495851e.png)

---

## 3. The Continuous Uniform Distribution

- For this bus scenario, **any wait time between 0 and 12 minutes is equally likely**.
- This is called a **continuous uniform distribution**.
    - All intervals of the same length within [0, 12] are equally probable.
- Unlike discrete, we cannot list every possible wait time (infinite possibilities!).
- We represent the distribution as a **flat line** (constant probability density).

---

## 4. Probability as Area

- In continuous distributions, **probability = area** under the curve (or line).
- For the uniform distribution over [0, 12]:
    - **Height** of the line: `1/12` (since total area = 1).
    - **Width**: the interval we're interested in.
    - **Area** for interval `[a, b]`: `(b - a) * (1/12)`.

---

### Example: Probability of Waiting Between 4 and 7 Minutes

- **Width**: `7 - 4 = 3`
- **Height**: `1/12`
- **Area (probability)**: `3 * (1/12) = 1/4 = 0.25` (**25%**)

---

## 5. Calculating Probabilities in Python

We can use `scipy.stats.uniform` to calculate probabilities for the uniform distribution.

### **Code Example:** Probability of Waiting ≤ 7 Minutes

```python
from scipy.stats import uniform

# Cumulative probability of waiting <= 7 minutes in [0, 12]
prob_7_or_less = uniform.cdf(7, 0, 12)
print(prob_7_or_less)
```

#### **Output:**
```
0.5833333333333334
```

#### **Explanation (Line by Line):**

1. `from scipy.stats import uniform`
    - **What:** Imports the `uniform` distribution from `scipy.stats`.
    - **Why:** Allows us to use methods for the uniform distribution (e.g., `cdf`, `rvs`).
    - **Result:** We can now access uniform distribution functions.
2. `prob_7_or_less = uniform.cdf(7, 0, 12)`
    - **What:** Calculates the cumulative distribution function (CDF) up to 7.
    - **Why:** The CDF gives the probability of a random variable being less than or equal to 7.
    - **Parameters:**
        - `7`: The value up to which to compute the probability.
        - `0`: Lower bound of distribution.
        - `12`: Width (scale) of the distribution.
    - **Result:** Returns the probability of waiting ≤ 7 minutes.
3. `print(prob_7_or_less)`
    - **What:** Prints the calculated probability.
    - **Why:** To view the result.
    - **Result:** Output is `0.583...` or about **58%**.

**Significance:**  
There is about a **58% chance** you will wait **7 minutes or less**.

---

### **Code Example:** Probability of Waiting Between 4 and 7 Minutes

To find the probability of waiting **between 4 and 7 minutes**, subtract the probability of waiting ≤ 4 minutes from that of waiting ≤ 7 minutes.

```python
from scipy.stats import uniform

prob_between_4_and_7 = uniform.cdf(7, 0, 12) - uniform.cdf(4, 0, 12)
print(prob_between_4_and_7)
```

#### **Output:**
```
0.25
```

#### **Explanation (Line by Line):**

1. `from scipy.stats import uniform`
    - **As above.**
2. `prob_between_4_and_7 = uniform.cdf(7, 0, 12) - uniform.cdf(4, 0, 12)`
    - **What:** Computes the probability of being between 4 and 7.
    - **Why:** Subtracts cumulative probability up to 4 from cumulative up to 7.
    - **Result:** Gives the probability for the interval [4, 7].
3. `print(prob_between_4_and_7)`
    - **As above.**

**Significance:**  
There is a **25% chance** of waiting **between 4 and 7 minutes**.

---

## 6. Total Area and Certainty

- **Total area under the uniform distribution** (from 0 to 12) = 1.
- This means **certainty**: you *must* wait some time between 0 and 12 minutes.

### **Calculation:**

- Probability of waiting between 0 and 12 minutes:
    - `(12 - 0) * (1/12) = 12 * (1/12) = 1` (**100%**)

---

## 7. Generating Random Numbers from a Uniform Distribution

You can generate random samples using `uniform.rvs()`.

### **Code Example:** Generate 10 Random Wait Times in [0, 5]

```python
from scipy.stats import uniform

random_waits = uniform.rvs(0, 5, size=10)
print(random_waits)
```

#### **Output:**  
(Example output — your numbers will differ due to randomness.)

```
[1.89740094 4.70673196 0.33224683 1.0137103  2.31641255 3.49969897
 0.29688598 0.92057234 4.71086658 1.56815855]
```

#### **Explanation (Line by Line):**

1. `from scipy.stats import uniform`
    - **As above.**
2. `random_waits = uniform.rvs(0, 5, size=10)`
    - **What:** Draws 10 random samples from a uniform distribution over [0, 5].
    - **Parameters:**
        - `0`: Lower bound.
        - `5`: Width (upper bound is 0 + 5 = 5).
        - `size=10`: Number of random values.
    - **Why:** To simulate random wait times or generate synthetic data.
    - **Result:** `random_waits` is an array of 10 numbers between 0 and 5.
3. `print(random_waits)`
    - **As above.**

**Significance:**  
Simulating data helps visualize and test statistical models.

---

## 8. Other Continuous Distributions

- Not all continuous distributions are uniform.
- **Other shapes:** Some values may be more probable than others (e.g., normal, exponential).
- **Key property:** The **total area under any continuous probability distribution is always 1** (represents certainty).

---

### Special Distributions:

- **Normal distribution:** Bell-shaped curve, common in nature (e.g., heights, test scores).
- **Exponential distribution:** Models waiting times for events that occur continuously and independently (e.g., radioactive decay, time between arrivals).

---

## 9. Practice

- Practice is essential for mastering continuous distributions.
- Try computing probabilities and generating samples using `scipy.stats`.

---

# Summary

- **Continuous uniform distribution** models equally likely outcomes over an interval.
- **Probability** is calculated as the **area under the curve** (rectangle for uniform).
- **Python tools**: Use `scipy.stats.uniform` for calculations and simulations.
- **Total area = 1**: Certainty that some value in the range will occur.
- **Other distributions** (normal, exponential) have different shapes but the same area rule.

---

**End of notes.**