### Practice Problems – Random Numbers

In this section you will find a small set of **exam-style problems** about:

* pseudo-random number generation with `random`
* seeding and reproducibility
* empirical frequency distributions
* simple Monte-Carlo simulations

Each problem is followed by a **worked solution** in code.


In [1]:
import random
import math
from collections import Counter
from typing import Any, Dict, Hashable, Iterable, Mapping, Tuple


In [2]:
def freq_distribution(data: Iterable[Hashable]) -> Dict[Hashable, int]:
    """Return a frequency distribution (histogram) for the given finite iterable."""
    return dict(Counter(data))


def relative_freq(freq_dist: Mapping[Hashable, int]) -> Dict[Hashable, float]:
    """Convert a frequency distribution to relative frequencies in percent."""
    total = sum(freq_dist.values())
    if total == 0:
        return {k: 0.0 for k in freq_dist}
    return {k: v / total * 100 for k, v in freq_dist.items()}


def chart_freq(data: Iterable[Tuple[Any, float]], bar_char: str = "*") -> None:
    """Print a horizontal bar chart for (key, percentage) pairs.

    Percentages are rounded to the nearest integer.
    """
    data = list(data)
    if not data:
        print("No data to chart.")
        return

    pad = max(len(str(key)) for key, _ in data)
    for key, value in data:
        bar = bar_char * round(value)
        print(f"{str(key).rjust(pad)}| {bar}")


#### Problem 1 – Dice Sum Distribution

Simulate rolling **two fair six-sided dice**.

1. Write a function `dice_sum_distribution(num_rolls: int, seed: int | None = None)` that:
   * optionally sets the seed (if `seed` is not `None`);
   * performs `num_rolls` simulations of rolling two dice using `random.randint(1, 6)`;
   * returns a **relative frequency distribution** (in percent) of the sums (2–12).

2. Call the function with `num_rolls = 50_000` and `seed = 0`, and display the result using
   `chart_freq`, with the sums ordered from 2 to 12.

3. Based on the output, comment on whether the distribution looks reasonable compared to
   the theoretical result that the sum `7` is the most likely outcome.


##### Solution – Problem 1


In [3]:
def dice_sum_distribution(num_rolls: int, seed: int | None = None) -> Dict[int, float]:
    """Simulate rolling two six-sided dice and return relative frequencies of sums.

    Args:
        num_rolls: Number of simulated rolls (must be positive).
        seed: Optional seed for reproducibility.

    Returns:
        A dict mapping each possible sum (2–12) to its relative frequency in percent.
    """
    if num_rolls <= 0:
        raise ValueError("num_rolls must be a positive integer.")

    if seed is not None:
        random.seed(seed)

    sums = []
    for _ in range(num_rolls):
        die1 = random.randint(1, 6)
        die2 = random.randint(1, 6)
        sums.append(die1 + die2)

    freq = freq_distribution(sums)
    rel = relative_freq(freq)

    # Ensure all sums from 2 to 12 appear explicitly (even if 0%).
    for s in range(2, 13):
        rel.setdefault(s, 0.0)

    return rel


# Run the simulation and chart it
rel_sums = dice_sum_distribution(50_000, seed=0)
sorted_items = sorted(rel_sums.items(), key=lambda item: item[0])
chart_freq(sorted_items)


 2| ***
 3| ******
 4| ********
 5| ***********
 6| **************
 7| *****************
 8| **************
 9| ***********
10| ********
11| ******
12| ***


You should see that the bar for sum `7` is the tallest, sums `6` and `8` are the next tallest,
and so on, matching the well-known triangular distribution of two dice.


#### Problem 2 – Biased Coin Simulation

Consider a **biased coin** that lands on heads with probability `p` (and tails with probability
`1 - p`).

1. Implement a function  
   `simulate_coin_flips(n: int, p: float, seed: int | None = None) -> Dict[str, float]` that
   uses `random.random()` to simulate `n` flips and returns a **relative frequency** dictionary
   like `{"H": ..., "T": ...}` (percentages).

   * Validate that `0 <= p <= 1` and `n > 0`.
   * Use the convention: if `random.random() < p`, treat it as heads; otherwise, tails.

2. Use your function to simulate:
   * `n = 100_000`, `p = 0.3`, `seed = 0`.
   * Print the resulting relative frequencies.

3. Compare the empirical frequency of heads with the theoretical value of `30%`.


##### Solution – Problem 2


In [4]:
def simulate_coin_flips(n: int, p: float, seed: int | None = None) -> Dict[str, float]:
    """Simulate flipping a biased coin n times and return relative frequencies.

    Args:
        n: Number of flips (must be positive).
        p: Probability of heads (0 <= p <= 1).
        seed: Optional seed for reproducibility.

    Returns:
        A dict with keys "H" and "T" mapping to relative frequencies in percent.
    """
    if n <= 0:
        raise ValueError("n must be a positive integer.")
    if not (0.0 <= p <= 1.0):
        raise ValueError("p must be between 0 and 1 (inclusive).")

    if seed is not None:
        random.seed(seed)

    outcomes = []
    for _ in range(n):
        if random.random() < p:
            outcomes.append("H")
        else:
            outcomes.append("T")

    freq = freq_distribution(outcomes)
    rel = relative_freq(freq)

    # Ensure both keys are present
    rel.setdefault("H", 0.0)
    rel.setdefault("T", 0.0)

    return rel


# Run the experiment and inspect the result
coin_rel = simulate_coin_flips(100_000, p=0.3, seed=0)
print(coin_rel)


{'T': 69.852, 'H': 30.148000000000003}


The relative frequency for heads should be **close to 30%**, while tails will be close to **70%**.
Differences are due to random sampling error and will shrink as `n` grows.


#### Problem 3 – Reproducibility With Seeds

You are asked to show concretely how **seeding** controls reproducibility.

1. Implement a function  
   `generate_uniform_samples(n: int, seed: int | None = None) -> list[float]`  
   that returns a list of `n` pseudo-random floats from `[0.0, 1.0)` using `random.random()`.

2. Write a small experiment that:
   * Generates a list `a` with `seed = 42`.
   * Generates a list `b` with `seed = 42`.
   * Generates a list `c` with `seed = 43`.

3. Verify and **print** whether:
   * `a` and `b` are equal (same elements and order),
   * `a` and `c` are equal or not.

4. Explain the result.


##### Solution – Problem 3


In [5]:
from typing import List, Optional


def generate_uniform_samples(n: int, seed: Optional[int] = None) -> List[float]:
    """Generate n pseudo-random floats in [0.0, 1.0).

    If a seed is provided, the sequence is reproducible.
    """
    if n <= 0:
        raise ValueError("n must be a positive integer.")

    if seed is not None:
        random.seed(seed)

    return [random.random() for _ in range(n)]


# Reproducibility experiment
a = generate_uniform_samples(5, seed=42)
b = generate_uniform_samples(5, seed=42)
c = generate_uniform_samples(5, seed=43)

print("a:", a)
print("b:", b)
print("c:", c)
print("a == b ?", a == b)
print("a == c ?", a == c)


a: [0.6394267984578837, 0.025010755222666936, 0.27502931836911926, 0.22321073814882275, 0.7364712141640124]
b: [0.6394267984578837, 0.025010755222666936, 0.27502931836911926, 0.22321073814882275, 0.7364712141640124]
c: [0.038551839337380045, 0.6962243226370528, 0.14393322139536102, 0.46253225482908755, 0.671646764117767]
a == b ? True
a == c ? False


Because the same seed (`42`) is used for generating `a` and `b`, they are **identical** lists.
Changing the seed to `43` produces a **different** sequence, so `c` does not match `a`.

This illustrates that Python's PRNG is **deterministic** given a seed: same seed, same sequence.


#### Problem 4 – Comparing Uniform and Normal Distributions

We want to compare visually how a **uniform** distribution differs from a **normal** distribution
using the ASCII charting utility `chart_freq`.

1. Implement a function

   ```python
   def analyze_uniform_ints(n: int, a: int, b: int, seed: int | None = None) -> None:
       ...
   ```

   that:
   * optionally sets the seed,
   * generates `n` random integers in `[a, b]` using `random.randint`,
   * computes relative frequencies,
   * sorts them by the key, and
   * displays them with `chart_freq`.

2. Implement a function

   ```python
   def analyze_normal_scaled(n: int, mu: float, sigma: float, seed: int | None = None) -> None:
       ...
   ```

   that:
   * optionally sets the seed,
   * generates `n` samples from a normal distribution using `random.gauss(mu, sigma)`,
   * scales by `10` and rounds to form integer buckets,
   * computes relative frequencies,
   * discards buckets whose rounded percentage is `0`,
   * sorts and charts them with `chart_freq`.

3. Use both functions with:

   * `analyze_uniform_ints(20_000, 1, 10, seed=0)`
   * `analyze_normal_scaled(20_000, 0.0, 1.0, seed=0)`

   and describe the qualitative difference between the two ASCII histograms.


##### Solution – Problem 4


In [6]:
def analyze_uniform_ints(n: int, a: int, b: int, seed: int | None = None) -> None:
    """Generate n random integers in [a, b] and chart their relative frequencies."""
    if n <= 0:
        raise ValueError("n must be a positive integer.")
    if a > b:
        raise ValueError("a must be less than or equal to b.")

    if seed is not None:
        random.seed(seed)

    data = [random.randint(a, b) for _ in range(n)]
    freq = freq_distribution(data)
    rel = relative_freq(freq)

    sorted_items = sorted(rel.items(), key=lambda item: item[0])
    chart_freq(sorted_items)


def analyze_normal_scaled(n: int, mu: float, sigma: float, seed: int | None = None) -> None:
    """Generate n N(mu, sigma) samples, bucket them, and chart relative frequencies."""
    if n <= 0:
        raise ValueError("n must be a positive integer.")
    if sigma <= 0:
        raise ValueError("sigma must be positive.")

    if seed is not None:
        random.seed(seed)

    # Scale by 10 and round to obtain integer buckets
    data = [round(10 * random.gauss(mu, sigma)) for _ in range(n)]
    freq = freq_distribution(data)
    rel = relative_freq(freq)

    # Keep only buckets that show up in the chart
    filtered = {k: v for k, v in rel.items() if round(v) > 0}
    sorted_items = sorted(filtered.items(), key=lambda item: item[0])
    chart_freq(sorted_items)


print("Uniform integers in [1, 10]:")
analyze_uniform_ints(20_000, 1, 10, seed=0)

print("\nApproximate normal distribution (mu=0, sigma=1):")
analyze_normal_scaled(20_000, 0.0, 1.0, seed=0)


Uniform integers in [1, 10]:
 1| **********
 2| **********
 3| **********
 4| **********
 5| **********
 6| **********
 7| **********
 8| **********
 9| **********
10| **********

Approximate normal distribution (mu=0, sigma=1):
-19| *
-18| *
-17| *
-16| *
-15| *
-14| **
-13| **
-12| **
-11| **
-10| **
 -9| ***
 -8| ***
 -7| ***
 -6| ***
 -5| ***
 -4| ****
 -3| ****
 -2| ****
 -1| ****
  0| ****
  1| ****
  2| ****
  3| ****
  4| ***
  5| ****
  6| ***
  7| ***
  8| ***
  9| ***
 10| **
 11| **
 12| **
 13| **
 14| *
 15| *
 16| *
 17| *
 18| *
 19| *
 20| *


In the **uniform** case, the histogram bars for keys `1` through `10` should all have roughly
the **same height**, reflecting that each integer is equally likely.

In the **normal** case, the bars form a **bell-shaped curve**: values near `0` (the mean) are
much more frequent, and the frequencies taper off symmetrically as you move away from `0` in
either direction. This visually confirms the difference between uniform and normal distributions.


#### Problem 5 – Monte Carlo Estimation of $\pi$

We can estimate the value of $\pi$ using a simple Monte Carlo method.
Consider the unit square $[0, 1] \times [0, 1]$ and the quarter of a unit circle
$x^2 + y^2 \le 1$ inside it.

1. Implement a function  
   `estimate_pi(num_samples: int, seed: int | None = None) -> float` that:
   * optionally sets the seed,
   * generates `num_samples` points `(x, y)` with `x, y` drawn from `random.random()`,
   * counts how many points fall inside the quarter circle, i.e. satisfy `x*x + y*y <= 1`,
   * returns the estimate `4 * inside / num_samples`.

   Validate that `num_samples > 0`.

2. For `seed = 0`, compute and print the estimate and the absolute error `|estimate - math.pi|`
   for the following values of `num_samples`:

   * `1_000`
   * `10_000`
   * `100_000`

3. Observe how the error behaves as `num_samples` increases.


##### Solution – Problem 5


In [7]:
def estimate_pi(num_samples: int, seed: int | None = None) -> float:
    """Estimate pi using Monte Carlo sampling in the unit square.

    Args:
        num_samples: Number of random points to generate (must be positive).
        seed: Optional seed for reproducibility.

    Returns:
        A floating-point estimate of pi.
    """
    if num_samples <= 0:
        raise ValueError("num_samples must be a positive integer.")

    if seed is not None:
        random.seed(seed)

    inside = 0
    for _ in range(num_samples):
        x = random.random()
        y = random.random()
        if x * x + y * y <= 1.0:
            inside += 1

    return 4.0 * inside / num_samples


for samples in (1_000, 10_000, 100_000):
    est = estimate_pi(samples, seed=0)
    error = abs(est - math.pi)
    print(f"num_samples={samples:7d} -> estimate={est:.6f}, error={error:.6f}")


num_samples=   1000 -> estimate=3.128000, error=0.013593
num_samples=  10000 -> estimate=3.135200, error=0.006393
num_samples= 100000 -> estimate=3.148440, error=0.006847


As `num_samples` grows, the Monte Carlo estimate converges slowly but steadily towards the
true value of $\pi$, and the absolute error tends to decrease.


#### Problem 6 – Law of Large Numbers for Uniform(0, 1)

The **Law of Large Numbers** says that the empirical mean of i.i.d. random variables converges
to the true expected value as the number of samples grows.

For a `random.random()` sample, the theoretical expected value is `0.5`.

1. Implement a function  
   `running_means_uniform(n: int, seed: int | None = None) -> list[float]` that:

   * optionally sets the seed,
   * draws `n` samples from `random.random()`,
   * computes and stores the running (cumulative) mean after each sample, and
   * returns the list of running means of length `n`.

   Validate that `n > 0`.

2. With `seed = 0` and `n = 20_000`, compute the running means and print:

   * the first 5 running means,
   * the last 5 running means.

3. Comment on how the running mean behaves over time compared to the theoretical value `0.5`.


##### Solution – Problem 6


In [8]:
def running_means_uniform(n: int, seed: int | None = None) -> List[float]:
    """Return the running means of n samples from random.random().

    Args:
        n: Number of samples (must be positive).
        seed: Optional seed for reproducibility.

    Returns:
        A list of length n where the i-th element is the mean of the first i+1 samples.
    """
    if n <= 0:
        raise ValueError("n must be a positive integer.")

    if seed is not None:
        random.seed(seed)

    running_means: List[float] = []
    total = 0.0
    for i in range(1, n + 1):
        x = random.random()
        total += x
        running_means.append(total / i)

    return running_means


means = running_means_uniform(20_000, seed=0)
print("First 5 running means:", means[:5])
print("Last 5 running means: ", means[-5:])
print("Theoretical mean (expected value): 0.5")


First 5 running means: [0.8444218515250481, 0.8011881272326753, 0.6743159450987318, 0.5704661463972898, 0.5586278613915535]
Last 5 running means:  [0.5004552705966715, 0.5004339450922594, 0.5004286942515742, 0.5004191187107433, 0.5004317012192231]
Theoretical mean (expected value): 0.5


Typically, the first few running means may be quite far from `0.5`, but as the number of samples
increases, the running mean tends to **stabilize near 0.5**, providing an empirical illustration
        of the Law of Large Numbers.
