---
title: "Sampling From Categorical Distribution Using Uniform Distribution"
author: "Kirtan Gangani"
date: "July 14, 2025" 
categories: [Statistics, Probability]
format:
  html:
    toc: true
    code-fold: false
    code-copy: true
jupyter: python3
image: "./images/random-sampling.png"
---

# Introduction

Have you ever wondered how computer programs make "random" choices with specific probabilities? Whether it's deciding the weather in a simulation or picking a random item from a loot table in a game, the underlying mechanism often involves sampling from a categorical distribution.

A categorical distribution simply defines the probability of selecting each item from a finite set of categories. For example, if we were predicting the weather today in Gandhinagar, Gujarat, India, we might have the following (simplified) probabilities:

* Sunny: 65%
* Rainy: 35%
* Cloudy: 10%

So, how do we get a computer to make a "random" choice that respects these probabilities? The answer lies in the power of the uniform distribution.

# Understanding the Distributions

Before we dive into "how", let's quickly understand both the distributions involved:

## Categorical distribution

This describes the probabilities of a discrete random variable taking on one of a fixed set of categories (e.g., "Sunny", "Cloudy", "Rainy"). Each category has a specific probability, and these probabilities must sum to 1.

Example:

* Sunny: 0.65 probability
* Cloudy: 0.25 probabilty
* Rainy: 0.1 probability

## Uniform distributions

This is typically a continuous uniform distribution over the interval [0,1]. It means that any value between 0 and 1 is equally likely to be drawn.

# How does the sampling works?

How do we bridge the gap between this even spread of numbers and the uneven probabilities of our categories? The key is to divide our 0-to-1 range into segments, where the size of each segment corresponds to the probability of a category. To do this, we calculate the cumulative probability for each category.

## Creating the cumulative probability

This is simply the running total of the probabilities. For our Gandhinagar weather example:

* Sunny: 65% (Cumulative: 65%)  -> Interval: [0.00, 0.65)
* Rainy: 25% (Cumulative: 65% + 25% = 90%) -> Interval: [0.65, 0.90)
* Cloudy: 10% (Cumulative: 90% + 10% = 100%) -> Interval: [0.90, 1.00]

## Sampling process

Now, we generate a single random number between 0 and 1 from our uniform distribution. The interval in which this random number falls directly corresponds to the category we select.

* If our random number is between 0.00 and 0.65 (exclusive of 0.65), we choose "Sunny."
* If it's between 0.65 and 0.90 (exclusive of 0.90), we choose "Rainy."
* If it's between 0.90 and 1.00 (inclusive), we choose "Cloudy."

Since every number between 0 and 1 has an equal chance of being generated, the likelihood of our random number landing in a particular interval is directly proportional to the size (probability) of that interval.

# Code

Below is the Python code that combines all of these concepts. It first calculates the cumulative probabilities, then draws a single "random" number using random library (the uniform distribution), and finally maps that number to one of our weather samples.

In [101]:
import random

samples = ['Sunny', 'Rainy', 'Cloudy']
prob    = [0.65,     0.25,    0.1]

number, cumsum = 0, []

for i in prob:
    number+=i
    cumsum.append(number)

drawn_number = random.random()
print(f"Random number generated from to 1: {drawn_number: .2f}")
print(f"Cumulative distribution of Samples: {cumsum}")

print('-'*81)

for i, j in enumerate(cumsum):
    if drawn_number < j:
        print(f"The random number{drawn_number: .2f} falls between the range {cumsum[i-1]} to {cumsum[i]}, hence {samples[i]} is chosen")
        break

print('-'*81)


Random number generated from to 1:  0.74
Cumulative distribution of Samples: [0.65, 0.9, 1.0]
---------------------------------------------------------------------------------
The random number 0.74 falls between the range 0.65 to 0.9, hence Rainy is chosen
---------------------------------------------------------------------------------
