Skip to content

Mathdee/RandomWeightedSampler

Folders and files

NameName
Last commit message
Last commit date

Latest commit

ย 

History

2 Commits
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 

Repository files navigation

Random Weighted Sampler

A high-performance Python implementation of weighted random sampling using prefix sums and binary search.

๐ŸŽฏ Overview

This project demonstrates efficient weighted random sampling - a technique for selecting items from a collection where each item has a different probability of being chosen based on its assigned weight. Items with higher weights are more likely to be selected.

Problem Solved

  • Weighted Selection: Choose items with probabilities proportional to their weights
  • Efficiency: O(log n) time complexity per sample using binary search
  • Accuracy: Precise probability distribution matching theoretical expectations

๐Ÿš€ Features

  • Fast Sampling: O(log n) time complexity per sample
  • Memory Efficient: O(n) space complexity for initialization
  • Interactive Demo: Streamlit web app for experimentation
  • Visual Analysis: Matplotlib plots comparing expected vs observed frequencies
  • Comprehensive Testing: Unit tests with pytest
  • Easy to Use: Simple API with clear documentation

๐Ÿงฎ Algorithm

The implementation uses a prefix sum array combined with binary search:

  1. Initialization: Build a prefix sum array from the weights
  2. Sampling: Generate a random number and use binary search to find the corresponding item
  3. Efficiency: Binary search provides O(log n) lookup time

Visual Representation

Items: [("apple", 1), ("banana", 2), ("cherry", 7)]
Weights: [1, 2, 7]
Prefix Sums: [1, 3, 10]

Random number: 0.0 - 1.0 โ†’ apple
Random number: 1.0 - 3.0 โ†’ banana  
Random number: 3.0 - 10.0 โ†’ cherry

๐Ÿ“ฆ Installation

Prerequisites

  • Python 3.7+
  • pip

Setup

  1. Clone the repository
git clone <your-repo-url>
cd Week1
  1. Create virtual environment (Windows)
python -m venv .venv
.venv\Scripts\activate
pip install -r requirements.txt
  1. Create virtual environment (macOS/Linux)
python -m venv .venv
source .venv/bin/activate
pip install -r requirements.txt

๐Ÿ’ป Usage

Basic Example

from sampler import RandomWeightedSampler

# Define items with weights
items = [("apple", 1), ("banana", 2), ("cherry", 7)]

# Create sampler
sampler = RandomWeightedSampler(items)

# Sample single item
sample = sampler.sample()
print(f"Selected: {sample}")  # Output: cherry (most likely)

# Sample multiple items
samples = sampler.sample_multiple(1000)
print(f"Sample count: {len(samples)}")

Advanced Example

from collections import Counter
from sampler import RandomWeightedSampler

# More complex data
items = [
    ("rare_item", 1),
    ("common_item", 10),
    ("legendary_item", 0.1)
]

sampler = RandomWeightedSampler(items)
samples = sampler.sample_multiple(10000)

# Analyze results
counts = Counter(samples)
total = len(samples)

for item, count in counts.items():
    frequency = count / total
    print(f"{item}: {frequency:.3f} ({count}/{total})")

Expected Output:

common_item: 0.901 (9010/10000)
rare_item: 0.090 (900/10000)
legendary_item: 0.009 (90/10000)

๐Ÿงช Running the Code

Run Demo Plots

python notebooks/demo.py

This generates matplotlib plots comparing expected vs observed frequencies.

Run Interactive Demo

streamlit run demo/app.py

This launches a web interface where you can:

  • Add custom items and weights
  • Adjust sample size (input integer)
  • View real-time frequency analysis
  • See interactive bar charts

๐Ÿ“Š Performance

Operation Time Complexity Space Complexity
Initialization O(n) O(n)
Single Sample O(log n) O(1)
Multiple Samples O(k log n) O(1)

Where:

  • n = number of items
  • k = number of samples requested

๐ŸŽจ Demo Screenshots

Interactive Streamlit App

The web interface allows you to:

  • Enter custom items and weights
  • Adjust sample size with a slider
  • View frequency tables and charts
  • Compare expected vs observed probabilities

Matplotlib Visualization

The notebook demo shows side-by-side comparison of:

  • Expected probabilities (theoretical)
  • Observed frequencies (empirical)
  • Perfect alignment demonstrates algorithm accuracy

๐Ÿงช Testing

The project includes comprehensive unit tests:

pytest -q

Tests cover:

  • Basic functionality
  • Edge cases (single item, zero weights)
  • Probability distribution accuracy
  • Performance benchmarks

๐Ÿ“ Project Structure

Week1/
โ”œโ”€โ”€ sampler.py              # Main implementation
โ”œโ”€โ”€ demo/
โ”‚   โ””โ”€โ”€ app.py             # Streamlit web app
โ”œโ”€โ”€ notebooks/
โ”‚   โ””โ”€โ”€ demo.py            # Matplotlib visualization
โ”œโ”€โ”€ tests/
โ”‚   โ””โ”€โ”€ test_sampler.py    # Unit tests
โ”œโ”€โ”€ requirements.txt       # Dependencies
โ””โ”€โ”€ README.md             # This file

๐Ÿค Contributing

  1. Fork the repository
  2. Create a feature branch (git checkout -b feature/amazing-feature)
  3. Commit your changes (git commit -m 'Add amazing feature')
  4. Push to the branch (git push origin feature/amazing-feature)
  5. Open a Pull Request

๐Ÿ“ License

This project is open source and available under the MIT License.

๐Ÿ”ฎ Future Improvements

  • Additional sampling algorithms (alias method, reservoir sampling)
  • Performance benchmarks vs other libraries
  • Support for dynamic weight updates
  • Multi-threaded sampling
  • Integration with popular ML libraries

๐Ÿ“š References


Made with โค๏ธ for efficient random sampling

About

A high-performance Python implementation of weighted random sampling using prefix sums and binary search.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages