A high-performance Python implementation of weighted random sampling using prefix sums and binary search.
This project demonstrates efficient weighted random sampling - a technique for selecting items from a collection where each item has a different probability of being chosen based on its assigned weight. Items with higher weights are more likely to be selected.
- Weighted Selection: Choose items with probabilities proportional to their weights
- Efficiency: O(log n) time complexity per sample using binary search
- Accuracy: Precise probability distribution matching theoretical expectations
- Fast Sampling: O(log n) time complexity per sample
- Memory Efficient: O(n) space complexity for initialization
- Interactive Demo: Streamlit web app for experimentation
- Visual Analysis: Matplotlib plots comparing expected vs observed frequencies
- Comprehensive Testing: Unit tests with pytest
- Easy to Use: Simple API with clear documentation
The implementation uses a prefix sum array combined with binary search:
- Initialization: Build a prefix sum array from the weights
- Sampling: Generate a random number and use binary search to find the corresponding item
- Efficiency: Binary search provides O(log n) lookup time
Items: [("apple", 1), ("banana", 2), ("cherry", 7)]
Weights: [1, 2, 7]
Prefix Sums: [1, 3, 10]
Random number: 0.0 - 1.0 โ apple
Random number: 1.0 - 3.0 โ banana
Random number: 3.0 - 10.0 โ cherry
- Python 3.7+
- pip
- Clone the repository
git clone <your-repo-url>
cd Week1- Create virtual environment (Windows)
python -m venv .venv
.venv\Scripts\activate
pip install -r requirements.txt- Create virtual environment (macOS/Linux)
python -m venv .venv
source .venv/bin/activate
pip install -r requirements.txtfrom sampler import RandomWeightedSampler
# Define items with weights
items = [("apple", 1), ("banana", 2), ("cherry", 7)]
# Create sampler
sampler = RandomWeightedSampler(items)
# Sample single item
sample = sampler.sample()
print(f"Selected: {sample}") # Output: cherry (most likely)
# Sample multiple items
samples = sampler.sample_multiple(1000)
print(f"Sample count: {len(samples)}")from collections import Counter
from sampler import RandomWeightedSampler
# More complex data
items = [
("rare_item", 1),
("common_item", 10),
("legendary_item", 0.1)
]
sampler = RandomWeightedSampler(items)
samples = sampler.sample_multiple(10000)
# Analyze results
counts = Counter(samples)
total = len(samples)
for item, count in counts.items():
frequency = count / total
print(f"{item}: {frequency:.3f} ({count}/{total})")Expected Output:
common_item: 0.901 (9010/10000)
rare_item: 0.090 (900/10000)
legendary_item: 0.009 (90/10000)
python notebooks/demo.pyThis generates matplotlib plots comparing expected vs observed frequencies.
streamlit run demo/app.pyThis launches a web interface where you can:
- Add custom items and weights
- Adjust sample size (input integer)
- View real-time frequency analysis
- See interactive bar charts
| Operation | Time Complexity | Space Complexity |
|---|---|---|
| Initialization | O(n) | O(n) |
| Single Sample | O(log n) | O(1) |
| Multiple Samples | O(k log n) | O(1) |
Where:
- n = number of items
- k = number of samples requested
The web interface allows you to:
- Enter custom items and weights
- Adjust sample size with a slider
- View frequency tables and charts
- Compare expected vs observed probabilities
The notebook demo shows side-by-side comparison of:
- Expected probabilities (theoretical)
- Observed frequencies (empirical)
- Perfect alignment demonstrates algorithm accuracy
The project includes comprehensive unit tests:
pytest -qTests cover:
- Basic functionality
- Edge cases (single item, zero weights)
- Probability distribution accuracy
- Performance benchmarks
Week1/
โโโ sampler.py # Main implementation
โโโ demo/
โ โโโ app.py # Streamlit web app
โโโ notebooks/
โ โโโ demo.py # Matplotlib visualization
โโโ tests/
โ โโโ test_sampler.py # Unit tests
โโโ requirements.txt # Dependencies
โโโ README.md # This file
- Fork the repository
- Create a feature branch (
git checkout -b feature/amazing-feature) - Commit your changes (
git commit -m 'Add amazing feature') - Push to the branch (
git push origin feature/amazing-feature) - Open a Pull Request
This project is open source and available under the MIT License.
- Additional sampling algorithms (alias method, reservoir sampling)
- Performance benchmarks vs other libraries
- Support for dynamic weight updates
- Multi-threaded sampling
- Integration with popular ML libraries
Made with โค๏ธ for efficient random sampling