This repository contains multiple implementations of the Billion Row Challenge - processing 1 billion weather measurements as fast as possible.
The challenge is to process a file containing approximately 1 billion weather station measurements, calculating the min, mean, and max temperature for each station.
Input format:
<station_name>;<temperature>
Output format:
{<station_name>=<min>/<mean>/<max>, ...}
The first optimized implementation uses memory mapping, parallel processing, and custom data structures to achieve good performance.
Performance:
- Processing time: ~15.6 seconds
- Throughput: ~64 million rows/second
- Memory usage: Efficient
Key optimizations:
- Memory-mapped file access
- Parallel processing with goroutines
- Custom hash table for station lookups
- Zero-allocation string parsing
The V2 implementation pushes optimization further with fixed-size buffers, integer-based temperature storage, and cache-friendly data structures.
Performance:
- Processing time: ~3.8 seconds
- Throughput: ~266 million rows/second
- Memory usage: Very efficient
Key optimizations:
- Integer-based temperature storage (int16 with implied decimal)
- Fixed-size buffers for station names
- Cache-line aligned data structures
- Larger batch sizes (256MB)
- Branchless temperature parsing
- Lock-free parallel processing
The V3 implementation takes a different approach to file splitting and processing.
Performance:
- Processing time: ~115 milliseconds
- Throughput: ~8,677 million rows/second
- Memory usage: Extremely efficient
Key optimizations:
- Optimized file splitting strategy
- Efficient memory management
- Streamlined processing pipeline
Comprehensive benchmarks were run with 5 iterations per implementation:
=== BENCHMARK RESULTS ===
Version: V1
Average Time: 15.577186316s
Median Time: 15.452524792s
Min Time: 15.312534916s
Max Time: 16.010921166s
Throughput: 64.20 million rows/second
Version: V2
Average Time: 3.757074466s
Median Time: 3.696428625s
Min Time: 3.58960625s
Max Time: 4.058022875s
Throughput: 266.16 million rows/second
Version: V3
Average Time: 115.239991ms
Median Time: 80.830541ms
Min Time: 67.155125ms
Max Time: 261.212125ms
Throughput: 8677.54 million rows/second
- V1 → V2: 4.15x
- V1 → V3: 135.17x
- V2 → V3: 32.60x
-
Implementation Strategy Matters: The dramatic performance difference between implementations shows how important algorithmic choices are. V3's approach is over 135x faster than V1.
-
Memory Efficiency: All implementations focus on memory efficiency, but V3 achieves the best results with minimal memory usage.
-
Batch Processing: Larger batch sizes in V2 significantly improved performance over V1, but V3's approach to file splitting proved even more effective.
-
Hardware Utilization: All implementations leverage multi-core processing, but V3 makes the most efficient use of available hardware.
-
Optimization Techniques: The progression from V1 to V3 demonstrates the value of:
- Cache-friendly data structures
- Minimizing allocations
- Integer-based calculations
- Efficient file I/O strategies
- Parallel processing
To run the benchmarks yourself:
# Make the benchmark script executable
chmod +x run_benchmark.sh
# Run with default 5 iterations
./run_benchmark.sh
# Or specify a custom number of iterations
./run_benchmark.sh 10Benchmark results will be displayed in the console and saved to benchmark_results.txt.
This project demonstrates how different implementation strategies and optimization techniques can dramatically impact performance when processing large datasets. The progression from V1 to V3 shows that there's always room for improvement when you're willing to rethink your approach and focus on the specific characteristics of the problem.