Comprehensive benchmarking suite demonstrating Python 3.14's free-threading capability (PEP 703) through multi-threaded, multiprocessing, and sequential execution comparisons.
Python 3.14 introduces official support for free-threading builds, which remove the Global Interpreter Lock (GIL). This allows true parallel execution of CPU-bound threads, providing significant performance improvements for multi-threaded applications.
This project tests three execution modes across multiple Python versions:
- Threading (with GIL) - Traditional Python threading (limited by GIL)
- Threading (free-threaded) - Python 3.14t with GIL disabled (true parallelism)
- Multiprocessing - Separate processes (no GIL, all versions)
- Sequential - Single-threaded baseline for comparison
- Hardware: MacBook Air 2025 with M4 chip
- CPU Cores: 10 cores (10 physical, 10 logical)
- OS: macOS (Darwin)
- Workload: CPU-intensive sum of squares calculations using all 10 cores
The following conda environments have been created for benchmarking:
- py38 - Python 3.8.20 (baseline comparison)
- py311 - Python 3.11.13 (pre-free-threading stable release)
- py313 - Python 3.13.7 (first release with experimental free-threading)
- py314 - Python 3.14.0 (with GIL enabled)
- py314t - Python 3.14.0 free-threading build (GIL disabled)
Note: Python 2.7 and Python 3.0 environments could not be created as they are not available for ARM64 architecture (Apple Silicon).
python-free-threading/
├── README.md # This file
├── requirements.txt # Python dependencies for the dashboard
├── benchmark.py # Comprehensive benchmark script (threading/multiprocessing/sequential)
├── run_benchmarks.sh # Shell script to run benchmarks across all environments
├── results.json # Benchmark results in JSON format
├── dashboard.py # Interactive Plotly Dash visualization
└── launch_dashboard.sh # Script to launch the dashboard
Run comprehensive benchmarks across all Python environments and execution modes:
bash run_benchmarks.sh
This will test all three modes (threading, multiprocessing, sequential) across all 5 Python environments.
Launch the interactive Plotly Dash dashboard:
bash launch_dashboard.sh
Then open your browser to: http://127.0.0.1:8050/
The dashboard displays:
- Execution time comparison across all modes
- Speedup factors relative to sequential baseline
- System information and CPU core count
- Side-by-side comparison of threading vs multiprocessing
- Key findings and performance insights
cat results.json | python -m json.tool
The benchmark (benchmark.py
) performs the following:
- Creates N concurrent threads/processes (N = CPU core count = 10)
- Each worker calculates the sum of squares up to a different limit
- Limits scale with worker index: worker i processes
base_workload * (i + 1)
iterations - Default base workload: 10,000,000 iterations
- Total work is identical across all modes for fair comparison
-
Threading Mode
- Uses Python's
threading
module - CPU-bound tasks executing in multiple threads
- With GIL (Python ≤3.14): Threads serialize, minimal speedup
- Free-threaded (Python 3.14t): True parallel execution, significant speedup
- Uses Python's
-
Multiprocessing Mode
- Uses Python's
multiprocessing
module - Separate processes with independent memory
- No GIL limitations in any Python version
- Process creation and IPC overhead included in timing
- Uses Python's
-
Sequential Mode
- Single-threaded baseline
- Executes all work sequentially
- Used to calculate speedup factors
Test specific modes or configurations:
# Activate any environment
conda activate py314t
# Test all modes (default)
python benchmark.py
# Test specific mode
python benchmark.py --mode threading
python benchmark.py --mode multiprocessing
python benchmark.py --mode sequential
# Customize workload
python benchmark.py --workload-size 20000000
# Specify number of workers
python benchmark.py --threads 8
Based on CPU-bound workloads with 10 cores:
Mode | Python ≤3.14 (GIL) | Python 3.14t (Free) | Multiprocessing (All) |
---|---|---|---|
Speedup vs Sequential | ~1.0-1.2x | ~8-10x | ~7-9x |
GIL Impact | ❌ Limited by GIL | ✅ No GIL | ✅ No GIL |
Memory | Shared | Shared | Separate processes |
Overhead | Minimal | Minimal | Process creation |
- True Parallelism: With GIL disabled, CPU-bound threads execute in parallel
- Shared Memory: Unlike multiprocessing, threads share memory space
- Lower Overhead: No process creation or inter-process communication overhead
- CPU Utilization: Can utilize all 10 CPU cores simultaneously
- macOS with ARM64 (Apple Silicon)
- Miniconda/Anaconda
- Python packages for dashboard (see requirements.txt)
# Create environments
conda create -n py38 python=3.8 -y
conda create -n py311 python=3.11 -y
conda create -n py313 python=3.13 -y
conda create -n py314 python=3.14 -c conda-forge -y
conda create -n py314t python-freethreading=3.14 -c conda-forge -y
conda activate py311
pip install -r requirements.txt
PEP 703 introduces optional support for running Python without the Global Interpreter Lock (GIL). Key points:
- First introduced: Python 3.13 (experimental)
- Official support: Python 3.14
- Performance: Significant improvements for CPU-bound multi-threaded workloads
- Compatibility: Requires free-threaded build (
python-freethreading
package) - Use cases: High-concurrency applications, scientific computing, data processing
- Verification: Check with
sys._is_gil_enabled()
(should returnFalse
)
import sys
print(f"GIL Enabled: {sys._is_gil_enabled()}") # False for free-threaded builds
-
Verify free-threaded build:
conda activate py314t python -c "import sys; print('GIL:', sys._is_gil_enabled())"
Should print:
GIL: False
-
Check CPU cores:
sysctl -n hw.ncpu
Ensure workload matches core count
-
Increase workload size:
python benchmark.py --workload-size 50000000
Larger workloads show clearer speedup differences
-
Check for package compatibility: Some packages may re-enable the GIL. Run benchmarks with minimal imports.
- PEP 703 – Making the Global Interpreter Lock Optional
- Python 3.14 Release Notes
- Free-Threading Python Documentation
- Python Threading Module
- Python Multiprocessing Module
Potential extensions to this benchmark:
- Test with different workload types (I/O-bound, mixed)
- Benchmark with varying thread counts (1, 2, 4, 8, 16, 32)
- Test memory usage and overhead
- Compare with NumPy/native C extensions
- Evaluate impact on single-threaded performance
- Test real-world applications (web servers, data processing)
This project is for educational and benchmarking purposes.