Skip to content

Comprehensive benchmark demonstrating Python 3.14's free-threading performance (PEP 703). Tests threading vs multiprocessing vs sequential execution across Python 3.8-3.14t. Achieves 2.83x speedup with no GIL on M4 MacBook Air.

Notifications You must be signed in to change notification settings

ArtInt-Research-Org/python314-free-threading

Repository files navigation

Python Free-Threading Performance Benchmark

Comprehensive benchmarking suite demonstrating Python 3.14's free-threading capability (PEP 703) through multi-threaded, multiprocessing, and sequential execution comparisons.

Overview

Python 3.14 introduces official support for free-threading builds, which remove the Global Interpreter Lock (GIL). This allows true parallel execution of CPU-bound threads, providing significant performance improvements for multi-threaded applications.

This project tests three execution modes across multiple Python versions:

  • Threading (with GIL) - Traditional Python threading (limited by GIL)
  • Threading (free-threaded) - Python 3.14t with GIL disabled (true parallelism)
  • Multiprocessing - Separate processes (no GIL, all versions)
  • Sequential - Single-threaded baseline for comparison

System Configuration

  • Hardware: MacBook Air 2025 with M4 chip
  • CPU Cores: 10 cores (10 physical, 10 logical)
  • OS: macOS (Darwin)
  • Workload: CPU-intensive sum of squares calculations using all 10 cores

Environments Created

The following conda environments have been created for benchmarking:

  1. py38 - Python 3.8.20 (baseline comparison)
  2. py311 - Python 3.11.13 (pre-free-threading stable release)
  3. py313 - Python 3.13.7 (first release with experimental free-threading)
  4. py314 - Python 3.14.0 (with GIL enabled)
  5. py314t - Python 3.14.0 free-threading build (GIL disabled)

Note: Python 2.7 and Python 3.0 environments could not be created as they are not available for ARM64 architecture (Apple Silicon).

Project Structure

python-free-threading/
├── README.md                  # This file
├── requirements.txt           # Python dependencies for the dashboard
├── benchmark.py               # Comprehensive benchmark script (threading/multiprocessing/sequential)
├── run_benchmarks.sh          # Shell script to run benchmarks across all environments
├── results.json               # Benchmark results in JSON format
├── dashboard.py               # Interactive Plotly Dash visualization
└── launch_dashboard.sh        # Script to launch the dashboard

Quick Start

1. Run Benchmarks

Run comprehensive benchmarks across all Python environments and execution modes:

bash run_benchmarks.sh

This will test all three modes (threading, multiprocessing, sequential) across all 5 Python environments.

2. View Results

Interactive Dashboard (Recommended)

Launch the interactive Plotly Dash dashboard:

bash launch_dashboard.sh

Then open your browser to: http://127.0.0.1:8050/

The dashboard displays:

  • Execution time comparison across all modes
  • Speedup factors relative to sequential baseline
  • System information and CPU core count
  • Side-by-side comparison of threading vs multiprocessing
  • Key findings and performance insights

View JSON Results

cat results.json | python -m json.tool

Benchmark Details

The benchmark (benchmark.py) performs the following:

Workload

  • Creates N concurrent threads/processes (N = CPU core count = 10)
  • Each worker calculates the sum of squares up to a different limit
  • Limits scale with worker index: worker i processes base_workload * (i + 1) iterations
  • Default base workload: 10,000,000 iterations
  • Total work is identical across all modes for fair comparison

Execution Modes

  1. Threading Mode

    • Uses Python's threading module
    • CPU-bound tasks executing in multiple threads
    • With GIL (Python ≤3.14): Threads serialize, minimal speedup
    • Free-threaded (Python 3.14t): True parallel execution, significant speedup
  2. Multiprocessing Mode

    • Uses Python's multiprocessing module
    • Separate processes with independent memory
    • No GIL limitations in any Python version
    • Process creation and IPC overhead included in timing
  3. Sequential Mode

    • Single-threaded baseline
    • Executes all work sequentially
    • Used to calculate speedup factors

Running Individual Modes

Test specific modes or configurations:

# Activate any environment
conda activate py314t

# Test all modes (default)
python benchmark.py

# Test specific mode
python benchmark.py --mode threading
python benchmark.py --mode multiprocessing
python benchmark.py --mode sequential

# Customize workload
python benchmark.py --workload-size 20000000

# Specify number of workers
python benchmark.py --threads 8

Expected Results

Based on CPU-bound workloads with 10 cores:

Mode Python ≤3.14 (GIL) Python 3.14t (Free) Multiprocessing (All)
Speedup vs Sequential ~1.0-1.2x ~8-10x ~7-9x
GIL Impact ❌ Limited by GIL ✅ No GIL ✅ No GIL
Memory Shared Shared Separate processes
Overhead Minimal Minimal Process creation

Why Free-Threading Should Be Faster

  1. True Parallelism: With GIL disabled, CPU-bound threads execute in parallel
  2. Shared Memory: Unlike multiprocessing, threads share memory space
  3. Lower Overhead: No process creation or inter-process communication overhead
  4. CPU Utilization: Can utilize all 10 CPU cores simultaneously

Installation

Prerequisites

  • macOS with ARM64 (Apple Silicon)
  • Miniconda/Anaconda
  • Python packages for dashboard (see requirements.txt)

Setup Conda Environments

# Create environments
conda create -n py38 python=3.8 -y
conda create -n py311 python=3.11 -y
conda create -n py313 python=3.13 -y
conda create -n py314 python=3.14 -c conda-forge -y
conda create -n py314t python-freethreading=3.14 -c conda-forge -y

Install Dashboard Dependencies

conda activate py311
pip install -r requirements.txt

About Free-Threading (PEP 703)

PEP 703 introduces optional support for running Python without the Global Interpreter Lock (GIL). Key points:

  • First introduced: Python 3.13 (experimental)
  • Official support: Python 3.14
  • Performance: Significant improvements for CPU-bound multi-threaded workloads
  • Compatibility: Requires free-threaded build (python-freethreading package)
  • Use cases: High-concurrency applications, scientific computing, data processing
  • Verification: Check with sys._is_gil_enabled() (should return False)

Checking GIL Status

import sys
print(f"GIL Enabled: {sys._is_gil_enabled()}")  # False for free-threaded builds

Troubleshooting

Performance Not as Expected?

  1. Verify free-threaded build:

    conda activate py314t
    python -c "import sys; print('GIL:', sys._is_gil_enabled())"

    Should print: GIL: False

  2. Check CPU cores:

    sysctl -n hw.ncpu

    Ensure workload matches core count

  3. Increase workload size:

    python benchmark.py --workload-size 50000000

    Larger workloads show clearer speedup differences

  4. Check for package compatibility: Some packages may re-enable the GIL. Run benchmarks with minimal imports.

References

Future Work

Potential extensions to this benchmark:

  1. Test with different workload types (I/O-bound, mixed)
  2. Benchmark with varying thread counts (1, 2, 4, 8, 16, 32)
  3. Test memory usage and overhead
  4. Compare with NumPy/native C extensions
  5. Evaluate impact on single-threaded performance
  6. Test real-world applications (web servers, data processing)

License

This project is for educational and benchmarking purposes.

About

Comprehensive benchmark demonstrating Python 3.14's free-threading performance (PEP 703). Tests threading vs multiprocessing vs sequential execution across Python 3.8-3.14t. Achieves 2.83x speedup with no GIL on M4 MacBook Air.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published