# Accelerating Python with pybind11

[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/Ziaeemehr/workshop_hpcpy/blob/main/notebooks/pybind11/pybind11_intro.ipynb)

## Introduction to pybind11

pybind11 is a lightweight header-only library that exposes C++ types in Python and vice versa, mainly to create Python bindings of existing C++ code. Its goals and syntax are similar to the excellent [Boost.Python](http://www.boost.org/doc/libs/1_63_0/libs/python/doc/html) library by David Abrahams: to minimize boilerplate code in traditional extension modules by inferring type information using compile-time introspection.

In the context of high-performance computing (HPC) with Python, pybind11 allows you to write performance-critical parts of your code in C++ and seamlessly integrate them into Python scripts. This is particularly useful when Python's interpreted nature becomes a bottleneck, but you still want to leverage Python's ease of use for the overall application logic.

### Why pybind11 for HPC?
- **Performance**: C++ code can be orders of magnitude faster than pure Python for compute-intensive tasks.
- **Ease of use**: Minimal boilerplate compared to other binding libraries.
- **Modern C++**: Supports C++11 and later features.
- **Type safety**: Compile-time type checking helps catch errors early.
- **Integration**: Works well with NumPy arrays and other Python libraries.

In this notebook, we'll demonstrate how to use pybind11 by creating a simple example: matrix multiplication. We'll compare the performance of a pure Python implementation, a NumPy-based version, and a C++ implementation bound to Python using pybind11.

## Installing pybind11

To use pybind11, you need to install it and have a C++ compiler available. Here's how to set it up:

### Using pip
```bash
pip install pybind11
```

### Using conda
```bash
conda install -c conda-forge pybind11
```

You'll also need a C++ compiler. On Linux, you can install g++:
```bash
sudo apt-get install g++
```

On macOS, install Xcode command line tools:
```bash
xcode-select --install
```

On Windows, install Visual Studio with C++ support.

For this notebook, we'll assume you have pybind11 installed. Let's verify:

In [1]:
import os
import sys

# Check if running on Google Colab
try:
    from google.colab import drive
    IN_COLAB = True
    print("Running on Google Colab")
except ImportError:
    IN_COLAB = False
    print("Running locally")

# Clone repository if on Colab and not already cloned
if IN_COLAB:
    if not os.path.exists('/content/workshop_hpcpy'):
        print("Cloning workshop_hpcpy repository...")
        os.system('git clone https://github.com/Ziaeemehr/workshop_hpcpy.git /content/workshop_hpcpy')
    
    # Change to profiling directory
    os.chdir('/content/workshop_hpcpy/notebooks/native-extentions/pybind11')
    print(f"Working directory: {os.getcwd()}")

Running locally


In [2]:
try:
    import pybind11
    print(f"pybind11 version: {pybind11.__version__}")
except:
    !pip install pybind11

# Check if we can import cmake (Python package for cmake integration)
try:
    import cmake
    print(f"cmake Python package available: {cmake.__version__}")
except ImportError:
    print("cmake Python package not available - installing...")
    !pip install cmake
    try:
        import cmake
        print(f"cmake Python package installed: {cmake.__version__}")
    except ImportError:
        print("Failed to install cmake Python package. The cmake binary at /usr/bin/cmake can still be used for manual compilation.")

pybind11 version: 3.0.1
cmake Python package available: 4.2.1


## Writing C++ Code for Binding

Let's create a simple C++ function that performs matrix multiplication. We'll write this in a separate file and then create Python bindings for it.

First, let's create the C++ source file:

In [3]:
%%writefile matrix_multiply.cpp
#include <pybind11/pybind11.h>
#include <pybind11/stl.h>
#include <vector>

namespace py = pybind11;

// Simple matrix multiplication function
std::vector<std::vector<double>> matrix_multiply(
    const std::vector<std::vector<double>>& A,
    const std::vector<std::vector<double>>& B) {
    
    size_t rows_A = A.size();
    size_t cols_A = A[0].size();
    size_t rows_B = B.size();
    size_t cols_B = B[0].size();
    
    if (cols_A != rows_B) {
        throw std::invalid_argument("Matrix dimensions don't match for multiplication");
    }
    
    std::vector<std::vector<double>> C(rows_A, std::vector<double>(cols_B, 0.0));
    
    for (size_t i = 0; i < rows_A; ++i) {
        for (size_t j = 0; j < cols_B; ++j) {
            for (size_t k = 0; k < cols_A; ++k) {
                C[i][j] += A[i][k] * B[k][j];
            }
        }
    }
    
    return C;
}

// Python module definition
PYBIND11_MODULE(matrix_multiply_module, m) {
    m.doc() = "Matrix multiplication module";
    m.def("matrix_multiply", &matrix_multiply, "Multiply two matrices");
}

Writing matrix_multiply.cpp


## Creating Python Bindings with pybind11

The C++ code above already includes the pybind11 bindings. The key parts are:

1. `#include <pybind11/pybind11.h>` - Main pybind11 header
2. `#include <pybind11/stl.h>` - For automatic conversion of STL types like std::vector
3. `PYBIND11_MODULE(module_name, m)` - Macro to define the Python module
4. `m.def("function_name", &cpp_function, "docstring")` - Expose C++ functions to Python

Now let's compile this into a Python extension module. We'll use setuptools for this:

In [4]:
%%writefile setup.py
from setuptools import setup, Extension
import pybind11

ext_modules = [
    Extension(
        'matrix_multiply_module',
        ['matrix_multiply.cpp'],
        include_dirs=[pybind11.get_include()],
        language='c++',
        extra_compile_args=['-std=c++11'],
    ),
]

setup(
    name='matrix_multiply_module',
    version='1.0',
    description='Matrix multiplication with pybind11',
    ext_modules=ext_modules,
    requires=['pybind11'],
)

Writing setup.py


In [5]:
!python setup.py build_ext --inplace

running build_ext
building 'matrix_multiply_module' extension
creating build/temp.linux-x86_64-cpython-311
g++ -pthread -B /home/ziaee/anaconda3/envs/hpc/compiler_compat -DNDEBUG -fwrapv -O2 -Wall -fPIC -O2 -isystem /home/ziaee/anaconda3/envs/hpc/include -fPIC -O2 -isystem /home/ziaee/anaconda3/envs/hpc/include -fPIC -I/home/ziaee/anaconda3/envs/hpc/lib/python3.11/site-packages/pybind11/include -I/home/ziaee/anaconda3/envs/hpc/include/python3.11 -c matrix_multiply.cpp -o build/temp.linux-x86_64-cpython-311/matrix_multiply.o -std=c++11
building 'matrix_multiply_module' extension
creating build/temp.linux-x86_64-cpython-311
g++ -pthread -B /home/ziaee/anaconda3/envs/hpc/compiler_compat -DNDEBUG -fwrapv -O2 -Wall -fPIC -O2 -isystem /home/ziaee/anaconda3/envs/hpc/include -fPIC -O2 -isystem /home/ziaee/anaconda3/envs/hpc/include -fPIC -I/home/ziaee/anaconda3/envs/hpc/lib/python3.11/site-packages/pybind11/include -I/home/ziaee/anaconda3/envs/hpc/include/python3.11 -c matrix_multiply.cpp -o b

In [6]:
import matrix_multiply_module as mmm

# Test the function
A = [[1, 2], [3, 4]]
B = [[5, 6], [7, 8]]

result = mmm.matrix_multiply(A, B)
print("Matrix A:")
for row in A:
    print(row)
print("\nMatrix B:")
for row in B:
    print(row)
print("\nResult A * B:")
for row in result:
    print(row)

Matrix A:
[1, 2]
[3, 4]

Matrix B:
[5, 6]
[7, 8]

Result A * B:
[19.0, 22.0]
[43.0, 50.0]


## Performance Comparison: Pure Python vs pybind11

Now let's compare the performance. First, let's implement matrix multiplication in pure Python:

In [7]:
def matrix_multiply_python(A, B):
    """Pure Python matrix multiplication"""
    rows_A = len(A)
    cols_A = len(A[0])
    rows_B = len(B)
    cols_B = len(B[0])
    
    if cols_A != rows_B:
        raise ValueError("Matrix dimensions don't match for multiplication")
    
    C = [[0.0 for _ in range(cols_B)] for _ in range(rows_A)]
    
    for i in range(rows_A):
        for j in range(cols_B):
            for k in range(cols_A):
                C[i][j] += A[i][k] * B[k][j]
    
    return C

In [8]:
import time
import numpy as np

# Create larger matrices for timing
size = 50
A = [[i + j for j in range(size)] for i in range(size)]
B = [[i - j for j in range(size)] for i in range(size)]

# Time pure Python
start = time.time()
result_python = matrix_multiply_python(A, B)
python_time = time.time() - start

# Time pybind11
start = time.time()
result_cpp = mmm.matrix_multiply(A, B)
cpp_time = time.time() - start

print(f"Pure Python time: {python_time:.4f} seconds")
print(f"pybind11 C++ time: {cpp_time:.4f} seconds")
print(f"Speedup: {python_time / cpp_time:.2f}x")

# Verify results are the same
def matrices_equal(A, B, tol=1e-10):
    if len(A) != len(B) or len(A[0]) != len(B[0]):
        return False
    for i in range(len(A)):
        for j in range(len(A[0])):
            if abs(A[i][j] - B[i][j]) > tol:
                return False
    return True

print(f"Results match: {matrices_equal(result_python, result_cpp)}")

Pure Python time: 0.0146 seconds
pybind11 C++ time: 0.0005 seconds
Speedup: 29.93x
Results match: True


## Performance Comparison: NumPy vs pybind11

NumPy is highly optimized and often provides excellent performance for numerical computations. Let's compare our pybind11 implementation with NumPy's matrix multiplication:

In [9]:
# Convert to NumPy arrays
A_np = np.array(A)
B_np = np.array(B)

# Time NumPy
start = time.time()
result_numpy = np.dot(A_np, B_np)
numpy_time = time.time() - start

print(f"NumPy time: {numpy_time:.4f} seconds")
print(f"pybind11 C++ time: {cpp_time:.4f} seconds")
print(f"pybind11 vs NumPy speedup: {numpy_time / cpp_time:.2f}x")

# Convert pybind11 result to numpy for comparison
result_cpp_np = np.array(result_cpp)
print(f"NumPy and pybind11 results match: {np.allclose(result_numpy, result_cpp_np)}")

# Summary
print("\nPerformance Summary:")
print(f"Pure Python: {python_time:.4f}s")
print(f"pybind11 C++: {cpp_time:.4f}s ({python_time/cpp_time:.1f}x speedup)")
print(f"NumPy:       {numpy_time:.4f}s ({python_time/numpy_time:.1f}x speedup)")

NumPy time: 0.0003 seconds
pybind11 C++ time: 0.0005 seconds
pybind11 vs NumPy speedup: 0.61x
NumPy and pybind11 results match: True

Performance Summary:
Pure Python: 0.0146s
pybind11 C++: 0.0005s (29.9x speedup)
NumPy:       0.0003s (49.4x speedup)


## Conclusion

In this notebook, we've demonstrated how to use pybind11 to create high-performance Python extensions written in C++. Key takeaways:

1. **pybind11** provides a clean, modern way to bind C++ code to Python with minimal boilerplate.

2. **Performance gains** can be significant compared to pure Python implementations.

3. **NumPy integration** is excellent, and pybind11 can often compete with or surpass NumPy's performance for custom algorithms.

4. **Ease of use**: The binding code is straightforward and type-safe.

pybind11 is particularly useful when:
- You have existing C++ code that you want to expose to Python
- You need maximum performance for specific algorithms
- You're building libraries that need to be fast and accessible from Python

For more complex projects, consider using tools like [scikit-build](https://scikit-build.readthedocs.io/en/latest/) or [CMake](https://cmake.org/) for managing the build process.

### Further Reading
- [pybind11 documentation](https://pybind11.readthedocs.io/)
- [C++ extensions for Python](https://docs.python.org/3/extending/extending.html)
- [Boost.Python](https://www.boost.org/doc/libs/1_75_0/libs/python/doc/html/index.html) (alternative binding library)

## Cleanup

Let's clean up the build artifacts and compiled files created during this tutorial:

In [10]:
import shutil
import glob

# Remove build directory
if os.path.exists('build'):
    shutil.rmtree('build')
    print("Removed: build/")

# Remove compiled extension modules
for ext in glob.glob('*.so') + glob.glob('*.pyd'):
    os.remove(ext)
    print(f"Removed: {ext}")

# Remove C++ source and setup files
files_to_remove = ['matrix_multiply.cpp', 'setup.py']
for file in files_to_remove:
    if os.path.exists(file):
        os.remove(file)
        print(f"Removed: {file}")

# Remove __pycache__ if it exists
if os.path.exists('__pycache__'):
    shutil.rmtree('__pycache__')
    print("Removed: __pycache__/")

print("\nCleanup complete!")

Removed: build/
Removed: matrix_multiply_module.cpython-311-x86_64-linux-gnu.so
Removed: matrix_multiply.cpp
Removed: setup.py

Cleanup complete!
