<a href="https://colab.research.google.com/drive/1xp1hk0gnvFQgVD5Avvi7DqoMMlLAJptu?usp=sharing" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Compiling Python with `numba` and `cython`

Reproduce Python function from lecture and measure its execution time:

In [1]:
def loop(x, r):
    for i in range(r):
        x *= 2.5
    return x

%time loop(2, 10**6)

# %time calculates the time taken to run the code

CPU times: user 34.7 ms, sys: 3 ms, total: 37.7 ms
Wall time: 47.2 ms


inf

## Using `numba`

First, let's try compiling "Just in Time" using `numba`:

from numba import jit 

从 numba 库中导入 jit 装饰器（just-in-time 编译器）；

它可以用来**加速数值计算中的 Python函数**，使其像C一样高效；

适用于大量循环、数组操作等CPU密集型运算。

In [2]:
from numba import jit

# jit compiles when we call the function for the first time
# nopython tries to run without involving Python interpreter
@jit(nopython=True)
def loop_jit(x, r):
  for i in range(r):
    x *= 2.5
  return x

%time loop_jit(2, 10**6) # The first time includes compilation time

CPU times: user 236 ms, sys: 83.2 ms, total: 319 ms
Wall time: 1.22 s


inf

In [3]:
%time loop_jit(2, 10**6) # much faster after compilation

CPU times: user 2 ms, sys: 122 µs, total: 2.12 ms
Wall time: 2.33 ms


inf

In [3]:
%timeit loop_jit(2, 10**6) # %timeit runs the code multiple times and gives the average time

2.07 ms ± 20 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)


In [4]:
%timeit loop(3, 10**6) # better to time across multiple runs using `timeit`

30.2 ms ± 470 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)


In [None]:
%timeit loop_jit(3, 10**6)

1.67 ms ± 83.4 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)


We might want to **compile our code ahead of time**, though, so that we can see a speed-up the first time we use it. `numba` allows us to compile ahead of time like so:


CC的作用是将一个函数 loop_aot 预先编译成一个二进制模块（.so 或 .dll 文件），而不是在运行时即时（JIT）编译。

In [4]:
from numba.pycc import CC

# name of compiled module to create:
cc = CC('test_aot')
# 创建一个编译上下文，模块名称为 test_aot；
# 最终会生成一个名为 test_aot.so（Linux/macOS）或 test_aot.pyd（Windows）的共享库文件


# name of function in module, with explicit data types required (4byte=32bit ints and floats)
@cc.export('loop_aot', 'f4(f4,i4)')
def loop_aot(x, r):
    for i in range(r):
        x *= 2.5
    return x

cc.compile()



####  一、签名字符串 `'f4(f4, i4)'` 含义：

Numba 的 `@cc.export('func_name', 'signature')` 中，**函数签名（signature）格式如下**：

\[
\texttt{'<返回值类型>(<参数1类型>, <参数2类型>, ...)'}
\]

所以：

```python
@cc.export('loop_aot', 'f4(f4,i4)')
```

表示：

- `loop_aot` 是导出的函数名；
- `'f4(f4,i4)'` 的含义是：
  - `f4`：表示函数的**返回值类型是 float32**；
  - `(f4, i4)`：表示函数的**参数分别是 float32 和 int32**。

---

#### 二、常见类型缩写对照表：

| 缩写 | 类型        | 对应 Python 类型         |
|------|-------------|--------------------------|
| `i4` | int32       | `numpy.int32` / `int`    |
| `i8` | int64       | `numpy.int64` / `int`    |
| `f4` | float32     | `numpy.float32` / `float`|
| `f8` | float64     | `numpy.float64` / `float`|
| `b1` | boolean     | `bool`                   |
| `void` | 无返回值 | `None`                   |

---

#### 三、可以导出多个函数吗？

**当然可以！你可以用多个 `@cc.export` 来导出多个函数。**

例如：

```python
@cc.export('loop_aot', 'f4(f4,i4)')
def loop_aot(x, r):
    for i in range(r):
        x *= 2.5
    return x

@cc.export('square_add', 'f8(f8, f8)')
def square_add(a, b):
    return a**2 + b**2
```

然后编译：

```python
cc.compile()
```

就会在模块中同时生成 `loop_aot` 和 `square_add` 两个函数，可以在 Python 中这样使用：

```python
import test_aot

print(test_aot.loop_aot(1.0, 3))      # 输出：1.0 * 2.5^3
print(test_aot.square_add(3.0, 4.0))  # 输出：3² + 4² = 25.0
```

---

#### 四、如何写更复杂的类型？

Numba 也支持：

- 向量、数组参数（比如 `'void(f8[:])'` 表示传入一个 1D float64 数组）
- 多维数组、结构体、tuple（更复杂）

如果你想加速处理 `numpy` 向量数据，可以写：

```python
@cc.export('scale_array', 'void(f4[:], f4)')
def scale_array(arr, a):
    for i in range(len(arr)):
        arr[i] *= a
```

---



#### 1. **@cc.export 必须紧贴函数定义前一行写**

`@cc.export(...)` 是一个 **装饰器（decorator）**，必须**紧跟它所要装饰的函数定义**，否则 Python 解释器无法把它和对应函数关联起来。

```python
@cc.export('loop_aot', 'f4(f4,i4)')
def loop_aot(x, r):
    ...
```
错误示例（中间插入其他代码）：

```python
@cc.export('loop_aot', 'f4(f4,i4)')
print("hello")  # 错误：装饰器失效
def loop_aot(x, r):
    ...
```
---

####  2. **可以为同一个函数写多个 `@cc.export(...)` **

可以对**同一个函数定义导出多个不同签名（overload）**，只要你的函数能接受这些类型并正确工作。

```python
@cc.export('loop_aot', 'f4(f4,i4)')
@cc.export('loop_aot', 'f8(f8,i4)')  # 支持 float64 输入
def loop_aot(x, r):
    for i in range(r):
        x *= 2.5
    return x
```

这里的两个 `@cc.export` 都作用在 **同一个函数 `loop_aot` 上**，最终在模块中导出一个 `loop_aot`，支持两种类型输入：`float32` 和 `float64`。

你不能这样写：

```python
@cc.export('loop_aot', 'f4(f4,i4)')
@cc.export('another_name', 'f8(f8,i4)')
def loop_aot(x, r):
    ...
```

因为你不能用两个 export 给**不同名字**，但只写一个函数定义。要想给不同名字写 export，必须写不同函数名。

---


Note that we now have a compiled object file (.so) in our current directory. This is a compiled module that contains our function.

In [5]:
ls

1M_python_compilation.ipynb     [31mtest_aot.cpython-312-darwin.so[m[m*


To use our function, we just need to import our pre-compiled module, as we would any other Python module:

In [6]:
import test_aot
%time test_aot.loop_aot(2, 10**6) # first time running it is fast this time

CPU times: user 2.04 ms, sys: 8 µs, total: 2.05 ms
Wall time: 2.05 ms


inf

In [7]:
%timeit test_aot.loop_aot(2, 10**6) # same overall performance as before

2.05 ms ± 5.09 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)


## Using `cython`

Another common way to compile Python code (albeit slightly uglier) is to compile our function via explicit `cython` static typing, like so (here, using the IPython `cython` extension to compile):

In [8]:
%load_ext cython

In [10]:
import pkg_resources

package_name = "cython"

installed_packages = [pkg.key for pkg in pkg_resources.working_set]

if package_name in installed_packages:
    print(f"'{package_name}' 已安装")
else:
    print(f"'{package_name}' 未安装")


'cython' 已安装


In [12]:
!pip install cython
!pip install ipython
!pip install ipycython


[33mDEPRECATION: Loading egg at /opt/anaconda3/lib/python3.12/site-packages/matlabengine-24.2-py3.12.egg is deprecated. pip 24.3 will enforce this behaviour change. A possible replacement is to use pip for package installation.. Discussion can be found at https://github.com/pypa/pip/issues/12330[0m[33m
[0m[33mDEPRECATION: Loading egg at /opt/anaconda3/lib/python3.12/site-packages/matlab_kernel-0.17.1-py3.12.egg is deprecated. pip 24.3 will enforce this behaviour change. A possible replacement is to use pip for package installation.. Discussion can be found at https://github.com/pypa/pip/issues/12330[0m[33m
[33mDEPRECATION: Loading egg at /opt/anaconda3/lib/python3.12/site-packages/matlabengine-24.2-py3.12.egg is deprecated. pip 24.3 will enforce this behaviour change. A possible replacement is to use pip for package installation.. Discussion can be found at https://github.com/pypa/pip/issues/12330[0m[33m
[0m[33mDEPRECATION: Loading egg at /opt/anaconda3/lib/python3.12/site-

In [13]:
!pip install Cython
!pip install ipython


[33mDEPRECATION: Loading egg at /opt/anaconda3/lib/python3.12/site-packages/matlabengine-24.2-py3.12.egg is deprecated. pip 24.3 will enforce this behaviour change. A possible replacement is to use pip for package installation.. Discussion can be found at https://github.com/pypa/pip/issues/12330[0m[33m
[0m[33mDEPRECATION: Loading egg at /opt/anaconda3/lib/python3.12/site-packages/matlab_kernel-0.17.1-py3.12.egg is deprecated. pip 24.3 will enforce this behaviour change. A possible replacement is to use pip for package installation.. Discussion can be found at https://github.com/pypa/pip/issues/12330[0m[33m
[33mDEPRECATION: Loading egg at /opt/anaconda3/lib/python3.12/site-packages/matlabengine-24.2-py3.12.egg is deprecated. pip 24.3 will enforce this behaviour change. A possible replacement is to use pip for package installation.. Discussion can be found at https://github.com/pypa/pip/issues/12330[0m[33m
[0m[33mDEPRECATION: Loading egg at /opt/anaconda3/lib/python3.12/site-

In [21]:
%load_ext Cython


The Cython extension is already loaded. To reload it, use:
  %reload_ext Cython


In [20]:
%lsmagic


Available line magics:
%alias  %alias_magic  %autoawait  %autocall  %automagic  %autosave  %bookmark  %cat  %cd  %clear  %colors  %conda  %config  %connect_info  %cp  %debug  %dhist  %dirs  %doctest_mode  %ed  %edit  %env  %gui  %hist  %history  %killbgscripts  %ldir  %less  %lf  %lk  %ll  %load  %load_ext  %loadpy  %logoff  %logon  %logstart  %logstate  %logstop  %ls  %lsmagic  %lx  %macro  %magic  %man  %matplotlib  %mkdir  %more  %mv  %notebook  %page  %pastebin  %pdb  %pdef  %pdoc  %pfile  %pinfo  %pinfo2  %pip  %popd  %pprint  %precision  %prun  %psearch  %psource  %pushd  %pwd  %pycat  %pylab  %qtconsole  %quickref  %recall  %rehashx  %reload_ext  %rep  %rerun  %reset  %reset_selective  %rm  %rmdir  %run  %save  %sc  %set_env  %store  %sx  %system  %tb  %time  %timeit  %unalias  %unload_ext  %who  %who_ls  %whos  %xdel  %xmode

Available cell magics:
%%!  %%HTML  %%SVG  %%bash  %%capture  %%cython  %%cython_inline  %%cython_pyximport  %%debug  %%file  %%html  %%javascript  %%js  

In [29]:
# will automatically convert Python -> C -> Compiled machine code
%%cython

# explicitly add static types to function itself:
def loop_cython(float x, int r):
    cdef int i
    for i in range(r):
        x *= 2.5
    return x

SyntaxError: invalid syntax (1063832235.py, line 5)

In [None]:
%timeit loop_cython(2, 10**6) # comparable performance to numba

1.54 ms ± 25.7 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
