## Python Bytecode 字节码
字节码是 Python 源代码编译后生成的低级指令集（类似于汇编语言），由 Python 虚拟机执行. 
特点：
1. 存储在 .pyc 文件中（Python 缓存文件）。

2. 由一系列操作码（opcode）组成，如 LOAD_FAST、CALL_FUNCTION 等。

3. 可通过 dis 模块反汇编查看

In [10]:
def add(a, b):
    return a + b

print(add(2, 3))


5


In [11]:
import dis

dis.dis(add)

  2           0 LOAD_FAST                0 (a)
              2 LOAD_FAST                1 (b)
              4 BINARY_ADD
              6 RETURN_VALUE


In [12]:
dis.dis(add.__code__)

  2           0 LOAD_FAST                0 (a)
              2 LOAD_FAST                1 (b)
              4 BINARY_ADD
              6 RETURN_VALUE


In [23]:
def outer(aa):
    def inner():
        bb = 1
        return aa + bb + cc
    return inner

# 机器上运行的二进制
print(outer.__code__.co_code)

# 看汇编(人类可读)
dis.dis(outer.__code__)

b'\x87\x00f\x01d\x01d\x02\x84\x08}\x01|\x01S\x00'
  2           0 LOAD_CLOSURE             0 (aa)
              2 BUILD_TUPLE              1
              4 LOAD_CONST               1 (<code object inner at 0x7fa9a0a38a80, file "<ipython-input-23-3e90f112e3d3>", line 2>)
              6 LOAD_CONST               2 ('outer.<locals>.inner')
              8 MAKE_FUNCTION            8 (closure)
             10 STORE_FAST               1 (inner)

  5          12 LOAD_FAST                1 (inner)
             14 RETURN_VALUE

Disassembly of <code object inner at 0x7fa9a0a38a80, file "<ipython-input-23-3e90f112e3d3>", line 2>:
  3           0 LOAD_CONST               1 (1)
              2 STORE_FAST               0 (bb)

  4           4 LOAD_DEREF               0 (aa)
              6 LOAD_FAST                0 (bb)
              8 BINARY_ADD
             10 LOAD_GLOBAL              0 (cc)
             12 BINARY_ADD
             14 RETURN_VALUE


注意到，在python的字节码中, `aa`(在inner的enclosing cope) 、`bb`(在inner的local scope)和`cc`(在inner的global scope)分别使用了不同的不同的指令(OP_CODE)来将变俩个加载到堆栈里:
* aa: `LOAD_DEREF`
* bb: `LOAD_FAST`
* cc: `LOAD_GLOBAL`

REF: https://docs.python.org/zh-cn/3.13/library/dis.html#python-bytecode-instructions

PS:
* python的作用域(scope)遵循LEGB原则: Local -> Enclosing -> Global -> Bultin

In [24]:
source_code = """
def toy_example(a, b):
    x = a / (torch.abs(a) + 1)
    if b.sum() < 0:
        b = b * -1
    return x * b
"""

# co 意为 code object: 从源码到code object
# source_code -> parse tree generation -> AST generation -> bytecode generaton
# -> bytecode optimization -> code object generation -> code object execution
co = compile(source_code, "<string>", "exec")

print(f"type(co) = {type(co)}")
print(co)

# 字节码（二进制形式）
print(f"type(co.co_code) = {type(co.co_code)}")
print(co.co_code)

import dis
dis.dis(co)

type(co) = <class 'code'>
<code object <module> at 0x7fa9a0a39450, file "<string>", line 2>
type(co.co_code) = <class 'bytes'>
b'd\x00d\x01\x84\x00Z\x00d\x02S\x00'
  2           0 LOAD_CONST               0 (<code object toy_example at 0x7fa9a0a34df0, file "<string>", line 2>)
              2 LOAD_CONST               1 ('toy_example')
              4 MAKE_FUNCTION            0
              6 STORE_NAME               0 (toy_example)
              8 LOAD_CONST               2 (None)
             10 RETURN_VALUE

Disassembly of <code object toy_example at 0x7fa9a0a34df0, file "<string>", line 2>:
  3           0 LOAD_FAST                0 (a)
              2 LOAD_GLOBAL              0 (torch)
              4 LOAD_METHOD              1 (abs)
              6 LOAD_FAST                0 (a)
              8 CALL_METHOD              1
             10 LOAD_CONST               1 (1)
             12 BINARY_ADD
             14 BINARY_TRUE_DIVIDE
             16 STORE_FAST               2 (x)

  4 

## Python code object 代码对象
Code Object 是 Python 内部用于存储代码块（如函数、模块、类）的元数据和字节码的对象。可通过 function.__code__ 或 compile() 获取。

| 特性          | 字节码（Bytecode）                          | Code Object（代码对象）                     |
|---------------|-------------------------------------------|--------------------------------------------|
| **存储形式**  | 二进制指令（`bytes`）                      | Python 对象（`types.CodeType`）             |
| **包含内容**  | 纯操作码（opcode）                         | 字节码 + 元数据（变量名、常量等）            |
| **获取方式**  | `func.__code__.co_code`                   | `func.__code__` 或 `compile()`              |
| **用途**      | 直接由 Python 虚拟机（PVM）执行            | 存储完整代码块信息（字节码 + 执行环境）       |

REF: https://docs.python.org/zh-cn/3.13/library/inspect.html

In [26]:
def mul(a, b):
    return a * b

code = mul.__code__
print(code.co_argcount)
print(code.co_posonlyargcount)
print(code.co_kwonlyargcount)

2
0
0


In [None]:
# / 表示它之前的arguments都是positional-only arguments
def mul(a, b, /, *args, **kwargs):
    return a * b

code = mul.__code__
print(code.co_argcount)
print(code.co_posonlyargcount)
print(code.co_kwonlyargcount)

2
2
0


In [32]:
# * 表示它之后的arguments都是keyword arguments
def mul(a, *, b = 1, **kwargs):
    return a * b

code = mul.__code__
# co_argcount 表示除了* 、**以及keyword arguments之外的所有的参数数量
print(code.co_argcount)
print(code.co_posonlyargcount)
print(code.co_kwonlyargcount)

1
0
1


In [None]:
def g():
    d = {}
    def f():
        d["a"] = 1
    return f


def get_code_info(code):
    print(f"nlocals: {code.co_nlocals}")

    print(f"varnames: {code.co_varnames}")
    print(f"names: {code.co_names}")

    # cellvars: 这个variable会在其他scope用到
    # freezers: 这个variable来自其他scope
    print(f"cellvars: {code.co_cellvars}")
    print(f"freears: {code.co_freevars}")

    print(f"consts: {code.co_consts}")

# obj = g()
# obj is function f
# code = g().__code__
code = g().__code__
get_code_info(code)

print(f"===================")

code = g.__code__
get_code_info(code)




nlocals: 0
varnames: ()
names: ()
cellvars: ()
freears: ('d',)
consts: (None, 1, 'a')
nlocals: 1
varnames: ('f',)
names: ()
cellvars: ('d',)
freears: ()
consts: (None, <code object f at 0x7fa990989710, file "<ipython-input-42-ac2699e2254f>", line 3>, 'g.<locals>.f')


In [45]:
def g():
    d = {}
    def f():
        # 如果f里面没有使用d, 那么d就是一个varnames, 而不是一个cellvars
        # d["a"] = 1
        pass
    return f

def get_code_info(code):
    print(f"nlocals: {code.co_nlocals}")

    print(f"varnames: {code.co_varnames}")
    print(f"names: {code.co_names}")

    # cellvars: 这个variable会在其他scope用到
    # freezers: 这个variable来自其他scope
    print(f"cellvars: {code.co_cellvars}")
    print(f"freears: {code.co_freevars}")

    print(f"consts: {code.co_consts}")

code = g.__code__
get_code_info(code)

nlocals: 2
varnames: ('d', 'f')
names: ()
cellvars: ()
freears: ()
consts: (None, <code object f at 0x7fa9909897c0, file "<ipython-input-45-903b2902fdda>", line 3>, 'g.<locals>.f')


## Python frame
1. 每个函数只会编译出来一个code object, 它是immutable的
2. frame会记录函数执行的一些状态信息, 每次函数调用的状态可能都不太一样

获取了python frame后, 对其内容进行解析, 可以做很多有意思的事情


In [None]:
# 最好在运行.py文件, 不要在Jupyter Notebook允许这段代码
import inspect
from objprint import op

def f():
    frame = inspect.currentframe()

    op(
        frame,
        honor_existing=False,
        depth=2
    )

f()

[36m<frame 0x7fa95a760a70[39m
  [32m.f_back[39m = [36m<frame 0x7fa97a70b750[39m
    [32m.f_back[39m = [36m<frame 0x7fa95a761140[39m ... [36m>[39m,
    [32m.f_builtins[39m = { ... },
    [32m.f_code[39m = [36m<code 0x7fa9806fbf50[39m ... [36m>[39m,
    [32m.f_globals[39m = { ... },
    [32m.f_lasti[39m = 2,
    [32m.f_lineno[39m = 14,
    [32m.f_locals[39m = { ... },
    [32m.f_trace[39m = None,
    [32m.f_trace_lines[39m = True,
    [32m.f_trace_opcodes[39m = False
  [36m>[39m,
  [32m.f_builtins[39m = {
    'ArithmeticError': [36m<type 0x1031894f0[39m ... [36m>[39m,
    'AssertionError': [36m<type 0x103189348[39m ... [36m>[39m,
    'AttributeError': [36m<type 0x103185de8[39m ... [36m>[39m,
    'BaseException': [36m<type 0x103184370[39m ... [36m>[39m,
    'BlockingIOError': [36m<type 0x103185c40[39m ... [36m>[39m,
    'BrokenPipeError': [36m<type 0x1031862e0[39m ... [36m>[39m,
    'BufferError': [36m<type 0x10318a088[39m ..