<center>
    <img src="https://upload.wikimedia.org/wikipedia/commons/a/a8/%D0%9B%D0%9E%D0%93%D0%9E_%D0%A8%D0%90%D0%94.png" width=500px/>
    <font>Python 2023</font><br/>
    <br/>
    <br/>
    <b style="font-size: 2em">Python Byte Code</b><br/>
    <br/>
    <font>Алексей Стыценко</font><br/>
</center>

Возьмём простую программу

In [11]:
program = """\
x = 1
y = 2
print(3*(x+y))
"""

In [12]:
exec(program)

9


Как интерпретатор её выполняет?

Шаг 1: Токенизация

In [13]:
import io
import tokenize

list(tokenize.generate_tokens(io.StringIO(program).readline))

[TokenInfo(type=1 (NAME), string='x', start=(1, 0), end=(1, 1), line='x = 1\n'),
 TokenInfo(type=54 (OP), string='=', start=(1, 2), end=(1, 3), line='x = 1\n'),
 TokenInfo(type=2 (NUMBER), string='1', start=(1, 4), end=(1, 5), line='x = 1\n'),
 TokenInfo(type=4 (NEWLINE), string='\n', start=(1, 5), end=(1, 6), line='x = 1\n'),
 TokenInfo(type=1 (NAME), string='y', start=(2, 0), end=(2, 1), line='y = 2\n'),
 TokenInfo(type=54 (OP), string='=', start=(2, 2), end=(2, 3), line='y = 2\n'),
 TokenInfo(type=2 (NUMBER), string='2', start=(2, 4), end=(2, 5), line='y = 2\n'),
 TokenInfo(type=4 (NEWLINE), string='\n', start=(2, 5), end=(2, 6), line='y = 2\n'),
 TokenInfo(type=1 (NAME), string='print', start=(3, 0), end=(3, 5), line='print(3*(x+y))\n'),
 TokenInfo(type=54 (OP), string='(', start=(3, 5), end=(3, 6), line='print(3*(x+y))\n'),
 TokenInfo(type=2 (NUMBER), string='3', start=(3, 6), end=(3, 7), line='print(3*(x+y))\n'),
 TokenInfo(type=54 (OP), string='*', start=(3, 7), end=(3, 8), line

Шаг 2: Построение дерева разбора

Грамматика языка: https://docs.python.org/3/reference/grammar.html

<div class="alert alert-warning">
На самом деле c версии 3.9 промежуточное дерево разбора не строится, строится сразу AST.

До версии 3.10 в стандартной библиотеке был модуль `parser`, с помощью которого можно было построить дерево разбора. Он устарел, и с версии 3.10 был удалён. На следующем слайде дерево разбора, построенное в версии 3.9.
</div>

Шаг 2: Построение дерева разбора (python 3.9)

In [18]:
import symbol
import token
import parser

# хелпер для вывода дерева разбора в читаемом виде
# https://realpython.com/cpython-source-code-guide/#lexing-and-parsing
def lex(expression):
    symbols = {v: k for k, v in symbol.__dict__.items() if isinstance(v, int)}
    tokens = {v: k for k, v in token.__dict__.items() if isinstance(v, int)}
    lexicon = {**symbols, **tokens}
    st = parser.expr(expression)
    st_list = parser.st2list(st)

    def replace(l: list):
        r = []
        for i in l:
            if isinstance(i, list):
                r.append(replace(i))
            else:
                if i in lexicon:
                    r.append(lexicon[i])
                else:
                    r.append(i)
        return r

    return replace(st_list)

Шаг 2: Построение дерева разбора (python 3.9)

In [19]:
lex('3*(x+y)')

['eval_input',
 ['testlist',
  ['test',
   ['or_test',
    ['and_test',
     ['not_test',
      ['comparison',
       ['expr',
        ['xor_expr',
         ['and_expr',
          ['shift_expr',
           ['arith_expr',
            ['term',
             ['factor', ['power', ['atom_expr', ['atom', ['NUMBER', '3']]]]],
             ['STAR', '*'],
             ['factor',
              ['power',
               ['atom_expr',
                ['atom',
                 ['LPAR', '('],
                 ['testlist_comp',
                  ['namedexpr_test',
                   ['test',
                    ['or_test',
                     ['and_test',
                      ['not_test',
                       ['comparison',
                        ['expr',
                         ['xor_expr',
                          ['and_expr',
                           ['shift_expr',
                            ['arith_expr',
                             ['term',
                              ['factor',
     

Шаг 3: Абстрактное синтаксическое дерево (AST)

In [14]:
import ast
print(ast.dump(ast.parse(program), indent=4))

Module(
    body=[
        Assign(
            targets=[
                Name(id='x', ctx=Store())],
            value=Constant(value=1)),
        Assign(
            targets=[
                Name(id='y', ctx=Store())],
            value=Constant(value=2)),
        Expr(
            value=Call(
                func=Name(id='print', ctx=Load()),
                args=[
                    BinOp(
                        left=Constant(value=3),
                        op=Mult(),
                        right=BinOp(
                            left=Name(id='x', ctx=Load()),
                            op=Add(),
                            right=Name(id='y', ctx=Load())))],
                keywords=[]))],
    type_ignores=[])


Шаг 4: Компиляция в байткод

In [24]:
code = compile(program, '<string>', 'exec')

In [25]:
code

<code object <module> at 0x10fe15980, file "<string>", line 1>

In [26]:
code.co_code

b'\x97\x00d\x00Z\x00d\x01Z\x01\x02\x00e\x02d\x02e\x00e\x01z\x00\x00\x00z\x05\x00\x00\xa6\x01\x00\x00\xab\x01\x00\x00\x00\x00\x00\x00\x00\x00\x01\x00d\x03S\x00'

In [27]:
len(code.co_code)

48

In [28]:
print(list(code.co_code))

[151, 0, 100, 0, 90, 0, 100, 1, 90, 1, 2, 0, 101, 2, 100, 2, 101, 0, 101, 1, 122, 0, 0, 0, 122, 5, 0, 0, 166, 1, 0, 0, 171, 1, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 100, 3, 83, 0]


In [51]:
import dis
dis.opname[100], dis.opname[90], dis.opname[122], dis.opname[171], dis.opname[0]

('LOAD_CONST', 'STORE_NAME', 'BINARY_OP', 'CALL', 'CACHE')

Байткод

In [52]:
dis.dis(program)

  0           0 RESUME                   0

  1           2 LOAD_CONST               0 (1)
              4 STORE_NAME               0 (x)

  2           6 LOAD_CONST               1 (2)
              8 STORE_NAME               1 (y)

  3          10 PUSH_NULL
             12 LOAD_NAME                2 (print)
             14 LOAD_CONST               2 (3)
             16 LOAD_NAME                0 (x)
             18 LOAD_NAME                1 (y)
             20 BINARY_OP                0 (+)
             24 BINARY_OP                5 (*)
             28 PRECALL                  1
             32 CALL                     1
             42 POP_TOP
             44 LOAD_CONST               3 (None)
             46 RETURN_VALUE


Как читать вывод dis: https://stackoverflow.com/a/47529318

Что делают инструкции: https://docs.python.org/3.11/library/dis.html#python-bytecode-instructions

Байткод

In [53]:
program = '''\
def foo(x, y):
    return 3*(x + y)
foo(1, 2)
'''
dis.dis(program)

  0           0 RESUME                   0

  1           2 LOAD_CONST               0 (<code object foo at 0x10f5a0440, file "<dis>", line 1>)
              4 MAKE_FUNCTION            0
              6 STORE_NAME               0 (foo)

  3           8 PUSH_NULL
             10 LOAD_NAME                0 (foo)
             12 LOAD_CONST               1 (1)
             14 LOAD_CONST               2 (2)
             16 PRECALL                  2
             20 CALL                     2
             30 POP_TOP
             32 LOAD_CONST               3 (None)
             34 RETURN_VALUE

Disassembly of <code object foo at 0x10f5a0440, file "<dis>", line 1>:
  1           0 RESUME                   0

  2           2 LOAD_CONST               1 (3)
              4 LOAD_FAST                0 (x)
              6 LOAD_FAST                1 (y)
              8 BINARY_OP                0 (+)
             12 BINARY_OP                5 (*)
             16 RETURN_VALUE


Байткод

In [54]:
list(dis.get_instructions(program))

[Instruction(opname='RESUME', opcode=151, arg=0, argval=0, argrepr='', offset=0, starts_line=0, is_jump_target=False, positions=Positions(lineno=0, end_lineno=1, col_offset=0, end_col_offset=0)),
 Instruction(opname='LOAD_CONST', opcode=100, arg=0, argval=<code object foo at 0x10f5a0440, file "<disassembly>", line 1>, argrepr='<code object foo at 0x10f5a0440, file "<disassembly>", line 1>', offset=2, starts_line=1, is_jump_target=False, positions=Positions(lineno=1, end_lineno=2, col_offset=0, end_col_offset=20)),
 Instruction(opname='MAKE_FUNCTION', opcode=132, arg=0, argval=0, argrepr='', offset=4, starts_line=None, is_jump_target=False, positions=Positions(lineno=1, end_lineno=2, col_offset=0, end_col_offset=20)),
 Instruction(opname='STORE_NAME', opcode=90, arg=0, argval='foo', argrepr='foo', offset=6, starts_line=None, is_jump_target=False, positions=Positions(lineno=1, end_lineno=2, col_offset=0, end_col_offset=20)),
 Instruction(opname='PUSH_NULL', opcode=2, arg=None, argval=Non

Ссылки

- https://leanpub.com/insidethepythonvirtualmachine/read
- https://realpython.com/cpython-source-code-guide/