# Python - Source Code to Execution

---

## Introduction

When we write Python in a notebook, it is tempting to imagine that the interpreter runs our text directly. In reality, CPython performs a series of internal transformations before anything is executed.

1. First, CPython parses the source into an **Abstract Syntax Tree (AST)**: a structured representation that captures the grammar of the program.
2. Next, CPython’s built-in **compiler** walks this tree and produces a **code object**. A code object is an in-memory container that holds the program’s bytecode and related metadata (constants, variable names, line numbers, and so on).
3. Finally, the **Python Virtual Machine** reads the bytecode instructions from the code object and executes them step by step.

The object we obtained with `compile(...)` is this in-memory code object. It is *not* a `.pyc` file; `.pyc` files are simply serialized caches of code objects that CPython may write to disk in `__pycache__` to avoid recompiling a module the next time it is imported.

In [18]:
import ast
import dis
import inspect
import marshal

## Parse into Abstract Syntax Tree

An AST (Abstract Syntax Tree) is a tree-shaped data structure that represents the syntactic structure of source code in a more structured, “understood by the compiler” form than raw text. Each node in the tree corresponds to a language construct (like a variable, operator, function call, or control statement), and parent–child relationships encode how those constructs are nested and combined.


The printed structure represents the AST for the `add` function: a tree-shaped model of the code that Python builds before compiling.

- The AST is a hierarchy of nodes like `Module`, `FunctionDef`, `arguments`, `Return`, `BinOp`, `Name`, and `Add`, each describing one syntactic construct in your source.
- For `def add(a, b): return a + b`, the tree encodes: a module containing a function named `add`, which takes two arguments `a` and `b`, and returns the result of a binary `Add` operation on `a` and `b`.

This tree is what the compiler walks to generate bytecode; each node type will turn into one or more bytecode instructions later in the pipeline.


In [20]:
def add(a, b):
    return a + b

source = inspect.getsource(add)
tree = ast.parse(source)
print(ast.dump(tree, indent=4))

Module(
    body=[
        FunctionDef(
            name='add',
            args=arguments(
                posonlyargs=[],
                args=[
                    arg(arg='a'),
                    arg(arg='b')],
                kwonlyargs=[],
                kw_defaults=[],
                defaults=[]),
            body=[
                Return(
                    value=BinOp(
                        left=Name(id='a', ctx=Load()),
                        op=Add(),
                        right=Name(id='b', ctx=Load())))],
            decorator_list=[],
            type_params=[])],
    type_ignores=[])


## Compile

In [27]:
code_obj = compile(tree, filename="<ast>", mode="exec")
code_obj.co_consts, code_obj.co_names, code_obj.co_varnames

((<code object add at 0x72fa3fc4f840, file "<ast>", line 1>, None),
 ('add',),
 ())

## Disassemble to bytecode for inspection

In this cell, the `dis.dis(add)` call shows the compiled bytecode for `add(a, b)`. 

Each line is a single instruction for the Python virtual machine, with an offset, an opcode name, and sometimes an argument. Reading from top to bottom, the VM loads the arguments a and b onto its internal stack, performs an addition, and then returns the result of that operation.

In [None]:
dis.dis(add)

  4           0 RESUME                   0

  5           2 LOAD_FAST                0 (a)
              4 LOAD_FAST                1 (b)
              6 BINARY_OP                0 (+)
             10 RETURN_VALUE


In [26]:
raw_code = marshal.dumps(compile("a = 1", "<string>", "exec"))
code_obj = marshal.loads(raw_code)
exec(code_obj)
print(a)

1
