# Python - Source Code to Execution

---

This notebook is a hands-on tour of how CPython turns your .py file into running code, step by step. Starting from a small function definition, it shows how the interpreter:

1. Reads the source text and breaks it into tokens.
2. Parses those tokens into an Abstract Syntax Tree (AST) that represents the structure of your program.
3. Compiles the AST into a code object containing bytecode and metadata.
4. Disassembles that bytecode so you can see the exact instructions the Python virtual machine will execute.
5. Finally executes the code via exec and a normal function call, so you can link the low-level view (bytecode) back to the high-level behavior you expect.

By the end, you have a concrete picture of the core pipeline in CPython: source text -> tokens -> AST -> code object -> bytecode -> execution.




In [1]:
import ast
import dis
import inspect
import logging
import symtable
import tokenize
from io import BytesIO

logging.basicConfig(level=logging.INFO, format="%(message)s")
log = logging.getLogger(__name__)

## Source

Everything starts as plain text: the .py file you write or the string you pass to exec or eval. This is just a sequence of characters with no structure attached. At this point Python does not know what keywords are used, which are variables, or where statements begin and end; it only has bytes or Unicode code points. The goal of the first few stages of the pipeline is to progressively add structure to this raw text so that it can eventually be executed as a program.

In [2]:
source = """
def add(a, b):
    return a + b

x = add(2, 3)
"""

log.info("=== Source ===")
log.info(source)

=== Source ===

def add(a, b):
    return a + b

x = add(2, 3)



## Tokenisation

The tokenisation step takes the raw characters of your source code and groups them into tokens, which are the smallest meaningful units the parser understands. A token has a type (like NAME, NUMBER, STRING, OP, NEWLINE), the exact text, and position information (start and end in terms of line and column). For example, def add(a, b): is split into tokens such as NAME 'def', NAME 'add', OP '(', NAME 'a', OP ',', NAME 'b', OP ')', OP ':', plus newline and indentation tokens for the following line. This step also handles indentation and dedentation tokens, discards comments, and normalizes line endings, so that the next stage - the parser - works on a clean, structured stream of tokens instead of raw characters

In [3]:
log.info("=== Tokens ===")

for tok in tokenize.tokenize(BytesIO(source.encode("utf-8")).readline):
    # Skip the ENCODING token to keep output shorter
    if tok.type == tokenize.ENCODING:
        continue
    log.info("%r", tok)

=== Tokens ===
TokenInfo(type=65 (NL), string='\n', start=(1, 0), end=(1, 1), line='\n')
TokenInfo(type=1 (NAME), string='def', start=(2, 0), end=(2, 3), line='def add(a, b):\n')
TokenInfo(type=1 (NAME), string='add', start=(2, 4), end=(2, 7), line='def add(a, b):\n')
TokenInfo(type=55 (OP), string='(', start=(2, 7), end=(2, 8), line='def add(a, b):\n')
TokenInfo(type=1 (NAME), string='a', start=(2, 8), end=(2, 9), line='def add(a, b):\n')
TokenInfo(type=55 (OP), string=',', start=(2, 9), end=(2, 10), line='def add(a, b):\n')
TokenInfo(type=1 (NAME), string='b', start=(2, 11), end=(2, 12), line='def add(a, b):\n')
TokenInfo(type=55 (OP), string=')', start=(2, 12), end=(2, 13), line='def add(a, b):\n')
TokenInfo(type=55 (OP), string=':', start=(2, 13), end=(2, 14), line='def add(a, b):\n')
TokenInfo(type=4 (NEWLINE), string='\n', start=(2, 14), end=(2, 15), line='def add(a, b):\n')
TokenInfo(type=5 (INDENT), string='    ', start=(3, 0), end=(3, 4), line='    return a + b\n')
TokenInfo(t

## Abstract Syntax Tree (AST)

Once Python has a token stream, the parser applies the Python grammar to decide whether the sequence of tokens forms valid Python code and to build an Abstract Syntax Tree (AST). The AST is a tree of node objects such as Module, FunctionDef, Assign, Return, BinOp, Name, and Constant, where each node represents a syntactic construct in your program and has child nodes that capture its components. For the example def add(a, b): return a + b, the AST will contain a FunctionDef node with a name, an arguments node for a and b, and a body containing a Return node whose value is a BinOp representing a + b. This step gives Python a structural representation of your code, independent of formatting, ready for semantic analysis and compilation.

In [4]:
tree = ast.parse(source, filename="<input>", mode="exec")

log.info("=== AST ===")
log.info(ast.dump(tree, indent=2))

=== AST ===
Module(
  body=[
    FunctionDef(
      name='add',
      args=arguments(
        posonlyargs=[],
        args=[
          arg(arg='a'),
          arg(arg='b')],
        kwonlyargs=[],
        kw_defaults=[],
        defaults=[]),
      body=[
        Return(
          value=BinOp(
            left=Name(id='a', ctx=Load()),
            op=Add(),
            right=Name(id='b', ctx=Load())))],
      decorator_list=[],
      type_params=[]),
    Assign(
      targets=[
        Name(id='x', ctx=Store())],
      value=Call(
        func=Name(id='add', ctx=Load()),
        args=[
          Constant(value=2),
          Constant(value=3)],
        keywords=[]))],
  type_ignores=[])


## Symbol table and scope analysis

After building the AST, CPython performs a symbol table and scope analysis pass that figures out how each name is bound and where it lives. It walks the tree and, for each block (module, function, class, comprehension), it builds a table describing which names are local, which are global, which are nonlocal, and which are part of closures. This analysis decides, for example, that a and b are local variables inside add, while add and x are names in the module’s global scope. The result of this step tells the compiler whether to generate bytecode instructions that access fast locals (LOAD_FAST), globals (LOAD_GLOBAL), or closure variables (LOAD_DEREF), and it catches some errors around global and nonlocal usage before any bytecode is emitted.



In [5]:
st = symtable.symtable(source, "<input>", "exec")

log.info("=== Symbol table ===")
log.info("type: %s", st.get_type())
log.info("names: %s", st.get_identifiers())

for child in st.get_children():
    log.info("")
    log.info("Child block: %s  type: %s", child.get_name(), child.get_type())
    log.info("  locals: %s", child.get_identifiers())

=== Symbol table ===
type: module
names: dict_keys(['add', 'x'])

Child block: add  type: function
  locals: dict_keys(['a', 'b'])


## Compilation to bytecode

With the AST and symbol information in place, the CPython compiler walks the AST and emits bytecode instructions, packaging them into immutable code objects. A code object contains the bytecode itself (a sequence of opcodes and argument bytes), plus metadata such as the list of constants (co_consts), the list of names accessed (co_names), the local variable names (co_varnames), flags, and a mapping from bytecode offsets to source line numbers. For a module, this produces a module-level code object; for each function definition, the compiler also creates a separate function code object and stores it in the module’s constants so it can be turned into a function object at runtime. This step is where high-level constructs like return a + b are translated into low-level VM operations like “load local a”, “load local b”, “add top two stack values”, “return top of stack”.

In [6]:
code = compile(source, "<input>", "exec")

log.info("=== Code object (module) ===")
log.info("co_name     : %r", code.co_name)
log.info("co_varnames : %r", code.co_varnames)
log.info("co_names    : %r", code.co_names)
log.info("co_consts   : %r", code.co_consts)
log.info("co_flags    : %r", code.co_flags)

=== Code object (module) ===
co_name     : '<module>'
co_varnames : ()
co_names    : ('add', 'x')
co_consts   : (<code object add at 0x7a79a14139e0, file "<input>", line 2>, 2, 3, None)
co_flags    : 0


## Function and module objects

When Python actually executes a module’s code object, it does so in a new global namespace (a dictionary) and processes the bytecode sequentially. Top-level statements create variables, import modules, and define functions and classes. A def add(a, b): ... at the top level is compiled into bytecode that loads the function’s code object from the constants table, wraps it into a function object (attaching default arguments, closure cells if any, and a reference to the module’s globals), and then stores that function object under the name add in the module’s global dictionary. At this point, calling add(2, 3) is just a normal operation: the runtime looks up add in the globals, sees a function object, and prepares to execute its associated code object in a new frame.

In [None]:
globals_ns = {}
exec(code, globals_ns)  # noqa: S102

add = globals_ns["add"]
func_code = add.__code__

log.info("=== Function object ===")
log.info("add: %r", add)
log.info("add.__code__.co_name     : %r", func_code.co_name)
log.info("add.__code__.co_varnames : %r", func_code.co_varnames)
log.info("add.__code__.co_consts   : %r", func_code.co_consts)

log.info("")
log.info("=== Disassembly: function add ===")
dis.dis(func_code)

=== Function object ===
add: <function add at 0x7a79a1404b80>
add.__code__.co_name     : 'add'
add.__code__.co_varnames : ('a', 'b')
add.__code__.co_consts   : (None,)

=== Disassembly: function add ===


  2           0 RESUME                   0

  3           2 LOAD_FAST                0 (a)
              4 LOAD_FAST                1 (b)
              6 BINARY_OP                0 (+)
             10 RETURN_VALUE


## Frames and the evaluation loop

When you call a function like add(2, 3), CPython creates a frame object that represents one active execution context. A frame holds the code object being executed, the instruction pointer (where in the bytecode sequence it currently is), the evaluation stack, local variables, references to globals and builtins, and a link to the caller’s frame. The heart of the interpreter, the bytecode evaluation loop, repeatedly fetches the next opcode from the frame’s bytecode and executes its C implementation: it manipulates the value stack (pushing and popping Python objects), updates locals and globals, changes control flow (jumps, calls, returns), and handles exceptions. For add(a, b), the function frame will run through bytecode that loads the arguments from fast locals, performs the addition using a binary operation opcode, pushes the result onto the stack, and finally executes RETURN_VALUE to pass that result back to the caller. This loop continues across frames and calls until there is no more code to execute, at which point your program finishes.

In [8]:
def log_frame(prefix, frame):
    code_obj = frame.f_code
    log.info(
        "%s frame: func=%r, lineno=%d, locals=%r",
        prefix,
        code_obj.co_name,
        frame.f_lineno,
        sorted(frame.f_locals.keys()),
    )


def traced_add(a, b):
    # This frame is for traced_add
    frame_here = inspect.currentframe()
    log_frame("traced_add (before call)", frame_here)

    # Call the real add; this creates a new frame that runs add's bytecode
    result = add(a, b)

    # After add returns, only traced_add's frame is active again
    frame_here = inspect.currentframe()
    log_frame("traced_add (after call)", frame_here)

    return result


log.info("")
log.info("=== Calling traced_add(2, 3) ===")
value = traced_add(2, 3)
log.info("Result from traced_add: %r", value)


=== Calling traced_add(2, 3) ===
traced_add (before call) frame: func='traced_add', lineno=15, locals=['a', 'b', 'frame_here']
traced_add (after call) frame: func='traced_add', lineno=22, locals=['a', 'b', 'frame_here', 'result']
Result from traced_add: 5
