## How is Python Implemented? :Lab


### The stack

(for how the C stack works, see http://duartes.org/gustavo/blog/post/journey-to-the-stack/ and http://duartes.org/gustavo/blog/post/epilogues-canaries-buffer-overflows/ )

![](http://aosabook.org/en/500L/interpreter-images/interpreter-callstack.png)

There are three stacks alive during the running of a python program. Since we run on a virtual machine, the call stack and stack frames are dependent on the virtual machine, rather than the real machine your code runs on. This is the critical difference between `stupidlang` and what we are doing now.

- the first is the **call stack**. This is the stack of environments you are familiar with. Often its not explicitly represented as a stack, but a recursive lookup of environments. Or, as in the C case, offsets into memory.
- the second is the **data stack or the value stack**. There is one of these per environment frame, and is used to run code in the context of that environment. This is where data-manupulating opcodes like `BINARY_ADD` run, in conjunction with namespace related opcodes such as `STORE_FAST` and `LOAD_FAST`, seen above.
- there is a third stack to handle compund statements: statements that contain other statements. This stack is known as the **block stack**.


In [1]:
def print_code(c):
    for x in dir(c):
        if x.startswith('co'):
            print(x, '=', getattr(c, x))

In [2]:
import dis

Columns:

1. line number
2. index into the bytecode string
3. instruction
4. argument to the instruction
5. what the argument means


In [3]:
def f(x):
    a = 1
    return a + x

In [4]:
dis.dis(f)

  2           0 LOAD_CONST               1 (1)
              3 STORE_FAST               1 (a)

  3           6 LOAD_FAST                1 (a)
              9 LOAD_FAST                0 (x)
             12 BINARY_ADD
             13 RETURN_VALUE


In [5]:
dis.show_code(f)

Name:              f
Filename:          <ipython-input-3-b301c67cd96f>
Argument count:    1
Kw-only arguments: 0
Number of locals:  2
Stack size:        2
Flags:             OPTIMIZED, NEWLOCALS, NOFREE
Constants:
   0: None
   1: 1
Variable names:
   0: x
   1: a


In [6]:

#from https://bitbucket.org/yaniv_aknin/pynards/src/c4b61c7a1798766affb49bfba86e485012af6d16/common/blog.py?at=default&fileviewer=file-view-default
import dis
import types

def get_code_object(obj, compilation_mode="exec"):
    if isinstance(obj, types.CodeType):
        return obj
    elif isinstance(obj, types.FrameType):
        return obj.f_code
    elif isinstance(obj, types.FunctionType):
        return obj.__code__
    elif isinstance(obj, str):
        try:
            return compile(obj, "<string>", compilation_mode)
        except SyntaxError as error:
            raise ValueError("syntax error in passed string") from error
    else:
        raise TypeError("get_code_object() can not handle '%s' objects" %
                        (type(obj).__name__,))

def diss(obj, mode="exec", recurse=False):
    _visit(obj, dis.dis, mode, recurse)

def ssc(obj, mode="exec", recurse=False):
    _visit(obj, dis.show_code, mode, recurse)

def _visit(obj, visitor, mode="exec", recurse=False):
    obj = get_code_object(obj, mode)
    visitor(obj)
    if recurse:
        for constant in obj.co_consts:
            if type(constant) is type(obj):
                print()
                print('recursing into %r:' % (constant,))
                _visit(constant, visitor, mode, recurse)


In [7]:
diss(f)

  2           0 LOAD_CONST               1 (1)
              3 STORE_FAST               1 (a)

  3           6 LOAD_FAST                1 (a)
              9 LOAD_FAST                0 (x)
             12 BINARY_ADD
             13 RETURN_VALUE


In [8]:
ssc(f)

Name:              f
Filename:          <ipython-input-3-b301c67cd96f>
Argument count:    1
Kw-only arguments: 0
Number of locals:  2
Stack size:        2
Flags:             OPTIMIZED, NEWLOCALS, NOFREE
Constants:
   0: None
   1: 1
Variable names:
   0: x
   1: a


Notice we have not talked anything about frames yet. Thats because, so far, we have only defined functions. Lets see what happens when we execute them.

### Binding lookup and Execution

Frame creation occurs in when a code object needs to be evaulated:

- when a function is called
- when a module is imported (top-level code is executed)
- when a class is defined
- every  command in the repl
- when eval or exec are used
- when the -c switch is used 

Let's go back to our frame structure in CPython:

```c

typedef struct _frame {
   PyObject_VAR_HEAD
   struct _frame *f_back;   /* previous frame, or NULL */
   PyCodeObject *f_code;    /* code segment */
   PyObject *f_builtins;    /* builtin symbol table */
   PyObject *f_globals;     /* global symbol table */
   PyObject *f_locals;      /* local symbol table */
   PyObject **f_valuestack; /* points after the last local */
   PyObject **f_stacktop;   /* current top of valuestack */
   PyObject *f_trace;       /* trace function */
 
   /* used for swapping generator exceptions */
   PyObject *f_exc_type, *f_exc_value, *f_exc_traceback;
 
   PyThreadState *f_tstate; /* call stack's thread state */
   int f_lasti;             /* last instruction if called */
   int f_lineno;            /* current line # (if tracing) */
   int f_iblock;            /* index in f_blockstack */
 
   /* for try and loop blocks */
   PyTryBlock f_blockstack[CO_MAXBLOCKS];
 
   /* dynamically: locals, free vars, cells and valuestack */
   PyObject *f_localsplus[1]; /* dynamic portion */
} PyFrameObject;
```

- `f_code` points to precisely one code object per frame. So when we have a call stack of frames, this corresponds to ca call stack of code objects.
- when python code is evaluated, it is done so in 3 namespaces corresponding to three symbol tables: `f_builtins`, `f_globals`, and `f_locals`. A name will first be resolved in the local scope, then in the global scope, and then in the builtin scope. For nested scopes like in closures, we'll first search the local scopes of the outer functions and only then go to the global and the builtin scope. This rule can be thought of as **LEGB**.
- a frame is a variable sized object as seen in `f_localsplus` 

## q1.

### `pdb` the python debugger

earlier we were inspecting frames manually. now lets use the python debugger to do this.

In [9]:
import pdb

In [10]:
def play(a,b):
    c = 1
    d = 5
    import pdb;pdb.set_trace()

    e = a+b
    def g():
        pdb.set_trace()
        return c +d +e
    return g

Run play in the debugger by calling `play(5,6)`, assigning it to a variable, and calling the function represented by the variable. Inspect the frames using `w` and `bt`, list with `l`. You can run arbitrary python code at the pdb prompt. `p var` will print a variable, `quit`will quit

In [11]:
#your code here
bob = play(5,6)


> <ipython-input-10-c42340cca819>(6)play()
-> e = a+b
(Pdb) bob = play(5,6)
(Pdb) bob
<function play.<locals>.g at 0x1040d7d90>
(Pdb) w
  /Users/christianjunge/anaconda/envs/py35/lib/python3.5/runpy.py(170)_run_module_as_main()
-> "__main__", mod_spec)
  /Users/christianjunge/anaconda/envs/py35/lib/python3.5/runpy.py(85)_run_code()
-> exec(code, run_globals)
  /Users/christianjunge/anaconda/envs/py35/lib/python3.5/site-packages/ipykernel/__main__.py(3)<module>()
-> app.launch_new_instance()
  /Users/christianjunge/anaconda/envs/py35/lib/python3.5/site-packages/traitlets/config/application.py(592)launch_instance()
-> app.start()
  /Users/christianjunge/anaconda/envs/py35/lib/python3.5/site-packages/ipykernel/kernelapp.py(403)start()
-> ioloop.IOLoop.instance().start()
  /Users/christianjunge/anaconda/envs/py35/lib/python3.5/site-packages/zmq/eventloop/ioloop.py(151)start()
-> super(ZMQIOLoop, self).start()
  /Users/christianjunge/anaconda/envs/py35/lib/python3.5/site-packages/tornado/io

BdbQuit: 

In [12]:
#your code here
bob

NameError: name 'bob' is not defined

## q2.

### A small implementation

Taken from byterun and edited some, here is an implementation of `Frame`:

In [21]:
class Frame(object):
    def __init__(self, code_obj, global_names, local_names, prev_frame):
        self.code_obj = code_obj
        self.f_globals = global_names
        self.f_locals = local_names
        self.f_back = prev_frame
        self.stack = []
        if prev_frame:
            self.f_builtins = prev_frame.f_builtins
        else:
            self.f_builtins = self.f_locals['__builtins__']
            if hasattr(self.f_builtins, '__dict__'):
                self.f_builtins = self.f_builtins.__dict__

        self.last_instruction = 0


And here is an implementation of `VirtualMachine` in which you will provide some methods: see below for "implement this"

In [22]:
class VirtualMachineError(Exception):
    pass

import operator
class VirtualMachine(object):
    def __init__(self):
        self.frames = []   # The call stack of frames.
        self.frame = None  # The current frame.
        self.return_value = None

    def run_code(self, code, global_names=None, local_names=None):
        """ An entry point to execute code using the virtual machine."""
        frame = self.make_frame(code, global_names=global_names, 
                                local_names=local_names)
        #print(vars(frame))
        return self.run_frame(frame)
        
    # Frame manipulation
    def make_frame(self, code, callargs={}, global_names=None, local_names=None):
        if global_names is not None and local_names is not None:
            local_names = global_names
        elif self.frames:
            global_names = self.frame.f_globals
            local_names = {}
        else:
            global_names = local_names = {
                '__builtins__': __builtins__,
                '__name__': '__main__',
                '__doc__': None,
                '__package__': None,
            }
        local_names.update(callargs)
        frame = Frame(code, global_names, local_names, self.frame)
        return frame

    def push_frame(self, frame):
        self.frames.append(frame)
        self.frame = frame

    def pop_frame(self):
        self.frames.pop()
        if self.frames:
            self.frame = self.frames[-1]
        else:
            self.frame = None
        
    # Data stack manipulation
    def top(self):
        return self.frame.stack[-1]

    def pop(self):
        return self.frame.stack.pop()

    def push(self, *vals):
        self.frame.stack.extend(vals)

    def popn(self, n):
        """Pop a number of values from the value stack.
        A list of `n` values is returned, the deepest value first.
        """
        if n:
            ret = self.frame.stack[-n:]
            self.frame.stack[-n:] = []
            return ret
        else:
            return []
        
    def parse_byte_and_args(self):
        f = self.frame
        opoffset = f.last_instruction
        byteCode = f.code_obj.co_code[opoffset]
        f.last_instruction += 1
        byte_name = dis.opname[byteCode]
        if byteCode >= dis.HAVE_ARGUMENT:
            # index into the bytecode
            arg = f.code_obj.co_code[f.last_instruction:f.last_instruction+2]  
            f.last_instruction += 2   # advance the instruction pointer
            arg_val = arg[0] + (arg[1] * 256)
            if byteCode in dis.hasconst:   # Look up a constant
                arg = f.code_obj.co_consts[arg_val]
            elif byteCode in dis.hasname:  # Look up a name
                arg = f.code_obj.co_names[arg_val]
            elif byteCode in dis.haslocal: # Look up a local name
                arg = f.code_obj.co_varnames[arg_val]
            else:
                arg = arg_val
            argument = [arg]
        else:
            argument = []

        return byte_name, argument
    
    def dispatch(self, byte_name, argument):
        """ Dispatch by bytename to the corresponding methods.
        Exceptions are caught and set on the virtual machine."""
        why=None
        bytecode_fn = getattr(self, 'byte_%s' % byte_name, None)
        if bytecode_fn is None:
            if byte_name.startswith('UNARY_'):
                self.unaryOperator(byte_name[6:])
            elif byte_name.startswith('BINARY_'):
                self.binaryOperator(byte_name[7:])
            else:
                raise VirtualMachineError(
                    "unsupported bytecode type: %s" % byte_name
                )
        else:
            why = bytecode_fn(*argument)

        return why

    def run_frame(self, frame):
        """Run a frame until it returns (somehow).
        Exceptions are raised, the return value is returned.
        """
        self.push_frame(frame)
        while True:
            byte_name, arguments = self.parse_byte_and_args()

            why = self.dispatch(byte_name, arguments)


            if why:
                break

        self.pop_frame()
        #print(">>",why, self.return_value)
        return self.return_value
    
## Stack manipulation

    def byte_LOAD_CONST(self, const):
        "implement this"
        ####################
        self.push(const)

    def byte_POP_TOP(self):
        "implement this"
        ####################
        self.pop()

    ## Names
    def byte_LOAD_NAME(self, name):
        frame = self.frame
        """implement whis with error
            raise NameError("name '%s' is not defined" % name)
        if the name is not found."""
        ####################
        if name in frame.f_locals:
            val= frame.f_locals[name]
        elif name in frame.f_globals:
            val = frame.f_globals[name]
        elif name in frame.f_builtins:
            val = frame.f_builtins[name]
        else:
            raise NameError("name '%s' is not defined" % name)
        self.push(val)

    def byte_STORE_NAME(self, name):
        self.frame.f_locals[name] = self.pop()

    def byte_LOAD_FAST(self, name):
        if name in self.frame.f_locals:
            val = self.frame.f_locals[name]
        else:
            raise UnboundLocalError(
                "local variable '%s' referenced before assignment" % name
            )
        self.push(val)

    def byte_STORE_FAST(self, name):
        "implement this"
        ################
        self.frame.f_locals[name] = self.pop()

        
    def byte_LOAD_GLOBAL(self, name):
        f = self.frame
        if name in f.f_globals:
            val = f.f_globals[name]
        elif name in f.f_builtins:
            val = f.f_builtins[name]
        else:
            raise NameError("global name '%s' is not defined" % name)
        self.push(val)
        
    ## Operators

    BINARY_OPERATORS = {
        'POWER':    pow,
        'MULTIPLY': operator.mul,
        'FLOOR_DIVIDE': operator.floordiv,
        'TRUE_DIVIDE':  operator.truediv,
        'MODULO':   operator.mod,
        'ADD':      operator.add,
        'SUBTRACT': operator.sub,
    }

    def binaryOperator(self, op):
        x, y = self.popn(2)
        self.push(self.BINARY_OPERATORS[op](x, y))

    ## Functions

    def byte_MAKE_FUNCTION(self, argc):
        name = self.pop()
        code = self.pop()
        defaults = self.popn(argc)
        globs = self.frame.f_globals
        fn = Function(name, code, globs, defaults, self)
        self.push(fn)

    def byte_CALL_FUNCTION(self, arg):
        lenKw, lenPos = divmod(arg, 256) # KWargs not supported here
        posargs = self.popn(lenPos)

        func = self.pop()
        frame = self.frame
        retval = func(*posargs)
        self.push(retval)

    def byte_RETURN_VALUE(self):
        self.return_value = self.pop()
        return "return"

In [28]:
import inspect
class Function(object):
    """
    Create a realistic function object, defining the things the interpreter expects.
    """


    def __init__(self, name, code, globs, defaults, vm):
        """You don't need to follow this closely to understand the interpreter."""
        self._vm = vm
        self.func_code = code
        self.func_name = self.__name__ = name or code.co_name
        self.func_defaults = tuple(defaults)
        self.func_globals = globs
        self.func_locals = self._vm.frame.f_locals
        self.__doc__ = code.co_consts[0] if code.co_consts else None
        # Sometimes, we need a real Python function.  This is for that.
        kw = {
            'argdefs': self.func_defaults,
        }
        self._func = types.FunctionType(code, globs, **kw)

    def __call__(self, *args, **kwargs):
        """When calling a Function, make a new frame and run it."""
        callargs = inspect.getcallargs(self._func, *args, **kwargs)
        # Use callargs to provide a mapping of arguments: values to pass into the new 
        # frame. Now create a frame and run it, returning 
        #self._vm.run_frame(frame)
        "implement this"
        frame = self._vm.make_frame(code=self.func_code, callargs=callargs, local_names=self.func_locals, global_names=self.func_globals)
        
        return self._vm.run_frame(frame)


In [29]:
call = """
def g():
    a=1
    b=2
    c=a+b
    return c
d = g()
print(d)
"""
code=compile(call.strip(), "", "exec")

In [30]:
call.strip()

'def g():\n    a=1\n    b=2\n    c=a+b\n    return c\nd = g()\nprint(d)'

In [31]:
diss(code, recurse=True)

  1           0 LOAD_CONST               0 (<code object g at 0x104355d20, file "", line 1>)
              3 LOAD_CONST               1 ('g')
              6 MAKE_FUNCTION            0
              9 STORE_NAME               0 (g)

  6          12 LOAD_NAME                0 (g)
             15 CALL_FUNCTION            0 (0 positional, 0 keyword pair)
             18 STORE_NAME               1 (d)

  7          21 LOAD_NAME                2 (print)
             24 LOAD_NAME                1 (d)
             27 CALL_FUNCTION            1 (1 positional, 0 keyword pair)
             30 POP_TOP
             31 LOAD_CONST               2 (None)
             34 RETURN_VALUE

recursing into <code object g at 0x104355d20, file "", line 1>:
  2           0 LOAD_CONST               1 (1)
              3 STORE_FAST               0 (a)

  3           6 LOAD_CONST               2 (2)
              9 STORE_FAST               1 (b)

  4          12 LOAD_FAST                0 (a)
             15 LOAD_

In [32]:
vm = VirtualMachine()
vm_value = vm.run_code(code)#should print 3

3


In [33]:
vm_value

In [34]:
call2="""
a=10
def g():
    return a+2
print(g())
"""
code=compile(call2, "", "exec")

In [35]:
diss(code, recurse=True)

  2           0 LOAD_CONST               0 (10)
              3 STORE_NAME               0 (a)

  3           6 LOAD_CONST               1 (<code object g at 0x1043bd300, file "", line 3>)
              9 LOAD_CONST               2 ('g')
             12 MAKE_FUNCTION            0
             15 STORE_NAME               1 (g)

  5          18 LOAD_NAME                2 (print)
             21 LOAD_NAME                1 (g)
             24 CALL_FUNCTION            0 (0 positional, 0 keyword pair)
             27 CALL_FUNCTION            1 (1 positional, 0 keyword pair)
             30 POP_TOP
             31 LOAD_CONST               3 (None)
             34 RETURN_VALUE

recursing into <code object g at 0x1043bd300, file "", line 3>:
  4           0 LOAD_GLOBAL              0 (a)
              3 LOAD_CONST               1 (2)
              6 BINARY_ADD
              7 RETURN_VALUE


In [36]:
vm = VirtualMachine()
vm_value = vm.run_code(code)#should print 12

12
