# Understanding Python's bytecode

 ## Bytecode <!-- TODO add this section on Obsidian -->


- Bytecode based languages such as [java, C#, python], Are languages that utilize an internal compilation stage that acts like an intermediate set of instruction for the interpter.

- Bytecode looks like high-level abstraction of assembly that is more readable.

- Bytecode is **NOT** considered machine code and CPU cannot understand the instructon on it directly.

## Python's bytecode

Python's bytecode is an intermediate language used by the Python Virtual Machine (PVM) to execute Python code efficiently. Here are some key points about Python's bytecode:

- Bytecode is a low-level, platform-independent representation of your source code. It is not binary machine code and cannot be run directly by the target machine.

- Bytecode is generated from Python source code through a process called compilation. In Python, the default implementation is [CPython]('https://wiki.python.org/moin/PythonImplementations?action=show&redirect=implementation), which compiles Python source code into bytecode


- Bytecode files have a .pyc extension and are stored in a folder named __pycache__. In Python 3, the bytecode files are stored in a folder named __pycache__ so it can eliminate the compilation stage after the first one. 👍

- The bytecode interpreter is part of the Python ecosystem and is responsible for executing the bytecode. 

- The CPython interpreter is an open-source implementation of the Python interpreter, and its implementation of the bytecode interpreter can be found on [GitHub]('https://github.com/python/cpython')


- Understanding Python bytecode can help you reason about your code and optimize its performance. It can also be useful for debugging and analyzing the execution of your code.


- The dis module in Python provides a disassembler for Python bytecode, which can be used to inspect and analyze the bytecode instructions.


In [29]:
import dis

def add_numbers(a: int, b: int) -> int:
    return a + b

dis.dis(add_numbers)

#  4           0 LOAD_FAST                0 (a)
#              2 LOAD_FAST                1 (b)
#              4 BINARY_ADD
#              6 RETURN_VALUE

print('---' * 30)

lambda_add_numbers = lambda a, b: a + b # yeilds the same bytecode as add_numbers

dis.dis(lambda_add_numbers)




  4           0 LOAD_FAST                0 (a)
              2 LOAD_FAST                1 (b)
              4 BINARY_ADD
              6 RETURN_VALUE
------------------------------------------------------------------------------------------
 15           0 LOAD_FAST                0 (a)
              2 LOAD_FAST                1 (b)
              4 BINARY_ADD
              6 RETURN_VALUE


## Disecting The Bytecode Table

Python bytecode is a stack-based programming language, which means that it operates by pushing data onto and popping it off the stack.



<p> This is a explaination of the bytecode table found above </p> 

| Line |Offset| Instruction | Argument | Explanation |
|------|---|-------------|----------|-------------|
| 4    |0| LOAD_FAST  | 0        | Load the value of the local variable `a` onto the stack |
| 2    |2| LOAD_FAST  | 1        | Load the value of the local variable `b` onto the stack |
| 4    |4| BINARY_ADD |          | Pop the top two values from the stack, add them, and push the result onto the stack |
| 6    |6| RETURN_VALUE |          | Return the top value from the stack as the result of the function |


Note: will go `in-depth` Later but here a summary.


- **Line**: Represents the line number in the source code where the bytecode instruction is located.

- **Offset**: Represents the offset of the bytecode instruction within the bytecode sequence.

- **Instruction**: Represents the bytecode instruction itself, which specifies the operation to be performed.

- **Argument**: Represents the argument associated with the bytecode instruction, if any.



#### *To truly understand how bytecode is constructed you need to check the \_\_code\_\_* Object

Lets take the add_numbers function from earlier 

In [14]:

for ele in dir(add_numbers.__code__):
    if not ele.startswith('_'):
        print(ele, end=' ')

co_argcount co_cellvars co_code co_consts co_filename co_firstlineno co_flags co_freevars co_kwonlyargcount co_lines co_linetable co_lnotab co_name co_names co_nlocals co_posonlyargcount co_stacksize co_varnames replace 

for all these attributes there are some that are some i want to focus on.

- `co_consts` Store constants which will be pushed onto our stack. But whats interesting is that `None` element!.

The reason for that because if the function doesn't explicitly have a return statement then it will return None, which why we need it on our `co_consts` ready. 👍

In [6]:
some_global_var = 'some_global_var'
def add_numbers1(a: int, b: int) -> int:
    the_consts = 'constant'
    another_var = some_global_var
    return a + b

add_numbers1.__code__.co_consts

(None, 'constant')

- `co_varnames` Store the variable name, which includes out params.

In [23]:
add_numbers1.__code__.co_varnames

('a', 'b', 'the_consts', 'another_var')

- `co_names` For non-local names

In [24]:
add_numbers1.__code__.co_names

('some_global_var',)

- `co_code` Finally main player here 😎. This is the actual bytecode in byte representation. python default into presents byte object in ascii.

In [25]:
add_numbers1.__code__.co_code

b'd\x01}\x02t\x00}\x03|\x00|\x01\x17\x00S\x00'

Lets explore `co_code` a lil further.

suppose we take the first byte char which looks like `d` it is actually in hexidecimal.

In [26]:
# lets look at the first byte code instruction
#first lets look at the ascii value of the byte code
ord('d')

100

In [7]:
import dis

# Now lets look at the disassembled byte code to find out the instruction

dis.opname[100]

'LOAD_CONST'

The first binary was the operation `LOAD_CONST` lets see if that is correct

In [8]:
dis.dis(add_numbers1)

  3           0 LOAD_CONST               1 ('constant')
              2 STORE_FAST               2 (the_consts)

  4           4 LOAD_GLOBAL              0 (some_global_var)
              6 STORE_FAST               3 (another_var)

  5           8 LOAD_FAST                0 (a)
             10 LOAD_FAST                1 (b)
             12 BINARY_ADD
             14 RETURN_VALUE


Surely It is ✨🎉✨

That means we can look that the instructions Like this as well using the Offset 😎. &darr;

In [11]:
dis.opname[add_numbers1.__code__.co_code[2]] # this is the instruction for LOAD_FAST ✨

'STORE_FAST'