# Other Python things
 - Importing libraries
 - misc
 - C extensions
 
In this notebook I try to make clear some of the concepts I find useful for understanding the more advanced and other features of python that go beyond what you need to know but are very useful to develop software in python effectively.

Note: I enjoy doing this to teach myself, take mind off things, and maybe others may have less of a hard time or may use this to speed up their learning of python as a programming language so they don't have to spend as long as I have. This is not a portfolio as is true with all the content on my github. This is just a hobby and for learning.

# Importing libraries

In python, a

# Module: 

 - is any .py,.pyc,.so, or .pyd file
 
 i.e.
 pack.py

# Package:

 - is a file structure that contains a module or groups of modules and one initialization file always named \_\_init\_\_.py
 - packages can have sub packages that follow the same layout


pack/

├── \_\_init\_\_.py

└── pack.py

**multiple sub-packages are also possible:**

pack/

├── subpack/

    ├── __init__.py

    └── subpack.py

├── \_\_init\_\_.py

└── pack.py

# Name space package:

 - is a folder containing packages 
 - Note: doesn't require an \_\_init\_\_.py in its directory since all it does is create a shared name space for multiple packages and that's pretty much all it does

namespace/

├── pack1/

    ├── __init__.py

    └── pack1.py

├── pack2/

    ├── __init__.py

    └── pack2.py


# importing

There are two types of imports e.g. absolute and relative.

Typically you only use absolute imports e.g. you import a module either as a package or a script (if not installed these must be located in your current python programs directory or on a directory in sys.path).

Relative imports are only meant for packages. This occurs when a sub package imports a package or packages from packages that preceeds it within the entire package folder structure. To do so, it goes up 'levels' indicated by dots in the from statement that indicate which package name (parent directory) to go relative from using this packages name (directory/path) as a way to find the desired package to import.

i.e. for subpack.py to import pack.py from pack it needs to use either an absolute import e.g. import pack (assuming it's installed) or it can use a relative import since it's part of a package structure e.g. 

inside subpack.py
```python
from .. import pack
# or 
from ..pack import attr
```
(Note: one dot is the same directory, additional dots goes to however many parent directories above as there are dots after the first dot)


pack/

├── subpack/

    ├── __init__.py

    └── subpack.py

├── \_\_init\_\_.py

└── pack.py

Also, if wanting to dyanmically import modules that follow the regular import convention you can use:
```python
__import__("pack")
```

# How python imports work

Imports follow this procedure:

1. checks sys.modules to see if the module has already been imported
2. if it doesn't exist then it will find the modules file (if it's a module) or directory (if it's a package or name space package) in any of the available directories on sys.path or use the relative import if applicable.

If it does exist it will set the reference back to what is desired otherwise nothing else happens since modules are only allowed to be loaded once per session (unless you use importlib.reload then you can reload the module; deleting the module and reimporting just reassigns it therefore you'd have to reload the module to fully reload it).

3. if found python will exec the modules entire script and pass its attributes over into a new ModuleType object else it will raise ModuleNotFoundError

4. make sure the .pyc generated for its import is updated (.pyc files are cached precompiled python byte code to reduce the overhead of performing subsequent imports on rerunning your program).

5. a reference is assigned to the new ModuleType object formed in the main program.



From statements when used in combination with the import statement follows the exact same approach then references the attributes desired in program and then deletes the module object from the main program.



# Installing packages

Unless modules used are in the same directory as the main program, it requires access from different directories and therefore python requires that you install your modules.

There are practically two types of package installation e.g. standard installation and editable mode.

**Standard:**
 - you have to reinstall after any saved changes to update it
 - are made to be located in pythons site-packages directory

**Editable:**
 - you can make saved changes to your package and it will automatically update (though you'll have to reload the importing of the package again)
 - Can be located in any directory (within reason) because it generates a .egg-info folder (in its directory) that 'links' its directory to the site-packages (where a .egg file will be located) and adds its package directory to sys.path (so that it can be looked for as an absolute import from now on whenever python tries to import its module name)


Package installation is primarily to link the file to sys.path and create any additional metadata but can also be used to create different distributions of your package i.e. as a .whl (wheel) file.

# Module level attrs
Modules have their own special dunder methods that can be overwritten. For example the \_\_all\_\_ method can be used to overwrite what attributes get imported on ```from *desired package* import *``` and the module level getattr can overwrite its getattr (Note: the getattr doesn't take a self arguement since we're not instantiating a class but rather using the method globally within the module / at the module levels scope).

i.e.

```python
__all__=['method1','method2','attr1','attr2']

def __getattr__(key):
    ...
```

# \_\_name\_\_ =="\_\_main\_\_"
If you run python modules as scripts sometimes there are sections of code that are intended to be executed only when it's ran as a main program and other sections when it's used as a module. This is exactly what the clause is for e.g. to separate the two uses since they may need to run differently.

# Exec vs Eval

Exec is when you want to execute definitions into scopes or other variables (typically dictionaries)

```python
exec("def f(): print('hi')") # or exec("def f(): print('hi')",globals())
f()
# prints
# hi
```

Eval is when you want to return the python data/value representation that it would evaluate to.

```python
eval("(1,2,3)")
# returns
# (1,2,3)
```

Note: exec returns None (but may change/create the value of a reference), whereas eval returns the evaluated expression.

# 0.1 + 0.2 != 0.3
When floating point numbers are calculated in some programming languages they are done so in binary representation

e.g. ```(-1)**sign+2**(exponent-bias)*(1+fraction)``` (IEEE 754 standard format; if interested here's a more in depth source: https://mathcenter.oxford.emory.edu/site/cs170/ieee754/). What can happen is rounding errors and is why the issue occurs.

To get around this issue use the Decimal function from the decimal builtin library that retains the appropriate precision expected though can be less memory efficient since you're using a string representation.

i.e.
```python
from decimal import Decimal

float(Decimal('0.1') + Decimal('0.2')) == 0.3 # True
```
Otherwise you'll need to use other methods to determine how such comparisons should be made.

# :=
The 'walrus' operator is used for inline assignment

i.e.
```python
ls=[1,2,3]
if (ls:=len(ls)) > 1:
    print(ls)
```

# super
This allows you to traverse the MRO (method resolution order) of a class to obtain parent class attributes. This is usually needed when you overwrite highly depended upon attributes such as typically \_\_getattribute\_\_ and \_\_getattr\_\_. Note: When you do an attribute lookup on a class you will be traversing the mro anyway (otherwise inheritance probably would have no effect).

Note: all classes in python inherit from object and therefore if wanting to use default methods directly without wanting to understand how super works sometimes you can use object.\_\_\*desired method\*\_\_ (assuming object has the method)

i.e.
```python
class test:
    def __getattr__(self,key):
        if key!="my_key":
            return super().__getattr__(key)
        return True
```

If overriding \_\_getattribute\_\_ and \_\_getattr\_\_ in jupyter notebook, something to be aware of is that jupyter notebook allows other methods to be used on your class that may effect your codes intended behavior (it's mostly for display).
```python
class test:
    def __getattribute__(self,key):
        print(key)
display(test())
# should print (if using jupyter notebook)
__class__
_ipython_canary_method_should_not_exist_
_ipython_display_
__class__
__class__
_ipython_canary_method_should_not_exist_
_repr_mimebundle_
__class__
__class__
__class__
__class__
_ipython_canary_method_should_not_exist_
_repr_html_
__class__
__class__
_ipython_canary_method_should_not_exist_
_repr_markdown_
__class__
__class__
_ipython_canary_method_should_not_exist_
_repr_svg_
__class__
__class__
_ipython_canary_method_should_not_exist_
_repr_png_
__class__
__class__
_ipython_canary_method_should_not_exist_
_repr_pdf_
__class__
__class__
_ipython_canary_method_should_not_exist_
_repr_jpeg_
__class__
__class__
_ipython_canary_method_should_not_exist_
_repr_latex_
__class__
__class__
_ipython_canary_method_should_not_exist_
_repr_json_
__class__
__class__
_ipython_canary_method_should_not_exist_
_repr_javascript_
__class__
<__main__.test at 0x1b9379b54d0>
```

Also, preferably use inspect.isclass when type checking for classes since isinstance(type(cls),type) fails for metaclasses (because a metaclass instantiates a class (type) so you'd get an instance of a class rather than a type)

To see what classes are inherited you can use \_\_bases\_\_ and there's also a \_\_subclasses\_\_ attribute. The MRO is found via the \_\_mro\_\_ attribute



# Function args

when writing function inputs on definition we always format it as follows:

```python
def f(args_only,/,*,kwargs_only) # formatting you should refer to conceptually
# or 
def f(args_only,*,kwargs_only)
```
and by default \*args and \*\*kwargs create or are a part of this ordering/formatting:
```python
def f(args_only,/,*,kwargs_only,**kwargs)
# or
def f(args_only,*args,kwargs_only,**kwargs)
```
Notice that we use the '/' and '*' symbols for ordering/formatting purposes. Anything not needed in your function definition ,i.e. if you were only needing key word only, then you can simply remove anything that is not needed from the i.e. first form so therefore you'd have i.e.:
```python
def f(*,k1,k2)
```
Forcing arguements to be passed by key word e.g.
```python
f(k1=1,k2=2)
```
By default arguements can be positional or key word (but the positionals still retain their order e.g. only key words can skip args) but if wanting positional only then we again remove what we don't need i.e.:
```python
f(k1,k2,/)
```
Forcing arguements to be passed by position e.g.
```python
f(1,2)
```
If there's uncertainty or flexibility around what or how many expected inputs (typically you might use this on testing new classes/bound methods/functions) you can also use the \*args,\*\*kwargs format allowing any positional args followed by any key word args i.e.:
```python
f(*args,**kwargs)
```
i.e.
```python
f(1,2,3,k1=1,k2=2,k3=3)
```
Note: one thing that should be noted is the use of \*, and \*\*. In python this is used for unpacking (tuples and dictionaries respectively). Therefore, we can utilize this to pass in multiple args via a tuple or dictionary expression i.e.:
```python
f(*(1,2,3),**{'k1': 0, 'k2': 1, 'k3': 2})
# or
f(*tuple(range(3)),**dict(zip(("k1","k2","k3"),range(3))))
```
Note: this will also work for the positional only and key word only args respectively as well.

# yield

yield is a statement that allows a function to return values while still being capable of returning to its position in its scope to return more subsequent values.
```python
def gen():
    yield 1
    yield 2
    yield 3

for i in gen(): print(i)

# should print:
# 1
# 2
# 3
```
You can also yield from another generator
```python
def gen():
    yield from range(1,4)

for i in gen(): print(i)

# should print:
# 1
# 2
# 3
```
# why yield?
What yield allows us to do is leave the function and preserve the position in the function where it left. Typically we use yield for context managers because we may need to temporarily change the context code is used in i.e. for a different path directory, a thread using a lock to use a resource, for clean up code such as with a try-except-finally clause; all need to return more than once for different processes when changing contexts. When we change contexts essentially we are using a yield with enter and exit code being executed before and after the context change respectively.

If we want to build a contextmanager there are two main ways to accomplish this.

1. using a decorator and yield 
```python
from contextlib import contextmanager

@contextmanager
def context():
    # entry code...
    yield
    # exit code...
```
2. using special/dunder/magic methods if using a class
```python
class context:
    def __enter__(self):
        # entry code ...

    def __exit__(self,*args,**kwargs):
        # exit code ...

```
Then, we can simply use a with statement to declare that we are using a contextmanager e.g.:

```python
with context():
    # do something
```
So we can use yield to create are own generator which can allow temporary contexts.

In further use cases we can also specify other kinds of context managers for asynchronous context managers or class generators and we can use it to create a coroutine as well by using yield as a value. But yield is function specific e.g. like returns are, they cannot be used outside a function.

Something to be aware of when using yield is the following:
```python
def gen():
    return
    yield
gen() # should return a generator
```
If you yield anywhere after a statement such as a return statement python still thinks you're yielding and because of this you
get a return after return functionality like a finally statement in a try-except-finally clause for cleanup behavior typically.

To avoid this you can use an after execution decorator if desired to move one iteration forwards.

i.e.
```python
from my_pack import wrap

@wrap.inplace(next)
def gen():
    return
    yield
```
Note: the example function used is not the same as:
```python
def gen():
    return (yield) # return an empty generator
```

# Frames

Functions are reusable code sections but they are also objects in python. This usually means they have attributes we can access and look at (assuming it has a \_\_dir\_\_ method or otherwise \_\_dict\_\_ attribute):
```python
dir(f)
```
However, functions, as is true with classes, utilize frame objects when called.

A frame object can be thought of as some section of memory specific to an object in the instance it's being used (called / in call). In python, you cannot explicitly separate from a function call create your own frames (not without finding some hacks or gaining a deeper understanding of how the C API works) but you can utilize them from when they are used from function calls.

What this typically means in a programming language (though it depends on how the language works) is i.e. a function is a frame on a stack (stack frame). In python this can be conceptualized as the case and we can access frames using the inspect module e.g. inspect.getcurrentframe or inspect.stack. This allows you to trace functions across the stack but only from the current stack and on a scope within the stack to stack frames previous.

A FrameType object holds information about the code used to execute a function and other things such as information about the types of args (i.e. positional, keyword etc.) code context (a string representing the code exec'd), and some other details. 

Note: when using on a CLI the code context may not be recorded (maybe use readline instead to get its history).

If using generators or coroutines etc. you can also view their frames that should be visible after locating them from using dir() on these objects i.e. generators have a ```gi_frame``` attribute you can look through.

# byte code

Functions, generators, and some other python objects may allow you to access their byte code usually via \_\_code\_\_.co_code attribute or .gi_code.co_code, gi_frame.f_code.co_code, and others. Alternatively you can create your own code objects via the ```compile``` function i.e. as follows ```compile(**expression**,'','exec').co_code``` or by using the CodeType function from the types module.

When you have byte code you can traverse it as an array and The bytes in the byte code can be interpreted into opcode by using opmap from the opcode builtin library as follows:
```python
from opcode import opmap

codes=dict(zip(opmap.values(),opmap.keys()))

def dis(byte_code: bytes) -> None:
    """Dissassembles python byte code"""
    global codes
    for index,byte in enumerate(byte_code):
        try: print(index,byte,codes[byte])
        except: print(index,byte)
```
Python also has a function that does a better job at the same thing with optimizations and displays other objects byte code where available:

```python
import dis

dis.dis(code_obj)
```

If wanting the arguements rather than printing to display you can use:
```python
dis._unpack_opargs(code_obj)
```

# Hooks

Hooks can be thought of as a way of tracing what your code did when it ran or figuring out what other things were going on. Hooks can either be class attributes (this is more so a meta programming/ meta class context e.g. typically \_\_prepare\_\_,\_\_new\_\_,and \_\_init\_\_ methods on a metaclass are 'hooks') or sys hooks.

For example we can look at what the program is doing by adding an audit hook via the sys module.

```python
import sys

def auditer(event: str,*args) -> None: # should only be passing args and not kwargs
    print(args)

sys.addaudithook(auditer)
```
For more information about what events you can listen for see: https://docs.python.org/3/library/audit_events.html

Note: using a logging module is another method as a form of outputting information via some medium (i.e. print or write to a file)

# python is compiled?
yes this is true, however, it's not compiled any further beyond an intermediate representation (not machine code). Essentially, how python executes code is via the following (Note: it does more additionally to what will be shown; this is only introductory to what it's capable of):

0. send code
```python
'print("hello world")'
```
1. parse code into an ast (abstract syntax tree)
```python
import ast

ast.parse('print("hello world")')

# i.e. print(ast.dump(ast.parse('print("hello world")'),indent=4)) gives
Module(
    body=[
        Expr(
            value=Call(
                func=Name(id='print', ctx=Load()),
                args=[
                    Constant(value='hello world')],
                keywords=[]))],
    type_ignores=[])
```
2. compile ast to bytecode (intermediate language made of opcodes (operation codes) that represent the ast saved in a .pyc file in the current directory)
```python
code=compile(ast.parse('print("hello world")'),'','exec')
# how to view
import dis

dis.dis(code)
# should print
  0           0 RESUME                   0

  1           2 PUSH_NULL
              4 LOAD_NAME                0 (print)
              6 LOAD_CONST               0 ('hello world')
              8 PRECALL                  1
             12 CALL                     1
             22 POP_TOP
             24 LOAD_CONST               1 (None)
             26 RETURN_VALUE
```
3. byte code is directly exec'd by the python virtual machine
```python
exec(compile(ast.parse('print("hello world")'),'','exec'))
# should print
hello world
```

So python is an interpreted language, but in more detail, there is still some compilation done just not to machine code that a cpu architecture directly execs but rather an intermediate language that an executable program (written in C and compiled to machine code before hand) runs (there's also peephole optimization done on the byte code to make it run better e.g. typically removing any unreachable or unnecessary code). Because of this, it's clear how optimizations can be more limiting as the interpreter itself may be the limiting factor since no compilation into binary per execution is done and therefore may miss out on some further optimizations.

Note: the compile function doesn't make your code faster in python (since the python interpreter does it for you), it's meant for conversion from code to byte-code (intermediate representation).

Being an interpreted language doesn't prevent other kinds of optimizations such as JIT (i.e. numba has a JIT for python) and C extensions such as allowing modules written in C or using Cython that may speed up code.

# memoization and caching

If using functions particularly if using recursion or global variables potentially, you may want to consider using functools.memoize or functools.lru_cache to save memory and improve speed. There are also many other functions that functools offer such as wraps (retains function meta data), partial (initializes partial functions), and reduce (allows a binary reduce to be applied to an iterable).

# garbage collection and weak referencing

Python uses garbage collection for efficient memory use e.g. if memory is not in use then free it. However, if left unsupervised the garbage collector can get in the way of the program and increase the time the program ends up taking because of it not being able to detect that the objects used ahead of time may not need so many garbage collections. Therefore, a way to speed up large computations is to change the pre set threshold for garbage collection, freeze the existing objects for garbage collection, and do a garbage collection before a large computation to ensure garbage collection doesn't occur more frequently than it should unnecessarily increasing computation time.

```python
import gc

gc.collect()
gc.freeze()
gc.set_threshold(10**4,10**2,10**3) # no particular reason for the choice of numbers it's just for a higher threshold
## do some large computation that would get garbage collected too regularly ##
```

Weak referencing prevents the ref count on an object to increase and deters memory leaks from keeping objects indefinitely alive. Typically you might use this with eternal or circular objects since these persist in memory longer than they should.

```python
import weakref

class test: pass
ref=weakref.ref(test)
```

The gc or weakref module in python for most use cases typically won't need to be concerned with because garbage collection of objects to improve memory usage is done internally by the python interpreter eventually anyway. There are cases for it but typically these are for large computations where the garbage collections default setup can get in the way or other objects may persist in memory for longer than they should. The recommendation should be to try these things out (safely and within reason) where it makes sense and time it to get direct feedback. So unless it will make a significant difference try not to overuse gc or weakref.


# JIT (Just in time compilation)

Just in time compilation is a way to optimize code while it's running. Typically it looks for redundancies such as a for/while loop that does nothing every iteration and other things such as loop unrolling.

i.e. here's the idea of loop unrolling:
```python
## proposed loop
t=[]
for i in range(10):
    t+=[i]

## first order unrolling
t=[]
for i in range(5):
    t+=[i]
    t+=[i+1]
```

As mentioned, Numba has a JIT compiler used to compile python functions to make it run faster: https://numba.readthedocs.io/en/stable/user/jit.html

Note: one of the drawbacks of using numba is in terms of rewriting your code since there are many python syntaxes that numbas JIT may not currently support such as list comprehensions and generator expressions. Note that there's likely other JITs out there as well that may be better solutions such as pypy and others but numba's typically okay for a decent amount of use cases or at least as a start if using typically lots of numpy.

In [13]:
# this section needs fixing
# 1. the install methods for extensions and cython I think are incorrect
# 2. examples need to be better

# C extensions, Cython, and ctypes

C extensions are python modules written in C/C++ typically compiled as .so files (Note: if using C++ you may need to use a C extern since really python has a C API and not a C++ API but it can still work). The C/C++ code directly interfaces with the python C API and are compiled to .so files via setuptools in a setup.py file where you can do an editable install in case you need to recompile.

```c
// my_clib.c
#include <stdio.h>

int add(int x, int y) {
    return x + y;
}
```
```python
#
from setuptools import setup, Extension

setup(
    name='my_clib',
    version='0.0',
    ext_modules=[
        Extension('my_clib', ['my_clib.c']),
    ]
)
```

```python
# test.py
from my_clib import add

add(1,1)
# should return
# 2
```
If interested to know more here's the reference to the C API: 

https://docs.python.org/3/c-api/index.html#c-api-index

https://devguide.python.org/developer-workflow/c-api/index.html

https://github.com/python/cpython/blob/main/Include

This is also good to go over if interested: https://docs.python.org/3/extending/extending.html

A disadvantage to this is the requirement of knowing C/C++ and how the python C API works. Cython is the high level language alternative to C extensions. It's python (and still written in a .py file though python doesn't care if you use different extensions it can still work) with added C data types where the .py file gets compiled under Cython e.g. translates the code into a .c file and compiled as a C extension that can be used in other python scripts at run time as a module. It's more user friendly than a C extension if you don't want to learn in depth about how the python C API works and likely gets near same results in terms of performance given Cython gets converted into a C extension and then compiled into a .so file therefore it essentially interfaces with C extensions to reduce the required knowledge needed to create a C extension.

i.e.
```python
# my_clib.py
def add(int x, int y):
    return x + y
```
```python
#
from setuptools import setup
from Cython.Build import cythonize

setup(
    name='my_clib',
    ext_modules=cythonize('my_clib.py')
)
```

See the documentation for other things you may want to know: https://cython.readthedocs.io/en/latest/

ctypes is a builtin python library that allows you to import .c or .cpp files as libraries from their .so compiled files where you can utilize C/C++ compiled functions and other data types at run time. It's a way to make python more flexible around the C/C++ interface so that the languages can be used together at run time. It does require additional knowledge of C mostly because you will be directly interfacing with C data types i.e. structs and pointers (python does not have identically these but ctypes allows compatibility via ctypes.pointer/ctypes.POINTER and ctypes.Structure).

i.e.
```c
// my_clib.c
// compile the following to my_clib.so as a shared library e.g. gcc -fPIC -shared -o my_clib.so my_clib.c
#include <stdio.h>

int add(int x, int y) {
    return x + y;
}
```
```python
## test.py ##
import ctypes

# DLL - Dynamic Link Library
cmodule = ctypes.CDLL('C://.../my_clib.so')

# Define the type annotations
cmodule.add.restype = ctypes.c_int
cmodule.add.argtypes = [ctypes.c_int, ctypes.c_int]

cmodule.add(1, 1)
# should return
# 2
```
If wanting to know more about ctypes see: https://docs.python.org/3/library/ctypes.html

It's important to note that using ctypes is more often about compatibility with C/C++ and sometimes interfacing other python internal mechanisms at run time rather than speed. If performing big data computations you may see performance improvements, however, using Cython is the recommended approach preferred over ctypes if speed/efficient memory usage is important and is typically also easier to use than ctypes and (as mentioned) C extensions.

However, if wanting to add new builtin functions or objects to the python language then you likely want a c extension.