# Python Back to the Basics

This session is inspired by some conversations I have had recently about some of the aspects of Python we take for granted. I think a lot of developers who have worked with Python for a long time and are very good developers, do not have a full understanding of some of the lower level details and logistics of how the language actually runs on their computer. I hope that by the end of this session, everyone will have a greater understanding of how Python is operating and what is actually happening when they perform common tasks with the language.

## Glossary

* History and Philosophy
* Quirks of the Python Language
* Interpreting
* Packages
* Environments

## History and Philosophy

Python is a high-level programming language that was created by Guido van Rossum. The first version of Python was released in 1991 as Python 0.9.0. This initial version already featured classes with inheritance, exception handling, functions, and core datatypes of list, dict, str and others. Guido van Rossum was influenced by the ABC language, a teaching language he helped create at CWI (Centrum Wiskunde & Informatica) in the Netherlands designed to be easy to use.

In the foreword to the book "Programming Python: Powerful Object-Oriented Programming," van Rossum stated that his primary goal in designing Python was to create a language that was easy to understand and write. He cited the importance of a short development cycle, where the time between conceiving a program and creating it is as short as possible. The design of Python also reflected his own personal taste in programming language features, borrowing elements from different languages such as ABC, Modula-3, and C.

The purpose of Python is to be readable, easy to use, flexible (in that it allows multiple paradigms), and high level (meaning you do not need to concern yourself with lower level functioning like garbage collection when writing).

## Quirks

Some things that have historically made Python unique are:

1. **It is interpreted** (more on this below)

2. **Everything is an object**: We have discussed this in previous session (see 03_Object_Oriented_Programming)

3. **Dynamic typing and duck typing**: Python is dynamically typed, which means you don't need to declare the data type of a variable when you create it - Python infers it at runtime. Additionally, Python uses duck typing, a concept that allows you to use objects based on their characteristics, not on their types. As the saying goes, "If it looks like a duck, swims like a duck, and quacks like a duck, then it probably is a duck."

4. **List comprehensions**: Discussed at length in 04_Functional_Programming_in_Python

5. **Mutability**: We have discussed this in previous session (see 03_Object_Oriented_Programming)

## The Interpreter

The Python interpreter is the core machinery that executes your Python programs. It's responsible for taking Python scripts, compiling them into bytecode, and then executing that bytecode within the Python runtime environment. The term "Python interpreter" is often used interchangeably to refer to the Python implementation itself, which includes not just the interpreter but also the standard libraries, the built-in types and functions, and the garbage collector.

The main Python interpreter, known as CPython, is written in C. CPython is the reference implementation of Python, developed and maintained by the Python Software Foundation. It handles the following tasks (discussed below in detail):

1. Source Code to Bytecode Compilation
2. Bytecode Execution
3. Memory Management and Garbage Collection
4. Built-in Functions and Standard Libraries

### Source Code to Bytecode Compilation

When a Python script is run, the interpreter first parses the source code, converting it into an Abstract Syntax Tree ([AST](https://docs.python.org/3/library/ast.html)). The AST is then compiled into bytecode, which is a low-level, platform-independent representation of the source code. This step involves several stages: tokenization, parsing into a parse tree, creation of the AST, and finally, compilation into bytecode.

This bytecode is then stored in .pyc files to speed up subsequent executions of the script. If the original .py file hasn't been modified since the .pyc file was created, the interpreter will skip the tokenization, parsing, and compilation stages and execute the .pyc file directly.

We can see an example of .pyc files at work in the following example.

In this directory, we have a simple module with a single python script, `hello.py`. `hello.py` has a function, `hello()` that simply prints the word "Hello":

```
.
├── 08_Back_to_the_Basics.ipynb
└── test_module
    ├── __init__.py
    └── hello.py
```

```python
def hello():
    print(f"Hello")
    

if __name__ == "__main__":
    hello()
```

If we open up a Python session and run the `hello()` function we will get this output:

![image-1](images/download-1.png)

If we then go back and look at the test_module directory, we will see that as soon as we ran `from test_module.hello import hello`, we generated a directory in test_module called `__pychache__`:

```
.
├── 08_Back_to_the_Basics.ipynb
└── test_module
    ├── __init__.py
    ├── __pycache__
    │   ├── __init__.cpython-311.pyc
    │   └── hello.cpython-311.pyc
    └── hello.py
```

For every file in our module, `__pychache__` creates a `.pyc` file of byte code representing the code in that file. If we take a look at the contents of `hello.cpython-311.pyc`, we will see what looks like gibberish:

```
?
?/?dN??0?d?Zedkr
                e??dSdS)c?$?td??dS)N?Hello)?print???T/Users/chasehudson/code/code-dojo/modules/08_back_to_the_basics/test_module/hello.py?hellos??	?(?O?O?O?O?Or__main__N)__name__rrr<module>r
                                   s<?????
                                         ?z??
                                             ?	?E?G?G?G?G?G?
                                                             ?
                                                              r%  
```

We can pick out a few things that seem to make sense, like the string constant we used "Hello". The reason this is indistinguishable is because it is not meant to be read by humans. This is the result of our computers attempt to make sense of of byte code, but this is a detailed set of instructions for the Python Virtual Machine ([PVM](https://www.devopsschool.com/blog/python-virtual-machine/)) (discussed below).

Now, let's see what happens if we edit our `hello.py` file to take an argument:

```python
def hello(name: str):
    print(f"Hello, {name}")
    

if __name__ == "__main__":
    hello()

```

![image-2](images/download-2.png)

If we look at our `test_module` directory, we will see the same `.pyc` files, but when we take a closer look, we can see that they have changed:

```
.
├── 08_Back_to_the_Basics.ipynb
└── test_module
    ├── __init__.py
    ├── __pycache__
    │   ├── __init__.cpython-311.pyc
    │   └── hello.cpython-311.pyc
    └── hello.py
```

```
?
?1?d_??6?defd?Zedkr
                   e??dSdS)?namec?*?td|????dS)NzHello, )?print)rs ?T/Users/chasehudson/code/code-dojo/modules/08_back_to_the_basics/test_module/hello.py?hellors??	?
?D?
?
?__main__N)?strr__name__?rr<module>r
                                    sH????????
                                             ?z??
                                                 ?	?E?G?G?G?G?G?
                                                                     ?
                                                                      r%  
```

If you look closely at the section around "Hello" you can tell this is different bytecode. So, you may be asking yourself how does the Python Interpreter know the code has changed? It is because `.pyc` store more than just the code execution instructions. They also store metadata about when the file was created (see [discussion](https://stackoverflow.com/a/23778557)). The interpreter compares the creation timestamp of the `.pyc` file with the source file to determine if it needs to "re-compile".

Another detail about how the Python Interpreter and bytecode work that you might have come across at some point is how re-importing a module works (or more aptly doesn't work) in an interactive session. Let's take our original example and add a print statement letting us know we are importing:

```python
print("importing...")

def hello():
    print(f"Hello")
    

if __name__ == "__main__":
    hello()
```

If we run this, we get the expected result:

![image-3](images/download-3.png)

We get the print message upon import and our output is "Hello". However, if we change the implementation to take an arg and re-import in the same session, we get:

![image-4](images/download-4.png)

When we re-import, we do not see the "importing..." message and we do not pick up the new changes. This is because the designers of python decided packages should only be imported once per interactive session for efficiency reasons. However, if you want to keep a session live and be able to pick up module changes, you can use [importlib.reload](https://docs.python.org/3/library/importlib.html#importlib.reload). If you have ever used a framework like Django or FastAPI that will pick up changes without you needed to stop the service, it is using importlib.realod to do so (see [dicsussion](https://stackoverflow.com/a/437591)). For example:

![image-5](images/download-5.png)

**Note**: We had to switch up the way we imported the `hello()` function in order to use `reload()`. This is because reload only works at the module level, not the function level (as shown in the docs).


### Bytecode Execution

The bytecode is executed by the Python Virtual Machine (PVM), which is the runtime engine of Python. The PVM is a simple loop that fetches the next bytecode instruction, executes it, and then proceeds to the next one, until there are no more instructions to execute.

Each bytecode instruction represents a low-level operation, such as an arithmetic operation, a variable assignment, a logical operation, a control flow change, etc. The PVM keeps track of the program state, maintaining a stack of active frames (one for each function call) and a stack of values for each frame.

[Further reading](https://panutanur.medium.com/pvm-python-virtual-machine-df83ca6a79a6)


### Memory Management and Garbage Collection:

The Python interpreter also manages the memory allocation for Python objects. When an object is created, the interpreter allocates memory to store it. When an object is no longer needed, the memory is freed and can be used for other objects. This process is handled by Python's memory manager.

In addition to the memory manager, Python also has a garbage collector that deals with circular references. Circular references occur when a group of objects reference each other, causing them to stay alive even if they're no longer reachable from the rest of the program. The garbage collector periodically checks for these groups of objects and frees them if they're no longer in use.

[Further reading](https://towardsdatascience.com/memory-management-and-garbage-collection-in-python-c1cb51d1612c)

### Built-in Functions and Standard Libraries:

The Python interpreter includes a set of built-in functions and types, such as print(), len(), list, dict, etc. These built-in functions are always available in Python, without needing to import any modules.

In addition to the built-in functions, the Python interpreter also comes with a large standard library of modules that provide additional functionality, from file I/O and regular expressions to HTTP servers.

### Homework

If you are interested in seeing an example of how the Python Interpreter ensures built-ins are available or what the garbage collection process looks like, you can run python in verbose mode and it will show you all of the imports happening under the hood at startup and all of the memory management processes that occur on shutdown:

```bash
python -v
```

```python
exit()
```

## External Packages and Python Package Management

We have all had to use `pip install` at some point in our career, but what is really happening here. 

### Pip Install

#### Public

The process here is surprisingly straightforward. The main public repository for Python packages is [PyPi](https://pypi.org/). By default, if you do not change your [pip config settings](https://pip.pypa.io/en/stable/topics/configuration/), you will be pulling from PyPi when you run a `pip install <package command>` So, what happens when you actually pip install something. In the old days of pip, you were basically just running a straight download and if the package you were downloading was inconsistent with your dependencies, you would not find out until you tried to run something. However, in November 2020 in version 20.3, pip added a dependency resolver (see [announcement](https://discuss.python.org/t/announcement-pip-20-3-release/5948)). The lack of a dependency resolver was one of the things that drove people to using virtual environment managers like conda (more on that below).

However, at a basic level, the install process that `pip install` uses can be thought of as a download of packages that people build using tools like `setuptools` via a `setup.py` file, or tools like poetry and push to PyPi (devs can also build packages via [recipes](https://docs.conda.io/projects/conda-build/en/stable/concepts/recipe.html) and push to a different public repository conda-forge). So, where do these downloads get stored?

Below is a basic example of what happens when you pip install and how you can locate your site packages (Note: I am using miniconda for this example, so the filepaths would be slightly different with poetry or pyenv):

![image-6](images/download-6.png)

![image-7](images/download-7.png)

Above you can see when we pip install `pandas` we have to also install the dependencies of pandas (you can override this behavior with the [--no-deps](https://pip.pypa.io/en/stable/cli/pip_download/#cmdoption-no-deps) flag). Once installed, if you want to find out where a module's files are located, you can import the module and check the dunder `__file__` attribute. If we take a look at this directory, we see every file and directory pandas needs to run (results have been truncated to 1 level for brevity):

```tree
/Users/chasehudson/miniforge3/envs/fresh-env/lib/python3.11/site-packages/pandas
├── __init__.py
├── __pycache__
├── _config
├── _libs
├── _testing
├── _typing.py
├── _version.py
├── api
├── arrays
├── compat
├── conftest.py
├── core
├── errors
├── io
├── plotting
├── testing.py
├── tests
├── tseries
└── util
```

#### Local

Similar to pip installs from PyPi, I assume everyone reading this has, at some point, had to pip install a local package using `pip install .`. This functions the exact same way as pip installing from PyPi. The package is installed based on the specifications in the build file (e.g., `setup.py`) and your package files are saved in the site-packages directory of your current Python Interpreter. Let's add a basic setup.py file for our test_module and see an example:

```python
from setuptools import setup, find_packages

setup(
    name='test-module',
    version='0.1.0',
    description='Module to illustrate a point',
    long_description='',
    author='Chase Hudson',
    packages=find_packages(),
    zip_safe=False,
    keywords='test_module',
    classifiers=[],
)
```

![image-8](images/download-8.png)

Great! It installed, now one thing to be careful about with local packages is to make sure you know which version you are pulling from when you import if you make changes. If you are in the projects directory from which you originally installed, any new Python session will pull from that directory and pick up any changes, but if you run from outside that directory, you will be pulling from the site-packages version and it will not have any updates you make until you rereun pip install from within the project directory. Example:

![image-9](images/download-9.png)

![image-10](images/download-10.png)

You can change this behavior by using the [-e](https://pip.pypa.io/en/stable/cli/pip_install/#install-editable) editable flag, which ensures that you are always pulling from your local project directory:

![image-11](images/download-11.png)

## Environment Management (the great divide)

![image-12](images/download-12.png)

Python environment and dependency management always seems to be an issue of debate. Everyone has their preference and it is often whichever tool they were first exposed to (I am very guilty of this; miniconda is the first environment/dependency manager I used and I am definitely biased towards it). However, I will try my best to lay out the details of the main virtual environment tools and their pros and cons.

### venv

**What does it do**: Virtual environment only. This tool allows you to install and isolate packages so they are only available to you when you have activated your venv environement

**How to install**: You do not have to install it. venv has come standard with Python since version 3.3

**How to use it**:

```bash
# Create
python -m venv <name> <file path to store environemnt in>

# Activate
source <file path to store environemnt in>

# Alternatively, you can activate using
cd <file path to store environemnt in>
./bin/activate

# You will know you are in your env because your terminal command prompt will display the env name
# For example, if our venv name was dev, your terminal would look like this:
(dev) ~/code/code-dojo/modules/08_back_to_the_basics

# Installing packacges
pip install <package>
```

**Pros**:

* Easy to use
* You do not have to install anything

**Cons**:

* You have to download the python version you want yourself and manage that install
* In my experience, directly downloading python to your system is more trouble than it is worth

### virtualenv

**What does it do**: Similar to venv, but a bit of a step up as far as performance and flexibility goes

**How to install**: Virtualenv can be [pip installed](https://sourabhbajaj.com/mac-setup/Python/virtualenv.html)

**How to use it**:

```bash
# Create
virtualenv venv <name> <file path to store environemnt in>

# Activate
source <file path to store environemnt in>

# Alternatively, you can activate using
cd <file path to store environemnt in>
./bin/activate

# You will know you are in your env because your terminal command prompt will display the env name
# For example, if our venv name was dev, your terminal would look like this:
(dev) ~/code/code-dojo/modules/08_back_to_the_basics

# Installing packacges
pip install <package>
```

**Pros**:

* Faster than venv

**Cons**:

* Largely the same cons as venv
* One con of both of the above options for me personally is the need to worry about keeping track of specific file paths. When you have a lot of virtual environments, this can be an unneeded headache 

### Poetry

**What does it do**: Poetry is the first tool we have discussed that manages not only virtual environments, but projects and dependencies

**How to install**: Poetry is installed from via binaries; instructions can be found [here](https://python-poetry.org/docs/#installing-with-the-official-installer)

**How to use it**: Poetry is substantially more involved than the previous offerings and there is a lot that can be done with this tool, so I will leave some links below and just cover a basic use case

```bash
# Create (this will create a directory of the name provided in the current directory)
poetry new <project name>
cd <project name>

# Activate
poetry shell


# You will know you are in your env because your terminal command prompt will display the env name
# For example, if our venv name was dev, your terminal would look like this:
(dev) ~/code/code-dojo/modules/08_back_to_the_basics

# Installing packacges
poetry add <package>
```

**Further reading**:

* [Poetry configs](https://python-poetry.org/docs/configuration/)
* [Specifying python version](https://python-poetry.org/docs/managing-environments/#switching-between-environments) (note Poetry also requires you to already have the Python version you want to use installed)
* [Get started guide](https://dev.to/bowmanjd/getting-started-with-python-poetry-3ica)
* [Package and deploy with Poetry](https://www.freecodecamp.org/news/how-to-build-and-publish-python-packages-with-poetry/)

**Pros**:

* Poetry can be a really nice tool because it can act as a virtual environment, dependency manager/resolver, and it can also take the place of setuptools for actually packaging your code
* Has its own dependency resolver (this was more of a selling point before pip 20.3 which we discussed above)

**Cons**:

* Longer install times as a result of Poetry's dependency resolver (Poetry tries every combination of version combinations within the version range you give it)
* Additionally, if you are using Docker to containerize your code, poetry results in longer build times as it has to be installed on each build and it can take awhile to install fresh
* Poetry is pretty bad about hanging forever if you have incompatible packages because it makes every effort to reconcile incompatible dependencies
* Again, you have to manage your own Python installs. I can not stress enough how much I hate having to manage my own python installs

### Conda (miniconda)

One note up top, people often confuse conda and miniconda because people will often refer to miniconda as conda. Conda (actually known by the full name, Anaconda) is a free open source platform that aims to make Python dependency management easier. Accounrding to Conda's docs on when to chose Anaconda or Miniconda:

```
Choose Anaconda if you:

* Are new to conda or Python.

* Like the convenience of having Python and over 1,500 scientific packages automatically installed at once.

* Have the time and disk space---a few minutes and 3 GB.

* Do not want to individually install each of the packages you want to use.

* Wish to use a set of packages curated and vetted for interoperability and usability.

Choose Miniconda if you:

* Do not mind installing each of the packages you want to use individually.

* Do not have time or disk space to install over 1,500 packages at once.

* Want fast access to Python and the conda commands and you wish to sort out the other programs later.
```

** **I do not reccommend using Anaconda; this section will only pertain to Miniconda and I will use "conda" to refer to Miniconda** **

**What does it do**: Like Poetry, conda can manage virtual environments, projects, and dependencies. I mentioned previously that PyPi was not the only public Python Package repository. Conda also offers [conda-forge](https://conda-forge.org/) for conda packages specifically. These are Python packages that are built through Conda's build process (more about that in **Further reading**).  

**How to install**: Conda is installed from via binaries; instructions can be found [here](https://conda.io/projects/conda/en/latest/user-guide/install/macos.html). Additionally, I wrote detailed installation instructions you can find [here](https://docs.google.com/document/d/1yyO2dXs5ojMvL6Lgqv-EBdbko-_zW5EVLXOGUkI4cMU/edit#heading=h.zaiw8fa5va8s)

**How to use it**: Like Poetry, Conda is substantially more involved than the previous offerings and there is a lot that can be done with this tool, so I will leave some links below and just cover a basic use case

```bash
# Create
conda create -n <name of env> python=<desired version of Python>

# Activate
conda activate <name>


# You will know you are in your env because your terminal command prompt will display the env name
# For example, if our venv name was dev, your terminal would look like this:
(dev) ~/code/code-dojo/modules/08_back_to_the_basics

# Installing packages from PyPi or private repos set up via pip config
pip install <package>

# Installing packages from conda-forge or a private conda channel
conda install <package>
```

**Further reading**:

* [Build Conda package from scratch](https://docs.conda.io/projects/conda-build/en/main/user-guide/tutorials/build-pkgs.html)
* [What is conda-forge](https://conda-forge.org/docs/user/introduction.html)
* [Managing channels](https://conda.io/projects/conda/en/latest/user-guide/tasks/manage-channels.html)

**Pros**:

* Finally, we don't have to worry about installing different versions of Python ourselves. Conda manages that when you set the python CLI argument
* No worrying about filepath. All virtual environments are stored in `/Users/<user>/miniforge3/envs`
* Flexibility of use (can be used in conjunction with any of the above tools)
* Has its own dependency resolver (this was more of a selling point before pip 20.3 which we discussed above); also, dependency resolver only applies to Conda packages
* Less of a learning curb than Poetry for basic usage (Conda recipes can be involved though if you are trying to build through Conda)

**Cons**:

* If you are using `conda install` instead of `pip install` it can take quite a bit longer to install as you wait for the dependency checker
* Similar to Poetry in terms of added difficulty in installation, with the exception that you don't have to worry about installing anything other than system level Python

### Parting Thoughts

At the end of the day, it's about what works for you and what gets the job done. I am a big proponent of engineering teams deciding on a single way of managing virtual environments, packages, and dependencies because it makes it infinitely easier to debug other developers issues when they are doing local development. Otherwise, you will spend the first chunk of time just trying to figure out how the other dev is managing their environment and making sure there aren't any oddities occurring as a result of that process.

Furthermore, I would like to stress that the above options are not mutually exclusive. You can use Conda to install a new Python version, then create an venv or poetry virtual environment within that Conda environment. I do not know if this is just because I have spent so many years using Conda, but anecdotally, I have seen far fewer odd environment related issues with Conda than I have with venv, virtualenv, or Poetry. However, take my opinions with a grain of salt and do what works best for you. All I ask is, whatever tool/process you use, make sure you use it because you believe it is the best option and not just because it is the one you used first 😄 