# SLU14 | Modules and Packages
***

### Table of Contents
[1. Modules in Python](#1.-Modules-in-Python)\
&emsp;[1.1. Importing a Module](#1.1.-Importing-a-Module)\
&emsp;[1.2. Import Aliases](#1.2.-Import-Aliases)\
&emsp;[1.3. Relative Imports](#1.3.-Relative-Imports)\
&emsp;[1.4. Executing Modules as Scripts](#1.4.-Executing-Modules-as-Scripts)\
&emsp;&emsp;[1.4.1. Considerations when running modules as scripts](#1.4.1.-Considerations-when-running-modules-as-scripts)\
&emsp;&emsp;&emsp;[1.4.1.1. Using script entrypoints for script-only execution](#1.4.1.1.-Using-script-entrypoints-for-script-only-execution)\
&emsp;&emsp;&emsp;[1.4.1.2. Scope consideration when using entrypoints](#1.4.1.2.-Scope-consideration-when-using-entrypoints)\
&emsp;&emsp;&emsp;[1.4.1.3. Demonstration of importance of script entrypoints](#1.4.1.3.-Demonstration-of-importance-of-script-entrypoints)\
&emsp;&emsp;&emsp;[1.4.1.4. Circular imports](#1.4.1.4.-Circular-imports)\
&emsp;&emsp;&emsp;[1.4.1.5. Relative imports with scripts](#1.4.1.5.-Relative-imports-with-scripts)\
&emsp;[1.5. Standard Modules](#1.5.-Standard-Modules)\
&emsp;&emsp;[1.5.1. Examples of Standard Modules](#1.5.1.-Examples-of-Standard-Modules)

[2. Packages in Python](#2.-Packages-in-Python)\
&emsp;[2.1. Executing package modules as scripts](#2.1.-Executing-Package-Modules-as-Scripts)\
&emsp;&emsp;[2.1.1. Basic Introduction to PYTHONPATH](#2.1.1.-Basic-Introduction-to-PYTHONPATH)\
&emsp;&emsp;[2.1.2. Running package modules as scripts using the -m flag](#2.1.2.-Running-package-modules-as-scripts-using-the--m-flag)\
&emsp;[2.2. Managing Packages with PIP and VENV](#2.2.-Managing-External-Packages-with-PIP-and-VENV)\
&emsp;&emsp;[2.2.1. Virtual Environments](#2.2.1.-Virtual-Environments)\
&emsp;&emsp;[2.2.2. Installing packages on our environments](#2.2.2.-Installing-packages-on-our-environments)\
&emsp;&emsp;[2.2.3. How does PIP handle dependencies?](#2.2.3-How-does-PIP-handle-dependencies?)\
&emsp;[2.3. Beyond PIP](#2.3.-Beyond-PIP)

[Recap](#Recap)

[Further Reading](#Further-Reading)

Welcome to your **8th** week! 🥳🥳🥳  
Congratulations on your work so far! We know it's not easy, but in the end when you look back, you'll know it was worth it.
<br><br>
The topics for this SLU are the following:
+ Modules in Python;
+ Packages in Python;
+ Managing Packages with pip and venv.

Soooo, let's do it! 💪💪🏻💪🏼💪🏽💪🏾💪🏿

# 1. Modules in Python

When a Python program is contained in a single file, it's known as a script. When our programs start to get bigger, in order to keep them organized and easier to maintain, we should separate logical parts of our programs by files. You may also want to use a handy function that you’ve written in several programs without copying its definition into each program.


To support this, Python has a way to put definitions (functions, classes, variables) in a file and use them in a script or a jupyter notebook. Such a file is called a module. Definitions from a module can be imported into other modules or into the main module.

A module is a file that contains Python definitions and statements. Its file name is its module name with the extension `.py`. For example, a file containing Python code, like number_enthusiast.py, is called a module, and its module name (given by the `__name__` attribute - example given later) would be number_enthusiast.

Source: https://docs.python.org/3/tutorial/modules.html#modules

#### An example module

Let's use the fibo module:

```python
# Fibonacci numbers module

def fib(n):    # write Fibonacci series up to n
    a, b = 0, 1
    while a < n:
        print(a, end=' ')
        a, b = b, a+b
    print()

def fib2(n):   # return Fibonacci series up to n
    result = []
    a, b = 0, 1
    while a < n:
        result.append(a)
        a, b = b, a+b
    return result
```

## 1.1. Importing a Module

You can think of imports as asking someone for a set of tools, where the module itself is the toolbox (collection of tools) and the variables/functions are the tools themselves.

In general, you should always place import statements at the __top of each module__ (with some particular exceptions).\
Why? It's always better to have tools that you need at hand, rather than having to fetch them every time you need them! Not only does that make your job easier, but also other developers will be able to read your code more easily.

Let's start by importing the fibo.py module. 

In [1]:
import fibo

The history saving thread hit an unexpected error (OperationalError('attempt to write a readonly database')).History will not be written to the database.


Importing a module in this way will only include its *name* (`fibo`), not any of its definitions, into the current namespace. In essence, you are telling Python that there is an object of name `fibo` that you are interested in. As for namespaces, if you're not familiar with them, don't worry. For now, just think about them as a mapping/dictionary that pairs names and objects (it's similar to the definition of *scopes*). 

The `fibo` module, of which you can see a copy above, has two functions: `fib` and `fib2`. Since we haven't imported them directly, we'll have to ask Python to get a particular attribute (in this case, a function definition) from the `fibo` module. That's how we would call and use the `fib` function, which we defined in the `fibo` module (imported in the previous cell):

In [2]:
fibo.fib(1000)

0 1 1 2 3 5 8 13 21 34 55 89 144 233 377 610 987 


__Bonus:__ When we used ```fibo.fib(1000)```, we actually did two things at once: we requested the function ```fibo.fib``` and, at the same time, used it by *calling* it. Let's see what you imported, without actually calling it:

In [3]:
fibo.fib

<function fibo.fib(n)>

### More on Modules

There is a variant of the import statement that imports names from a module directly. This means that we are not importing the module itself, i.e. we will not have `fibo` within our namespace, but instead have `fib` and `fib2`. If you think about this in a simplistic way, importing variables, functions, or classes in this manner is like copying & pasting them and their dependencies directly into your module.

In [4]:
from fibo import fib, fib2

In [5]:
fib2(500)

[0, 1, 1, 2, 3, 5, 8, 13, 21, 34, 55, 89, 144, 233, 377]

__Bonus:__ Using this method, you can be very specific with what you want. It does not need to be a function, but can be a variable or even another module! Example:
```python
from module.submodule import subsubmodule  # notice the . separating the module structure

def func(arg):
    return subsubmodule.foo(arg)
```
More on this below...

__(Caution: Bad Practice Below)__ 

There is a variant that imports all names that a module defines. Note that in general the practice of importing __*__ from a module or package is frowned upon, since it often leads to poorly readable code.

One of the very __few__ ways you should consider doing this is with one-time, disposable snippets of code, where you need to debug something quickly.

In [6]:
from fibo import *

In [7]:
fib(500)

0 1 1 2 3 5 8 13 21 34 55 89 144 233 377 


__Why is this bad practice?__ It can be particularly troublesome in a couple cases:

* You have imports of two modules and both have a function with the same name. How will you know which one you'll end up using?
* When debugging (using certain tools), having everything in the same namespace can be confusing.

### TL;DR: Analogy Time

``import <module>`` is like asking Python for a toolbox. It can contain either tools or drawers/dividers that contain tools. This kind of statement can useful when you have multiple tools you need to use, or when you want to know exactly from which toolbox you retrieved a given tool.

``from <module> import <resource>`` is like asking for a specific tool or drawer from within your toolbox. It's useful when you know exactly which tools you need or where most of them are located. 

``from <module> import *`` is like asking for every possible tool in your toolbox to be delivered at your feet. That might get the job done, but it seems a bit extreme if you only wanted a screwdriver. Also, now you have a mess at your feet.

## 1.2. Import Aliases

If the module name is followed by an `as`, then the name following `as` is added to the namespace as an alias of an imported module within the namespace.

In [8]:
import fibo as fib_module
fib_module.fib(500)

0 1 1 2 3 5 8 13 21 34 55 89 144 233 377 


In effect, this imports the module in the same way that `import fibo` does, with the only difference being that the module will be available as `fib_module`.

You can also use aliases in a similar way with `from` statements:

In [9]:
from fibo import fib as fibonacci_function
fibonacci_function(500)

0 1 1 2 3 5 8 13 21 34 55 89 144 233 377 


### TL;DR: Analogy Time
Your tools and toolboxes are now spies!

While that is not entirely true, this is like trying to tell someone which tool you want, when they do not know its name. Instead, give it a friendly name that helps the person identify the tool more easily.

__When would you actually use this?__ 

Sometimes it helps to keep things less *verbose* (aka loooooong):
```python
from module import super_duper_long_function_name_just_because

def foo():
    return super_duper_long_function_name_just_because()
# vs.
from module import super_duper_long_function_name_just_because as short

def bar():
    return short()
```

Other times, it can help when two toolboxes have the same tool name for different things:
```python
from toolbox_1 import screwdriver
from toolbox_2 import screwdriver # this will overwrite the definition of screwdriver
# vs.
from toolbox_1 import screwdriver as flat_screwdriver
from toolbox_2 import screwdriver as cross_screwdriver # also known as a phillips screwdriver
```

## 1.3. Relative Imports

While usually a bad idea, there is an alternate method of importing modules worth mentioning just to be thorough. Until now, we have seen what are known as *absolute* imports, meaning that the absolute path to the import is included from the root module to the final submodule/function (e.g. `from root.submodule.subsubmodule import function`).\
Similar to directory paths, there are also *relative* imports, where you attempt to traverse a module structure using dots (`.`). This is considered bad practice in most cases, because it makes the code less intuitive. These imports use the module `__name__` to resolve paths to imports.\
Examples:
```python
from . import <module>             # look in current module directory for <module>
from .. import <module>            # go back into parent directory and look for <module>
from .<module_a> import <module_b> # in current directory, enter <module_a> and import <module_b>
```
These are most commonly found within `__init__.py` files to initialize modules with a particular set of resource names, but that is a more advanced topic that we will leave aside for the moment.

### TL;DR: Analogy Time
A relative import is like asking to look for another tool or tool drawer *relative* to the current location. You'll usually see these kinds of requests for tools that are contained either in the current drawer or in the one that preceded it.\
It's similar to how you can treat directory paths in places like your terminal (e.g. if you `cd ..` in your terminal, it will enter the parent directory).

## 1.4. Executing Modules as Scripts

When you run a Python module with

```
python fibo.py <arguments>
```

the code in the module will execute **just as if you imported it** (when you import a module, its code is run from top to bottom), but with its `__name__` attribute set to `__main__`. That means that by adding this code at the end of your module,

```python
if __name__ == "__main__":
    import sys
    fib(int(sys.argv[1]))
```

you make the file usable as a script, as well as an importable module. This is because the code that parses the command line only runs if the module is executed as the “main” file. We add that piece of code to the fibo module and save it in the fibo_script.py file. When we run it, the block of code that appears above will run, but if the file is imported as a module, the block of code above will not run (we won't see any output).

In [23]:
! python fibo_script.py 50

0 1 1 2 3 5 8 13 21 34 


In Python, an __entrypoint__ is the part of your code where the program starts executing. When you run a Python script, the interpreter starts at the first line of code and executes each statement in order until it reaches the end of the file.

To specify an entrypoint for your module, you can add code inside a conditional block that checks whether the current module is the main program that is running. This conditional block is usually written as follows:

```python
if __name__=='__main__':

    #Your code here
```

The purpose of this block is to provide a way to differentiate between a file that is being run as a script directly and a file that is being imported as a module. When you run a script directly, Python sets the ```__name__``` variable to ```"__main__"```. However, when a script is imported as a module, Python sets the ```__name__``` variable to the name of the module.

Here's a simple example of how to use the entry point in your script. Let's say you have a module called hello.py that contains a function that prints a greeting:

In [24]:
# hello.py

def say_hello():
    print("Hello, world!")


If you want to be able to run this module *appropriately* as a script, you can add the following code to the bottom of the file:

In [12]:
# hello.py

def say_hello():
    print("Hello, world!")

if __name__ == '__main__':
    say_hello()


Hello, world!


Now, when you run hello.py from the command line, Python will execute the say_hello() function, but if you import hello as a module into another script, the say_hello() function won't be executed automatically.

### 1.4.1. Considerations when running modules as scripts
In this section, we will see some real-life examples of common pitfalls and mistakes found when working with modules. These can be found in code from developers, both experienced and beginner.\
We hope that by giving you some background into these examples, you will not only know how to recognise them later on, but also *understand* what this code is doing.

#### 1.4.1.1. Using script entrypoints for script-only execution 

Let's dig a little deeper into one of the previous examples:
```python
if __name__ == "__main__":
    import sys
    fib(int(sys.argv[1]))
```

But wait! Didn't we say that imports should be at the __top__ of the module?

This is one of those *special occasions* where we can "bend" the rules a bit. When making imports within the ```if __name__ == "__main__":``` block, we are essentially saying, *If this module is run as a script, then import these modules and execute the code below*.

In the example, we do not need the module ``sys`` for anything other than getting the arguments __when running as a script__, therefore we only import it when we need to. Imports, or variables/functions, defined in this block will not run when importing the resource as a module from somewhere else.

#### 1.4.1.2. Scope consideration when using entrypoints

It can be good practice to use a different naming system within a script entrypoint, because it can lead to future errors. (Is this sentence correct? Does good practice lead to future errors? What about "It can be good practice to use a different naming system within a script entrypoint. Otherwise, you'll end up with errors."?)
The names defined in the entrypoint will be added to the global script namespace, meaning that it (do you mean the names or the global script namespace?) can technically exist when the file is used as a script, not a module.

```python
# foo.py

def hello():
    print(x) # notice that x was never defined within the function
    
if __name__ == "__main__":
    x = "hi!"
    hello()
    
---
> "hi!"
```
In the above example, `x` was never defined. However, if you run `python foo.py`, it will output the string defined in the entrypoint. Later on, if you try to import `foo.py` as a module from another module, then there will be an error when you call `hello()`. This because the variable `x` has not been defined yet.

For this reason, it might be worth using the name `_x`, or a different name altogether, to avoid these clashes down the line. It is not fun to debug things when you expect them to work *one* way, but they end up working *another*.

__Note:__ To emphasize, this example is *not supposed to work* even when you change `x` to `_x` in the entrypoint. It simply acts as a reminder that you can accidentally create a function that works only as a script because of an easily overlooked mistake. This practice of using a different name for variables inside of the entrypoint *avoids* potential clashes with names defined in other scopes (some IDEs will actually warn you about that).

#### 1.4.1.3. Demonstration of importance of script entrypoints

Here is an example of the importance of the ```if __name__ == "__main__":``` block.
Say you have a module (``foo.py``) like this:
```python
# foo.py

def hello():
    print("hello world")
    
hello()  # sneaky little call, probably used to test the module
```
And then you have a different module that imports ```foo.py``` and calls its ```hello()``` function. What would you see?

__Answer:__ You would see two calls: the first is the call done in ```foo.py``` itself (that runs when the module is imported), and the second is your call!

```python
from foo import hello

hello()

---
> hello world
> hello world
```

#### 1.4.1.4. Circular imports

Please bear in mind that Python is not able to resolve imports that reference each other, i.e. module A imports B and B imports A. These are called circular imports, like a snake eating its own tail (because python is a type of snake). When you run module A, it will request module B, but then B  will request A. 

This is relevant to entrypoints because of the example above. As you saw, there was a repeated call to the function `hello()`. This is due to the fact that Python will interpret and run everything within the module it imported when it's looking for the `hello()` function. (From a high-level perspective, think of it like Python copying & pasting a module into your script and running everything for you.)

In the event that you need to use something from a module that would throw a circular import, just keep in mind the analogy of the toolbox: When it comes to the `import <module>` statement, *You only want the toolbox at first. You'll grab a tool that's in it only when you need that specific tool*. Do bear in mind that this approach is rare. In most cases, you should be __refactoring__ (improving by changing) your code instead.

__Throws Exception__
```python
# a.py
from b import b1

def a1():
    ...

if __name__ == "__main__":
    a1()
    b1()
---
# b.py
from a import a1

def b1():
    ...
```

__Does Not Throw Exception__
```python
# a.py
import b

def a1():
    ...
    
if __name__ == "__main__":
    a1()
    b.b1()
---
# b.py
import a

def b1():
    ...
```

#### 1.4.1.5. Relative imports with scripts

Earlier, we saw how relative imports (e.g. `from . import <module>`) that use the module's `__name__` can resolve the import structure. Likewise, we learned that, when running modules as scripts, `__name__` will be changed to `__main__` regardless of where the script exists.

So the question is *What happens when we run a module as a script that contains relative imports?*\
__Answer:__ You will get an `ImportError: attempted relative import with no known parent package`.

The bottom line: always use absolute imports for python modules that you want to use as scripts.

## 1.5. Standard Modules

Python comes with a library of standard modules. These provide access to operations that are not part of the core of the language, but are nevertheless built in, either for efficiency or to provide access to operating system primitives, such as system calls.

The Python interpreter (like Jupyter Notebooks) also has a number of functions and types built into it that are always available without having to import them. Here's a full list: https://docs.python.org/3/library/functions.html

Source: https://docs.python.org/3/library

### 1.5.1. Examples of Standard Modules

Here are some examples of useful modules within the Python standard library. As a developer, some of these will accompany you for a long time.

`os` and `sys` - System related\
`math` - Mathematical functions\
`pathlib` - Dealing with paths to files and directories\
`json` - Reading and writing to JSON\
`re` - Regular expressions\
`datetime` - Working with dates\
`time` - Working with time\
`dataclasses` - Special types of classes\
`itertools` - Advanced iteration operations\
`functools` - Advanced function operations\
`logging` - Logging operations

Bear in mind that some modules contain themselves. These are commonly called *packages* and are the topic of our next discussion.

In [13]:
from datetime import datetime
datetime.today().date()

datetime.date(2025, 5, 12)

## 2. Packages in Python

Packages in python are directories that contain modules.    

Packages are a way of structuring Python’s module namespace by using “dotted module names”. For example, the module name A.B designates a submodule named B in a package named A.

Suppose you want to design a collection of modules (a “package”) for the uniform handling of sound files and sound data. There are many different sound file formats (usually recognized by their extension, for example, .wav, .aiff, .au), so you may need to create and maintain a growing collection of modules for the conversion between the various file formats. There are also many different operations you might want to perform on sound data (such as mixing, adding echo, applying an equalizer function, creating an artificial stereo effect), so you might find yourself writing a never-ending stream of modules to perform these operations. Here is a possible structure for your package (expressed in terms of a hierarchical filesystem):

```code
sound/
├── __init__.py           # Initialize the sound package
├── formats/              # Subpackage for file format conversions
│   ├── __init__.py
│   ├── wavread.py
│   ├── wavwrite.py
│   ├── aiffread.py
│   ├── aiffwrite.py
│   ├── auread.py
│   ├── auwrite.py
│   └── ...
├── effects/              # Subpackage for sound effects
│   ├── __init__.py
│   ├── echo.py
│   ├── surround.py
│   ├── reverse.py
│   └── ...
└── filters/              # Subpackage for filters
    ├── __init__.py
    ├── equalizer.py
    ├── vocoder.py
    ├── karaoke.py
    └── ...
```

**IMPORTANT** 

What distinguishes a python package from an ordinary directory is an `__init__.py` file inside of the corresponding directory of the package. In the simplest case, `__init__.py` can just be an empty file, but it still tells python that we are looking at a package. 

Users of the package can import individual modules from that package. For example,

In [14]:
import sound.effects.echo

This loads the submodule *sound.effects.echo*. It must be referenced with its full name.



In [15]:
sound.effects.echo.echofilter(input='', output='', delay=0.7, atten=4)

In [16]:
from sound.effects import echo

This also loads the submodule echo, which makes the submodule available without its package prefix. It can be used as follows:


In [17]:
echo.echofilter(input='', output='', delay=0.7, atten=4)

Yet another variation is to import the desired function or variable directly:

In [18]:
from sound.effects.echo import echofilter

In [19]:
echofilter(input='', output='', delay=0.7, atten=4)

Note that when using `from package import item`, the `item` can be either a submodule (or subpackage) of the package or some other name defined in the package, like a function, class, or variable.

On the other hand, when using syntax like `import item.subitem.subsubitem`, each item, except for the last, must be a package. The last item can be a module or a package, but it cannot be a class, function, or variable defined in the previous item.

## 2.1. Executing Package Modules as Scripts

Earlier, we saw how every python file is considered a module and how modules can be run explicitly as scripts using the `if __name__ == "__main__"` block. Now, we are very briefly going to touch upon a similar concept that might help you understand a common pitfall when developing packages for the first time.

Let's say that your project has grown in complexity, and now you have some directories that you have turned into submodules by dropping an `__init__.py` file in them. You have several Python files in each directory: a neatly organised code structure.

For the sake of clarity, let's assume this structure:

```
project_dir/
└── pkg/
    ├── module/
    │   ├── __init__.py
    │   └── b.py
    ├── __init__.py
    └── a.py
```
And let's assume this:
```python
# b.py
def hello():
    print("hello world")
    
if __name__ == "__main__":
    hello()

---
# a.py
from pkg.module.b import hello

if __name__ == "__main__":
    hello()
```

Now, you decide that you want to test one of the inner python modules that you have created with the `if __name__ == "__main__"` block. You assume that it's going to work, that your script will import the necessary modules and submodules for its execution... but all of a sudden!

```sh
python pkg/a.py
> Traceback (most recent call last):
>  ...
>    from pkg.module.b import hello
> ModuleNotFoundError: No module named 'pkg'
```

Not exactly what you expected, right? Wrong.

### 2.1.1. Basic Introduction to PYTHONPATH 

You see, Python has absolutely no idea what `pkg` is. This is due to the dreaded topic of *import resolution*, which is where Python uses the __PYTHONPATH__ environment variable to discover modules. We will skip most of the inner workings of import reslution for now and focus on what truly matters, which is "How do I get my code to work?".

By using the command `python pkg/a.py`, we are adding the directory containing `a.py` to __PYTHONPATH__ (in this case the `pkg/`). In this sense, Python can only see the directory `module/`, the script `a.py`, and the `__init__.py`. If you were to change `a.py`'s import statement to `from module.b import hello`, everything would start working (although this is not what you wanted, considering that you wanted to run scripts from `project_dir/`). In fact, if you would copy the contents of `a.py` to a `c.py` directly in the `project_dir/` directory, it would work as intended.

### 2.1.2. Running package modules as scripts using the -m flag

So the question now is this: *How do we run a script that imports modules correctly from within the root folder?*.

This is the same as asking, "How do we add the `project_dir/` to the __PYTHONPATH__?".

__BAD:__ The naive approach would be to code it into every file you could possibly run (some IDEs do this in a hidden way):
```python
# a.py
import sys
sys.path.append(<path to project_dir>)
from pkg.module.b import hello
...
```
__GOOD:__ Use the `-m` flag to add your current working directory to the __PYTHONPATH__ automatically:
```sh
python -m pkg.a
```

Note that we do not add the `.py` file extension to the command. We use a `.` instead of a `/` and add the `-m` flag. The `__name__` attribute will be `__main__`, just like our previous example. In this way, the __PYTHONPATH__ will have the current working directory (in our case, `project_dir/`) appended, and it will find modules in a similar fashion as absolute imports do.\
For a more complete description of the `-m` usage: https://stackoverflow.com/questions/7610001/what-is-the-purpose-of-the-m-switch

## 2.2. Managing External Packages with PIP and VENV

Adapted from https://packaging.python.org/en/latest/guides/installing-using-pip-and-virtual-environments/

Python applications often use packages and modules that do not come as part of the standard library or that you have not developed yourself.

To use these packages in our programs, we need two things:
* create and activate a virtual environment that will contain the packages 
* install the packages using pip
 
Different applications should use different virtual environments. 

### 2.2.1. Virtual Environments

You should __always__ use a virtual environment in your projects. In fact, each project should have *its own virtual environment*. These environments will isolate each of your projects, keeping their individual dependencies separate.

But why do we need to do this?

__Example 1__: Imagine that you have project A, which uses the external package `numpy` version 1.19.0. Everything runs fine until, a couple months later, you start project B, which uses `numpy` version 1.26.0. When you go back to project A (let's say that you decided to extend a class from it), there is an error with one of the `numpy` methods! 

What happened here? When you install a package without a virtual environment, it will be available for all projects, whether they use a virtual environment or not. When you installed `numpy` for project B, you __overwrote__ the original installation of `numpy` (used by project A) with a different version!

__Example 2__: There are certain tools (e.g. `Ansible`) which can be installed via `pip`. These are not tools you would necessarily use for coding, but rather as Command Line tools. If they are installed globally, then one of your projects is liable to change one of its dependencies! 

__Example 3__: You are working on a project with a colleague. You both clone a git repository, but neither of you knows the exact packages used. As a result, one of you might be working with a completely different set of packages that would lead to the same code, which both of you cloned, outputing different results (it might even run without an error).

The bottom line is that __not using virtual environments is like gambling__ on a future error. You can hope that changes between projects will not cause problems or that different versions of your projects will somehow stay consistent... but is that a risk you want take?


### 2.2.2. Installing packages on our environments

Let's use the numpy package as an example. 
We can install numpy by running `pip install numpy` in a terminal [**first making sure our virtual environment is active**](https://github.com/LDSSA/ds-prep-course-2024/blob/main/python-venv.md).

In [20]:
! pip install numpy


[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m A new release of pip is available: [0m[31;49m25.0.1[0m[39;49m -> [0m[32;49m25.1.1[0m
[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m To update, run: [0m[32;49mpip install --upgrade pip[0m


__Common PIP Usages__
```
pip install package          # install latest or keep current
pip install package==version # install specific version
pip install package>=version # install version greater or equal
pip install --user package   # do not install package for all users (helps with some permission issues)
```
If `pip` finds that a package is already installed that satisfies the conditions requested, it will skip it stating, `Requirement already satisfied`

In [21]:
import numpy

We can also install packages in batches by using a `requirements` file. You are already familiar with the command to do that. You ran it when you installed the dependencies for this course. 

```
cd ~/projects/ds-prep-workspace 
pip install -r requirements.txt

```

Let's take a look at the requirements.txt file from the course, which is situated in the course's main directory (that's why we use the ../../  to reach it).

In [22]:
!cat ../../requirements.txt

nbgrader==0.9.5    # base, Jupyter notebook

numpy==2.2.3       # used by some units (SLU08 - Functions Intermediate: Exercise notebook, ...)
matplotlib==3.10.1 # used by some units (SLU12 - Linear Algebra & NumPy, Part 1, ...)
pygame==2.6.1      # used by SLU16 - Final project

pytest==8.3.5      # used in Exam preparation II (SLU18_1)
geopy==2.4.1       # used in Exam preparation II (SLU18_2)
scipy==1.15.2      # used in Exam preparation II (SLU18_4)


### 2.2.3 How does PIP handle dependencies?

__Short answer:__ It doesn't. At least, not very well...

__Long answer:__ PIP is the standard package manager that comes with Python. It allows you to install packages from PyPI and other indexes (repositories of Python packages). It also allows you to specify the version you *want* for the package. Keep in mind that what you *want* might not be what you *get*. 

PIP will go through your command or requirements file sequentially and install every package that you specified with a *naive* approach to version management. By *naive*, what we mean is that it will install exactly the version that you specify. However, if one of the packages has a subdependency on a package already installed, __it will overwrite the former's version unless it also satisfies the latter's requirements__!

It will overwrite:
```python
> depX == 1.0.0
> depY == 2.0.0  # but depY v2.0.0 depends on having depX v1.5.0 installed -> will overwrite with depX == v1.5.0
``` 
It will not overwrite:
```python
> depX == 1.0.0
> depZ == 2.0.0  # depZ v2.0.0 depends on having depX >= v1.0.0 -> will leave v1.0.0 installed
```

But what if the overwritten version is __incompatible__ with the rest of your project or dependencies? 

__Answer:__ Pip doesn't care; pip only installs. There is some leverage that package owners can enforce through certain dependency versions, but you should be careful nonetheless. 

### 2.3. Beyond PIP

In a professional environment, you will most likely need to work with more powerful tools that are beyond the scope of this course. Tools like `poetry` `pdm`, and `pipenv` will create virtual environments, *resolve* dependencies, and lock exact versions. Such tools will ensure that your dependencies play nice with each other. (*bonus remark*: these tools also make building packages and publishing them much easier)

And it won't stop there... if you delve into Python programming for long enough, you'll start to understand why it is necessary to use a particular tool, such as `pyenv`, to help manage Python versions during installations. Using these tools, your production-level projects might just stand the test of time.

Last but not least, as we said in a prior example, there are tools that you can install in your system using pip. Since these are typically not project-specific, but rather globally available in your system, make sure to find a better way to install these to avoid potential clashes. That is why there are tools such as `pipx`, which will handle the isolation of these types of packages behind the scenes.

You are still early in your journey into Python programming and, as such, these are tools that you do not need right now. If you want to want to work in the industry, however, some of these are likely to show up.

__Tip:__ Look into a couple of these tools before heading into technical interviews, as knowing about them and why you would need to use them can help you stand out from your competition. My suggestion would be to check out either `poetry` or `pdm` for dependency resolution and virtual environment handling and `pyenv` for managing different Python versions.

## Recap

* modules are files ending in `.py`
* some modules come with python itself
* packages are directories that 
    * contain a `__init__.py` file
    * can contain modules
    * can contain other packages
* some packages are contained in a particular Python application
* some packages are installed with pip in a virtual environment and used in another Python application or script

## Further Reading

There's much more to modules and packages than what's been covered here, but I hope this is enough to cover your needs for the foreseeable future.

If you'd like to learn more, I recommend that you read the [modules tutorial](https://docs.python.org/3/tutorial/modules.html) from the python documentation. 

Chapter 22 of Mark Lutz's "Learning Python" is an in depth review of what we discussed in this notebook. 

If you are feeling brave and want to fall down a rabbit-hole of dependency management, visit this [guide](https://alpopkes.com/posts/python/packaging_tools)