# Python: Packages and Environments

In this lesson we will cover:
- Launching VSCodium on the SCRTP Desktop, with a python plugin
- Creating virtual environments
- Installing Python packages
- Exporting to and Installing from requirements files
- Understanding Namespace

To be able to use a Python package properly for scientific computing this is a good place to start. If you have VSCode/VSCodium installed on your computer and it runs python then some of the initial steps can be skipped. If things don't work as expected on your computer please use the SCRTP desktop.

## Getting started: Running VSCodium

To work with this lesson we are going to use the SCRTP desktop. For this you will need to register with the SCRTP and login to the graphical remote desktop via a browser or remote desktop application.

Use [godzilla.csc.warwick.ac.uk](rdp://godzilla.csc.warwick.ac.uk) and log in with your scrtp username and password.

You may now open this page in the remote desktop browser and maximize the remote screen.

Once you are logged in we need to open a terminal which can be done by clicking the middle icon (circled in red) in the top left of your remote desktop.

![Image showing location of terminal icon](./img/terminal-icon.png)

This will open a terminal that looks like this:

![Image showing blank terminal](./img/terminal.png)

There is a flashing cursor next to the dollar sign indicating the terminal is ready for input. We need to load a few programs to get started. 

<details>
<summary>Help! What is the terminal and how do I use it?</summary>

In this course we will make extensive use of terminals/command line instructions using the IDE VSCode. We will not explore or explain the terminal just use it to run sets of commands that will allow us to work with package manages and python modules.

If using the terminal is outside of your comfort zone stop here and go try the intro to Linux computing where this is covered in more depth.

</details>

In the terminal run the following command:

``` bash

module load GCCcore VSCodium

```

Then launch vscodium to your documents folder:

``` bash

codium ~/Documents/

```

Next launch the integrated terminal use the options in the top left go to the terminal drop down then click 'New Terminal'.

![Location of the inbuilt terminal menu](./img/terminal-menu.png)

This will open a terminal in the bottom of the window.

![Location of the inbuilt terminal](./img/terminal-location.png)


In this terminal run the following command:

``` bash

mkdir learning-packages

```

Then click on open-folder and open the learning-packages folder

![Location of open folder](./img/open-folder.png)

Pick the `learning-packages` folder and vscodium will launch a new window with this as the new base folder

![Window with the new base folder](./img/new-window.png)

Now click on the new file (red arrow), name this file `learning_file.py`. (After you press enter sometimes a new system window appears behind VScodium, select it using the menu bar at the bottom and click save.)

Then reopen the integrated terminal using the menu in the top left like before, note the terminal automatically opens in the location of the new file.

![Terminal with new location highlighted, and which pip highlighted](./img/location-and-pip.png)

As a final check we run the command highlighted in the rectangle.

``` bash

which pip

```

The result `bin/pip` confirms pip is installed and available, now we can start to learn about python packages but first package managers.

## Motivating package managers and Environments

### Why do I need a package manager?

If you are learning Python (or any other language), you are probably doing it to leverage the power of prebuilt packages. In Python a few of the most well known packages are Numpy, Scipy, or Pandas. Another package you may be less aware of is Django a web framework used for building websites. In fact instagram uses Django to build their website/app.

To install any of these packages you will need to use a package manager. However, to install packages safely we need to think about environments. 

### Why do I need an environment?

It may seem like we are moving away from using a package like our initial goal. However, lets motivate the use of an environment using a lab-work analogy. In a lab any given experimental setup starts from a clean bench. A new experiment requires a clean bench, in the lab this means scrubbing the bench removing bacteria, foreign objects or chemicals. Thankfully creating a sterile or clean environment on a computer is much easier. We need to do this to make sure one python package installed does not interact with another. An environment is essentially like a clean room isolating one package configuration from another. 

### How do I do it?

Create the environment using `venv`. Here `python3 -m` means python run a module from the command line then `venv` is the name of the module, `env` is the name of our environment. We can choose any name we like here so for other projects you can use a descriptive name such as `proj-shorthand-env`.

``` bash 

python3 -m venv env

```

This creates a file in the directory called `env/` in there is all information about your environment configuration and some scripts to let you control your environment. 

The first control script is used to activate the environment:

``` bash 

source env/bin/activate

```

If its worked correctly then you will see `(env)` in front of your prompt

![A terminal with activated environment](././img/activated-venv.png)



### What is pip?

`pip` is the default python package manager, there are many others worth noting are conda and poetry. Pros and cons exist for each, so, we will use `pip`.

### What's the environment doing?

Let's start by further justifying the use of an environment. Run this command:

```bash

pip list

```

You should see the following:

![pip list output in a clean environment](./img/pip-list-clean.png)

Now lets deactivate the environment and run the same command:


```bash

deactivate

pip list

```

And the following is produced:


![start of base pip output](./img/base-pip-list-start.png)

# ...

# ...

![end of base pip output](./img/base-pip-list-end.png)

We can see that our base or system environment has a lot of things in it. This is fine and wont hurt your computer (usually), but it isn't good for having lightweight simple and sharable experimental setups.

Let's reactivate the environment then we can actually start to install the packages as was out initial goal.

```bash

source env/bin/activate

```

## Managing installs and dependencies using pip and PyPI

### Using pip

pip can do lots of useful things to see them all run:

```bash

pip -h

```

We are going to focus on `pip install`, `pip list`, `pip freeze`, and make brief use of `pip check`.

Before we can install a package we need to find a package, pip uses [PyPI](https://pypi.org/) by default which is the [**Py**thon **P**ackage **I**ndex](https://pypi.org/). 

You can go to the website and search for a package, use a search engine with a search like ["python package linear algebra PyPI"](https://duckduckgo.com/?q=python+package+linear+algebra+PyPI&ia=web) which will help you find the packages you need.

The following is a [PyPI page](https://pypi.org/project/numpy/) for the extremely popular Numpy module.

![numpy package page](./img/numpy-package-page.png)

Up top it gives the package name, the version number, the installation command using pip and a brief description of the module, scrolling down there is more information, points of note here are the Documentation and the home page.

![Project links on PyPi](./img/pypi-links.png)

The home page is great for learning more about the package generally but the documentation is the place to go to learn how the package works and how to use the package.

#### Installing a package (Finally)

To install a package we use the command shown to us on the webpage:

```bash

pip install numpy

```

The result will look something like this:

![Numpy install procedure](./img/numpy-installed.png)

We can now run `pip list` to make sure its in our environment:

![pip list output after numpy](./img/pip-list-post-numpy.png)

#### Sharing your environment

When you have installed many packages you may want to run your code on another computer (e.g. a server, on HPC, in the cloud) or give your code to someone else. In these instances you need to also give them your environment the reasons for this are numerous but two worth mentioning:
    1. It saves the user(or you) a lot of time if they have a list of whats needed.
    2. Some packages will have minor differences in output between versions that could change the result.

So to generate this list of dependencies so called because your code depends on them to work properly we do the following. 

``` bash

pip freeze > requirements.txt

```

This command generates a text file called `requirements.txt`, it can be named anything by changing the file name but `requirements` is the standard that has all of your installed dependencies in it with version numbers.


You can share this along with your code and create a like for like environment on another computer.


![pip freeze output](./img/requirements-output.png)


#### Installing someone else's environment (or your own on another computer)

Here is a requirements file for a project that uses `scikit-learn` and `bokeh` a fairly standard setup for some machine learning and data visualization.

``` text

bokeh==3.1.1
contourpy==1.0.7
Jinja2==3.1.2
joblib==1.2.0
MarkupSafe==2.1.2
numpy==1.24.3
packaging==23.1
pandas==2.0.2
Pillow==9.5.0
python-dateutil==2.8.2
pytz==2023.3
PyYAML==6.0
scikit-learn==1.2.2
scipy==1.10.1
six==1.16.0
threadpoolctl==3.1.0
tornado==6.3.2
tzdata==2023.3
xyzservices==2023.5.0

```

Either use the `requirements.txt` file included or copy and paste the text above into a text file called `requirements.txt`.

To install all the packages run this command:

``` bash

pip install -r requirements.txt

```

Finally we can run `pip check` if this passes it returns `No broken requirements found` then your environment has no dependency conflicts and is good to go.


## Using a package

To use a package we need to learn a bit of python syntax here is a basic bit of syntax, we will use `numpy` as an example.

``` python

import numpy

```

Now lets use an extension to VScodium to get some python support that will help out with understanding modules.

We use the IntelliSense plugin which can be installed by using the extensions tab (5th down on left hand side of VSCodium). Search for 'Python' and install the 'Intellisense (Pylance)' plugin. 

![Location of the plugin to install](./img/python-extention.png)

This extension gives VSCodium a few more bells and whistles when writing Python. One that is particularly useful is autocomplete. 

![Autocomplete showing numpy sqrt](./img/autocomplete.png)

This helps find the correct function quickly with a small prompt. 

Finish the script so it looks like this:

```python

import numpy

x = 16
y = numpy.sqrt(x)
print(y)

```

Then run the script either by running the following terminal command:

```bash 
python learning_file.py
```

or 

Pressing the play button in the top right of VSCodium:


![VSCodium screen grab with run button circled and output terminal in the bottom](./img/running-a-python-file.png)

Either way the code will print `4.0` to the output.

Some modules have a common shorthand that is widely adopted, you can even use your own but this isn't recommended as it can make your code harder to read.

To set the shorthand we use the `as` keyword, for example:

```python
import numpy as np

x = 16
y = np.sqrt(x)
print(y)
```

You can see we have renamed `numpy` to `np` this can make code look cleaner.

Finally we can speed up our code by only importing the parts of packages we need. Here we only use the sqrt function so importing the whole of Numpy is overkill. The following code only imports the `sqrt` function.

```python
from numpy import sqrt

x = 16
y = sqrt(x)
print(y)
```

This will work just the same but not import the whole of Numpy.

## Namespaces

A namespace is the term we use to refer to all the names and objects currently defined. For example in the previous code:

```python
import numpy as np

x = 16
y = np.sqrt(x)
print(x)
```

We add `np`, `x`, and `y` to the namespace. (Additionally there is all of the default namespace which is where `print`, `import`, `as`. and `=` come from) 


#### A warning

There is one thing you will likely see and should never do as it is terrible practice.

```python 
# Do not do this
from numpy import *
```

So what does this do? 

This tells python to import every single function from the numpy module into the namespace. Given how many things there are in the numpy package this will clutter the namespace badly and increase the chances of a namespace collision.

### Controlling your namespace

By using the `from <module> import <thing>` import syntax we can selectively import functions to reduce the amount of items we have in the namespace.

Furthermore, using the `as` syntax we can rename items for example earlier we renamed `numpy` to `np`. This can be useful if we want to use functions from two packages that share a name. For example lets get `sqrt` from both `numpy` and built in `python` via `math`.

```python
import numpy
import math 

x = 16

y = numpy.sqrt(x)
print(y)

y = math.sqrt(x)
print(y)
```

Above we import numpy and math into the namespace then access a function called sqrt from both using the `.` syntax.

What if we wanted to only import the sqrt from both, well we could rename them on import.

```python
from numpy import sqrt as np_sqrt
from math import sqrt as mt_sqrt

x = 16
y = np_sqrt(x)
print(y)

y = mt_sqrt(x)
print(y)
```

Now we have both the square root functions that were originally called `sqrt` imported into the same namespace under different names. 

## Conclusions

#### Here we have shown you how to:
Open a terminal on the SCRTP Desktop
Open VSCodium though the module system.
Create python virtual environments using `venv`.
Install packages using `pip`.
Export and load package lists using `pip` and `requirements.txt`
Load python packages using `import`
Control your namespace using `import`, `from`, and `as`.

It's worth noting here that finding and using packages is the simplest part of this whole structure, but everything else is what makes using them as part of a responsible scientific workflow possible. 

#### What we have not shown you

##### How to use any packages
All packages are different, you will get a feel for the standards but for all packages you should read the documentation first.

##### How to make a package
Making and publishing a package is an advanced topic. Publishing a well written, well tested and well documented package is a great way to create non traditional scientific output if your code will be useful to others.


### Finally
When writing code remember to look around for packages that have been written already to avoid you duplicating effort most mathematical functions can be found and many domains have specific collections e.g. bio-python, astropy, e.t.c.