# Environments


There are two main types of environments that you need to consider when developing code in Python.
The `bash` environment which knows about the shell and your OS, and the `python` environment which knows about the python packages you are using.


## Bash environment

The bash environment is defined by a set of environment variables. 

### Environment variables

It is a key-value pair, stored by the OS, used by programs and shell scripts to configure the system behavior. 

The key is the variable name, and the value is the variable value.

To set an environment variable, you can use the `export` command. For example:

```bash
export MY_VARIABLE="my_value"
```

To print the value of an environment variable, you can use the `echo` command. For example:

```bash
echo $MY_VARIABLE
```

with the dollar sign `$` to get the value of the variable (think of dollar/value).

You can list the environment variables currently set in your bash session by typing:

```bash
printenv
```

This is a bash command that is **very important**, you should not forget about it.


<div class="exercise-box">
**Exercise:**  Connect to CSD3 and list the environment variables.
</div>


The main environment variables you need to know are:

- `PATH`: Defines the directories where the system searches for executable programs. It's essential for adding research software tools like compilers, or compiled programs.

<div class="exercise-box">
**Exercise:** On your local machine list the executables found in some of the directories in your `PATH`.
</div>


- `LD_LIBRARY_PATH` (Linux) / `DYLD_LIBRARY_PATH` (macOS): Used to specify additional directories where the system looks for shared libraries (.so or .dylib files). For example, such librairies can be math libraries (e.g., [LAPACK](https://en.wikipedia.org/wiki/LAPACK), [BLAS](https://en.wikipedia.org/wiki/BLAS), [FFTW](https://en.wikipedia.org/wiki/FFTW), [GSL](https://en.wikipedia.org/wiki/GNU_Scientific_Library), etc.).

Note: In `LD_LIBRARY_PATH`, the `LD` stands for "Linker/Loader". It refers to the **dynamic linker or dynamic loader**, which is the part of the operating system responsible for loading shared libraries (also known as dynamic libraries) when an executable program is run.


- `HOME`: Your home directory.


- `USER`: Holds your username.

- `PYTHONPATH`: Used to specify additional directories where the system looks for Python packages. Typically, where the systems look when you run `import ...` in Python.


<div class="exercise-box">
**Exercise:** Open a Colab notebook and print the values of each of these environment variables.

Remember to use the `!` to run the bash commands in Colab.
</div>

### Adding to your `PATH`

You can add to your `PATH` by appending the directory to the `PATH` variable. For example:

```bash
export PATH=$PATH:/path/to/your/program
```

You can also prepend to your `PATH` by adding the directory at the beginning of the `PATH` variable. For example:

```bash
export PATH=/path/to/your/program:$PATH
```

**Important**: The order in which you add directories to your `PATH` is important. The first directory in the `PATH` is the first one that is searched.

<div class="exercise-box">
**Exercise:** Add a directory to your `PATH` and check that it works. If you have a directory that contains an executable, add this one and try running the program from anywhere in the system.
</div>


### Adding to your path variables

You can add to any path variables in exactly the same way as we have seen for `PATH` above.

For example, to add a directory to your `PYTHONPATH`, you can use:

```bash
export PYTHONPATH=/path/to/your/program:$PYTHONPATH
```

### Making the changes persistent

You can make the changes persistent by adding them to your `~/.bashrc` (or `~/.bash_profile` on macOS) file. For example, to add a directory to your `PATH`, you can type in your terminal:

```bash
echo "export PATH=/path/to/your/program:$PATH" >> ~/.bashrc
```

You can then reload the `~/.bashrc` file by typing:

```bash
source ~/.bashrc
```

so the changes are applied in your current shell as well. 


<div class="exercise-box">
**Exercise:** Do the following:

1. Add a fake directory to your `PATH`. For instance `/path/to/nowhere`, using the `export` command.
2. Use echo to check the `PATH` has been updated.
3. Close your current shell (i.e., Terminal) and re-open it (or open a new one).
4. Check the `PATH` and note that the fake path is not there.
5. Redo 1-4 but this time modify the `~/.bashrc` file to make the change persistent before step 3.
6. Check that the change is persistent.

To undo the change, remove the new line added to the `~/.bashrc` file. Use vim or VS code to do so. 
</div>



## Python environment

Python was invented in December 1989 by [Guido van Rossum](https://gvanrossum.github.io) at  the Centrum Wiskunde & Informatica (abbr. CWI; English: "National Research Institute for Mathematics and Computer Science") in the Netherlands.


Python is one of the most popular programming languages, for its widespread use in machine learning and data science. 

The name Python comes from the British comedy group Monty Python. You will occasionnaly find some further references, such as the use of the terms "spam" and "eggs". Python is fun. 

As researcher data scientists, your Python environment consists mainly of **three components**:

1. Python interpreter/version,
2. Virtual environment(s),
3. Jupyter notebook and kernel.

### Python interpreter


The **Python interpreter** is the program that **runs your Python code**. It is the interface between your code and the Python language.

Python (i.e., a Python interpreter) may not be natively installed on your system. 

If you type `python` in your terminal, and get an error such as:

```bash
bash: python: command not found
```

then you need to install Python.

**On MacOS** you can install Python (i.e., a Python interpreter) using Homebrew, e.g.,

```bash
brew install python
```

**On Linux** you can install Python using your package manager, e.g.,

```bash
sudo apt-get install python
```

**On CSD3** several Python interpreters are available. You can load the one you need using the `module` command. For example:

```bash
module load miniconda/3
```

You can then check the default python version you get from the loaded module by typing:

```bash
python --version
```

If you then start a Python session by typing `python` in your terminal, you should see something like this:

```bash
Python 3.7.4 (default, Aug 13 2019, 20:35:49) 
[GCC 7.3.0] :: Anaconda, Inc. on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> 
```
The `>>>` is the **Python prompt**. You can type Python code between the prompt and **press Enter to execute the code**.

To exit the Python interpreter, you can type `exit()` and press Enter.  

Note for CSD3: if you don't use this command, other Pythons are available, but you need to explicitly type the python version you want to use. They are stored in `/usr/bin`. You can list them using `ls /usr/bin/python*`.





Now, in a Python session, type `import platform; platform.python_implementation()` and press Enter. You should see something like this:

```bash
'CPython'
```

Your Python interpreter is **CPython**. This is the reference implementation of the Python programming language. Written in C and Python, CPython is the default and most widely used implementation of the Python language ([Wikipedia](https://en.wikipedia.org/wiki/CPython#:~:text=CPython%20is%20the%20reference%20implementation,implementation%20of%20the%20Python%20language.)).

To check the version of Python you are using from within Python, type:

```python
import sys
print(sys.version)
```

<div class="exercise-box">
**Exercise:** Check the version of Python and interpreter you are using, in Colab, your laptop and CSD3.
</div>


There are other implementations of Python, such as [PyPy](https://en.wikipedia.org/wiki/PyPy), [Jython](https://en.wikipedia.org/wiki/Jython) and [IronPython](https://en.wikipedia.org/wiki/IronPython). You can read about them online. 


### Virtual environments


If you type `pip list` in a Terminal you will see the packages currently installed in your Python environment.


<div class="exercise-box">
**Exercise:** Try pip list in Colab (it runs the shell command from within the notebook), your laptop and CSD3.
</div>


Packages can have conflicts, and some software may require a specific version of a package. 

A Python environment is a directory that contains a Python interpreter and a curated set of installed packages readily available. 

Good practice is to create a new virtual environment for each project. 

There are multiple tools to create and manage virtual environments. Here we use `venv`. See [here](https://docs.python.org/3/library/venv.html) for the documentation. 

To create a new virtual environment, you can use the following command:

First, create a directory to store the virtual environments, e.g., `venvs`.

```bash
mkdir venvs
```

Then, create the virtual environment in this directory, do:

```bash
python -m venv <path/to/venvs/nameofenv>
```

Here, `-m venv` specifies that we use the `venv` module to create a virtual environment. The `-m` option tells Python to run the `venv` module.

`<path/to/venvs/nameofenv>` is the path to the directory where the virtual environment will be created, and `nameofenv` is the name of the virtual environment. Just above, we have created the directory `venvs`. We can use any name for the virtual environment, e.g., `venv_lecture2` or anything else.

To activate the virtual environment, you can use the following command:

```bash 
source <path/to/venvs/nameofenv/bin/activate>
```

In the Terminal, you should see something like this:

```bash
(nameofenv) <your-prompt>
```

Now if you run the command `pip list`, you will see the packages currently installed in your virtual environment. This should be different from the packages installed in your *global* Python environment.

To deactivate the virtual environment, you can use the following command in the Terminal:

```bash
deactivate
```

This will return you to your global Python environment.


<div class="exercise-box">
**Exercise:** Working with Virtual Environments

1. Create a new virtual environment named 'venv_lecture2' using Python 3.11

    Hint: Use `python3.11 -m venv venvs/venv_lecture2`

2. Activate the virtual environment

    Hint: Use `source venvs/venv_lecture2/bin/activate`

3. Use the `tree` command to see the structure of your virtual environment

    Hint: Use `tree venvs/venv_lecture2`

4. Check the Python version in your virtual environment

    Hint: Use `python --version`

5. List all packages installed in your virtual environment

    Hint: Use `pip list`

6. Install a new package (e.g., `numpy`) in your virtual environment

    Hint: Use `pip install numpy`

7. Use `tree` again to see how the structure has changed after installing numpy

    Hint: Use `tree venvs/venv_lecture2`

8. List the packages again to confirm the installation

    Hint: Use `pip list`

9. Deactivate the virtual environment

    Hint: Use `deactivate`
</div>


**Important**: To create an environment with a specific Python version, just use the Python version number, e.g., `python3.11 -m venv <envdir>/<nameofenv>` when creating the environment.



### Fresh and non-fresh environments

You can create a virtual environment while allowing it to access packages installed in your global environment. To do so use the `--system-site-packages` option:

```bash
python -m venv --system-site-packages <path/to/venvs/nameofenv>
```

<div class="exercise-box">
**Exercise:** Create an environment with access to your global packages.
and use pip list to check that you can see the packages in your global environment.
</div>


This can get confusing though because some packages installed in your global environment may be incompatible with the new packages you will want to install in your virtual environment.







### Managing Python versions

The Python language evolves fast. Currently, a new version is released every year.

You can consult the status of Python versions [here](https://devguide.python.org/versions/) and read about Python development on the same website.

You can install new Python versions on your machine using tools like Anaconda, Miniconda (i.e., `conda`) or Homebrew (on macOS, `brew`).

<div class="exercise-box">
**Exercise:** Checking Python Versions on CSD3 with Miniconda

1. Log in to CSD3 if you haven't already

2. Load the Miniconda module
    Hint: Use `module load miniconda/3`

3. List the available Python versions
    Hint: Type `which python` and then list the files in the directory returned by this command.

4. Now list the versions available by default: `ls /usr/bin/python*` and try to start a Python session with one of them.
    Hint: Just type `python3.11` (or another version number) and press Enter.

What is the difference between the versions available by default and the one loaded by the Miniconda module?
</div>


### Jupyter and IPython kernels

Jupyter provides an interactive environment to run Python code.
It typically opens in a web browser. The most common feature is the notebook, which corresponds to 
what you get in Colab.

Jupyter is an evolution of IPython, "interactive Python". IPython is an interactive shell that supports more features than the standard Python shell. Type `ipython` in your terminal and you will understand why.

We don't generally use IPython, but instead use Jupyter which is based on it. 

To start a Jupyter notebook, having access to your Python environment, you need to create the 
relevant kernels. One kernel is created for each environment.

Assume your virtual environment is called `venv_lecture2` and you have activated it.

To create the kernel for this environment, you can use the following command:

```bash
python -m ipykernel install --user --name venv_lecture2 --display-name "Python (venv_lecture2)"
```

This creates a file (kernel) that contains info about the environment and can be propagated to Jupyter. Typically this file is in:

```bash
<path/to/Jupyter>/kernels/kernels/<venvs>/kernel.json
```

<div class="exercise-box">
**Exercise:** Create a new kernel from a new environment. Find the `kernel.json` file and look at it using `vim`.
</div>


You can then start a Jupyter notebook by typing `jupyter-lab` in your terminal.



## Symbolic links and aliases

When you load or source an environment or install new codes, aliases and symbolic links may be created.



### Symbolic links

A symbolic link is a file that is **pointer** to another file or directory. To create a symbolic link, you can use the `ln -s` command. For example:

```bash
ln -s /path/to/target /path/to/link
```

To see which file a symbolic link points to, you can use the `ls -l` command. For example:

```bash
ls -l /path/to/link
```

For instance, on CSD3, there are symbolic links to useful Job submission files. (A job means a heavy computation that is submitted to the batch system.)

### Aliases

An alias is a **shortcut for a command**. To list the aliases currently set in your bash session, you can use the `alias` command.

```bash
alias
```

For instance on CSD3, by default, we have

```bash
alias vi='vim'
```

So that typing `vi` is equivalent to typing `vim` and opens the vim editor.



## Workflow with environments   

You create a Python environment for each project.
You need to store the information about your projects and relevant environment in a well organised manner. 

A good way to stay organised and keep track of what you are doing is certainly to create a github repository for each project, with a README explaining what to setup and how to do it, and storing your codes in it.


### Bash scripts to load environments

To make your life easier, you can create bash scripts to load your environments.
One bash script per environment. 

The script is used to update environment variables and also activate relevant Python virtual environment. It would look something like this:

```bash
#!/bin/bash

# Step 1: Define environment variables
export MY_VAR1="value1"
export MY_VAR2="value2"
export PATH="/my/custom/path:$PATH"
export LD_LIBRARY_PATH="/my/custom/path:$LD_LIBRARY_PATH"

# Step 2: Source the Python virtual environment
# Replace the path below with the actual path to your virtual environment's 'activate' script
source /path/to/your/venv/bin/activate

# Optional: Print a message to confirm the environment is set
echo "Environment variables are set, and the virtual environment is activated."
```

and saved as `activate_my_env.sh`.

You can then source the script to load the environment, including bash variables and the Python virtual environment:

```bash
source activate_my_env.sh
```

<div class="exercise-box">
**Exercise:** Create a bash script to load your Python environment and check that it works.
</div>


### Note on File Permissions

You may have noticed that when you list with the `ls -alh` command, the first thing to appear is a string of letters and numbers, such as `-rwxr-xr-x`.

This represents the **permissions** of the file.


When you create a bash script, it is generally **not executable**. To make it executable, you can use the `chmod` command. For example:

```bash
chmod +x activate_my_env.sh
```

<div class="exercise-box">
**Exercise:** Show the permission of the bash script from above before and after making it executable with `chmod`.
</div>



