Skip to content

1 Python Environments

Dimitris Tsapetis edited this page Nov 10, 2022 · 2 revisions

For a video presentation of the following content, please follow this link

Python Virtual Environments

Python Virtual Environments: A Primer - Real Python

Why virtual environments?

Python is a programming language that is considered one of the most simple programming languages in terms of syntax and complexity. This is one of the main reasons why it has been adopted by many non-technical/non-programming communities as their go to choice for rapid prototyping. It offers an extensive archive of modules and libraries for any type of problem, which means that users can find an easy solution that addresses their needs.

Easy access to a wide range of libraries, has a downside in the case of Python. Specifically, managing these dependencies can quickly become a burden since, many issues can occur ranging from dependency conflicts to overwriting system required python packages and reproducibility issues.

This is why virtual environments are considered the go solution for isolating the working environments between projects and thus alleviating all the aforementioned issues.

System Pollution

Mac and Linux-based operating systems usually come with a preinstalled version of Python used by the OS for internal procedures. Installing packages to the systems global Python, creates the risk of interfering with system-relevant operations and thus leading to systems related failures or

Dependency Conflicts

It is a commonplace issue between software project to require different version of the exact same library. In the case of a single global dependency management system, the option to maintain projects with different library version is unavailable. The reason is that when installing a second version of the same library, it overwrites the first!

Reproducibility Issues

If you want to replicate the environment used locally to run a specific software project, it will be incrementally more difficult to track down all required packages in the case of a single global python installation. Virtual environments solve this issue by allowing the separation of dependency management system, between different environments.

Installation Privilege Lockouts

In case of host computers where the access is limited for security issues, the installation of python packages in the global python system is usually restricted (e.g. Rockfish HPC). Only administrators are allowed to add new package as this affect all users and the stability of the operating system. Creating virtual environments is a way to circumvent this issue.

What is a virtual environment?

A virtual environment in python is a self-contained folder with copies or symlinks to the Python executable files.

Below you can see what the folder structure of a virtual environment looks like in MacOS:

(for Windows an Linux OS the similar folder structure can be found here)

It consists of three main folders

  • bin/ that contains the executables files of your virtual environment such as your python interpreter, as well as activation scripts to enable your environment for use in terminals.
  • include/ is an initially empty folder that is later populated with C header files from packages with C extensions.
  • lib/ folder is the main reason why you create a virtual environment. It is the folder where external packages are installed and can be used within your virtual environment. By default, there are two dependencies preinstalled, pip and setuptools. This allows us to limit the number of packages available to a project to the bare minimum. As a result global installation issues and collision errors are eliminated and complex dependency bugs are avoided.

Using virtual environments

Here we will go through the basic operations used when working with virtual environments.

Creating Environments

To create a new pure python virtual environment, the venv module of python is used that handles virtual environments. To invoke this mοdule, the -m option, followed by the venv module name is required. The final input argument of the command is the name of the new environment to be created.

python -m venv EnvironmentName

Activating Environments

Now that we have created a new environment, we can activate it and work on separate workspace compared to the global python installation. In order to activate it, we execute the activate file that exists inside the bin folder of our new created environment directory. Two different ways of activation are displayed below for different terminal environments.

source EnvironmentName/bin/activate
. EnvironmentName/bin/activate

Installing Packages

The activation process allows us to switch workspaces between the global python environment and the respective virtual environments. Once our context switches to a virtual environment, we can start the installation of our project dependencies. This is achieved by calling the pip module of python, followed the install option of the command and the name of the package to be installed.

python -m pip install <package-name>

Deactivating Environments

Many times, the developer wants to switch context between virtual environments or even return the workspace to the global python installation. To achieve that, the deactivate commands is called.

deactivate

Managing Environments

Requirement Files

The easiest way to make sure that all the python packages required will be installed along with our code, is to create a requirements file. This file contains all third-party library needed for the execution of our code, along with their version to ensure compatibility with the code at all future installations. In case we are unsure about the code dependencies, the following command can be used to retrieve them from our virtual environment.

pip freeze

The latter command will only output the dependencies on the console, but we can redirect the output to a file as follows:

pip freeze > requirements.txt

Duplicating Environments

Given the a requirements.txt file that contains all the project dependencies a duplicate of a virtual environment can be created as follows:

  • Create a new virtual environment
python -m venv EnvironmentName2
  • Install all code dependencies using the requirements.txt file
source activate EnvironmentName2
pip install -r requirements.txt

Troubleshooting

In most cases, problems occurring in virtual environments can be fixed by checking that all required packages are correctly installed, and when running the code the correct environment is activated. The complexity of handling dependency conflicts can quickly spiral out of control. If so the proposed solution usually is:

  • Delete the old virtual environment.
rm -r EnvironmentName/
  • Make a new empty one.
python -m venv NewEnvironment
  • Installed all the required dependencies using the requirements file generated above.
pip install -r requirements.txt

Don’ts of Virtual Environments

Python virtualenv and venv dos and don'ts

  • Don’t share virtual environment between projects

    Even in the case, when projects use a similar set of dependencies and would make sense to use the same environments, separate ones should be used. Dependencies can rapidly change, which can directly lead to incompatibilities and potential conflicts that will make either projects unusable. At the same time, the savings (disk, time etc) are negligible as the requirements files can easily automate the creation and installation of dependencies, while keeping a clean and separated slate between projects.

  • Don’t place project files inside a Python virtual environment

    The sole purpose of a virtual environment, is to hold all the python and library dependency files required for your project to tun. Projects files do not belong here, as accidentally removing the virtual environment without keeping backup of the required project files is common. Naming conventions used for the custom project files might result to collisions with the virtual environment files and lead to unexpected and hard to resolve errors when running your code.

  • Don’t forget to activate your virtual environment

    It is commonplace for user to create a virtual environment, but forget to activate it. This results to python packages being installed to the global python environment, version conflicts when running the code. To address this use the use of an Integrated Development Environment (IDE) such as PyCharm or VS Code is suggested as they can setup up to automatically activate a different environment for each project.

  • Don’t use ≥ for package versions inside a requirements.txt file

    When creating a requirements file, the developer should always specify the exact version number of packages. This guarantees, that if future versions break backwards compatibility, the requirements file will always recreate the same environment without breaking our code.

Anaconda

Getting started with conda - conda 22.9.0 documentation

Installing Anaconda

The following instructions are specific to MacOS operating systems, but detailed installation instructions for all platforms can be found in the following webpage:

Installation - conda 22.9.0.post7+b83abc22d documentation

The steps for the MacOS installation are the following:

  1. Downloaded the installer file for mac using the following link: https://conda.io/projects/conda/en/latest/user-guide/install/index.html
  2. There are two different alternatives
    1. Anaconda

      To install anaconda just double click the downloaded .pkg file and follow the installation wizard instructions. For older anaconda installations make sure that the conda init command is performed to append the anaconda activation to your terminal configuration file. To verify that the installation is performed correctly, open a terminal window and run the conda list command, that will list the available packages of the (base) conda environment.

    2. Miniconda

      To install miniconda, using the terminal window run the following command

      bash Miniconda-latest-MacOSX-x86_64.sh

Updating Anaconda

The steps to update the Anaconda/Miniconda distribution you have installed locally are the following:

  1. Open a terminal window
  2. Navigate to the anaconda directory
  3. Run the command conda update conda

Removing Anaconda

The steps to uninstall the Anaconda/Miniconda distribution you have installed locally are the following:

  1. Open a terminal window

  2. Remove the entire Miniconda install directory with

    rm -rf ~/miniconda
  3. OPTIONAL: Edit the ~/.bash_profile to remove the Miniconda directory from the PATH environment variable.

  4. Remove the anaconda hidden folders that may have been created in the home directory by running:

    rm -rf ~/.condarc ~/.conda ~/.continuum

Before you start

In the MacOS version of anaconda all basic anaconda operations are performed from the command line. If the anaconda distribution is installed correctly then, when opening the terminal the user sees one of the following outputs:

1.  (base) user@computer current_directory %
2.  (base) [user@computer current_directory]$ 

The activate conda environment is always displayed inside the parethenses. In our case it is (base). If the environment does not appear and it is the first time installing conda then the following command should be performed.

conda init

This will add anaconda to your environment path and ~/.bashrc file so any new terminal sessions auto-activate conda.

The display the installed anaconda version use the command:

conda --version

This will display the version in a message similar to the following: conda 4.5.1

Managing Environments

Similar to the global python installation, Anaconda offers a default environment (base), created at installation. If no other conda virtual environments are created this will be the default choice. To create a new virtual environment, the create option of conda command is used, followed by the name of the environent specified by the --name option.

conda create --name EnvironmentName

To activate the environment that we just created, there is no need to point to the activate file and directory that contains it. Conda identifies the different virtual environments by name.

conda activate EnvironmentName

To list all available environments created so far conda the following options are available. The first option will display a short list of created environments, while the second option, also retrieved more details about the environments, such as the path to the environment.

1. conda env list
2. conda info --envs

In case you need to switch environments, or go back to the default one you can activate it but running the activate command followed by the environment name, as shown below:

conda activate base

To completely exit the anaconda d

conda deactivate

CAUTION: When trying to create an anaconda environment with custom Python version, then the python option must be set equal to the required version. In Rockfish, it has been observed that the when creating the environment for the first time, the python version is set correctly. Exiting the terminal and reactivating the environment will result to the the python version defaulting to the anaconda installation version, which might lead to package incompatibilities. To ensure that the correct Python version is retained throughout the environments life, make sure that the ipython option is provided, when creating a new Python environment.

‼️ **ROCKFISH:** Creating an Anaconda environment with custom python version : [https://stackoverflow.com/a/56713819/5647511](https://stackoverflow.com/a/56713819/5647511)
conda create -n myenv python=3.3.0 ipython

Managing Packages

Anaconda environments come with preinstalled packages useful for scientific computing/ data science application, machine learning etc. To check if a library is already installed locally, anaconda provides the search option, which checks if the package is available as follows:

conda search <package-name>

To install a new package the process is similar to pip where instead we use the conda command. An example on how to install the beautifulsoup4 package is displayed below.

conda install beautifulsoup4

If the package belongs in a specific channels, not the default anaconda one, then the -coption must be provided followed by the name of the channels. For example the conda-forge build for UQpy can be obtained by using the command:

conda install -c conda-forge uqpy

To see all the available packages already installed in the current activated environment the anaconda list command can be invoked. This command also provides detailed information about the channel and version of the packages installed and is the go-to tool when debugging dependency conflicts.

conda list
⚠️ CAUTION: When using anaconda virtual environments, trying to avoid using `conda install` and `pip install` options interchangeably. This can lead to a number of dependency conflict errors that are hard to resolve. Instead:

Best Practices

  • Install as many requirements as possible using conda, then use pip.
  • pip should be run using the --upgrade-strategy-only-if-needed option (default).
  • Do not use pip with the --user argument. Avoid all users installations.

For more details please refer to the following site:

Anaconda | Using Pip in a Conda Environment

🚨 If Anaconda does not find a package installed in your environment, it will search for it in others with usual fallback option being the (base) . Even if your environment seems to be activated make sure that it contains the package your code is looking for.

Managing Channels

Anaconda also keeps track of different channels that your packages may originate. In case you want conda to automatically traverse the available channels when searching for a package, or you don’t want to use the channels option of the conda install command you can append your new package channel in the anaconda list.

⚠️ Anaconda searches through the channels for your package sequentially, meaning that it will install the package found in first channel. If the channel you want to use exists in a different channel, that is lower in the `channels list` the it must be specified explicitly. SEQUENCE MATTERS HERE!

To add a channel at the end of the list, the developer can use the config option of the conda command, followed by the --append channels or --add channels options and the channels name. These two alternatives are equivalent and can be used interchangeably.

conda config --add channels new_channel
									OR
conda config --append channels new_channel

If a new channels need to be at the top of the search list the the --prepend channels option should be used.

conda config --prepend channels new_channel

Conda CheatSheet

conda-cheatsheet.pdf