# Virtual Environments

In this section we willl cover virtual environments and how to use them as part of your standard data anlysis workflow. Anytime you create a new project, your first two steps should be to create a **git repository** and a **virtual environment**. 

By the end of this section you will be able to:
    * Understand why environments are important
    * Explain how environments isolate your code environment
    * Create new virtual environments for your projects
    * Use `venv` to manage your Python environments
    * Make your environment reusable for others


This section uses the `venv` module to manage virtual environments. While this is the officially recommended way to manage environments in Python it's not the only one. [Conda]() is an alternative way to manage environments that is frequently used by the data science community. While this tutorial does not lay out the differences between the two, you should determine whether conda or venv will be a better fit for your needs.

## Installing `virtualenv`

Most python releases will come with the `pip` and `venv` modules already installed. However, in order to confirm this, you should follow [this guide](https://packaging.python.org/en/latest/guides/installing-using-pip-and-virtual-environments/) to ensure you have `venv` properly configured on your machine.

## What is a Virtual Environment?

Virtual environments are a common technique to isolate programs to make code portable, repeatable, easier to maintain. A virtual environment will contain an installation of Python and all of the necessary packages to run the code in your project. This has a number of benefits:

    * It isolates your programs
    * It is easier to manage dependencies in your program
    * It makes your code repeatable by other programmers

### Isolation

When you install Python, a global version is created and any packages you install will be bound to that version of Python. Because various packages are constantly being updated as well as Python itself, it become challenging to ensure all of your packages are compatible with one another and with your current installation of Python. 

For example, suppose in March of 2019 you wrote some code that used Python3.7 and Pandas 1.6 and never touched that code again. If you continued to upgrade Python and Pandas regularly, you may return to this project in 2023 to find the code unusable. You could downgrade your Python and Pandas versions, however this may render your more recent projects unusable. Virtual environments help avoid this problem by giving each project it's own Python installation as well as its own library of packages. This means you could return to your project from 2019, activate its virtual environment and use the version of Python and Pandas you installed 4 years ago. 

### Portability

If you want to share your code with others it is important that it works! One of the most common issues when sharing code is that code that runs on one programmer's machine may not run on another's machine. This is often due to not specifying the correct environment specifications when you ship your code. Virtual environments allow you to document the environment in which your code was run and share those requirements with others. This way when you share your code as a repository, other programmers can install the correct versions of Python packages and any dependencies that may be required.

## How to use `virtualenv`

Depending on your operating system these guidelines may differ slightly, however the general workflow will be the same.

Upon starting a project you should always initialize a virtual environment within your working directory. For example suppose we are creating a project called "new_project". We would create a directory for this project and initialize the virtual environment within this folder. 

*Note: You can name your virtual environment whatever you want, however in practive most people just use venv for simplicity*

```{bash}
cd ~/Projects/new_project
python3 -m venv venv
```

You should then see a folder called `venv` within your `new_project` directory.

```
.
└── venv
    ├── bin
    ├── include
    ├── lib
    └── pyvenv.cfg
```
The `venv` folder contains a few subdirectories. You will almost never need to modify the contents of `venv` directly. However, it is good to know that the `bin` folder contains your binaries, or executable files (which includes your Python installation) and the `lib` folder contains your library of installed packages. 

### Activating your virtual environment

Once you have created your virtual environment within your working directory you can activate it using the command

`source venv/bin/activate`

or on Windows

`venv/Scripts/activate`

You should notice your command line has changed to include `(venv)`. This lets you know you have successfully activated the environment.

Now, whenever you install a package using `pip` it will install directly into the `lib` folder and when you run `python` it will use the installation of python in your virtual environment, rather than your global version of python.

To deactivate your virtual environment type `deactivate`.

## Using a virtualenvironment for this course

Once you have the hang of creating and initializing virtualenvironments, be sure to create a virtualenvironment in the main directory of this repository. We'll be downloading a number of packages in the coming chapters. 