# Virtual Environments, Packages, Repositories - Conda Edition

### Extra Readings
[Real Python - What is a Virtual Environment?](https://realpython.com/python-virtual-environments-a-primer/#what-is-a-python-virtual-environment) - This a good overview but focuses on the PyPi / Pip ecosystem

[DataQuest - Python's Pip Vs. Anaconda Vs. MiniConda](https://www.dataquest.io/blog/python-vs-anaconda/) - Further clarification on the differences between the package managers / ecosystems

[Python Virtual Environments tutorial using Virtualenv and Poetry](https://www.linkedin.com/pulse/python-virtual-environments-tutorial-using-virtualenv-dmitriy-zub)  - This a good overview but focuses on the PyPi / Pip ecosystem

*The first three diagrams in this notebook are taken from the above.

---

## Why are we leaving Rhino and Jumping Back into VSCode?

Python has two major characteristics that has kept it consistantly at the [top of the world's most used programming languages.](https://www.tiobe.com/tiobe-index/). 

1. It's english-like syntax is relatively easy to understand and learn.
2. It's community has developed a huge ecosystem of libraries that extend its base functionality. In particular, Python's ecosystem of libraries has made it the defacto language for machine learning, data science, and scientific computing.

Rhino allows us to install *some* external libraries, but struggles with more complex data analysis libraries. This is especially true for libraries like *geopandas* that make extensive use of the notebook interface to plot intermediate data analysis and maps.

We are going to jump back into VSCode for an introduction to *geospatial analysis*. We will later learn how to bring htis data back into Rhino.

### And Why Virtual Environments and all these setup steps?

In order to best use powerful analysis libraries, we need to ensure that our programming environment is properly set up. If we don't we'll spend the rest of the semester troubleshooting!
<br>
<br>

---

## Reviewing Terms - Libraries, Packages, Repositories, and Package Managers

***In a nutshell, we download packages, which contain libraries, using a package manager, from repositories in order to add extra functionality to our code.***

#### Libraries
* We've spent a good chunk of this course working with the *library* RhinoCommon. 
* A library is just a collection of code that we can use within our own scripts.
* We can make use of a library with the statement `import library_name` at the beginning of our code.
* Some libraries are included with Python. For example, `math` or `random`. 
* Other libraries like `Rhino` (RhinoCommon) are included within a certain distribution of Python. `Rhino` comes package with the copy of Python included with Rhino's Grasshopper.
* Finally, other libraries need to be downloaded and installed before we can use them. This leads us to our next point...

#### Packages
* When a library and/or other code is packaged to be distributed online, we call it a ***package***.
* As we've seen with the [rectangle-packer](https://pypi.org/project/rectangle-packer/), which we used to pack laser cut pieces into a rectangle, functionality that we need for our code often doesn't come packaged with Python, but rather needs to be installed after the fact.
* Once we've installed a package, we can reference the libraries and code contained within. We use the same statement, `import library_name`.
* We usually google to find the functionality / packages we need online, but then install them via the command line in VSCode or with the line   `# r:packagename` at the beginning of our GH Python code.
* Packages are stored online in repositories which leads us to our next point ...

#### Repositories
* Rectangle-packer also introduced us to the most commonly used Python *repository*, [PyPi](https://pypi.org/). 
* A repository is simply a collection of code or packages hosted online.
* We can browse the repositories online through our web browser but will ultimately install packages contained within using the command line, which leads us to our next point...

#### Package Managers
* Package managers are simply tools that allow us to download packages from repositories.
* They are usually *CLI* (command line interface) tools - ie. they are accessed via typing through the terminal.
* For packages hosted on [PyPi](https://pypi.org/), we use the package manager *Pip* and the CLI command `pip install <package_name>`
* It is essential for regular Python users to be familiar with `pip`, but for this course, we will use the repository [Anaconda](https://anaconda.org/anaconda/repo). 
* For packages hosted on the Anaconda repo, we will use the package manager *MiniConda* and the CLI command `conda install <package_name>` to install packages.
<br>
<br>
---

## Installing a Package from a Repository - The Wrong Way

The below steps show you how to install a package on you system-wide copy of Python. 

*Note: this is for demonstration purposes only. Don't follow along.*

*Extra Note: Pip is not wrong! Pip is great! I'm only using it here, as it is the default, and therefor easiest to use wrong.*

### Open Your Terminal
To open a new terminal in VSCode, hold `Ctrl + Shift + ~` To open a previously open instance of your terminal, or to close your terminal, hold `Ctrl + ~ (don't hold shift)`. You can also access the terminal from the ribbon menu at the top of VSCode - select `Terminal --> New Terminal`.

Once you've opened the terminal, you should see something like this at the bottom of your screen:

<img src="img\terminal.png" width="800"> 

### Install the Package

You can now install a package to your system-wide copy of Python by simply typing

```bash
pip install <package_name>
```

This CLI command, uses the *package manager* PIP, to download the package from PyPi, and install the contained code / libraries in your global copy of Python. 


If you install a package this way, you will be able to import the contained library into your Python code. That's it! Now let's see why this is not a best practice.
<br>
<br>

---


## Keeping Track of Everything - Virtual Environments and Dependencies

### The Perils of Installing Packages 'The Wrong Way'

Refer back to the first class' lecture notes. You will see that we installed *Python, the program*. We have been using this single, system wide copy of Python to run our code whenever we have been working in VSCode. This has been working great for us so far, but we also haven't installed any external packages. Let's look at a scenario to understand why we need to use something called *virtual environments* when installing packes.

### Scenario 1 - Installing Packages to our System Wide Copy of Python
Assume we keep working like this - with the one copy of Python running all our code. Also assume we write code in July. This code we write in July needs some extra functionality so we decide to install the package `extra_stuff`. To do so, we open our terminal and install the most recent copy of `extra_stuff` with the command:
```bash
pip install extra_stuff
```

***When we run this command, pip installs `extra_stuff` globally. That is, our system-wide version of Python, now can reference this particular version of `extra_stuff`.***

We can now happily import `extra_stuff`'s functionality into our code using the statement `import extra_stuff`.

Now, let's assume we start writing another project in August. For this, we want to use the package `cool_things`. Again we install `cool_things` using `pip` and happily write our code.

***THIS IS WHERE THE BAD TIMES BEGIN***

Unbeknownst to us, `cool_things` depends on an older version of `extra-Stuff`. In the process of installing `cool_things`, we inadvertantly overrode the copy of `extra_stuff` needed to run the code we wrote in July. When we go back to run July's code, we will get an error.

As we continue to code like this, and install packages willy-nilly, we will continue to break more code, and dig a deeper and deeper dependency hole.

The below illustration is a depiction of our code base in this scenario. Note the flames.
<img src="img\global.png" width="700"> 


### Scenario 2 - Installing Packages Using Virtual Environments

In this less stressful scenario, we will make use of ***virtual environments***. 

A virtual environment is simply a copy of the Python and project required libraries installed within a project's folder. Whenever we work on said project, we just use the specific version of Python along side the specific versions of the libraries. We call the specified Python version and specified libraries ***dependencies***. This system also has the added benfit that, when we share our code, we can easily indicate the required dependencies needed for our project to run.

In this scenario, in July, we do the following steps:

* Create a Virtual Environment
* Activate the Virtual Environment
* Install packages in the Virtual Environment
* Run our code within the virtual Environment

Here's the same info in a nice illustration.

<img src="img\overview.png" width="700"> 

In August we do the same process for our new project. It doesn't matter if the projects share dependencies as *packages are installed within the copy of Python contained within the virtual environment.* This prevent us from the ever deepening hole of dependency errors!

The below illustration is a depiction of two peacefully coexistant coding projects.
<img src="img\local.png" width="700">

If you follow these best practices. You will prevent A LOT of headaches in the future.
<br>
<br>

---

## Anaconda - Managing Packages and Virtual Environments

<img src="img\anaconda_logo.png" width="400"> 

In this course, we will will not use the default package manager, `pip`, or the default virtual environment manager `venv`. Rather, we will use the managers included in a program called ***Anaconda*** - more specifically, a bare bones version of Ananconda called ***MiniConda***.

### Why Anaconda?

[Anaconda](https://en.wikipedia.org/wiki/Anaconda_%28Python_distribution%29) is a Python distribution built to simplify package and projects management for coders working in the fields of data science, scientific computing, and machine learning. Popular libraries like `pandas`, `numpy`, `scikitlearn`, etc. are easier to install and manage. Of particular interest to this course, the geospatial analysis library [GeoPandas](https://geopandas.org/en/stable/), is approximately 999999999X easier to install with Conda versus the default `pip` package manager. Using conda, we get access to power spatial analysis functionality like this directly within our Python notebooks:

<img src="img\biodiv.png" width="500"> 
<br>
<br>

### Installing MiniConda
I've included two installation notebooks in this folder.
* For Windows installation, follow the steps in `windows_conda_install.ipynb`.
* For Mac installation, follow the steps in `mac_conda_install.ipynb`.

<br>
<br>

---


## Setting up Our Virtual Environment

By default, MiniConda creates virtual environments in it's installation folder. We want to avoid that default behaviour and create our virtual environment *within* our project folder. This just keeps our projects nice and organized. 

### Step 0. Choosing Our Folder Location

I found that saving your project folder in a path that has spaces, leads to insanely long conda download times and other frustrating errors. For example, if you are on windows and your project folder is in:
```
C:\google_drive\My Drive\Arch_python
```
the `My Drive` is going to cause a world of problems.

You are better off ensuring that you work in a folder with no spaces. You could set up a folder like this instead:
```
C:\documents\Arch_python
```

### Step 1. Preparing Our Project Folder

We first want to make sure that we have create a project folder. your folder structure should differ, but it is a good habit to keep your supporting files seperate from your main code. For instance, this notebooks folder is organized like so:

<img src="img\folder_structure.png" width="400"> 

The main code files (the notebooks) are in the main folder. The images embedded in the notebooks are in folders *img* and *img_win*, and supporting data will be put into the folder *data*.

***It is very important that your keep all your project's files and only your projects files within this folder.***

### Step 2. Create A Virtual Environment in Your Project Folder

Open your terminal, and make sure you are in the root project folder. By default, you should be. For instance, in the folder structure shown above, the root folder is `01716_conda_environments`. We know we are in this folder when our terminal line begins with the folder's path. For instance:

<img src="img\path.png" width="900"> 

If you aren't in the proper folder, simply close the terminal and open a new instance, or [learn how to navigate in a Windows terminal](https://www.itprotoday.com/powershell/how-to-use-powershell-to-navigate-the-windows-folder-structure), or, [learn how to navigate in Mac / Linux](https://www.macworld.com/article/221277/command-line-navigating-files-folders-mac-terminal.html).

In the proper location, type the following command:

```bash
conda create --prefix ./venv
```

This creates the virtual environment folder `venv` in the current project root folder. You should see this change reflected in the VSCode explorer on the left hand side of the screen.


### Step 3. Activate Your Virtual Environment in the Terminal

Activating our virtual environment in the terminal allows us to install packages *only within the project's virtual environment*. As we read above, this is key to keeping our code dependancy-error-free. 

To activate your project's virtual environment you will type the command:

```bash
conda activate ./venv
```

You will know your virtual environment is active if the current terminal line is preceeded by `(venv)`.

<img src="img\venv_active.png" width="900"> 
<br>
<br>


***NOTE: During this step or the next, VSCode may prompt you to set this environment as the new workspace.***

<img src="img\new_workspace.png" width="500">

You can just click `yes`.

---

## Installing a Package in our Virtual Environment

First, make sure you've followed the steps above, and have `(venv)` leading your current terminal line.

#### Installing GeoPandas

We are going to install `geopandas`, the geospatial analysis library. Thankfully, we've gotten through the hard stuff. 

To install `geopandas`, [as per the project website](https://geopandas.org/en/stable/getting_started.html), simply enter the following command into your terminal and hit enter:

```bash
conda install -c conda-forge geopandas
```

The `-c conda-forge` simply means that we will download from the community drive conda repo versus the default repo. You usually don't need to include this argument, but the geopandas team specifically mentions this command.

Keep your eye on the terminal. It will take a few minutes to install, but at one point you will be prompted to confirm the installation.

```
Proceed ([y]/n)?
```

Type `y` and hit enter.

You should now see MiniConda downloading and extracting the packages:

<img src="img\geopandas_install.png" width="900"> 

<br>
<br>
<br>

Following this, and a few more automatic installation steps, that may take a while (geopandas is big), you will receive confirmation that the package has installed correctly. You should see something like this:

<img src="img\conda_install_complete.png" width="500"> 


### Installing Ipykernel

`ipykernel` allows the environment to communicate through a .ipynb notebook.

To install `ipykernel` we can simply enter the command:

```bash
conda install ipykernel
```

Like our geopandas installation, we will enter `y` at the confirmation prompt. After this installation, you should be good to go!
<br>
<br>

---

## Running Code in our Virtual Environment

### Selecting the venv kernel

When we run code in a `.ipynb` file, we are able to select which copy of Python actually runs the code. The copy of Python we select is called the *kernel*. Since we've installed our packages within our virtual environment, we want to ensure that we select the virtual environment's kernel.

To do so, with your notebook open, click `select kernel` in the upper right hand corner of VSCode.

<img src="img\select_kernel.png" width="200"> 

Choose `Python Environments` and then select `venv`. It should be starred and recommended. Your screen may look slightly different. Just make sure it says `venv`.

<img src="img\select_venv.png" width="600"> 

### Confiming you're good to go

To confirm that geopandas is ready to go, we can first import it using `import geopandas`.

We can then use the function `show_versions()` to display the versions of all geopanda's dependencies.

If you get an output that looks like this, you're good to go!

<img src="img\gp_nb.png" width="600"> 