Note: due to a recent Github issue rendering Jupyter Notebooks, images in Notebooks are not appearing. Until it is fixed, I recommend viewing this Notebook with nbviewer: https://nbviewer.org/github/GonzagaCPSC322/U0-Introduction/blob/master/B%20Environment%20Setup.ipynb?flush_cache=true

# [CPSC 322](https://github.com/GonzagaCPSC322) Data Science Algorithms
[Gonzaga University](https://www.gonzaga.edu/)

[Gina Sprint](http://cs.gonzaga.edu/faculty/sprint/)

# Environment Setup
What are our learning objectives for this lesson?
* Download and install Git
* Download and install Docker Desktop
* Set up a Docker container running Anaconda3 Python distribution
* Run a Python program on their own computer
    * Interactive mode
    * Scripting mode
* Learn about virtual environments

Content used in this lesson is based upon information in the following sources:
* None to report

## Download/Install Git
* For Windows users: visit the [Git downloads page for Windows](https://git-scm.com/download/windows) and install Git for your PC
* For Mac users: first try opening your terminal program (command key + spacebar, then type terminal to open the terminal program) and typing `git` and pressing enter. If Git is not installed, you will prompted to install the Xcode developer tools. Install these to install Git. For alternatives to install Git for Mac, visit the [Git downloads page for Mac](https://git-scm.com/download/mac)
* For Linux users: visit the [Git downloads page for Linux](https://code.visualstudio.com/download) and install Git for your PC

## Ways to Run Python Code
There are several ways to get a Python development environment setup! Here are a few:
1. Download and install Python locally on your machine. I recommend installing the Anaconda Python distribution because it contains so many great data science libraries. Visit the [Anaconda downloads page](https://www.anaconda.com/products/individual) and download Anaconda v3.9 graphical installer for your operating system. Once the download is complete, run the installer. 
    1. Note: once you have installed Python locally on your machine, you can/should set up a Python virtual environment for your project (see the section below).
1. Download and install Docker Desktop locally on your machine. With Docker, you can pull an Anaconda Python image and run this image in a container. A container is "a lightweight, stand-alone, executable package of a piece of software that includes everything needed to run it: code, runtime, system tools, system libraries, settings."
    1. Note: this is the recommended approach for this class so we all are running (and grading) programming assignment code in the exact same, reproducible environment! Plus, you get to learn/practice with Docker!
1. Use a cloud-based Python provider. If you don't want to install Python locally, you can use in-browser development environments. Here are a few free ones to check out:
    1. https://repl.it/
    1. https://www.pythonanywhere.com/
    1. https://trinket.io/python

## Download/Install Docker Desktop
Visit the [Docker Desktop downloads page](https://www.docker.com/products/docker-desktop) and download Docker Desktop for your operating system. Once the download is complete, run the installer. 

On a Windows machine, at the "Configuration" screen, make sure both check boxes are selected, as follows:

<img src="https://i1.wp.com/www.codingnagger.com/wp-content/uploads/2020/06/image-3.png?resize=754%2C522&ssl=1" width="500">

(image from https://www.codingnagger.com/2020/06/20/install-docker-desktop-on-windows-10/)

Test that your Docker installation is complete and correct by opening Docker Desktop. You can do this by searching your computer for "Docker Desktop" (on a Windows machine press the windows key and start typing, on a Mac its command key + space and start typing).

If you are on Windows and you are prompted to install WSL 2, follow the [https://aka.ms/wsl2kernel](https://docs.microsoft.com/en-us/windows/wsl/install-win10#step-4---download-the-linux-kernel-update-package) link and download/run the MSI file:

<img src="https://i.stack.imgur.com/Tc7m4.png" width="500"/>

(image from https://i.stack.imgur.com/Tc7m4.png)

Then reboot your Windows machine.

Note: You may want to make desktop a shortcut for Docker Desktop

### What is Docker? 
From https://www.docker.com/why-docker:
> In 2013, Docker introduced what would become the industry standard for containers. Containers are a standardized unit of software that allows developers to isolate their app from its environment, solving the “it works on my machine” headache. For millions of developers today, Docker is the de facto standard to build and share containerized apps - from desktop, to the cloud. We are building on our unique connected experience from code to cloud for developers and developer teams.

![](https://miro.medium.com/max/2520/1*p8k1b2DZTQEW_yf0hYniXw.png)

(image from https://miro.medium.com/max/2520/1*p8k1b2DZTQEW_yf0hYniXw.png)

From https://en.wikipedia.org/wiki/Docker_(software):
>Docker is a set of platform as a service (PaaS) products that use OS-level virtualization to deliver software in packages called containers.[6] Containers are isolated from one another and bundle their own software, libraries and configuration files; they can communicate with each other through well-defined channels.[7] All containers are run by a single operating system kernel and therefore use fewer resources than virtual machines.[8] The service has both free and premium tiers. The software that hosts the containers is called Docker Engine.[8] It was first started in 2013 and is developed by Docker, Inc.[9]

Here is a diagram comparing containers to VMs:

![](https://miro.medium.com/max/2048/0*ujI404Gnomn1Wz5h.png)

(image from https://miro.medium.com/max/2048/0*ujI404Gnomn1Wz5h.png)

So why are we using Docker in this class? Here are a few reasons!
1. Docker skills are in high demand right now. According to [hackerearth's list of top 9 hottest tech skills to hire for in 2020](https://www.hackerearth.com/blog/talent-assessment/hottest-tech-skills-to-hire/), Docker is number 6 on the list (while Python is number 2!). Here is [blog post](https://dev.to/javinpaul/11-essential-skills-software-developers-should-learn-in-2020-1bio) that lists Docker as the number one essential skill software developers should learn in 2020. For fun, do your own searching about Docker and how in demand it has become. Indeed shows that demand for Docker skills continues to surge, up 4,162% since 2014 and it was listed in more than 5% of all US tech jobs in 2019 (from [InfoWorld](https://www.infoworld.com/article/3583931/the-most-valuable-software-developer-skills.html))
1. One of the most important concepts in data science is *reproducibility*, which is the notion of designing and documenting experiments that are reproducible. This means that you (and others) should be able to reproduce your results. Docker makes it easy to make your results reproducible because you can essentially save your tech stack (e.g. the exact versions of Python, its dependencies, etc.) and distribute it easily to others.
1. In this class, we are going to test our code using *unit tests*. A unit test is a test that measures the correctness of one unit, such as a function or a class. With Docker, you can run your tests against your code locally with the *exact same* tech stack (e.g. Docker image) my automated testing setup will run. There won't be any discrepancies in the output of your code running in your Docker container versus my Docker container, so long as we are running containers from the exact same image!
1. It is **FUN** to learn new things. In fact, a career in computer science and software development is a career in lifelong learning 🤓

Basic Docker terminology you should know (adapted from Microsoft's [Docker terminology post](https://docs.microsoft.com/en-us/dotnet/architecture/microservices/container-docker-introduction/docker-terminology) and https://mindmajix.com/docker/basic-terminologies-of-docker):
* Container image: A package with all the dependencies and information needed to create a container (these dependencies are defined in a Dockerfile). An image includes all the dependencies (such as frameworks) plus deployment and execution configuration to be used by a container runtime. Usually, an image derives from multiple base images that are layers stacked on top of each other to form the container's file system. An image is immutable once it has been created. Below are the major features of a Docker image:
    * Portability: The Docker images can easily be pushed or moved into a Docker registry and can also be saved as tar file.
    * Layered nature: Images are added using layers. This enables re-usability of images and disk usability is highly reduced as the parent layers are shared.
    * Static or Compile-time nature: Though you can create a new Docker image, the contents will remain unchanged.
* Container: An instance of a Docker image. A container represents the execution of a single application, process, or service. It consists of the contents of a Docker image, an execution environment, and a standard set of instructions. When scaling a service, you create multiple instances of a container from the same image. Or a batch job can create multiple containers from the same image, passing different parameters to each instance.
* Tag: A mark or label you can apply to images so that different images or versions of the same image (depending on the version number or the target environment) can be identified.
* Repository (repo): A collection of related Docker images, labeled with a tag that indicates the image version. Some repos contain multiple variants of a specific image, such as an image containing SDKs (heavier), an image containing only runtimes (lighter), etc. Those variants can be marked with tags. A single repo can contain platform variants, such as a Linux image and a Windows image.
* Registry: A service that provides access to repositories. The default registry for most public images is [Docker Hub](https://hub.docker.com/) (owned by Docker as an organization). A registry usually contains repositories from multiple teams. Companies often have private registries to store and manage images they've created. Azure Container Registry is another example.

![](https://docs.microsoft.com/en-us/dotnet/architecture/microservices/container-docker-introduction/media/docker-containers-images-registries/taxonomy-of-docker-terms-and-concepts.png)

(image from https://docs.microsoft.com/en-us/dotnet/architecture/microservices/container-docker-introduction/docker-containers-images-registries)

For more about Docker terminology, check out the [Docker glossary](https://docs.docker.com/glossary/).

I encourage you to complete the [Docker 101 tutorial](https://www.docker.com/101-tutorial) to increase your familiarity with the tool.

### Create Anaconda3 Container from Image
Open the command line and make a CPSC322 directory and cd into it
* On Mac/Linux: `docker run -it -p 8888:8888 -p 5000:5000 -v "$PWD":/home --name anaconda3_cpsc322 continuumio/anaconda3:2024.06-1`
    * Note: if you are running macOS Monterey, Control Center's AirPlay Receiver listens on port 5000. Either disable Airplay Receiver (instructions here: https://developer.apple.com/forums/thread/682332) or choose a different host port, like 5001 (e.g. `-p 5001:5000`)
* On Windows powershell: `docker run -it -p 8888:8888 -p 5000:5000 -v ${PWD}:/home --name anaconda3_cpsc322 continuumio/anaconda3:2024.06-1`

Notes on the flags
* -it launches an interactive terminal for the container once it is created and running
* -p forwards and opens a port in the form: `-p <host port number>:<container port number>`
    * Port 8888 for jupyter use later
    * Port 5000 for flask use later
* -v mounts the current working directory of your terminal as the home directory of the container
* --name provides your own name for the container instead of the random default name assigned by docker

This command starts the created container and changes your terminal prompt to that of the container. Open Docker Desktop and confirm your new container is running under "Containers/Apps":

<img src="https://raw.githubusercontent.com/GonzagaCPSC322/U0-Introduction/master/figures/anaconda_container_running.png">

### Starting/Stopping the Container
When you are ready to write/run Python code using the container, you need to start it first. Here are two ways to start the container:
1. Use Docker Desktop
    1. Click the play button for the container
1. Use command line
    1. Open a terminal and run `docker start -ai anaconda3_cpsc322`
    1. Note: you can get a listing of all containers with `docker ps -a`

When you are done running your container, stop the container. Here are two ways to stop the container:
1. Use Docker Desktop
    1. Hit the stop button 
1. Use command line
    1. At the container prompt run `exit`
    1. OR at your host prompt run `docker stop anaconda3_cpsc322`

### Getting the Container CLI
There are three ways to get the container's CLI (command line interface) AKA terminal AKA command line
1. Use container command line
    1. Use the one provided when you start the container from command line with `docker start -ai <container name>`
    1. OR click "CLI" icon in Docker Desktop next to your running container
1. Use VS Code
    1. Go to View->Terminal
1. Use Jupyter Lab
    1. Open a new terminal with File->New->Terminal

## Download/Install VS Code
Visit the [VS Code downloads page](https://code.visualstudio.com/download)(https://www.anaconda.com/products/individual) and download Docker Desktop for your operating system. Once the download is complete, run the installer. 

Test that your VS Code installation is complete and correct by opening VS Code. You can do this by searching your computer for "VS Code" (on a Windows machine press the windows key and start typing, on a Mac its command key + space and start typing).

With VS Code open, install the following two extensions by clicking the bottom icon on the left side bar that looks like 4 blocks and searching for:
1. Python extension
1. Remote - Containers extension

<img src="https://files.realpython.com/media/vscode-marketplace.25e99aec9f68.gif" width="500"/>

(image from https://realpython.com/python-development-visual-studio-code/)

Note: You may want to make desktop a shortcut for VS Code

## Developing with VS Code and the Container
Launch VS Code and connect it to running container
1. Make sure you have installed the Remote - Containers VS Code extension. Then click the green status bar icon for "Open a Remote Window" in the bottom left corner of VS Code
<img src="https://raw.githubusercontent.com/GonzagaCPSC322/U0-Introduction/master/figures/open_remote_window_icon.png">

1. Click “Attach to Running Container…”
<img src="https://raw.githubusercontent.com/GonzagaCPSC322/U0-Introduction/master/figures/attach_to_running_container.png">
1. Select the running anaconda3_cpsc322 container
<img src="https://raw.githubusercontent.com/GonzagaCPSC322/U0-Introduction/master/figures/attach_to_anaconda_container.png">
1. Click “Open folder” and navigate to /home in the container.
<img src="https://raw.githubusercontent.com/GonzagaCPSC322/U0-Introduction/master/figures/open_home_container.png">

Note: if you are prompted to select a Python interpreter, choose Python 3.9.7 (this is the one from the container). You can always bring up the the select interpreter window by opening the command palette (on Windows: ctrl + shift + P; on Mac: cmd + shift + P) and typing select interpreter. 
<img src="https://raw.githubusercontent.com/GonzagaCPSC322/U0-Introduction/master/figures/select_interpreter.png">

When you are done using VS Code, you should close its connection to the container with: File->Close Remote Connection

## Executing Python Code
Python code can be executed in *interactive* mode and in *scripting* mode. In interactive mode, Python code is entered/executed in a command prompt/console/terminal. In scripting mode, Python code in a source file (e.g. .py) file is executed as a program. 

We are going to perform the following steps to re-write the previous `"Hello World!"` program in *interactive* mode and again in *scripting* mode using VS Code and the command line.

### Interactive Python
1. Open the container CLI (command line interface) via the CLI icon in Docker Desktop. Type `python` to enter interactive Python mode. Type `print("Hello World!")` and press enter. You should see "Hello World!" echoed back out on the console. Congrats! You just wrote and executed your first line of Python code. We just executed this code in "interactive" mode.
1. Let's explore some features of interactive Python. Type the following commands into the IPython shell and observe the output:
    1. `help(print)`: You can type the name of any identifier between the parens of the `help()` command to learn more about the Python construct.
    1. `pwd()`: Tells you the "present working directory" where Python is executing
    1. `x = 5`: Declare a variable named `x` and assign it the value 5
    1. `type(x)`: Returns the data type of the value stored in variable `x`. What is it?
    1. `course_name = "AHA"`: Declare a variable named `course_name` and assigns it the string "AHA"
    1. `type(course_name)`
    1. `course_name.<tab>`: Type the variable name `course_name` and a dot. Then press tab. IPython will provide you with all of the available information and behaviors associated with this variable. We will learn much more about this information later in the course.
    1. `course_name.upper()`: What does `upper()` do?
    1. `dir()`: Lists the known objects (variables) in Python. Do you see your variable names?
    
### Scripting Python
1. Open VS Code using the above instructions and create a new file.
1. Type `print("Hello World!")`
1. Navigate to the menu bar. Select File -> Save As and select a folder to save your Python code files in. Name this file hello_world.py and save it in your newly made folder.
1. Run the program! You can do this by pressing F5 on your keyboard or selecting Run -> Start Debugging. The output of your code will be in the Terminal. Congrats! You just wrote and executed your first Python *script*.
<img src="https://raw.githubusercontent.com/GonzagaCPSC322/U0-Introduction/master/figures/vscode_hello_world.png">

### Launching Jupyter Lab using the Container
In container CLI run: `jupyter lab --ip='0.0.0.0' --port=8888 --no-browser --allow-root --notebook-dir=/home`

Copy and paste url (with token) into the browser. When you are done with Jupyter Lab, close and shutdown any running notebooks (File->Close and Shutdown Notebook), then shutdown Jupyter Lab (File->Shutdown)

### Testing using the Container
Run provided unit tests after development and your own testing. To do this, you need a terminal running in the container to do this.

Cd to your assignment directory and run `pytest --verbose test.py` (or whatever your test script is called). Example output from VS Code terminal:

```
(base) root@a276bb6ed62a:/home# cd pytest-docker/
(base) root@a276bb6ed62a:/home/pytest-docker# pytest --verbose hello_test.py 
====================================================== test session starts =======================================================
platform linux -- Python 3.9.7, pytest-6.1.1, py-1.9.0, pluggy-0.13.1
rootdir: /home/pytest-docker
collected 1 item                                                                                                                 

hello_test.py .                                                                                                            [100%]

======================================================= 1 passed in 0.03s ========================================================
(base) root@a276bb6ed62a:/home/pytest-docker# 
```

## An Aside: Virtual Environments
For collaboration and reproducibility, you can also set up virtual environment for your Python project. As stated by Miguel Grinberg, "A virtual environment is a copy of the Python interpreter into which you can install packages privately, without affecting the global Python interpreter installed in your system." Essentially, you can install the required dependencies for your project to run via the virtual environment. Then, when you are ready to share your project with others, you can export your virtual environment dependencies via a requirements.txt file. It is this txt file that you put under version control (not the virtual environment itself) and you share with others so they can set up their environment to be the same as yours! 

From https://docs.python.org/3/tutorial/venv.html:
> Python applications will often use packages and modules that don’t come as part of the standard library. Applications will sometimes need a specific version of a library, because the application may require that a particular bug has been fixed or the application may be written using an obsolete version of the library’s interface.

>This means it may not be possible for one Python installation to meet the requirements of every application. If application A needs version 1.0 of a particular module but application B needs version 2.0, then the requirements are in conflict and installing either version 1.0 or 2.0 will leave one application unable to run.

>The solution for this problem is to create a virtual environment, a self-contained directory tree that contains a Python installation for a particular version of Python, plus a number of additional packages.

>Different applications can then use different virtual environments. To resolve the earlier example of conflicting requirements, application A can have its own virtual environment with version 1.0 installed while application B has another virtual environment with version 2.0. If application B requires a library be upgraded to version 3.0, this will not affect application A’s environment.

Here is an example of how it works:
1. Make a directory for your new Python project and cd into it
1. Run `python3 -m venv my-env` where `my-env` is what you want to call your virtual environment (typically the name `env` suffices)
    1. This creates a directory in the CWD called `my-env`
1. Activate the virtual environment
    1. Mac/Linux: `source my-env/bin/activate`
    1. Windows: `my-env\Scripts\activate.bat`
    1. Note: this will change your shell prompt to show you which virtual environment is active. For example, my prompt looks like this: `(my-env) sprint@cps-25626 venvtemp % `
1. Run `pip install` to install your required project dependencies, e.g. `pip install numpy scipy pandas`
    1. Note: you can install a specific version of a package by giving the package name followed by == and the version number
1. You can run `pip list` to see all of the packages installed in the virtual environment, e.g.:
    ```
    (my-env) sprint@cps-25626 venvtemp % pip list
    Package         Version
    --------------- -------
    numpy           1.19.5 
    pandas          1.2.1  
    pip             19.0.3 
    python-dateutil 2.8.1  
    pytz            2020.5 
    scipy           1.6.0  
    setuptools      40.8.0 
    six             1.15.0 
    ```
1. When you are ready to export your virtual environment requirements, run `pip freeze > requirements.txt` to write the requirements to a file, e.g. the contents of requirements.txt for the example `my-env` environment above:
    ```
    (my-env) sprint@cps-25626 venvtemp % cat requirements.txt 
    numpy==1.19.5
    pandas==1.2.1
    python-dateutil==2.8.1
    pytz==2020.5
    scipy==1.6.0
    six==1.15.0
    ```
1. Now, share this requirements.txt with others (e.g. put it in your Github repo) so they can set up your same virtual environment on their machine with the following command (running in their own virtual environment: `python -m pip install -r requirements.txt`
1. When you are done working with your virtual environment, run `deactivate` to exit the environment.