# Welcome to the Dark Art of Coding:
## Introduction to Data Science Fundamentals
Preparation and installation guide

<img src='images/logos.3.600.wide.png' height='250' width='300' style="float:right">

# Main objectives
---

By the end of this module, you will be able to:

* Download the tools we will be using (conda, specific Python libraries)
* Install the tools
* Test them for successful installation
* Open the Juptyer Notebooks that we will be using in class
* Run the code samples found in the notebooks
* Understand the importance of the tools for our tasks today



# Installing the Software You'll Need
---

## Step Zero: Read through ALL the steps...

We strongly recommend that you read through **ALL** the steps below, before you start to install, etc. For some more advanced practitioners, you **may** already have some tools installed OR available. 

IF you can successfully open the Jupyter Notebooks AND import the data libraries listed below in Step Three, then you shouldn't need to do anything. For folks who aren't sure OR for folks who are fairly new to Python/programming... these steps should get us to the point we need to be.

## Step One: Download and Install Miniconda

Follow the instructions for your operating system in the **`miniconda`** quickstart guides.

**Some warnings/cautions:**

1. We **highly recommend** the use of `conda` as a package manager and virtual environment manager for this tutorial. This material has been tested using `conda` but has not been tested using `pip`, `virtualenv`, `pyenv`, etc.

1. **IF you already have `conda`** installed via a previous `Anaconda` OR `miniconda` install, you should not need to reinstall. How can you tell? If you type `conda` on your command line and get a response similar to this, then you should be fine:

    ```
    my_macbook:my_folder chalmerlowe$ conda
    usage: conda [-h] [-V] command ...

    conda is a tool for managing and deploying applications, environments and packages.
    .
    .
    .
    ```
1. Use a **Python 3.x** version of `miniconda` to install Python 3.x.
1. Based on our experience in workshops, **the most common problems** we experience with installs is that a step got missed OR a command was typed incorrectly. It happens to all of us, so stay sharp, folks!

With that in mind, please choose the appropriate version and install `conda`

**`conda` for Windows**:

* Download the installer: [Miniconda installer for Windows.](https://conda.io/miniconda.html)
* Double-click the .exe file.
* Follow the instructions on the screen.
* NOTE: If you are unsure about any setting, accept the defaults. You can change them later.
* When installation is finished, from the **Start menu**, open the **Anaconda Prompt**.

**`conda` for MacOS**:

* Download the installer: [Miniconda installer for MacOS.](https://conda.io/miniconda.html)
* In your Terminal window, run:
    
    ```bash
    $ bash Miniconda3-latest-MacOSX-x86_64.sh
    ```
<br>
* Follow the prompts on the installer screens.
* NOTE: If you are unsure about any setting, accept the defaults. You can change them later.
* **Close** and then **re-open** your Terminal window, to make the changes take effect.


**`conda` for Linux**:

* Download the installer: [Miniconda installer for Linux.](https://conda.io/miniconda.html)
* In your Terminal window, run:
    
    ```bash
    $ bash Miniconda3-latest-Linux-x86_64.sh
    ```
<br>
* Follow the prompts on the installer screens.
* If you are unsure about any setting, accept the defaults. You can change them later.
* **Close** and then **re-open** your Terminal window, to make the changes take effect.

## Step Two: Confirm your conda install

In a command prompt type `conda list`. If `conda` is installed properly, you will see a summary of the packages installed by `conda`.

### Troubleshooting

Here's a list of error messages & how to fix them.

- **`conda: Command not found.` **IF you see this, the most common reason is that your command shell is not yet aware of the installation of `conda`. The easiest fix is to simply **close** your terminal/command prompt & **reopen** your terminal/command prompt. If that doesn't fix it, ask for help.

## Step Three: Install Python, and other packages...

With `conda` installed, we want to ensure that we have a suitable version of Python installed and that we have the necessary libraries also installed.

We will create a directory to hold our lesson content, for consistency, we will call this directory `stats` and then we will create a virtual environment and populate it with Python and our libraries.

1. On your command prompt, make sure you are in a directory where you want your project folder to be located (many people put this in their `My Documents` OR `home` folder. From that directory, run the following command:

    ```bash
    chalmerlowe$ mkdir stats
    ```
    <br>
1. Change directories into the new folder:

    ```bash
    chalmerlowe$ cd stats
    ```
    <br>
1. Create a virtual environment with Python 3, using the following command (don't worry, we will explain this below):

    ```bash
    chalmerlowe$ conda create -n stats python=3
    ```
    <br>
1. Activate your virtual environment using the command appropriate to your operating system. NOTICE your prompt will change to reflect the fact that you are now in a virtual environment:

    **Mac/ Linux** 

    ```bash
    chalmerlowe$ source activate stats
 (stats) chalmerlowe$    
    ```
    <br>
    **Windows**

    ```bat
    C:\> activate stats
    (stats) C:\>
    ```
    <br>

1. Install the following additional packages to your virtual environment:

    ```bash
    (stats) chalmerlowe$ conda install jupyterlab matplotlib numpy pandas scipy 
    ```
    <br>

1. Test your installation, by typing the following on your commmand line/terminal:

    ```bash
    (stats) chalmerlowe$ jupyter lab 
    ```
    <br>
    
If your browser opens with a Jupyter Lab instance, you will know the install process succeeded.    

## Done with commands for now!

If you (if you're working in pairs, you and your partner) are done, then now you can put your green sticky up! This is how we know you're done with the commands.

<img src='images/green_sticky.300px.png' width='200' style='float:left'>

If you like reading, you can also keep reading this page.

# The Big Picture 
---

## What is miniconda (conda) and why did we install it?

Miniconda contains the `conda` package manager/virtual environment manager and `Python`. `conda` is language agnostic, so you can also use it to support work with languages besides Python. Once miniconda is installed, you will be able to: 

* create virtual environments and 
* manage separate installations of `Python` (including different versions) 
* manage Python packages/libraries
* as well as manage packages in other languages ... packages that are fundamentally unmanageable by Python-only tools like `pip` & `virtualenv`.

Whenever you work on a new project, you should create a separate environment for that project. `conda` lets you do this easily and efficiently. A later lesson will provide more details on both virtual environments and the use of the `conda` package manager.

## And what is a virtual environment?

When you create a virtual environment, `conda` will add subdirectories to a miniconda directory on your computer. Specifically it will create a directory that will contain:

* a database and metadata about the virtualenv
* software and libraries related to the project (i.e., Python and any modules you install in the virtualenv)

NOTE: this virtualenv folder is **NOT** a duplicate of your project folder **NOR** does it contain your code/class material

# Deep dive
---

## What is a virtual environment?

As mentioned above, virtual environments (also called virtualenvs) are tools used to keep projects separate, especially in terms of keeping different software versions separate and different library versions separate. For example, virtualenvs prevent Python's site packages folder from getting filled with potentially conflicting versions of software AND thus prevents problems that arise when one project needs **version x.x** of a library but another project needs **version y.y** of the same library. At their core, virtualenvs are glorified directories that use scripts and metadata to organize and control the environment. You are allowed to have an essentially unlimited number of virtual environments. And as you saw above, they are very easy to create using various command line tools, such as `conda`.

## When should we use a virtual environment?

Anytime you have more than one project and there is a possibility of conflicts between your libraries, it is a good time to use a virtual environment. Having said that, many programmers use virtual environments for **all but the most trivial** programming tasks. Especially for beginners, using virtual environmentss early on in your learning career will build a valuable skill AND help prevent sneaky bugs related to version discrepancies. Bugs that can be hard to diagnose.

## How do you create a virtual environment?

While there are several programs or libraries that can generate virtual environments for today's lesson, we will be using the `conda` package manager, which includes the capability to simply and easily produce virtual environments.

Presuming you have `conda` installed, these steps enable you to create and activate a virtual environment.

```bash
$ conda create -n stats python=3
```

Description:
* `conda` runs the conda program.
* `create` tells it to create a virtualenv
* `-n` identifies the name of the virtualenv, in this case, `stats`
* `python=3` tells conda that you want to install Python version 3 in this virtualenv

**NOTE**: for other projects, you **can** use `python=2` or `python=3` and regardless which you choose, conda will default to the most recent version of Python within the version 2 OR version 3 family. If you need to select a specific minor version of python, use the following syntax:

`python=3.2`

When you execute the `conda create` command, `conda` prepares to install Python and any dependencies that Python relies upon. It will display output similar to the following. 

```bash
my_macbook:my_folder chalmerlowe$ conda create -n stats python=3
Fetching package metadata .......
Solving package specifications: ..........

Package plan for installation in environment /Users/chalmerlowe/miniconda3/envs/stats:

The following packages will be downloaded:

    package                    |            build
    ---------------------------|-----------------
    openssl-1.0.2k             |                1         3.0 MB
    python-3.6.0               |                0        11.7 MB
    setuptools-27.2.0          |           py36_0         523 KB
    wheel-0.29.0               |           py36_0          87 KB
    pip-9.0.1                  |           py36_1         1.7 MB
    ------------------------------------------------------------
                                            Total:        17.0 MB

The following NEW packages will be INSTALLED:

    openssl:    1.0.2k-1
    pip:        9.0.1-py36_1
    python:     3.6.0-0
    readline:   6.2-2
    setuptools: 27.2.0-py36_0
    sqlite:     3.13.0-0
    tk:         8.5.18-0
    wheel:      0.29.0-py36_0
    xz:         5.2.2-1
    zlib:       1.2.8-3

Proceed ([y]/n)?
```

To finish the creation of the virtualenv and install the software, press `y`.

## Activating a virtualenv

Once you have created a virtualenv, you will need to activate it. Activation has several side effects:

* It temporarily changes your `$PATH` variable so calls to the `python` command (and similar commands) will look first in the virtual environment's `bin/` directory. 
* It temporarily changes your shell prompt to show which virtual environment you are using. Your prompt will likely look something like this, with the name of your virtual environment in parenthesis in front of the prompt:
    * Mac/Linux: `(stats) chalmerlowe$`
    * Windows: `(stats) C:\>`

To activate your virtual environment, run the appropriate command for your operating system:

**Mac/Linux**

```bash
$ source activate stats
```

**Windows**

```bat
C:\> activate stats
```

**Note:** If you are using **Power Shell**, `activate` won't work out of the box. Type `cmd` first to get a regular command prompt, *then* `activate stats`.

### Adding software to your virtualenv 

To add more software to the virtualenv, you can use `conda` to install the software. The maintainers of conda provide access to many Python and non-Python libraries, but not all of them. If conda cannot install a particular library that you need, you can generally use `pip` or a similar package installation tool to install it instead (covering `pip` is outside the scope of this workshop).

For example, to install IPython, you can use the following `conda` command:

```
conda install ipython
```

Conda will prepare to install IPython and any dependencies that IPython relies upon. It will display output similar to the following (truncated to save space).

```bash
Fetching package metadata .......
Solving package specifications: ..........

Package plan for installation in environment /Users/chalmerlowe/miniconda3:

The following packages will be downloaded:

    package                    |            build
    ---------------------------|-----------------
    conda-env-2.6.0            |                0          601 B
    ...
    ipython-5.3.0              |           py35_0        1021 KB
    conda-4.3.14               |           py35_0         505 KB
    ------------------------------------------------------------
                                            Total:         3.8 MB

The following NEW packages will be INSTALLED:

    appnope:          0.1.0-py35_0
    ...
    wcwidth:          0.1.7-py35_0

The following packages will be UPDATED:

    conda:            4.1.11-py35_0 --> 4.3.14-py35_0
    conda-env:        2.5.2-py35_0  --> 2.6.0-0
    requests:         2.10.0-py35_0 --> 2.13.0-py35_0

Proceed ([y]/n)?
```

To finish the installation of IPython and its dependencies, press `y`.

### Multiple packages

Multiple packages can be installed at the same time, by separating the package names with spaces:

`conda install jupyterlab matplotlib numpy pandas scipy`

**IF** there are special packages that you need to get from a specific repository channel (i.e. the conda-forge channel), you can designate a channel using the `-c` flag and the name of the channel (such as `conda-forge`) as shown here:

`conda install -c conda-forge jupyterlab matplotlib numpy pandas scipy`

### Leaving the virtualenv when you are done

When you are done working in your virtualenv, you can deactivate it using the following command:

**Mac/Linux**

```bash
(mytest) $ source deactivate
$
```

**Windows**

```bat
(mytest) C:\> deactivate
C:\>
```

## Resources



* [Using conda](http://conda.pydata.org/docs/using/index.html): A tutorial on how to use `conda`

* [conda cheatsheet](https://conda.io/docs/_downloads/conda-cheatsheet.pdf): A cheatsheet of the most common `conda` commands

* [conda myths and misconceptions](http://jakevdp.github.io/blog/2016/08/25/conda-myths-and-misconceptions/): Reasons why conda was created and how it differs from `pip`, `virtualenv`, etc.

* [Python's `venv` and `virtualenv` can also create virtual environments.](http://stackoverflow.com/questions/41573587/what-is-the-difference-between-venv-pyvenv-pyenv-virtualenv-virtualenvwrappe)

* [`pip` is Python's package manager.](https://en.wikipedia.org/wiki/Pip_(package_manager))