# Virtual Environments

Less like this:

In [9]:
from IPython.display import YouTubeVideo, Image
YouTubeVideo("_fNp37zFn9Q")

More like this:

In [11]:
Image(url='https://wikispaces.psu.edu/download/attachments/115966617/ht_BuildaSandbox_hero_image.jpg')

# Introduction

Running your code often implicitly depends on the current state of what you have installed on your system. It's all fine and well to give someone the code and data that you used to produce your results, but what happens when they can't reproduce the analysis even when they run the code! This is particularly problematic when [the person who can't reproduce your results is you](https://twitter.com/kcranstn/status/370914072511791104)!

The most likely culprit? Differences between the behavior on the original system and the current system. The solution? Virtual environments and version control. This allows you to take a snapshot of your system and recreate it at different times on different machines. We'll go through examples of how to use virtual environments in both `python` and `R` below.

The goal is for you to be able to:

* Create a virtual environment for a project
* Include details in github repository
* Recreate the analysis on another machine

# Python

## Conda

In addition to being a package manager `conda` is also a environment manager. There are other methods for doing both,  `pip` is a python package manager and `virtualenv` is a python environment manager, but `conda` does both in one. See [this table for a side by side comparison](http://conda.pydata.org/docs/_downloads/conda-pip-virtualenv-translator.html) of the three.

Two of the [main advantages](http://stackoverflow.com/questions/20994716/what-is-the-difference-between-pip-and-conda) of using conda:

* `conda` packages can be more than `python` (e.g. pre-compiled binaries for `R`, `C` libraries, etc.)
* `conda` environments can specify non-`python` libraries and dependencies

Roughly speaking:

`conda` = `pip` + `virtualenv`...for more than just `python`

Let's give it a try! In fact, if you've been using `conda` you've been using a default environment. Note that you can always abbreviate options using a `--` double dash with a `-` single dash and the first letter. In what follows, we'll mostly use [the documentation](http://conda.pydata.org/docs/using/envs.html)

In [14]:
%%bash
conda info --envs
conda info -e

# conda environments:
#
_test                    //anaconda/envs/_test
root                  *  //anaconda

# conda environments:
#
_test                    //anaconda/envs/_test
root                  *  //anaconda



Since we're using the default environment, then listing the packages installed via `conda` should be the same as listing the packages in our current environment.

In [16]:
%%bash
conda list | head
conda list -e | head

# packages in environment at //anaconda:
#
_license                  1.1                      py27_0  
abstract-rendering        0.5.1                np19py27_0  
alabaster                 0.7.3                    py27_0  
anaconda                  2.3.0                np19py27_0  
appscript                 1.0.1                    py27_0  
argcomplete               0.8.9                    py27_0  
astropy                   1.0.3                np19py27_0  
atom                      0.3.9                    py27_0  
# This file may be used to create an environment using:
# $ conda create --name <env> --file <this file>
# platform: osx-64
_license=1.1=py27_0
abstract-rendering=0.5.1=np19py27_0
alabaster=0.7.3=py27_0
anaconda=2.3.0=np19py27_0
appscript=1.0.1=py27_0
argcomplete=0.8.9=py27_0
astropy=1.0.3=np19py27_0


Let's create a new environment for a project that we'd eventually like to share using a specific version of `numpy` that is needed for some other code to run. You don't have to specify the version if you don't need to.

In [5]:
%%bash
conda create -n test numpy=1.8

Error: prefix already exists: //anaconda/envs/test


This will create the new environment and install the relevant version of `numpy` as well as packages that interact with it. We can look at the change we've made by inspecting the information about environments. Note that the current environment has an `*` next to it.

`conda info -e`

We can change the environment using the suggestions at the end of the install instructions from when we created the environment. Note that this might differ from system to system; windows omits `source`.

`source activate test`

`conda info -e`

`conda list -e`

The packages installed in the new environment will be minimal. But, you can very carefully control what's required and specify the version. It is particularly useful to install all of the packages that will be required in the environment simultaneously. This will give you a comprehensive overview of what's going to be installed. 

If you have to go back and add things, it's possible to use the regular install syntax and specify the environment where you want the package installed.

`conda install -n test pymc`

If we omit the `-n` tag, then `conda` will install or uninstall in the current environment, which can be checked using `conda info -e`.

You can `clone` an environment to minimally test altering a single package version, or larger changes.

`conda create --name another-test --clone test`

You can also just delete an environment once you're done with it.

`conda remove --name flowers --all`

Most importantly, you can export the current environment to file.

In [11]:
%%bash
source activate test
conda env export > environment.yml
cat environment.yml

name: test
dependencies:
- numpy=1.8.2=py27_0
- openssl=1.0.1k=1
- pip=7.1.0=py27_0
- python=2.7.10=0
- readline=6.2=2
- setuptools=18.0.1=py27_0
- sqlite=3.8.4.1=1
- tk=8.5.18=0
- zlib=1.2.8=0



discarding //anaconda/bin from PATH
prepending //anaconda/envs/test/bin to PATH


This environment file can be used to recreate the environment.

`conda env create -f environment.yml`

Simple as that. Not only can you test out how updating your system will affect your work, you can also make sure that your work can be successfully reproduced!

Approaching it from the other direction, imagine you are trying to reproduce someone else's analysis. If they've include an environment file in a github repository, then you could take the following steps:

1. Clone from github (or fork and clone)
2. Create virtual environment from file
3. Follow instructions to reproduce

If all goes according to plan, that's great. If things go wrong, then you have plenty of information to work with. You can use github to collaboratively resolve the issue.

# R

## Packrat 

Much like the virtual environments that you can create using `conda`, `Packrat` offers similar functionality for `R`, and is particularly [well-integrated with RStudio](https://rstudio.github.io/packrat/rstudio.html).

As the name suggests, `Packrat` works by keeping a local copy of everything, with [the benefits](https://rstudio.github.io/packrat/) of being:

* Isolated: Installing a new or updated package for one project won't break your other projects, and vice versa. That's because packrat gives each project its own private package library.
* Portable: Easily transport your projects from one computer to another, even across different platforms. Packrat makes it easy to install the packages your project depends on.
* Reproducible: Packrat records the exact package versions you depend on, and ensures those exact versions are the ones that get installed wherever you go.

You can install from `CRAN` and start using `Packrat` when creating a new project in `RStudio`. Or, you can navigate to the appropriate directory and initialize a project via the `R` terminal.

The basic workflow for a project using `Packrat` would be something like the following:

* `packrat::init()`
* `install.packages()` to project
* `packrat::snapshot()` status of project
* `packrat::status()` check status

Once you're satisfied with the project you can either push it to a repository on github or bundle it into a tarball:

* `packrat::bundle()` project into tarball
* `packrat::unbundle()` project elsewhere

If you or a collaborator accidentally remove packages that your code depends on, you can restore those using:

`packrat::restore()`


If you no longer need a package to run your code, you can remove it from the project using:

`packrat::clean()`

`Packrat` is very nicely integrated with `RStudio`. You can get at most of the functions above [using the packages pane[(https://rstudio.github.io/packrat/rstudio.html). Together with git and github integration, and document creation, this turns `RStudio` into an extremely powerful tool.

# Testing for reproducibility

Here's where the rubber meets the road. If you've followed along with the rest of the course, you should be able to create and track a project using a git repository, push it to github so that others can see, download, and ultimately reproduce it. You can test things out by asking a friend or colleague to try to reproduce your analysis, or download the materials onto another machine.