# Bootstrapping a Python Environment

Q. How do you setup a new virtual environment to do Python development?

A. It depends

Most of this is platform specific, python version specific and ultimately application dependent. Most of my development is for scientific software and, to a lesser degree, systems administration tasks. Critical packages for my work include numpy, scipy, ipython (including the notebook), networkx and matplotlib and pandas.

<br />

Q. What's a normal workflow

A. The [Python Packaging Authority](https://packaging.python.org/en/latest/) is the best guide on the internet for how to do this. In short though, the basic idea is that you:

* Install [pip](https://pip.pypa.io/en/latest/installing.html)
* Install [virtualenv](https://virtualenv.pypa.io/en/latest/installation.html)
* Manually install the packages you're interested in: `pip install`, you'll normally want a hand curated list of packages that you pick and choose from depending on application
* Keep developing 
* Capture the list of packages in your initial virtual environment with `pip freeze > requirements.txt`
* Create a new virtual environment, `pip install -r requirements.txt`, re-run your unit tests

<br />

Q. When is this hard?

A. When you have:

* A firewall - you still need to get through any corporate proxy servers. This means correctly configuring `HTTP_PROXY` for older versions of pip and `HTTPS_PROXY` for newer versions (just set both). This gets more complicated if you need to authenticate to your proxy, and depends on what conventions there are for doing that.
* To deal with SSL certificates - for new versions of requests (which is a key part of conda and pip), you'll also need to be careful of `REQUESTS_CA_BUNDLE` 
* When you don't have admin rights. To get started, you may need to play some games with [virtualenv](http://stackoverflow.com/questions/4324558/whats-the-proper-way-to-install-pip-virtualenv-and-distribute-for-python) depending on your platform to get started.


<br />

Q. Why doesn't this work for you?

A. Because of non-python dependencies. Numpy is mainly C, Scipy is C/C++ and Fortran.  Each of those packages needs build tools, compilers and additional libraries.

<br />


Q. What's the easiest way

A. For my purposes, the easiest way to get started is [miniconda](http://conda.pydata.org/miniconda.html). Once minicoda

```shell
conda install numpy scipy ipython-notebook network matplotlib pandas
```

The benefit of using conda is that I get what is effectively a virtual environment, a version of python I can select. Even better, I don't even have to know how, or what, non-python code is being compiled and linked into my application.

This solution works on windows, linux and mac.

It's not without it's flaws though - conda makes all the decisions for you, which means that if you know something that it doesn't (like which version of lapack / atlas / blas) are installed, it's not going to make use of the them correctly.

Conda (or one of the other python distributions) is the only effective way to manage Python on Windows.

<br />

Q. So what are the alternatives?

A. Install everything hard from system packages (like python-numpy and python-scipy) and then use virtualenvs for the rest. 

Once the hard things are setup, you can create virtual environments with the `--system-site-packages` flag. This solves the problem of having to know which packages are required to install numpy and scipy (because once they're installed, most other things follow pretty easily).

This solution is possibly the worst possible compromise - it gets you started, but it's very brittle to changes and package upgrades. Even worse, operating system packages tend to lag updates quite badly.

<br />

Q. Any more?

A. Don't use virtual environments

This only makes sense if you're only developing one thing, when it's your personal sandbox, or if you're managing your python packages using something like docker or vagrant.

<br />

Q. Any more?

A. Manage everything by hand.

This is by far the most time consuming method. It depends heavily on (deep) understanding of the platforms that you work on. Much of this comes from trial and error and lots of hunting and searching on stackoverflow. You install all your operating system packages first (lapack, blas and fortran and c++ compilers).

This will give the best results - if you're a sys-admin or support a large number of developers, invest the time to get this right. If you're a sole developer or part of a small team, this approach isn't cost effective.