<div align=right>
<img src="img/logosmall.png" width="100px" align=right>
</div>

# Installing and managing a Python environment

In the previous section we defined a Python _environment_ as consisting of the following:

1. A Python *interpreter* that's suitable for your system.

2. An installation of the Python *standard library* that's usable by that interpreter.

3. The toolset required to find, install and maintain 3rd party libraries.

We are now going to learn how to install our own Python environment.  Why do we want to install our own Python environment?  So that we have *control over it*.  We need a measure of control and flexibility if we're really going to be *using* our programming environment!  For instance:

* If a version of Python came pre-installed on your system, the system may depend on it.  It's often better not to mess with it.


* More importantly:  Having your own Python environment under your control means you can install, upgrade and uninstall Python modules when you need to, without calling your network administrator.  As a working Python programmer, you'll often need to do this many times a day.

## Introducing Anaconda

Anaconda is a set of tools that makes it easy for the working scientist to set up and maintain a working Python environment *without* any help from the system administrator or the local computer expert.

It can be used to set up a Python environment your home directory on a multi-user UNIX or Linux environment, or on a single-user machine like your desktop or laptop.

>For this course, I urge anyone who has brought along their own Windows, Linux or Mac laptop *and has it connected to the network* to set up their Python environment on their laptop.  This will help to minimise the load we place on our poor overworked EVOP servers.

Anaconda is developed by a commercial company called *Continuum Analytics*, which specialising in providing commercial support for high-performance Python computing in science.  Don't worry, though:  Like the rest of the Python tools we'll use, Anaconda is completely free and open source.

>It's encouraging that Anaconda was written *by* scientists *for* scientists, don't you think?

## How Anaconda works

Anaconda provides us with a command-line tool called `conda` which we can use to install and update both Python itself, as well as the many third party packages (modules) we will be using.  `conda` installs *binary* packages from an online repository known as the *Anaconda Cloud*.

By "binary package", we mean a package that has already been built (or compiled) for a particular system.  This has a very definite advantage for us:  Because the package has already been built, we can install it without having to worry about whether we have the prerequisite tools or libraries installed on our computer — it should "just work".

It also poses certain disadvantages:

* Packages have to be updated in the Anaconda Cloud by Continuum Analytics.  They may lag days or even weeks behind the latest releases.

* The packages are pre-built, and hence they're built for specific platforms.  Currently, the Anaconda Cloud has packages for 32-bit and 64-bit versions of Linux, Microsoft Windows, and macOS (formerly OS X).  That should cover most bases!

In practice, these disadvantages don't outweigh the it-just-works advantage!

## Installing Anaconda

Anaconda is provided in two *distributions:*

* **Anaconda** includes Python itself, plus hundreds of pre-built modules for scientific use.  It's comprehensive, but takes up a fair bit of disk space.

* **Miniconda** includes just Python itself, plus a few core tools.  You have to install further modules as and when you need them.

*For the purposes of this course, we'll use Miniconda.*

Miniconda can be downloaded from this URL: <http://conda.pydata.org/miniconda.html>

On that page you'll see a download matrix — we have quite a few choices!

![Miniconda download matrix](img/Miniconda download.png)

* We can choose between versions for Windows, Mac OS X (technically, macOS) and Linux.

* We can choose between versions which include Python 3.6, and versions which include Python 2.7

* We can choose between 64-bit and 32-bit versions.

At this point, how you proceed depends on where your'e installing Miniconda.  If you're installing on `evopserver`, continue on to the next section.  However, if you're installing on your own laptop, continue to one of the following three sections, whichever is appropriate to your case:

* [Installing on a Linux laptop](#Installing-on-a-Linux-laptop)
* [Installing on a Mac laptop](#Installing-on-a-Mac-laptop)
* [Installing on a Microsoft Windows laptop](#Installing-on-a-Microsoft-Windows-laptop)

### Installing on `evopserver`

We'll use the Miniconda version for Python 3.6, and `evopserver` requires that we install the 32-bit version.

In a terminal window on `evopserver`, type in the following command to download the Miniconda installer using the `curl` utility:

In [None]:
curl -LO https://repo.continuum.io/miniconda/Miniconda3-latest-Linux-x86.sh

Next, run the installer with the Bash shell as follows:

In [None]:
bash Miniconda3-latest-Linux-x86.sh

The first thing you'll be required to do is to agree to a licence.  You can read it if you want, or you can just trust me that it's not onerous.  Anaconda is open source software, even though it's the product of a commercial company.

>For the licensing-aware among you, it's simply the 3-clause BSD licence.

Press `ENTER` when prompted, then press `SPACE` to page through the licence terms, and finally type `yes` to accept the terms of the licence.

Next, the Miniconda installer asks where you want to install your Python environment.  You can just press `ENTER` to approve the default, which is `/homes/evopserver/<your_login>/miniconda3`.

At this point you get to sit back and wait for a minute while the installation occurs…

Finally, the Miniconda installer asks you if you want to prepend the install location to your `PATH` environment variable in your `.bashrc`.  Respond with `yes`.

>If you found this last bit cryptic:  The Miniconda installer just wants to add a line to the startup configuration of your shell that will ensure that *its* version of Python — and not the version already installed on the system — is the one that gets executed when you type `python`.

Finally, we have to restart our shell to let this change take effect.  The easiest way to do that is to log out of `evopserver`, and then log back in.  Do that now…

When you're logged back into `evopserver`, you can continue on to the following section:

* [Check your installation](#Check-your-installation)

### Installing on a Linux laptop

Installation on a Linux-based laptop should proceed almost exactly like installation on `evopserver`, as described in the previous section.  We will again install the Python 3.6 version of Miniconda for Linux.  The only difference is that for modern laptops, you can opt to install the 64-bit version.  You can download it using the `curl` utility as follows:

In [None]:
curl -LO https://repo.continuum.io/miniconda/Miniconda3-latest-Linux-x86_64.sh

>Don't worry if you don't know whether your laptop has a 32-bit or 64-bit processor.  Try the 64-bit Miniconda installer first.  If it doesn't work it will not harm anything, an you can then use the 32-bit installer — see the previous section.

Next, run the installer with the Bash shell as follows:

In [None]:
bash Miniconda3-latest-Linux-x86_64.sh

The first thing you'll be required to do is to agree to a licence.  You can read it if you want, or you can just trust me that it's not onerous.  Anaconda is open source software, even though it's the product of a commercial company.

>For the licensing-aware among you, it's simply the 3-clause BSD licence.

Press `ENTER` when prompted, then press `SPACE` to page through the licence terms, and finally type `yes` to accept the terms of the licence.

Next, the Miniconda installer asks where you want to install your Python environment.  You can just press `ENTER` to approve the default, which is a directory called `miniconda3` in your home directory.

>If you know what yuo're doing, feel free to add an alternative path here.  My personal preference is to install Miniconda in `~/.python`.

At this point you get to sit back and wait for a minute while the installation occurs…

Finally, the Miniconda installer asks you if you want to prepend the install location to your `PATH` environment variable in your `.bashrc`.  If you don't know what this means, just Respond with `yes`.

>If you found this last bit cryptic:  The Miniconda installer just wants to add a line to the startup configuration of your shell that will ensure that *its* version of Python — and not the version already installed on the system — is the one that gets executed when you type `python`.

If you know what you're doing and you'd rather add the Miniconda binary directory to your `$PATH` yourself instead of letting the installer do it, feel free to do so!

Finally, we have to restart our shell to let this change take effect.  The easiest way to do that is to log out of `evopserver`, and then log back in.  Do that now…

>If you know how to source your shell startup file, you can do that instead.

When you get back, you can continue on to the following section:

* [Check your installation](#Check-your-installation)

### Installing on a Mac laptop

We will install the Python 3.6 version of Miniconda for Mac OS X.  Click on it on the web page, and the installer will be downloaded to the `Downloads` folder in your home directory.

Once downloaded, execute the installer as follows:

In [None]:
bash ~/Downloads/Miniconda3-latest-MacOSX-x86_64.sh

The first thing you'll be required to do is to agree to a licence.  You can read it if you want, or you can just trust me that it's not onerous.  Anaconda is open source software, even though it's the product of a commercial company.

>For the licensing-aware among you, it's simply the 3-clause BSD licence.

Press `ENTER` when prompted, then press `SPACE` to page through the licence terms, and finally type `yes` to accept the terms of the licence.

Next, the Miniconda installer asks where you want to install your Python environment.  You can just press `ENTER` to approve the default, which is `/Users/<your_login>/miniconda3`.

At this point you get to sit back and wait for a minute while the installation occurs…

Finally, the Miniconda installer asks you if you want to prepend the install location to your `PATH` environment variable in your `.bashrc`.  Respond with `yes`.

>If you found this last bit cryptic:  The Miniconda installer just wants to add a line to the startup configuration of your shell that will ensure that *its* version of Python — and not the version already installed on the system — is the one that gets executed when you type `python`.

Finally, we have to restart our shell to let this change take effect.  The easiest way to do that is simply to close the Terminal application, and start it again.  Do that now…

When you're back in Terminal, you can continue on to the following section:

* [Check your installation](#Check-your-installation)

### Installing on a Microsoft Windows laptop

Click the link to the Miniconda installer for Python 3.6.  If your machine is less than about five years old, grab the 64-bit version.  If it's a really old laptop you may need the 32-bit version (but you can always try the 64-bit version first to see if it works — it does no harm to try).

Clicking the link will initiate a download of an executable installer called `Miniconda3-latest-Windows-x86_64.exe` (for the 64-bit version).  Download this anywhere you like (e.g. your desktop) and click it to execute the installer.  Follow the installation wizard — the default options should do fine.

Once installed, open a terminal window…

    Start Menu -> Run -> Command Prompt

At tbis point, you can continue on to the following section…

### Check your installation

Check on the status of your anaconda installation.  We'll do this by
invoking the `conda` command for the very first time!

In [None]:
$ conda info

The result should look something like this:

```
Current conda install:

             platform : linux-32
        conda version : 4.3.13
  conda-build version : not installed
       python version : 3.6.0.final.0
     requests version : 2.12.4
     root environment : /homes/evopserver/johann/miniconda3  (writable)
  default environment : /homes/evopserver/johann/miniconda3
     envs directories : /homes/evopserver/johann/miniconda3/envs
        package cache : /homes/evopserver/johann/miniconda3/pkgs
         channel URLs : https://repo.continuum.io/pkgs/free/linux-32/
                        https://repo.continuum.io/pkgs/free/noarch/
                        https://repo.continuum.io/pkgs/pro/linux-32/
                        https://repo.continuum.io/pkgs/pro/noarch/
          config file : None
    is foreign system : False
```

## Using Anaconda

### Updating Anaconda

As a first step, let's tell `conda` to update itself (in case there are newer versions available):

In [None]:
conda update conda

If any updates are available, `conda` will tell you what it is about to do. Just hit ‘`y`’ to proceed when you get this prompt:

    Proceed ([y]/n)?

Getting `conda` to update itself is generally the first thing you should do in
a new Anaconda installation.  But it's a also good idea to do this from time to
time in a working Anaconda installation.

### View a list of installed packages and their versions.

Let's see a list of packages already installed.  To do this, we use the command…

In [None]:
conda list

The output on a brand-new Anaconda installation should look something like this:

```
# packages in environment at /homes/evopserver/johann/miniconda3:
#
cffi                      1.9.1                    py36_0  
conda                     4.3.13                   py36_0  
conda-env                 2.6.0                         0  
cryptography              1.7.1                    py36_0  
idna                      2.2                      py36_0  
openssl                   1.0.2k                        0  
pip                       9.0.1                    py36_1  
pyasn1                    0.1.9                    py36_0  
pycosat                   0.6.1                    py36_1  
pycparser                 2.17                     py36_0  
pyopenssl                 16.2.0                   py36_0  
python                    3.6.0                         0  
readline                  6.2                           2  
requests                  2.12.4                   py36_0  
ruamel_yaml               0.11.14                  py36_1  
setuptools                27.2.0                   py36_0  
six                       1.10.0                   py36_0  
sqlite                    3.13.0                        0  
tk                        8.5.18                        0  
wheel                     0.29.0                   py36_0  
xz                        5.2.2                         1  
yaml                      0.1.6                         0  
zlib                      1.2.8                         3  
```

We have Python 3.6.0 installed, as well as version 4.3.13 of `conda`.  And
a couple of other very basic packages and libraries.  Everything's looking shiny.

### Search for a package in the Anaconda Cloud

As said, we're going to be using a tool called Jupyter for most of this course.  Jupyter is itself written in Python, and distributed as a Python package.  Let's search for Jupyter in the Anaconda cloud:

In [None]:
conda search jupyter

You'll see that you get quite a bit of output since Jupyter has several components.  Right at the top you should see a listing for a package called simpley `jupyter` which is currently in version 1.0.0.  This is the one we need to install, and we'll do so next…

### Install a package from the Anaconda Cloud

To install Jupyter and all its dependencies (other modules upon which it depends), all we have to do is issue one command:

In [None]:
conda install jupyter

`conda` computes all the dependencies and then presents you with a list of packages that will be installed and/or updated.  Again, it waits for your permission to proceed, and you can hit `y` to tell it to go ahead.

As the saying goes:  Sit back and marvel at all the work you *didn't* have to
do!  (You'll probably be in a better position to appreciate this if you've ever
tried to install Jupyter by hand!)

Afterwards, you can again do a `conda list` to see which packages are now installed.

### Install a package that isn't on the Anaconda Cloud

Anaconda makes it very easy — almost trivial — to install Python packages, but as we've said, it can only install packages which have been made available via the Anaconda Cloud.  Since Anaconda is aimed at scientists, many of the packages we might want to use *are* on the Anaconda Cloud… but not necessarily all of them.  What do we do when we really want to install a Python package that isn't being made available via the Anaconda cloud?  Let's look at at an example:

We want to process files in the popular variant call format (VCF) using Python.  The first thing we might do — before we start writing a VCF parser ourselves — is to see whether someone has already done it.  And we might well do this by typing `python vcf` into Google.  (Try it!)

From our search results we'll see that such a module exists!  It's called PyVCF, and its documentation is to be found at <https://pyvcf.readthedocs.org>.

Let's try to search for it using `conda`:

In [None]:
conda search pyvcf

Oh dear!  `conda` explicitly tells us that PyVCF is not available on the Anaconda Cloud.

Fortunately you can still use Python's built-in package installer `pip` to install packages, even if you've installed Python using Anaconda.  By using `pip` you lost all the advantages that Anaconda provides:  packages are often downloaded as source code, and might need to be built on your local system.  But at least you now have the opportunity to use them!

Let's try installing PyVCF using `pip`:

In [None]:
pip install pyvcf

If all goes well, this should just work.  Let's use `conda` to see whether a `pyvcf` package has been installed:

In [None]:
conda list

In the results, you should see a line like…

    pyvcf                     0.6.7                     <pip>

As you can see, `conda` clearly tells you that this package was installed not by its own machinery, but by `pip`.

>I would suggest minimising the use of `pip` if you manage your Python environment with Anaconda.  But it's great to know it's there if you need it!

## Online package databases

You can search the Anaconda cloud by going to…

<http://anaconda.org>

As said, these are curated binary packages meant for installation with the `conda` tool.

You can also search the entire Python Package Index (PyPI) by clicking on this URL:

<http://pypi.python.org>

PyPI is an (incomplete) index of all 3rd party Python packages.  It's only very lightly curated, so it's can be hard to find worthwhile packages in a sea of junk.  Just try to search for a word like `bioinformatics` and see how many hits you get!

---