# Installing needed software

Modern scientific research often either requires or can greatly benefit from the use of computational tools. Even if you're not on the cutting edge, you can make your life better and easier using these tools to conduct and share your research. The goal here is to give a brief overview of the tools we'll be using in this course, how to install them along with other tools on your system, and how to get help when things aren't working quite like they should.

* Installing Python and the SciPy stack
* Installing R and additional packages
* Installing TeX and TeX Live

A quick disclaimer before we dive in. These instructions are oriented towards Unix-like operating systems (i.e. Linux, Mac OS X). If you are running Windows, strongly consider installing [Cygwin](https://www.cygwin.com/) or creating a [linux partition](http://www.everydaylinuxuser.com/2014/05/install-ubuntu-1404-alongside-windows.html) on your machine. This will allow you to follow along with the rest of the course and install many if not most of the things you could ever need with a single terminal command. 

# Python

## Description

[Python](https://www.python.org/) is a [widely-used, highly-extensible](https://www.python.org/about/success/#education), and easily-learned programming language.

In [1]:
from IPython.display import Image
Image(url='http://imgs.xkcd.com/comics/python.png')

Within the Python computing environment, we'll be using several useful extensions. Collectively, these are referred to as the [SciPy stack](http://www.scipy.org/stackspec.html#stackspec). The current version of the SciPy stack requires the following versions of various tools:

* [Python](https://www.python.org/)  (2.x >= 2.6 or 3.x >= 3.2) : see above, results may vary
* [NumPy](https://github.com/numpy/numpy)  (>= 1.6) : fundamental package needed for scientific computing with python
* [SciPy](https://github.com/scipy/scipy) (>= 0.10) : open-source software for mathematics, science, and engineering
* [Matplotlib](http://matplotlib.org/) (>= 1.1) : open-source plotting library, deeply integrated with python
* [IPython]() (>= 0.13) : a command shell for interactive computing in multiple programming languages
* [Pandas](https://github.com/pydata/pandas)  (>= 0.8) : fast, flexible, and expressive data structures
* [Sympy](http://www.sympy.org/en/index.html) (>= 0.7) : library for symbolic mathematics in python, moving towards a full computer algebra system

We'll also be using third-party [SciPy toolkits](https://scikits.appspot.com/):

* [Scikit-learn](http://scikit-learn.org/dev/index.html) : simple, efficient, and reusable tools for machine learning
* [Scikit-statsmodels](https://pypi.python.org/pypi/scikits.statsmodels) : Python package for statistical computations
* ...

You'll find lots of detailed tools at Scikits. If you're thinking about implementing something in your own work, check these kits and the Python Package Index ([PyPI](https://pypi.python.org/pypi)) to see if someone has already done the work.  Below we'll talk about how to install packages from these locations.

## Installation

If you're already started using some of these tools in your own research, then you've probably installed at least Python, and maybe more. To start off, let's find out what you've installed and where it's installed.

As a start let's find out the version of python we've got installed. You can ignore the first line, which just tells IPython to treat what follows as input to the bash shell. So far so good!

In [3]:
%%bash
echo "Python version is:"
python -V

Python version is:


Python 2.7.10 :: Anaconda 2.3.0 (64-bit)


Now, let's check to see what else is installed. Again, we can do this from the command line. But, since we're trying to find out whether a python package, rather than python itself is installed, we'll ask python to run the command in quotes. The option `-c` tells python to treat the quoted text as a command. The semicolon inside breaks up the commands. If something isn't installed, we'll get an error like the following.

In [4]:
%%bash
python -c "import notinstalledyet; print notinstalledyet.__version__"

Traceback (most recent call last):
  File "<string>", line 1, in <module>
ImportError: No module named notinstalledyet


Otherwise, we'll find out what version is installed.

In [5]:
%%bash
echo "NumPy version is:"
python -c "import numpy; print numpy.__version__"
echo "SciPy version is:"
python -c "import scipy; print scipy.__version__"
echo "Matplotlib version is:"
python -c "import matplotlib; print matplotlib.__version__"
echo "IPython version is:"
ipython -V
echo "Pandas version is:"
python -c "import pandas; print pandas.__version__"
echo "SymPy version is:"
python -c "import sympy; print sympy.__version__"

NumPy version is:
1.9.2
SciPy version is:
0.15.1
Matplotlib version is:
1.4.3
IPython version is:
3.2.0
Pandas version is:
0.16.2
SymPy version is:
0.7.6


Alright, it looks like we've got the SciPy stack installed. This is good, but not a trivial accomplishment. Sometimes getting to this point can be a bit frustrating and require a lot of time debugging and googling error messages.

In [6]:
Image(url='https://imgs.xkcd.com/comics/dependencies.png')

For the purposes of this course, we suggest using the [Anaconda](https://store.continuum.io/cshop/anaconda/) scientific Python distribution, which is supported on Linux, Windows and Mac OS X. The benefits of using Anaconda are:

* Parallel to other Python installations
* Maintained by [Continuum](http://continuum.io/)
* [Free](https://en.wikipedia.org/wiki/The_Free_Software_Definition) in several ways

Take a few minutes to install Anaconda and then we'll give a quick run down of its structure. Know that we've installed Anaconda, let's take a look at where our operating system thinks Python is. The default installation location is in your home directory.

In [7]:
%%bash
which python

/home/cahern/anaconda/bin/python


But wait, didn't we say that Anaconda installs Python in parallel to other installations. It does, we've just listed the default location. Let's look for all locations of Python. The option `-a` selects all pathnames to installations of Python rather than just the default.

In [10]:
%%bash
which -a python

/home/cahern-adm/anaconda/bin/python
/usr/bin/python


There we go! It now looks like we have two different versions of Python installed at different locations. Take a look at the versions of the different packages we'll be using in each of the two . Note that the versions used by different installations of Python are not necessarily the same. It's important to remember this as you install additional packages.

In [11]:
%%bash
/home/cahern/anaconda/bin/python -c "import numpy; print numpy.__version__"
/usr/bin/python -c "import numpy; print numpy.__version__"

1.9.2
1.8.2


Anaconda comes with a package and virtual environment manager `conda`. We can list the packages currently installed for and accessible to Anaconda. This yields some of the information we found earlier. Namely, the version numbers of the different packages we'll be using.

In [12]:
%%bash
conda list

# packages in environment at /home/cahern/anaconda:
#
_license                  1.1                      py27_0  
abstract-rendering        0.5.1                np19py27_0  
alabaster                 0.7.3                    py27_0  
anaconda                  2.3.0                np19py27_0  
argcomplete               0.8.9                    py27_0  
astropy                   1.0.3                np19py27_0  
babel                     1.3                      py27_0  
backports.ssl-match-hostname 3.4.0.2                   <pip>
bcolz                     0.9.0                np19py27_0  
beautiful-soup            4.3.2                    py27_0  
beautifulsoup4            4.3.2                     <pip>
binstar                   0.11.0                   py27_0  
bitarray                  0.8.1                    py27_0  
blaze                     0.8.0                     <pip>
blaze-core                0.8.0                np19py27_0  
blz                       0.6.2                np

Before doing anything else, let's update `conda` and `anaconda`.

In [22]:
%%bash
conda update conda
conda update anaconda

Fetching package metadata: ....
# All requested packages already installed.
# packages in environment at /home/cahern/anaconda:
#
conda                     3.14.1                   py27_0  
Fetching package metadata: ....
# All requested packages already installed.
# packages in environment at /home/cahern/anaconda:
#
anaconda                  2.3.0                np19py27_0  


Note that in addition to the full SciPy stack Anaconda comes with [a lot](http://docs.continuum.io/anaconda/pkg-docs.html) of packages and tools ready to use. This is one of the benefits of using a maintained Python distribution: it's possible that you'll never have to install anything beyond what's already included. In addition to the packages that are already installed, there are more to be installed by searching.

In [14]:
%%bash
conda list | wc -l
conda search | grep '*' | wc -l

155
148


For example, if we want to use [MCMC methods](https://en.wikipedia.org/wiki/Markov_chain_Monte_Carlo) in Python, we can install [PyMC](http://camdavidsonpilon.github.io/Probabilistic-Programming-and-Bayesian-Methods-for-Hackers/).

In [15]:
%%bash
conda search pymc

Fetching package metadata: ....
pymc                         2.2                 np17py27_p0  defaults        
                             2.2                  np17py27_0  defaults        
                             2.3                  np17py27_0  defaults        
                             2.3                  np18py27_1  defaults        
                             2.3.2                np18py27_0  defaults        
                             2.3.2                np18py26_0  defaults        
                             2.3.3                np19py27_0  defaults        
                             2.3.3                np19py26_0  defaults        
                             2.3.3                np18py27_0  defaults        
                             2.3.3                np18py26_0  defaults        
                             2.3.4               np19py34_p0  defaults        [mkl]
                             2.3.4                np19py34_0  defaults        
               

In [16]:
%%bash
conda install pymc

Fetching package metadata: ....
Solving package specifications: .
Package plan for installation in environment /home/cahern/anaconda:

The following packages will be downloaded:

    package                    |            build
    ---------------------------|-----------------
    setuptools-18.0.1          |           py27_0         341 KB
    pip-7.1.0                  |           py27_0         1.4 MB
    pymc-2.3.4                 |       np19py27_0         1.7 MB
    ------------------------------------------------------------
                                           Total:         3.4 MB

The following NEW packages will be INSTALLED:

    pymc:       2.3.4-np19py27_0

The following packages will be UPDATED:

    pip:        7.0.3-py27_0  --> 7.1.0-py27_0    
    setuptools: 17.1.1-py27_0 --> 18.0.1-py27_0   

Proceed ([y]/n)? 
Fetching packages ...
setuptools-18.   0% |                                                                                           | ETA:  --:--:--  

But, what if I want something that's outside of `conda search`? There are certainly [ways to do so](http://conda.pydata.org/), but for this course we're going to treat Anaconda as a closed ecosystem.

In [29]:
Image(url='http://i.dailymail.co.uk/i/pix/2013/01/24/article-2267504-17212EB3000005DC-781_634x663.jpg')

If the closed ecosystem is stifling, you can break out into the larger world by changing your default Python installation. Navigate to your home directory and either comment out the following line. Reload the bash script that runs whenever you start a new session, and `which -a python` should no longer contain Anaconda. You can still use the Anaconda install, but it won't be the default.

In [17]:
%%bash
cd ~
grep "anaconda" .bashrc
source ~/.bashrc

export PATH="/home/cahern/anaconda/bin:$PATH"


To install more packages you'll need to install [pip](https://pip.pypa.io/en/stable/installing.html#install-pip). Pip will allow you to install packages from the Python Package Index (PyPI) with a single command. Note that the Anaconda python installation does not have access to packages installed via pip, and the default python installation on your system does not have access to the packages installed via Anaconda. This is either a feature or a drawback of using the Anaconda distribution.

In [18]:
%%bash
pip install textblob

Collecting textblob
  Downloading textblob-0.9.1-py2.py3-none-any.whl (633kB)
Installing collected packages: textblob
Successfully installed textblob-0.9.1


In [19]:
%%bash
pip list | tail
pip list | wc -l

textblob (0.9.1)
Theano (0.7.0)
toolz (0.7.2)
tornado (4.2)
ujson (1.33)
unicodecsv (0.9.4)
Werkzeug (0.10.4)
xlrd (0.9.3)
XlsxWriter (0.7.3)
xlwt (1.0.0)
115


# R

## Description

[R](http://www.r-project.org/) is a [widely-used](http://www.nature.com/news/programming-tools-adventures-with-r-1.16609) programming language and software environment for statistical computing and graphics.

In [20]:
from IPython.display import Image
Image(url='http://1.bp.blogspot.com/-W8bv1BMOwEk/UjtKZ22SGXI/AAAAAAAAAFs/FCJiV-k-4PQ/s1600/R+Pirate.png')

## Installation

R can be downloaded from the closest [mirror server](http://cran.r-project.org/mirrors.html). Follow the installation instructions.

In [26]:
%load_ext rpy2

ImportError: No module named rpy2

The simplest way to install additional packages in ```R``` is to use ```install.packages("package_name")```. It might be particularly useful to install the package ```devtools```. This will allow you to install packages from github.

In [None]:
%%R
install_github("jofrhwld/UhUm")
library(UhUm)

To add on to ```R```, we'll be using [RStudio]() as a way to analyze data and create documents. Go ahead and download and install it.

# TeX

## Description

TeX is an open source, powerful, and flexible typesetting system that is the standard in many fields. The learning curve is steep relative to other word processing programs, but arguably pays off in terms of expressiveness and customizability.

## Installation

To install TeX you'll need to install the appropriate version of [TeXLive]() for your system. Note that this can take a long time. It might be best to leave this for later when you have a couple of hours to occasioanlly check in on the install process.

The last thing we'll need with regard to typesetting is the [Pandoc]() package, which allows for automatic conversion between different document formats. Go ahead and install the appropriate package for your system.

# Summary 

At the end of following these instructions you should have the following:

* SciPy stack (see above for details)
* R and RStudio
* TeXLive
* Pandoc

Unfortunately, this doesn't mean things will have gone smoothly for you up to this point. In fact, it would be surprising if there were no issues. The best way to resolve issues is to consult the [oracle](www.lmgtfy.com).

In [None]:
# Replace with IT google
Image(url='http://imgs.xkcd.com/comics/python.png')