Skip to content

Guide to installing machine learning libraries in python

Deniz edited this page Nov 4, 2013 · 1 revision

10/24/13 Guide to Installing Machine Learning Libraries in Python

Installing the machine learning toolkit on Python might get a bit problematic at times. This guide hopefully would give you step by step instructions with the goal of setting up your environment ready for the data science-y stuff. Although this instructions should get you there, it is not all-encompassing. If you have any questions about any of the steps, please let Jason or me know and we will be happy to explain in more detail (or figure it out together!). Note that the tutorial is written by a Mac OS X 10.8 user.

Preamble:

Python 2.5 or above comes preinstalled on your Mac. But we strongly recommend that you use the versions of tools and the libraries described below to use the packages in real life.

Before you install Python, you will need to install XCode and the smaller Command Line Tools. Download XCode from app store and install it -This will take time!-. After installation, go to Preferences --> Downloads --> Command Line Tools, click 'install' to install the commands.

Homebrew and Python:

MacOS systems are missing a decent package manager. The solution is Homebrew, which will take care of things for you. Install homebrew by:

ruby -e "$(curl -fsSL https://raw.github.com/mxcl/homebrew/go)"

Install Python via Homebrew with a framework build:

brew install python --framework 

This will take a minute or two. Once it is done, you need to make sure that your default Pyhton directory is pointing to the new Python and the packages we will install, not to the preinstalled version. Easy way to check this is to install a package, launch Python and try to import the new module. If it works, you are good! If you are installing to a location (like /usr/local) that requires higher permissions, you may need to the pip commands with sudo.

Set PATH in .bashrc profile for homebrew Python:

export PATH=/usr/local/share/python:$PATH

Homebrew will also install the handy Python installation package 'pip' along with Python. Alternatively, you can install pip by:

easy_install pip

Here is a trick. You can see which packages were installed with:

pip freeze

Install gfortran:

This is an important step because many dependencies later will be included in this package:

brew install gfortran

Virtual Environment (Optional but Recommended):

The virtual environment allows you to install packages such that they are isolated from the system-wide packages, essentially by redefining the system path in a clever way. In general, it is a good practice to create completely clean Python environments for each project.

[sudo] pip install virtualenv

activate: Activate an environment called 'default' in an 'Env' directory.

mkdir ~/Env
cd ~/Env
virtualenv default
source default/bin/activate

You can deactivate the environment by simply entering 'deactivate' in the command line.

Install Python Packages:

Let's install everything else:

  1. Install numpy

    [sudo] pip install numpy

  2. Install scipy

    [sudo] pip install scipy

  3. Matplotlib library requires external packages to be installed before installing matplotlib.

    http://matplotlib.org/faq/installing_faq.html#how-to-install

    brew install freetype

We should be good to go with matplotlib now:

[sudo] pip install matplotlib
  1. GGplot is an excellent plotting system initially built for R. The good news is that it has become available on Python too!

We need to do a bit of work to have it installed. There are two dependencies missing:

https://github.com/yhat/ggplot/

a) patsy

	sudo pip install patsy

b) statsmodels

	sudo pip install statsmodels

c) now you can install ggplot. Download via curl, unzip the folder and install via pip. 

	https://github.com/yhat/ggplot/

	# matplotlibrc from Huy Nguyen (http://www.huyng.com/posts/sane-color-scheme-for-matplotlib/)
	$ curl -O https://raw.github.com/yhat/ggplot/master/matplotlibrcs/matplotlibrc-osx.zip
	$ unzip matplotlibrc-osx.zip -d ~/
	# install ggplot using pip
	$ pip install ggplot
  1. Finally, scikit package.

    [sudo] pip install scikit-learn

To make sure everything works, start python and import the packages:

import numpy, import scipy etc.. 
  1. Let's also install iPython. IPython is a great tool for interactive computing.

    [sudo] pip install ipython

I hope that the pip command will take care of iPython installation but I remember running into dependency issues recently, particularly in 'readline' module. You can dig deeper here:

http://ipython.org/ipython-doc/dev/install/install.html

Building a package from the source is a great learning exercise and might be the only way if everything else fails. The best way to install these is to follow the directions on the package website, which will typically tell you to build from source (btw, "building from source" means downloading the source code itself and issuing the build/install commands from the shell, rather than downloading an installation file/disk image/etc and running an application)

To build from source, you download the source code (eg to your ~/Downloads directory), cd into the newly downloaded and look for a file called "setup.py"...this setup.py file should appear in any python source package you pull.

Onstallation proceeds by typing "python setup.py install" from the directory containing setup.py (or "python setup.py build" followed by "python setup.py install" if you want to see it work in two steps). You can verify that you've succeeded by checking your python library (via "pip freeze") and/or trying to import your newly installed module from the python interpreter. Once this is done, you can safely remove the code you downloaded, since the installation process basically consists of copying this into another directory on your machine (called "site-packages", which is in the python path)

Any questions, please let us know!