Join GitHub today
GitHub is home to over 28 million developers working together to host and review code, manage projects, and build software together.Sign up
Guide to installing machine learning libraries in python
10/24/13 Guide to Installing Machine Learning Libraries in Python
Installing the machine learning toolkit on Python might get a bit problematic at times. This guide hopefully would give you step by step instructions with the goal of setting up your environment ready for the data science-y stuff. Although this instructions should get you there, it is not all-encompassing. If you have any questions about any of the steps, please let Jason or me know and we will be happy to explain in more detail (or figure it out together!). Note that the tutorial is written by a Mac OS X 10.8 user.
Python 2.5 or above comes preinstalled on your Mac. But we strongly recommend that you use the versions of tools and the libraries described below to use the packages in real life.
Before you install Python, you will need to install XCode and the smaller Command Line Tools. Download XCode from app store and install it -This will take time!-. After installation, go to Preferences --> Downloads --> Command Line Tools, click 'install' to install the commands.
Homebrew and Python:
MacOS systems are missing a decent package manager. The solution is Homebrew, which will take care of things for you. Install homebrew by:
ruby -e "$(curl -fsSL https://raw.github.com/mxcl/homebrew/go)"
Install Python via Homebrew with a framework build:
brew install python --framework
This will take a minute or two. Once it is done, you need to make sure that your default Pyhton directory is pointing to the new Python and the packages we will install, not to the preinstalled version. Easy way to check this is to install a package, launch Python and try to import the new module. If it works, you are good! If you are installing to a location (like /usr/local) that requires higher permissions, you may need to the pip commands with sudo.
Set PATH in .bashrc profile for homebrew Python:
Homebrew will also install the handy Python installation package 'pip' along with Python. Alternatively, you can install pip by:
Here is a trick. You can see which packages were installed with:
This is an important step because many dependencies later will be included in this package:
brew install gfortran
Virtual Environment (Optional but Recommended):
The virtual environment allows you to install packages such that they are isolated from the system-wide packages, essentially by redefining the system path in a clever way. In general, it is a good practice to create completely clean Python environments for each project.
[sudo] pip install virtualenv
activate: Activate an environment called 'default' in an 'Env' directory.
mkdir ~/Env cd ~/Env virtualenv default source default/bin/activate
You can deactivate the environment by simply entering 'deactivate' in the command line.
Install Python Packages:
Let's install everything else:
[sudo] pip install numpy
[sudo] pip install scipy
Matplotlib library requires external packages to be installed before installing matplotlib.
brew install freetype
We should be good to go with matplotlib now:
[sudo] pip install matplotlib
- GGplot is an excellent plotting system initially built for R. The good news is that it has become available on Python too!
We need to do a bit of work to have it installed. There are two dependencies missing:
https://github.com/yhat/ggplot/ a) patsy sudo pip install patsy b) statsmodels sudo pip install statsmodels c) now you can install ggplot. Download via curl, unzip the folder and install via pip. https://github.com/yhat/ggplot/ # matplotlibrc from Huy Nguyen (http://www.huyng.com/posts/sane-color-scheme-for-matplotlib/) $ curl -O https://raw.github.com/yhat/ggplot/master/matplotlibrcs/matplotlibrc-osx.zip $ unzip matplotlibrc-osx.zip -d ~/ # install ggplot using pip $ pip install ggplot
Finally, scikit package.
[sudo] pip install scikit-learn
To make sure everything works, start python and import the packages:
import numpy, import scipy etc..
Let's also install iPython. IPython is a great tool for interactive computing.
[sudo] pip install ipython
I hope that the pip command will take care of iPython installation but I remember running into dependency issues recently, particularly in 'readline' module. You can dig deeper here:
Building a package from the source is a great learning exercise and might be the only way if everything else fails. The best way to install these is to follow the directions on the package website, which will typically tell you to build from source (btw, "building from source" means downloading the source code itself and issuing the build/install commands from the shell, rather than downloading an installation file/disk image/etc and running an application)
To build from source, you download the source code (eg to your ~/Downloads directory), cd into the newly downloaded and look for a file called "setup.py"...this setup.py file should appear in any python source package you pull.
Onstallation proceeds by typing "python setup.py install" from the directory containing setup.py (or "python setup.py build" followed by "python setup.py install" if you want to see it work in two steps). You can verify that you've succeeded by checking your python library (via "pip freeze") and/or trying to import your newly installed module from the python interpreter. Once this is done, you can safely remove the code you downloaded, since the installation process basically consists of copying this into another directory on your machine (called "site-packages", which is in the python path)
Any questions, please let us know!