Skip to content
Go to file

Latest commit


Git stats


Failed to load latest commit information.
Latest commit message
Commit time

Course on machine learning with scikit-learn

This repository contains files and other info associated with a three hour course I am on scikit-learn.


These materials can change at any time, as I work on the course

Installation Notes

This tutorial will require recent installations of numpy, scipy, matplotlib, scikit-learn, pandas and Pillow (or PIL).

For users who do not yet have these packages installed, a relatively painless way to install all the requirements is to use a package such as Anaconda, which can be downloaded and installed for free.

Please down in advance the Olivetti and/or the LFW datasets using::

from sklearn import datasets datasets.fetch_olivetti_faces() datasets.fetch_lfw_people()

Reading the training materials

Not all the material will be covered in the course: there is not enough time available. However, you can follow the material by yourself.

With the IPython notebook

The recommended way to access the materials is to execute them in the IPython notebook. If you have the IPython notebook installed, you should download the materials (see below), go the the notebooks directory, and launch IPython notebook from there by typing:

cd notebooks
ipython notebook

in your terminal window. This will open a notebook panel load in your web browser.

On Internet

If you don't have the IPython notebook installed, you can browse the files on Internet:

Downloading the Tutorial Materials

I would highly recommend using git, not only for this tutorial, but for the general betterment of your life. Once git is installed, you can clone the material in this tutorial by using the git address shown above:

If you can't or don't want to install git, there is a link above to download the contents of this repository as a zip file. I may make minor changes to the repository in the days before the tutorial, however, so cloning the repository is a much better option.

Data Downloads

The data for this tutorial is not included in the repository. We will be using several data sets during the tutorial: most are built-in to scikit-learn, which includes code which automatically downloads and caches these data. Because the wireless network at the course venue can often be spotty, it would be a good idea to download these data sets before arriving. You can do so by using the included in the tutorial materials.

Original material from the Scipy 2013 tutorial

This material is adapted from the scipy 2013 tutorial:

Original authors:


Materials for a course on scikit-learn at ENSAE




No releases published

Contributors 4

You can’t perform that action at this time.