# Prerequisites

## Python
You first need to set up a Python environment (if you do not have done so already). The easiest way to do this is by installing [Anaconda](https://www.continuum.io/downloads). We will be using Python 3, so be sure to install the right version.

Note: if you are not using Anaconda, and you already have a custom Python environment set up, possibly using a different Python version, it may be wise to [set up a virtual environment](http://docs.python-guide.org/en/latest/dev/virtualenvs/) for this course so that it does not affect your existing environment. 

## Required packages
Next, you'll need to install several packages that we'll be using extensively. You'll need to run these commands on the command line.

### Installing packages with conda
If you are using Anaconda, you can use the ``conda`` package manager to install all packages:

    conda install numpy scipy scikit-learn matplotlib pandas pillow graphviz

and then *also*

    pip install graphviz


### Installing packages with pip
With most other setups, you can use pip to install all packages. Pip is the Python Package index it is included in most Python installations.

    pip install numpy scipy scikit-learn matplotlib pandas pillow graphviz

Note: we'll be using scikit-learn 0.18, which is currently the latest version.

You also need to install the graphiz C-library:  
* OS X: use homebrew: ``brew install graphviz``
* Ubuntu/debian: use apt-get: ``apt-get install graphviz``.
* Installing graphviz on Windows can be tricky and using conda / anaconda is recommended.

### Installing OpenML
The OpenML package is not yet available through pip. We will need to install it from github:
This requires git. If you don't have git, it's [easy to install](https://git-scm.com/book/en/v1/Getting-Started-Installing-Git).

    pip install git+https://github.com/openml/openml-python.git@develop
    
You may have to install the latest developer version of liac-arff manually for this to work:

    pip install git+https://github.com/renatopp/liac-arff@master
    
You'll also need an OpenML account to download/upload data. If you don't have one, [go ahead and create one](www.openml.org).

### Installing Jupyter notebooks
As our coding environment, we'll be using Jupyter notebooks. They interleave documentation (in markdown) with executable Python code, and they run in your browser. That means that you can easily edit and re-run all the code in this course using only your browser, and you can make your own notes in your copy of the notebook as well.

If you use Anaconda, Jupyter is already installed. If you use pip, you can install it with

    pip3 install jupyter
    
To test if it works, run

    jupyter notebook
    
A browser window should open showing the files in your current directory. You can shut down the notebook by typing CTRL-C in your terminal.

If you are new to notebooks, [take this quick tutorial](https://try.jupyter.org/). Optionally, for a more in-depth coverage, [try the DataCamp tutorial](https://www.datacamp.com/community/tutorials/tutorial-jupyter-notebook#gs.wlHChdo).

## Course materials
You can download all the notebooks for this course [as a .zip file](https://github.com/openml/machine_learning_introduction). Click 'Clone or download' > 'Download ZIP'.

However, a better way is to use git to clone the resources. That makes sure that you can easily download future updates. 

    git clone https://github.com/openml/machine_learning_introduction.git
    
To download updates, run

    git pull


## Testing
To test if everything works, run Jupyter (ideally from the directory where you downloaded the course):

    jupyter notebook
    
A browser window should open with all course materials. Open one of the chapters and check if you can execute all code by clicking Cell > Run all.

Alternatively, if you prefer to create your own scripts, you can [browse the notebooks on GitHub](https://github.com/openml/machine_learning_introduction) and copy code to your scripts, or you can run the notebooks and extract all the code using File > Download as > Python.

## Alternative: Everware
In case you run into any issues, you can also run all materials in the cloud. This is a special (private beta) service provided by [Everware](https://github.com/everware) with computing resources provided by the [Yandex School of Data Analytics](https://yandexdataschool.com/).

You'll need a GitHub account to authenticate. If you dont, [you can make one now](github.com). To spin up the service, just click here (it may take a few minutes to boot):  
[![run at everware](https://img.shields.io/badge/run%20me-@everware-blue.svg?style=flat)](https://everware.rep.school.yandex.net/hub/oauth_login?repourl=https://github.com/openml/machine_learning_introduction)