# Machine Learning and Statistics for Physicists

Material for a [UC Irvine](https://uci.edu/) course offered by the [Department of Physics and Astronomy](https://www.physics.uci.edu/).

Content is maintained on [github](github.com/dkirkby/MachineLearningStatistics) and distributed under a [BSD3 license](https://opensource.org/licenses/BSD-3-Clause).

[Table of contents](Contents.ipynb)

## Prerequisites

- [Create a github account](https://github.com/join) if you don't already have one.
- [Install the git command-line tools](https://git-scm.com/downloads) on your computer, if necessary.
- Install the python 3.6 version of [anaconda](https://www.anaconda.com/download/) on your computer, if necessary.

This course assumes a basic familiarity with the core python language. If you are rusty or still learning, I recommend the free ebook [A Whirlwind Tour of Python](http://www.oreilly.com/programming/free/a-whirlwind-tour-of-python.csp), which is *"a fast-paced introduction to essential components of the Python language for researchers and developers who are already familiar with programming in another language"*.

If you are currently using python 2.x and reluctant to move to python 3, read [this](https://wiki.python.org/moin/Python2orPython3) and [this](http://www.python3statement.org/).

No previous experience with git or github is necessary for this course (but they are useful research tools so worth learning - [here](https://guides.github.com/introduction/git-handbook/) is a good starting point).  If you are finding the git learning curve to be steep, you are [not alone](https://explainxkcd.com/wiki/index.php/1597:_Git).

## Create the course python environment

We will use the [conda command](https://conda.io/docs/commands.html) to create a standard [python environment](https://conda.io/docs/user-guide/tasks/manage-environments.html) for this course. These instructions assume that you have already satisfied the prerequisites.

Create a new environment by entering (or pasting) the following command at a shell prompt. **Enter the command on one line**, even though it appears on two lines below:
```
conda create -n MLS python=3.6 pip ipython jupyter numpy scipy pandas
  matplotlib seaborn scikit-learn hdf5 pytables pillow
```
Activate the new environment using (this should add "(MLS)" to your command prompt, as a reminder of your current environment):
```
source activate MLS
```
Add some additional packages from other sources (details [here](https://github.com/ipython-contrib/jupyter_contrib_nbextensions#conda) and [here](https://github.com/ipython-contrib/jupyter_contrib_nbextensions/issues/1153)):
```
conda install -c conda-forge libiconv jupyter_contrib_nbextensions
conda install -c astropy emcee astroml
pip install wpca autograd tensorflow edward
```
Enable a jupyter notebook [extension](https://github.com/ipython-contrib/jupyter_contrib_nbextensions/tree/master/src/jupyter_contrib_nbextensions/nbextensions/exercise2) we will use for in-class exercises:
```
jupyter nbextension enable exercise2/main
```

In case something goes wrong with your installation and you want to start again, use:
```
conda remove --name MLS --all
```
You will need to shutdown any jupyter sessions with the old environment first.

## Install course material

Clone the course material from github with the following command, which will create a subdirectory called `MachineLearningStatistics`:
```
git clone https://github.com/dkirkby/MachineLearningStatistics.git
```
This should ask you for your github username and password (but you can streamline future [github access using ssh](https://help.github.com/articles/which-remote-url-should-i-use/)).

Activate the course environment, if necessary (check your command prompt, but it doesn't do any harm to reactivate the current environment):
```
source activate MLS
```
Install the course code and data using:
```
cd MachineLearningStatistics
pip install .
```

## Launch notebook server

To launch the [notebook server](http://jupyter-notebook.readthedocs.io/en/stable/notebook.html) at any time, you can now use:
```
[[MachineLearningStatistics]]
source activate MLS
cd notebooks
jupyter notebook
```
Note that `[[MachineLearningStatistics]]` is a reminder that you must be in your `MachineLearningStatistics` directory before typing the following commands.  If you are unsure about this, refer to the [pwd](http://www.linfo.org/pwd.html) and [cd](http://www.linfo.org/cd.html) commands.

**Windows users:** Wherever you see `source activate MLS`, use `activate MLS` instead.  Details [here](https://conda.io/docs/user-guide/tasks/manage-environments.html#activating-an-environment).

Click on `Contents.ipynb` if this is your first time doing this, to check that you can view a notebook.

*(For git experts: you will normally be working on the master branch to simplify the workflow. This means that your local work must be discarded or saved to another branch each time you update, using the instructions below).*

## Update course material

You can skip this section if you are installing for the first time, but remember these instructions for later.

The first step is to "factory reset" your installation before getting the updates. The simplest method is to throw away any changes you have made using:
```
[[MachineLearningStatistics]]
git checkout master
git reset --hard
```
Alternatively, you can keep a permanent record of your changes in a [git branch](https://git-scm.com/book/en/v2/Git-Branching-Branches-in-a-Nutshell) with a name of your choice, for example "08-Jan-2018":
```
[[MachineLearningStatistics]]
git checkout -b "08-Jan-2018"
git commit -a -m "Save work in progress"
git checkout master
```

The second step is to download the changes from github:
```
[[MachineLearningStatistics]]
git pull
```
If this commands reports `Already up-to-date.` then there are no updates to download.

The final step is to update your local python environment:
```
[[MachineLearningStatistics]]
source activate MLS
pip install . --upgrade
```