# Introduction to machine learning for M/EEG data

- Why do mvpa?
- Difference to traditional statistics?
- What is unique for M/EEG data?

## Linear regression example

In [None]:
%matplotlib inline
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
plt.style.use("ggplot")


In [None]:
data = pd.read_csv('http://www-bcf.usc.edu/~gareth/ISL/Advertising.csv', index_col=0)
data.head()



In [None]:
# this is the standard import if you're using "formula notation" (similar to R)
import statsmodels.formula.api as smf

# create a fitted model in one line
lm = smf.ols(formula='Sales ~ TV', data=data).fit()

# print the coefficients
lm.params



In [None]:
# create a DataFrame with the minimum and maximum values of TV
X_new = pd.DataFrame({'TV': [data.TV.min(), data.TV.max()]})
X_new.head()

# make predictions for those x values and store them
preds = lm.predict(X_new)
preds


In [None]:
# first, plot the observed data
data.plot(kind='scatter', x='TV', y='Sales')

# then, plot the least squares line
plt.plot(X_new, preds, c='red', linewidth=2)



## Classification example

predicting groups rather than numbers

# Two types of machine learning

## Supervised learning

When you know the outcome, e.g. group or price

E.g. Sales or flower type

## Unsurpervised learning
When you don't know the outcome

E.g. PCA

# Scikit-learn 

scikit-learn (sk) is a python toolbox for machine learning. It has large range of different algorithms, preprocessing tools, scoring metrix etc.

Every sk algorithm uses the same structure:
    
+ **X** the data to learn from. It is a two dimensional numpy array.
+ **y** the data to be predicted. It is a one dimensional numpy array.
 
## X the feature matrix

Features can be anything relevant to make the prediction of *y*, e.g.:
    
- predicting the prince of a house. Features could be: number of rooms, square meters, location etc.
- predicting condition of M/EEG data: the sensor data, source reconstruction data etc.

For scikit-learn the **X** is always a NxM matrix, where N is the number of observations (e.g. epochs, subjects) and M is the features (e.g. time series).

The **y** is a one dimensional array coding the type of observations, for classification which group the observation is, for regression the number to be predicted (e.g. house price). 

**OBS** for scikit-learn **y** is always an integer array, groups should hence be coded as integers, e.g. 0, 1 etc.




# Additional resources

- [scikit-learn introduction tutorial (vidoe)](https://www.youtube.com/watch?v=L7R4HUQ-eQ0)
    - See [scikit-learn.org](scikit-learn.org) for documentation adn examples


- The __bible__ of mvpa (Math heavy!!!)
    - Hastie, T., Tibshirani, R., Friedman, J., Hastie, T., Friedman, J., & Tibshirani, R. (2009). The elements of statistical learning (Vol. 2): Springer.      
    - Free at: statweb.stanford.edu/~tibs/ElemStatLearn/


- A very good but not as math heavy (dumbed downed version of the elements of...)
    - James, G., Witten, D., Hastie, T., & Tibshirani, R. (2013). An Introduction to Statistical Learning (Vol. 103). New York, NY: Springer New York.
    - Free at: http://www-bcf.usc.edu/~gareth/ISL/
    - It uses R and there is a online course by Hastie & Tibshirani that uses the book as the main textbook.


- Discussion of traditional statistics vs mvpa approaches
    - Breiman, L. (2001). Statistical Modeling: The Two Cultures (with comments and a rejoinder by the author). 199-231. doi:10.1214/ss/1009213726
- Guidelines for cross-validation
    - Varoquaux, G., Raamana, P. R., Engemann, D. A., Hoyos-Idrobo, A., Schwartz, Y., & Thirion, B. Assessing and tuning brain decoders: Cross-validation, caveats, and guidelines. NeuroImage. doi:http://dx.doi.org/10.1016/j.neuroimage.2016.10.038
