
# <center>Primer on Machine Learning for Physiological Signals</center> 
  ###     <center>Jacek Dmochowski</center>
  ###     <center>Department of Biomedical Engineering</center>
  ###     <center>City College of New York</center>

## AI versus Machine Learning
* Artificial Intelligence: teaching computers how to think and automate tasks performed by humans
* Machine Learning: training computers to find rules for automating the task
* Deep Learning: subset of ML that employs many levels of data representation
<img src="./images/chollet.png" width="1000">

## Supervised Learning
* Given $x$, what is $y$?
* $x$ is known as the _feature_ vector (or matrix)
* $y$ is known as the _target_ or _label_



## Training a model
* Goal is to estimate $f$ in: 
<center> $y = f(x) + \mathrm{error}$  </center>
given some training data:
<center> {$(x_1,y_1), (x_2,y_2), ... (x_N,y_N)$} </center>

## Training minimizes a _loss_ function
* $f$ is chosen to minimize a loss on the training data:
* Mean squared error loss:   $L = ( y- f(x) )^2 $
  * Regression problem:   $y$ is continuous
* Binary cross-entropy loss: $L = y \log(p) + (1-y) \log(1-p)$
  * Binary classification problem: $y \in \{0,1\}$
  * Here $p$ denotes the model's estimate of the probability of the feature $x$ coming from class $y=1$

## Testing a model
* Once $f$ has been fit, we can use it to make predictions on _new_ data
* $ \hat{y} = f (x_{\mathrm{new}} )$
  * $\hat{y}$ is the model's estimate of the label (malignant or benign) for feature $x_{\mathrm{new}}$

## Selecting a model family
* The nature of $f$ defines the learning model
* eg. logistic regression: $f$ is linear in the features $X$ 
* eg. deep neural nets: $f$ formed as a product of many subfunctions with nonlinearites

## Overfitting
* When the model has learned the structure of the _noise_
![overfitting](./images/overfitting.png)


## Cross-validation
* A widely used technique to prevent and diagnose overfitting
* Also used to evaluate performance before going "live"
![cv](./images/cv.png)


## Regularization
* Adding prior information to help the model learn genuine structure
* Most common type enforces either _smoothness_ or _sparsity_ of model coefficients
* $\mathrm{Loss} := \mathrm{Loss} + \mathrm{Penalty}$

## L1 and L2 Regularization
* L2 penalties constrain the magnitude of the model weights
* L1 penalties constrain the number of non-zero coefficients
<img src="./images/l12.png" width="750">

## Unsupervised Learning
* No labels, just features
* From $x$, discover the _structure_ in the data
* Examples include clustering, components analysis, feature selection
<img src="./images/unsupervised.png" width="750">



## Physiological signals
* Working with physiological data imposes constraints on statistical learning
* Data is noisy 
* Data is expensive to collect
* Data is expensive to label (malignant or benign?)
<img src="./images/EEG_signal.jpeg" width="1000">



## Physiological data is noisy
* Preprocessing the data is an artform
* Details matter: ordering of the steps, parameters of the steps
* Filtering and artifact rejection are critical
* In the future, deep learning may allow for automated preprocesing (?)

## Physiological data is expensive to collect
* Complex models necessitate large training sets
* The number of examples in many biomedical applications is limited
* Shallow learning architectures may offer favorable generalization

## Physiological data is expensive to label
* Obtaining many MRI scans may be feasible, but labeling them into ground-truth may not be
* Unsupervised learning is well-suited to discovering the data's structure

## Model interpretability is important in biomedical applications
* We would like to know what information is being used to make decisions
* Linear model are easier to interpret
* In deep learning, the representation is highly distributed