Creative Commons CC BY 4.0 Lynd Bacon & Associates, Ltd.  Not warranted to be suitable for any particular purpose. (You're on your own!)

<center><h1> Machine Learning for Healthcare : Day 1 Intro</h1></center>

# Course  Objectives 

* Review fundamentals of machine learning (ML)
* Introduce widely used algorithms
* Run and modify example code to train ML algorithms
* Apply methods for validating algorithm performance to assess generalization


# Who Am I?

Lynd Bacon, PhD MBA  
Adjunct Associate Professor  
Dept. of Internal Medicine & David Eccles School of Business  
lynd.bacon@hsc.utah.edu  

* Background in academic and commercial research settings
* Training in social sciences, neuropsych, business
* Teaching data science for Northwestern for the last seven years



# Housekeeping Items

* Schedule: luncheon speakers, etc.
* Who doesn't have a GitHub user name?

# Main Topics By Day (tentative)

* Day 1, a.m. : Intro, ML definitions, software checkout, linear regression
* Day 1, p.m. : Regression, cross-validation. regularization, leakage
* Day 2, a.m. : Rescaling transformations, ridge regression
* Day 2, p.m. : Lasso, elasticNet
* Day 3, a.m. : Classifiers, logistic regression, support vector classifiers
* Day 3, p.m. : Bayes classifier, tree models
* Day 4; a.m. : Bagging, Random Forest, Boosting, backpropagation, neural networks
* Day 4, p.m. : Multilayer perceptrons, convolutional neural networks, things left untouched


<h1>Materials and Resources</h1>

* All slides, examples, and code are in Jupyter notebooks
* Execute (almost) all code on JupyterHub at https://decart.jupyter.med.utah.edu)
    * Python packages you'll need should be installed in the container you run on JupyterHub
    * One notebook might be better run on [Google's Colaboratory](https://colab.research.google.com)
* Clone content from GitHub repo https://github.com/UUDeCART/DeCART_ML_2019

# What is Machine Learning?

* Machine learning = extracting patterns from data, extrapolating to new data
    * Usually for the purpose of _predicting_ data values, but also sometimes for _data reduction_
    * The "learning" is in regard to parameters in order to optimize a function
* Frank Harrell (2018):       
_...an algorithmic approach that does not use traditional identified statistical parameters, and for which a preconceived structure is not imposed on the relationships between predictors and outcomes._
* Methods from various disciplines, e.g. computer science, statistics
* A subset of AI (perhaps). Not all AI methods learn from data

# What's _Not_ a Machine Learning Problem?

* When  _explanation_ is the primary objective
* For theory- or hypothesis-testing applications
* Causal inference
* When uncertainty in _predictions_ needs to be estimated
* Where the data generating mechanism can be, or needs to be, specified
* When the data aren't "big"
* When a simple model will do well enough

See Frank Harrell's 2018 blog post:  
[Road Map for Choosing between Statistical Modeling and Machine Learning](https://www.fharrell.com/post/stat-ml/)

# Terminology ( \$10 ML Terms)

* Labels
* Features
* Bias/Variance trade-off
* Cross-validation
* overfitting
* regularization
* Parameter tuning
* Data leakage
* "No Free Lunch"

# Main Types of Machine Learning Problems

* Supervised  : variables ("features") are used to predict values ("labels") on another variable ("target").
* Unsupervised : algorithms applied to group, or _cluster_, observations. No target with labels.
* Semi-supervised : Labeled and unlabeled data are used together to facilitate prediction of targets.
* Reinforcement learning (RL) : software agent learns to achieve an objective by trial and error learning.

# Python Packages, Libraries, and Platforms for ML

* [numpy](https://www.numpy.org/) : array manipulation, linear optimization, other tools
* [Pandas](https://pandas.pydata.org/) : data manipulation and analysis package
* [matplotlib](https://matplotlib.org/) : 2D plotting library
* [seaborn](https://seaborn.pydata.org/) : data visualization library
* [scikit-learn](https://scikit-learn.org/stable/) : machine learning library
    * [scikit API Reference](https://scikit-learn.org/stable/modules/classes.html)
* [scikit-plot](https://scikit-plot.readthedocs.io/en/stable/index.html) : graphics methods complementing parts of scikit-learn.
* [tensorflow](https://www.tensorflow.org/) : machine learning platform
* [keras](https://keras.io/) : deep learning library
* [Google Colaboratory](https://colab.research.google.com/notebooks/welcome.ipynb) : cloud computing platform using Jupyter Notebooks

Others may "emerge" in the Jupyter Notebooks we'll be using.

# Data

Most of what we'll be torturing together:

* radon and lung cancer: data on 2881 US Counties, 11 variables ("features").
    * [radon R documentation](https://lomabuena.info/2K4oJCN).
* breast-w WI breast cancer data: 699 cases, 10 variables.
    * [WI breast-w data documentation](https://www.openml.org/d/15).
* cervical cancer risk factors: 858 cases, 36 variables.
    * [UCI cervical CA risk data](https://archive.ics.uci.edu/ml/datasets/Cervical+cancer+%28Risk+Factors%29)
* diabetes data : 768 cases, 9 variables.
    * [diabetes dataset on openML](https://www.openml.org/d/37)
* contraceptive choice: 1473 cases, 9 variables.
    * [UCI contraceptive choice data](https://archive.ics.uci.edu/ml/datasets/Cervical+cancer+%28Risk+Factors%29)
* Patient satisfaction data: 1811 inpatients, 11 variables.
    * internal dataset.
* [MNIST](http://yann.lecun.com/exdb/mnist/) digit images data.

# Loss Functions

* Most ML algorithms "learn" by optimizing some kind of goal or objective function
* Many ML algorithms learn using some sort of _gradient descent_ method to minimize _loss_ (i.e. prediction error)
    * Common loss measures using for supervised learning problems:
        * _Mean Squared Error_ for continuously valued labels
        * _Cross-Entropy Loss_ for discrete labels
    * The gradient ("Grad", $\nabla$): the set of partial derivatives of the loss function w.r.t. ("with respect to) the parameters being learned
* Sebastian Ruder on the variety of gradient descent methods:  
    * [An overview of gradient descent optimization algorithms (paper)](https://arxiv.org/abs/1609.04747)  
    * [An overview of gradient descent optimization algorithms (blog)](http://ruder.io/optimizing-gradient-descent/)

# Have No Fear!

The machine learning field is characterized by a fair amount of complexity, and it can be pretty "weedy."  And new developments are coming fast and furious.

Keep in mind:

* The field is so new that there aren't that many _true_ expects.
* There is actually little theory, and so work in machine learning tends to be rather empirical and "R&D-like."
* Everyone doing machine learning looks up things all the time.
* There's always a better model.  (At least probably.)
* A new algorithm born every day.
* There's always another Python package that you wish you had used.  

Being somewhat fond of fiddling with things until the work, is a help.

# Unresolved, important Issues Remain, Including:  

* What should be optimized?
* How can unintended consequences from use be avoided?
* How should uncertainty be represented?
* How can malicious applications be detected, prevented?

# Let's Get Started: Software Check Out Time
