# Machine Learning from Scratch (Applied)
### *Collin Prather*

This is the command I need:

``jupyter nbconvert BDI_2018_slides.ipynb --to slides --post serve  --SlidesExporter.reveal_theme=serif --SlidesExporter.reveal_scroll=True --SlidesExporter.reveal_transition=none``

make sure to ``cd`` first

* have everyone launch binder first!

## Presentation Outline

#### (try to find a way to have these as a header of some sort on each slide)

1. ML Overview
    * considerations/complexities in building ML models
    * Steps in ML Process
2. Building SVM from scratch (Done in JupyterHub)
3. Explore Python ML library and apply to real-world data set (Done in JupyterHub)

## What is Machine Learning?
![ML Coursera](Figures/mlcoursera1.png)

Maybe scale that photo down usint HTML..
Even according to the experts, the exact definifion of the field of machine learning is a bit fuzzy, but  As early as 1959, Arthur Samuel quote.

*Arthur Samuel*:
> Machine Learning is the field of study that gives computers the ability to learn with out being explicitly programmed.


## What is Machine Learning?

data + algorthms = predicting the future

ML techniqes can be applied to a wide range of problems in diverse industries. In fact, ML has become ubiquitous in our everyday lives
* Siri/ Amazon Alexa
* Recommendation systems (amazon, netflix)
* Fraud Detection
* Disease diagnosis
* Supply Chain Optimization

## According to Google...

<figure>
  <img src="Figures/google_trends.png" alt="my alt text"/>
    <figcaption>Data obtained from <i>trends.google.com.</i></figcaption>
</figure>

## What has Caused this Spike?

The math that powers machine learning algorithms has been around for quite a few years... so what's changed?

1. Data Availability
2. Computational Scale (NG MLY 01 pg 10)


The rise of the big data era has given us access to astounding amounts of data. That phenomenon paired with with the exponential growth we've experienced in computational advances, has created the perfect storm for the emergence of the field of machine learning.

## Steps in the ML Process
* [this](https://towardsdatascience.com/the-7-steps-of-machine-learning-2877d7e5548e) is a good place to start with identifying the steps in the ML process
* also use Ng's ML Yearning

### Step 0: Identify the Problem

* Let's say the city of GR is finding that they're having a lot of difficulty arresting/convicting hit and run drivers. Could be for many reasons -- it's difficult to prove that they were the ones driving the car (even if they can track down the license plate number and owner of the car), by the time they get around to investigating they've lost what would've been relevant security footage from local buildings, etc.
    * however, they do some research and find that the quicker they begin their investigation, they are much more likely to find and convict the perpetrator
    * This can at times be difficult given the large volume of car crashes that are reported each day, and a lack of resources on the investigative team
    * in other words, humans can't parse through all the data quick enough to prioritze which crashes to investigate in an efficient manner
    
* It turns out that solutions to problems like these can often be solved using machine learning!

## Steps in the ML Process
### Step 1: Get the Data

* This might look like...
    * SQL query
    * CSV download
    * web-scraping
    * collecting data yourself
    
* In our case, we head over to [GR Data](http://grdata-grandrapids.opendata.arcgis.com) and download a .csv of data on all car crashes within the city of GR in the past 10 years.
    * looks like the data is available to potentially solve this problem

## Steps in the ML Process
### Step 2: Data Preparation/Data Exploration

* These are inherently different steps (requiring different tools/skillsets) but are so closely related that it's hard to separate the two
* feature engineering
    * define what it is
    * give some motivation for it's importance (the data is the most crucial part of of ML!)(garbage in, garbage out)
    * Give some examples and difficulties
        * encoding categorical variables
        * scaling 
        * how to deal with missing data (go over how I handled missing ages)
        * examples of creating new features
            * Sometimes you can use your data to make new feautres that summarize
            * sometimes you can use your data to make new features that better model reality
                * example with projecting the 1-d hours feature in 2-d space

## Steps in the ML Process
### Step 2b: Define a metric

* Talk through the importance of defining a metric
* talk through why simply classification accuracy is not suffifient... if we were to classify each crash as "no alcohol involved", we'd have a model that predicts with ~95% accuracy! ...but it'd be useless..

# Steps in the ML Process
## Step 3a: Model Selection

* usually takes some trial and error
* talk through the process of selecting a model

![Machine Learning Algorithms](Figures/ML_algos_table.png)

* talk through bias-variance tradeoff with the example below of the graphs below (note that all these graphs are for the regression case)
    * Essentially, we want a model that learns a rule that can be generalize to new observations (both for classification and regression)
    * Essentially the variance of our model refers to how much our "best-fit" line might change if we are training it from a different sample in our dataset. Show example with sin graph?
    * The bias of our model refers to how strong our assumptions are about what the best-fit line should look like. For example, one might say that our linear-best-fit line is biased towards linearity. Since we've constrained this best-fit line to be linear, no matter how unequivocally non-linear the relationship might be, the line will be linear, due to its bias.
    * This is the essence of ML -- to learn general patterns/relationships from a sample of data, and generalize it to be applied to new, previously unseen datapoints.

In [55]:
%matplotlib inline
import matplotlib.pyplot as plt
import numpy as np
from sklearn.linear_model import LinearRegression

Say that we have a dataset like this:

| Original Data| 
|:---:|
|![Image](Figures/data_to_fit.png)|

* We Could fit a model like so:


| Original Data| Non-Linear Fit |
|:---:|:---:|
|![Image](Figures/data_to_fit.png)|![Image](Figures/cos_fit.png)|

* Or we could fit one like this

| Original Data| Linear Fit |
|:---:|:---:|
|![Image](Figures/data_to_fit.png)|![Image](Figures/linear_fit.png)|


* So, which would be a better fit?

| Non-Linear Fit| Linear Fit |
|:---:|:---:|
|![Image](Figures/cos_fit.png)|![Image](Figures/linear_fit.png)|


* we can always just fit a line that goes through every point
* Remember our aim is to come up with a "rule" that will generalize best to new observations
    * we check this with our test set
* Ultimately, there are any number of lines / "rules" we could use to fit this data
    * maybe have that same graph with 5 or so different lines?
* This is called the **Bias-Variance Trade-off** and motivates cross-validation/Hyper-param tuning

# Steps in the ML Process
## Step 3b: Cross-Validation/Hyper-parameter tuning

* the tuning parameters control the bias-variance trade-off
* usually takes some trial and error
* testing
* making predictions

If you're in a kaggle comp -- you're done here. If you're deploying this into an app, there are some further steps, but those are outside the scope of this presentation

(maybe make a step 4 Containerization, using tools like kubernetes/docker?)

## Sources
* ISLR
* ML Coursera
* ML Yearning
* Siraj's SVM video
* Siraj's loss functions video