Projects/DeepLearningExamples/R/Keras-with-R-talk-slideshow.Rpres

Using Keras with R talk
========================================================
author: Anton Antonov
date: 2018-06-02
autosize: true

## [Orlando Machine Learning and Data Science meetup](https://www.meetup.com/Orlando-MLDS)

### [Deep Learning series (session 2)](https://www.meetup.com/Orlando-MLDS/events/250086544/)

Very short introduction
========================================================

Talking about TensorFlow / Keras / R combination:


```{r, eval=FALSE}
library(keras)

model <- keras_model_sequential() 
model %>% 
  layer_dense(units = 256, activation = 'relu', input_shape = c(784)) %>% 
  layer_dropout(rate = 0.4) %>% 
  layer_dense(units = 128, activation = 'relu') %>%
  layer_dropout(rate = 0.3) %>%
  layer_dense(units = 10, activation = 'softmax')

summary(model)
```


Detailed introduction 1
========================================================

## Goals (messages to convey)

- Understanding deep learning by comparison 

  - Taking a system analysis approach

  - Analogy with [a man made Machine Learning algorithm](https://mathematicaforprediction.wordpress.com/2013/08/26/classification-of-handwritten-digits/)
  
- Deep learning libraries
 
  - TensorFlow, Keras, MXNet.

  - With making neural networks is not so much of [Goldberg machines](https://en.wikipedia.org/wiki/Rube_Goldberg_machine) (anymore); 
    
    -  more of a building with a Lego set or Soma cube.

Detailed introduction 2
========================================================

## Keras in R

- Classification with the [MNIST data set](http://yann.lecun.com/exdb/mnist/)

- Classification of IMDB reviews

- Some questions / explorations to consider 

## Other

- The Trojan horse ([MXNet](https://mxnet.incubator.apache.org), [Mathematica](https://www.wolfram.com))
  
  - [Powered By](https://mxnet.incubator.apache.org/community/powered_by.html)
  
Links
========================================================

- The book ["Deep learning with R"](https://www.manning.com/books/deep-learning-with-r)

  - First three chapters are free. (And well-worth reading just them.) 
   
     - \[[1st](`https://manning-content.s3.amazonaws.com/download/6/3bdf613-e2f6-48fa-8710-b3bd0b7979e6/SampleCh01.pdf`)\], 
       \[[2nd](`https://manning-content.s3.amazonaws.com/download/4/481437b-2746-4ab1-94a7-c25eab8fae44/SampleCh02.pdf`)\],
       \[[3rd](`https://manning-content.s3.amazonaws.com/download/9/9a3b0d8-e651-4239-8c4f-94267be64fee/SampleCh03.pdf`)\], 

  - [The book Rmd notebooks](https://github.com/jjallaire/deep-learning-with-r-notebooks) are at GitHub.
  
- [RStudio's Keras page](https://keras.rstudio.com)
  
  - [another one](https://tensorflow.rstudio.com/keras/)


Who am I?
========================================================

- MSc in Mathematics (Abstract Algebra).

- MSc in Computer Science (Databases).

- PhD in Applied Mathematics (Large Scale Air Pollution Simulations).

- Former Kernel Developer of Mathematica (7 years).

- Currently branding as a "Senior Data Scientist." 

   - 10+ years experience in applying machine learning algorithms in commercial setting.
   
   - Large part in recommendations systems building and related data analysis. 
   
   - Currently working in healthcare.

Audience questions
========================================================

- How many use R?

- How many use Python?

- How many are data scientists?

- How many are engineers?

- How many are students?


How Keras addresses Deep Learning's most important feature?
========================================================

- The principle: "Trying to see without looking."

- No special feature engineering required. 

- The development speed-up of using Keras, in general and in R.

- The Paris Gun pattern.


Analogy: a classifier based on matrix factorization 1
========================================================

**1.** [Training phase](https://mathematicaforprediction.wordpress.com/2013/08/26/classification-of-handwritten-digits/)

1.1. Rasterize each training image into an array of 16 x 16 pixels.

1.2. Each raster image is linearized — the rows are aligned into a one dimensional array.   
In other words, each raster image is mapped into a R^256 vector space.   
We will call these one dimensional arrays raster vectors.

1.3. From each set of images corresponding to a digit make a matrix with 256 columns of the corresponding raster vectors.

1.4. Using the matrices in step 1.3 use thin SVD to derive orthogonal bases that describe the image data for each digit.
  

Analogy: a classifier based on matrix factorization 2
========================================================

**2.** [Recognition phase](https://mathematicaforprediction.wordpress.com/2013/08/26/classification-of-handwritten-digits/)

2.1. Given an image of an unknown digit derive its raster vector, R.

2.2. Find the residuals of the approximations of R with each of the bases found in 1.4.

2.3. The digit with the minimal residual is the recognition result.

- See [more](https://mathematicaforprediction.wordpress.com/?s=NNMF).


Neural network construction in general
========================================================

- See this diagram.

- Steps:

  - Prepare the data.

  - Chain layers.

  - Pick an optimizer.

  - Train and evaluate.


Neural network layers primer
========================================================

- Is this something the audience want to see/hear?

- Separate presentation or referenced along in the code runs?
  
  - Sub-presentation done in Mathematica (~15 min.)

- See the functionality breakdowns:

  - RStudio: [Keras reference](https://keras.rstudio.com/reference/index.html);
  
  - Mathematica: ["Neural Networks guide"](http://reference.wolfram.com/language/guide/NeuralNetworks.html).   


The code runs 1
========================================================

- First run with a basic, non-trivial example (over MNIST.)

- The breakdown:

  - binary classification;
  
  - multi-label classification;
  
  - regression.


The code runs 2
========================================================

- The specific topics:
  
  - encoders and decoders;
  
  - dealing with over-fitting;
  
  - categorical classification;
  
  - vector classification.


Some questions to consider in more detail 1
========================================================

- Can we change the metrics function?

- Can we do out-of-core training?

   - [Or, how we do batch training?](https://mathematica.stackexchange.com/a/174150/34008)

- How do we deal with over-fitting?

- Can we visualize the layers?

- Are there repositories we can use to download already made nets?


Some questions to consider in more detail 2
========================================================

- How easy to add a custom classifier to an already made and pre-trained net?

- Where we can find explanations and/or directions for which type layer to use under what conditions?

- How the data is “uplifted” into the space of a net?

  - Encoders

  - And of course what are the decoders?


Some guidelines 1
========================================================

- Most likely we will not be making neural network from scratch.

- Two important skills to acquire first:

   - Knowing well how to utilize different encoders (over different data.)
  
   - Knowing basic neural networks and how to obtain them.
   
      - Copy & paste or from dedicated repositories.

- "Next wave" skills

   - Knowing how to do batch training and out-of-core training.
   
   - Knowing how to deal with over-fitting.
   
   - Knowing how to do network surgery.
   
   
Some guidelines 2
========================================================

- Given a problem:

   - Is it simple to apply neural networks to it?
   
   - Do we have enough data with enough quality in order to apply neural networks?
   
   - What result we get with alternative methods, like random forest, nearest neighbors, etc.?
 
   
Future plans   
========================================================   

- Conversational agent for building neural networks.