# Code First, Math Later
## Learning Neural Nets Through Implementation and Examples
### Kyle Shaffer

# Talk Overview
* Overview of Deep Learning and Applications
* Introduction to Keras API
* Implementing Neural Networks
* Beyond the Black Box: Investigating Feature Extraction Layers
* Conclusion

# Introduction to Deep Learning

# Feedforward Networks

![title](figs/neural-network.png)

In [None]:
# Build a simple MLP NN
def build_mlp():
    mlp = Sequential()
    mlp.add(Dense(output_dim=100))
    mlp.add(Dense(output_dim=50))
    mlp.add(Dense(3))
    mlp.add(Activation('softmax'))
    mlp.compile(loss='categorical_crossentropy', optimizer='sgd')
    return mlp

# Convolutional Networks

# Convolutional Networks
* Often used in image tasks
* Idea of using a "sliding filter" to aggregate features over an image
* Network passes learned feature representations to following layers
* Each subsequent convolutional layer learns more general / abstract concepts

![conv-layers](figs/conv_layers.png)

## What's a convolution?
![conv-img](https://i.stack.imgur.com/I7DBr.gif)
Source: Source: http://deeplearning.stanford.edu/wiki/index.php/Feature_extraction_using_convolution

# Recurrent Networks

# Recurrent Networks
* Capture sequential dependencies in data
* Often used for language / text data or time series
* Learn a "cell state" that represents relationships between items in input sequence
* Caveats: can take very long time to train, optimization can be tricky

![lstm-unrolled](figs/lstm_unrolled.png)
Source: http://colah.github.io/posts/2015-08-Understanding-LSTMs/

![inside-lstm](figs/inside_lstm.png)
Source: http://colah.github.io/posts/2015-08-Understanding-LSTMs/

# Introduction to Keras

## Keras Homepage
### https://keras.io/
![keras-homepage](figs/keras_homepage.png)

# Implementing Neural Nets: Examples

# CNN for Object Recognition
* CIFAR 10 Dataset

# RNN / LSTM for Document Classification

# Beyond the Black Box: Investigating Feature Extraction Layers

### - Visualizing probability layers
### - Visualizing and investigating feature extrafction layers 

In [18]:
# Building our model architecture...
from keras.models import Sequential
from keras.layers import Dense, Dropout, Embedding, Flatten, LSTM

def build_lstm_model():
    model = Sequential()
    # Embedding layer
    model.add(Embedding(input_dim=2000, output_dim=200, input_length=100))
    # Recurrent layers
    model.add(LSTM(64))
    model.add(LSTM(64))
    # Additional fully connected layers
    model.add(Dense(128, activation='relu'))
    model.add(Dropout(0.2))
    # Layer that actually does the classification
    model.add(Dense(4, activation='softmax'))
    model.compile(loss='categorical_crossentropy', optimizer='sgd')
    return model

In [19]:
# Cool, let's build our model so we can use it - oh wait...
cnn = build_lstm_model()

ValueError: Input 0 is incompatible with layer lstm_2: expected ndim=3, found ndim=2

In [31]:
def build_lstm_feature_extractor():
    model = Sequential()
    # Embedding layer
    model.add(Embedding(input_dim=2000, output_dim=200, input_length=100))
    model.add(LSTM(64))
    # In principle, this last step doesn't make much sense
    # But, we need to compile a model in order to play with it
    model.compile(loss='categorical_crossentropy', optimizer='sgd')
    return model

In [33]:
import numpy as np

emb_model = build_lstm_feature_extractor()
# Let's make some fake data
X_fake = np.array([np.random.randint(low=0, high=1, size=100) for _ in xrange(100)])
print "Shape of our fake data:", X_fake.shape

lstm_features = emb_model.predict(X_fake)
print "Shape of feature data from first LSTM layer:", lstm_features.shape

Shape of our fake data: (100, 100)
Shape of feature data from first LSTM layer: (100, 64)


# Recap

# Thanks! Questions?