<a href="https://colab.research.google.com/github/mtnman38/tensorflow-examples/blob/master/example_3_keras.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Example 3: Using the Keras API!

---

Dustin A. Landers
---
11/5/2019


Let's make sure we have the latest TensorFlow.

In [0]:
#pip install tensorflow==2.0.0

In [2]:
import tensorflow as tf
print(tf.__version__)

2.0.0


## Dataset loading (how to get public dataset easily)

Next we will bring in some fun datasets for us to work with. We will use the tensorflow_datasets package to go ahead and get a pre-loaded dataset. Fortunately, this will make it easy for us to get started with text architectures and using the keras module. Unforunately, it also will create a bit of illusion about how easy it is to get the data in the correct format. Per usual, preparing the data for a tensorflow task is often the most tedious part.

In [0]:
import tensorflow_datasets as tfds

In [0]:
(train_data, test_data), info = tfds.load(
    'imdb_reviews/subwords8k', 
    split = (tfds.Split.TRAIN, tfds.Split.TEST),
    as_supervised=True,
    with_info=True)

This tensorflow public dataset also comes with the text encoder. This saves us yet another step. I'm going to play around with that encoder below so that we have a chance to see how it works.

## This is how encoders work!

In [0]:
encoder = info.features['text'].encoder

In [6]:
encoder.encode('man I really am enjoying using tensorflow 2.0!')

[172,
 12,
 81,
 258,
 1236,
 34,
 1168,
 943,
 2327,
 2934,
 7961,
 7979,
 7975,
 7977,
 7962]

In [7]:
encoder.decode([172, 12, 81, 258, 1236,
                34, 1168, 943, 2327, 2934,
                7961, 7979, 7975, 7977, 7962])

'man I really am enjoying using tensorflow 2.0!'

## You train deep learning representations with batches (can you guess why?)

In [0]:
BUFFER_SIZE = 1000

train_batches = (
    train_data
    .shuffle(BUFFER_SIZE)
    .padded_batch(32, train_data.output_shapes))

test_batches = (
    test_data
    .padded_batch(32, train_data.output_shapes))

## Use tf.keras.Sequential to build out layers (this one is simple)

### Example of cool architectures -- embeddings

In [9]:
model = tf.keras.Sequential([
  tf.keras.layers.Embedding(encoder.vocab_size, 16),
  tf.keras.layers.GlobalAveragePooling1D(),
  tf.keras.layers.Dense(1, activation='sigmoid')])

model.summary()

Model: "sequential"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
embedding (Embedding)        (None, None, 16)          130960    
_________________________________________________________________
global_average_pooling1d (Gl (None, 16)                0         
_________________________________________________________________
dense (Dense)                (None, 1)                 17        
Total params: 130,977
Trainable params: 130,977
Non-trainable params: 0
_________________________________________________________________


This architecture achieved decent accuracy on the validation set by epoch 10!

What if we used a more complex architecture. Maybe we use a 1-dimensional convultion layer to pick up on various word combinations and phrases.

In [10]:
model.compile(optimizer='adam',
              loss='binary_crossentropy',
              metrics=['accuracy'])

history = model.fit(train_batches,
                    epochs=10,
                    validation_data=test_batches,
                    validation_steps=30)

Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10


## Don't stop here. There are tons of cool architectures to explore to improve upon the accuracy achieved above.