This notebook (and the slides from lecture 8) will help you go straight from training a model in Colab to deploying it in a webpage with TensorFlow.js - without having to leave the browser.

Configure this notebook to work with your GitHub account by populating these fields.

In [0]:
!pip install tensorflowjs



In [0]:
!pip install BeautifulSoup4



In [0]:
import urllib.request
from bs4 import BeautifulSoup
import numpy as np
import random
from nltk.tokenize import sent_tokenize
import nltk
nltk.download('punkt')

[nltk_data] Downloading package punkt to /root/nltk_data...
[nltk_data]   Package punkt is already up-to-date!


True

In [0]:
# the three books are 'pride and prejudice','Heart of Darkness', and 'Dracula'
quote_page = ['https://www.gutenberg.org/files/1342/1342-0.txt','https://www.gutenberg.org/files/219/219-0.txt','http://www.gutenberg.org/cache/epub/345/pg345.txt']


In [0]:
def train_data(webs):
  train = []
  label = []
  for i in range(len(webs)):
    page = urllib.request.urlopen(webs[i])
    soup = BeautifulSoup(page, 'html.parser')
    data = soup.text
    # split by line to eliminate the special character
    data = data.splitlines()
    # join the sentence as a large string for sentence tokenizse
    data = ''.join(map(str,data))
    # use sent_tokenize from nltk to tokenize the sentence
    sent_tokenize_list = sent_tokenize(data)
    print(len(sent_tokenize_list))
    try:
      train.extend(sent_tokenize_list[1000:2000])
    except:
      print('len is smaller than 1000')
      train.extend(sent_tokenize_list[-1000:])
    label.extend([i] * 1000)
  return train, np.array(label)

In [0]:
train, label = train_data(quote_page)

3947
2235
7743


In [0]:
#shuffle the data

In [0]:
a = list(range(3000))
random.shuffle(a)
new_train = [train[i] for i in a]
new_label = [label[i] for i in a]
new_label = np.array(new_label)

In [0]:
# your github username
USER_NAME = "Jiachenxu" 

# the email associated with your commits
# (may not matter if you leave it as this)
USER_EMAIL = "jx2318@columbia.edu" 

# the user token you've created (see the lecture 8 slides for instructions)
TOKEN = "2ffdf9d5f886846d899464c9972ab05c769eb337" 

# site name
# for example, if my user_name is "foo", then this notebook will create
# a site at https://foo.github.io/hw4/
SITE_NAME = "hw4"

Next, run this cell to configure git.

In [0]:
!git config --global user.email {USER_NAME}
!git config --global user.name  {USER_EMAIL}

Clone your GitHub pages repo (see the lecture 8 slides for instructions on how to create one).

In [0]:
import os
repo_path = USER_NAME + '.github.io'
if not os.path.exists(os.path.join(os.getcwd(), repo_path)):
  !git clone https://{USER_NAME}:{TOKEN}@github.com/{USER_NAME}/{USER_NAME}.github.io

Cloning into 'Jiachenxu.github.io'...
remote: Enumerating objects: 39, done.[K
remote: Counting objects: 100% (39/39), done.[K
remote: Compressing objects: 100% (30/30), done.[K
remote: Total 39 (delta 7), reused 32 (delta 6), pack-reused 0[K
Unpacking objects: 100% (39/39), done.


In [0]:
os.chdir(repo_path)
!git pull

Already up to date.


Create a folder for your site.

In [0]:
project_path = os.path.join(os.getcwd(), SITE_NAME)
if not os.path.exists(project_path): 
  os.mkdir(project_path)
os.chdir(project_path)

These paths will be used by the converter script.

In [0]:
# DO NOT MODIFY
MODEL_DIR = os.path.join(project_path, "model_js")
if not os.path.exists(MODEL_DIR):
  os.mkdir(MODEL_DIR)

As an example, we will create and vectorize a few documents. (Check out https://www.gutenberg.org/ for a bunch of free e-books.)

In [0]:
x_train = new_train
y_train = new_label.reshape(-1,1) # Indicating which book each sentence is from

Tokenize the documents, create a word index (word -> number).

In [0]:
max_len = 50
num_words = 100000
from keras.preprocessing.text import Tokenizer
# Fit the tokenizer on the training data
t = Tokenizer(num_words=num_words)
t.fit_on_texts(x_train)

Using TensorFlow backend.


In [0]:
print(t.word_index)



Here's how we vectorize a document.

In [0]:
train[0]

'I will go directly to Mr. Bennet, andwe shall very soon settle it with her, I am sure.”She would not give him time to reply, but hurrying instantly to herhusband, called out as she entered the library, “Oh!'

In [0]:
vectorized = t.texts_to_sequences([train[0]])
print(vectorized)

[[5, 55, 169, 643, 4, 36, 162, 10424, 111, 43, 113, 2242, 10, 17, 12, 5, 62, 327, 892, 44, 14, 174, 26, 67, 4, 990, 25, 3787, 1185, 4, 10425, 318, 53, 16, 15, 535, 1, 2843, 1456]]


Apply padding if necessary.

In [0]:
from keras.preprocessing.sequence import pad_sequences
padded = pad_sequences(vectorized, maxlen=max_len, padding='post')

In [0]:
print(padded)

[[    5    55   169   643     4    36   162 10424   111    43   113  2242
     10    17    12     5    62   327   892    44    14   174    26    67
      4   990    25  3787  1185     4 10425   318    53    16    15   535
      1  2843  1456     0     0     0     0     0     0     0     0     0
      0     0]]


We will save the word index in metadata. Later, we'll use it to convert words typed in the browser to numbers for prediction.

In [0]:
metadata = {
  'word_index': t.word_index,
  'max_len': max_len,
  'vocabulary_size': num_words,
}

Define a model.

In [0]:
embedding_size = 8
n_classes = 3
epochs = 10

import keras
from keras.callbacks import EarlyStopping
model = keras.Sequential()
model.add(keras.layers.Embedding(num_words, embedding_size, input_shape=(max_len,)))
model.add(keras.layers.LSTM(128, return_sequences = True))
model.add(keras.layers.LSTM(64))
model.add(keras.layers.Dense(3, activation='softmax'))
model.compile('adam', 'sparse_categorical_crossentropy', metrics=['accuracy'])
model.summary()

_________________________________________________________________
Layer (type)                 Output Shape              Param #   
embedding_1 (Embedding)      (None, 50, 8)             800000    
_________________________________________________________________
lstm_1 (LSTM)                (None, 50, 128)           70144     
_________________________________________________________________
lstm_2 (LSTM)                (None, 64)                49408     
_________________________________________________________________
dense_1 (Dense)              (None, 3)                 195       
Total params: 919,747
Trainable params: 919,747
Non-trainable params: 0
_________________________________________________________________


Prepare some training data.

In [0]:
x_train = t.texts_to_sequences(x_train)
x_train = pad_sequences(x_train, maxlen=max_len, padding='post')
print(x_train)

[[  20  214    4 ...    0    0    0]
 [   5   82   10 ...    0    0    0]
 [   9 3902  608 ...    0    0    0]
 ...
 [ 104  204 1804 ...    0    0    0]
 [  42   56   72 ...    0    0    0]
 [ 673  154  297 ...    0    0    0]]


In [0]:
earlystop = EarlyStopping(monitor='val_acc', min_delta=0.0001, patience=3, 
                          verbose=1, mode='auto')
callbacks_list = [earlystop]
model.fit(x_train, y_train, epochs=epochs, validation_split = 0.2, callbacks=callbacks_list)

Train on 2400 samples, validate on 600 samples
Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 00005: early stopping


<keras.callbacks.History at 0x7f790c8477b8>

Demo using the model to make predictions.

In [0]:
# From the thrid books
test_example = "I must regret that an attack of gout, from which malady I am a constant sufferer, forbids absolutely any travelling on my part for some time to come."
x_test = t.texts_to_sequences([test_example])
x_test = pad_sequences(x_test, maxlen=max_len, padding='post')
print(x_test)

[[   5   63  780   11   50  964    3   41   33    5   62    6 1080 1110
    73 2625   24   27  314   19   64   67    4  163    0    0    0    0
     0    0    0    0    0    0    0    0    0    0    0    0    0    0
     0    0    0    0    0    0    0    0]]


In [0]:
preds = model.predict(x_test)
print(preds)
import numpy as np
print(np.argmax(preds))

[[0.26522762 0.02981036 0.704962  ]]
2


Convert the model

In [0]:
import json
import tensorflowjs as tfjs

metadata_json_path = os.path.join(MODEL_DIR, 'metadata.json')
json.dump(metadata, open(metadata_json_path, 'wt'))
tfjs.converters.save_keras_model(model, MODEL_DIR)
print('\nSaved model artifcats in directory: %s' % MODEL_DIR)


Saved model artifcats in directory: /content/Jiachenxu.github.io/hw4/model_js


Write an index.html and an index.js file configured to load our model.

In [0]:
index_html = """
<!doctype html>

<body>
  <style>
    #textfield {
      font-size: 120%;
      width: 60%;
      height: 200px;
    }
  </style>
  <h1>
    Title
  </h1>
  <hr>
  <div class="create-model">
    <button id="load-model" style="display:none">Load model</button>
  </div>
  <div>
    <div>
      <span>Vocabulary size: </span>
      <span id="vocabularySize"></span>
    </div>
    <div>
      <span>Max length: </span>
      <span id="maxLen"></span>
    </div>
  </div>
  <hr>
  <div>
    <select id="example-select" class="form-control">
      <option value="example1">Alice's Adventures in Wonderland</option>
      <option value="example2">Dracula</option>
      <option value="example3">The Iliad</option>
    </select>
  </div>
  <div>
    <textarea id="text-entry"></textarea>
  </div>
  <hr>
  <div>
    <span id="status">Standing by.</span>
  </div>

  <script src='https://cdn.jsdelivr.net/npm/@tensorflow/tfjs/dist/tf.min.js'></script>
  <script src='index.js'></script>
</body>
"""

In [0]:
index_js = """
const HOSTED_URLS = {
  model:
      'model_js/model.json',
  metadata:
      'model_js/metadata.json'
};

const examples = {
  'example1':
      'Alice was beginning to get very tired of sitting by her sister on the bank.',
  'example2':
      'Buda-Pesth seems a wonderful place.',
  'example3':
      'Scepticism was as much the result of knowledge, as knowledge is of scepticism.'      
};

function status(statusText) {
  console.log(statusText);
  document.getElementById('status').textContent = statusText;
}

function showMetadata(metadataJSON) {
  document.getElementById('vocabularySize').textContent =
      metadataJSON['vocabulary_size'];
  document.getElementById('maxLen').textContent =
      metadataJSON['max_len'];
}

function settextField(text, predict) {
  const textField = document.getElementById('text-entry');
  textField.value = text;
  doPredict(predict);
}

function setPredictFunction(predict) {
  const textField = document.getElementById('text-entry');
  textField.addEventListener('input', () => doPredict(predict));
}

function disableLoadModelButtons() {
  document.getElementById('load-model').style.display = 'none';
}

function doPredict(predict) {
  const textField = document.getElementById('text-entry');
  const result = predict(textField.value);
  score_string = "Class scores: ";
  for (var x in result.score) {
    score_string += x + " ->  " + result.score[x].toFixed(3) + ", "
  }
  //console.log(score_string);
  status(
      score_string + ' elapsed: ' + result.elapsed.toFixed(3) + ' ms)');
}

function prepUI(predict) {
  setPredictFunction(predict);
  const testExampleSelect = document.getElementById('example-select');
  testExampleSelect.addEventListener('change', () => {
    settextField(examples[testExampleSelect.value], predict);
  });
  settextField(examples['example1'], predict);
}

async function urlExists(url) {
  status('Testing url ' + url);
  try {
    const response = await fetch(url, {method: 'HEAD'});
    return response.ok;
  } catch (err) {
    return false;
  }
}

async function loadHostedPretrainedModel(url) {
  status('Loading pretrained model from ' + url);
  try {
    const model = await tf.loadModel(url);
    status('Done loading pretrained model.');
    disableLoadModelButtons();
    return model;
  } catch (err) {
    console.error(err);
    status('Loading pretrained model failed.');
  }
}

async function loadHostedMetadata(url) {
  status('Loading metadata from ' + url);
  try {
    const metadataJson = await fetch(url);
    const metadata = await metadataJson.json();
    status('Done loading metadata.');
    return metadata;
  } catch (err) {
    console.error(err);
    status('Loading metadata failed.');
  }
}

class Classifier {

  async init(urls) {
    this.urls = urls;
    this.model = await loadHostedPretrainedModel(urls.model);
    await this.loadMetadata();
    return this;
  }

  async loadMetadata() {
    const metadata =
        await loadHostedMetadata(this.urls.metadata);
    showMetadata(metadata);
    this.maxLen = metadata['max_len'];
    console.log('maxLen = ' + this.maxLen);
    this.wordIndex = metadata['word_index']
  }

  predict(text) {
    // Convert to lower case and remove all punctuations.
    const inputText =
        text.trim().toLowerCase().replace(/(\.|\,|\!)/g, '').split(' ');
    // Look up word indices.
    const inputBuffer = tf.buffer([1, this.maxLen], 'float32');
    for (let i = 0; i < inputText.length; ++i) {
      const word = inputText[i];
      inputBuffer.set(this.wordIndex[word], 0, i);
      //console.log(word, this.wordIndex[word], inputBuffer);
    }
    const input = inputBuffer.toTensor();
    //console.log(input);

    status('Running inference');
    const beginMs = performance.now();
    const predictOut = this.model.predict(input);
    //console.log(predictOut.dataSync());
    const score = predictOut.dataSync();//[0];
    predictOut.dispose();
    const endMs = performance.now();

    return {score: score, elapsed: (endMs - beginMs)};
  }
};

async function setup() {
  if (await urlExists(HOSTED_URLS.model)) {
    status('Model available: ' + HOSTED_URLS.model);
    const button = document.getElementById('load-model');
    button.addEventListener('click', async () => {
      const predictor = await new Classifier().init(HOSTED_URLS);
      prepUI(x => predictor.predict(x));
    });
    button.style.display = 'inline-block';
  }

  status('Standing by.');
}

setup();
"""

In [0]:
with open('index.html','w') as f:
  f.write(index_html)
  
with open('index.js','w') as f:
  f.write(index_js)

In [0]:
!ls

index.html  index.js  model_js


Commit and push everything. Note: we're storing large binary files in GitHub, this isn't ideal (if you want to deploy a model down the road, better to host it in a cloud storage bucket).

In [0]:
!git add . 
!git commit -m "colab -> github"
!git push https://{USER_NAME}:{TOKEN}@github.com/{USER_NAME}/{USER_NAME}.github.io/ master

[master 5e84adb] colab -> github
 3 files changed, 2 insertions(+), 2 deletions(-)
 rewrite hw4/model_js/group1-shard1of1 (66%)
 rewrite hw4/model_js/metadata.json (97%)
 rewrite hw4/model_js/model.json (83%)
Counting objects: 7, done.
Delta compression using up to 2 threads.
Compressing objects: 100% (7/7), done.
Writing objects: 100% (7/7), 3.30 MiB | 1.64 MiB/s, done.
Total 7 (delta 2), reused 0 (delta 0)
remote: Resolving deltas: 100% (2/2), completed with 2 local objects.[K
To https://github.com/Jiachenxu/Jiachenxu.github.io/
   8ccc033..5e84adb  master -> master


All done! Hopefully everything worked. You may need to wait a few moments for the changes to appear in your site. If not working, check the JavaScript console for errors (in Chrome: View -> Developer -> JavaScript Console).

In [0]:
print("Now, visit https://%s.github.io/%s/" % (USER_NAME, SITE_NAME))

Now, visit https://Jiachenxu.github.io/hw4/


If you are debugging and Chrome is failing to pick up your changes, though you've verified they're present in your GitHub repo, see the second answer to: https://superuser.com/questions/89809/how-to-force-refresh-without-cache-in-google-chrome