Results table (from all previous observations)

In [None]:
from prettytable import PrettyTable
x = PrettyTable()
x.field_names = ['Embeddings','Model','# Params','Bleu-Score','Mean-latency (in ms)','90P-latency (in ms)','99P-latency (in ms)']
x.add_row(['Glove word-embeddings(largest available)','LSTM Enc-dec','9.2M {1.5M trainable}',0.10,48.22,84.19,131.04])
x.add_row(['Sentence-piece','T5 (Base)','220M',0.07,882.76, 1364.36, 2461.01])
x.add_row(['Byte-Pair embeddings','GPT-2 (Base)','124M',0.18,11882.10,19693.22,21115.62])

print(x)

+------------------------------------------+--------------+-----------------------+------------+----------------------+---------------------+---------------------+
|                Embeddings                |    Model     |        # Params       | Bleu-Score | Mean-latency (in ms) | 90P-latency (in ms) | 99P-latency (in ms) |
+------------------------------------------+--------------+-----------------------+------------+----------------------+---------------------+---------------------+
| Glove word-embeddings(largest available) | LSTM Enc-dec | 9.2M {1.5M trainable} |    0.1     |        48.22         |        84.19        |        131.04       |
|              Sentence-piece              |  T5 (Base)   |          220M         |    0.07    |        882.76        |       1364.36       |       2461.01       |
|           Byte-Pair embeddings           | GPT-2 (Base) |          124M         |    0.18    |       11882.1        |       19693.22      |       21115.62      |
+---------------

Imports, loading models and data

In [None]:
# import and load data
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import numpy as np
import warnings
warnings.filterwarnings("ignore")
import re
import pickle
import email
from tqdm import tqdm
import datetime
from dateutil import parser
import nltk
import tensorflow as tf
from sklearn.model_selection import train_test_split
from tensorflow.keras.layers import Embedding, LSTM, Dense,Dropout
from tensorflow.keras.preprocessing.sequence import pad_sequences

!pip install -q gpt-2-simple
import gpt_2_simple as gpt2

import re
!pip install --upgrade --no-cache-dir gdown


Load model and data

In [3]:
# load gpt2 model
gpt2.mount_gdrive()
gpt2.copy_checkpoint_from_gdrive(run_name='run1')

sess = gpt2.start_tf_sess()
gpt2.load_gpt2(sess, run_name='run1')

Mounted at /content/drive
Loading checkpoint checkpoint/run1/model-2500
INFO:tensorflow:Restoring parameters from checkpoint/run1/model-2500


In [5]:
# load data
!gdown --id 1cvJp9HTZ5z6FvMl5Q7bCenWVbtqgzYCa
with open('Sequence_data.pickle', 'rb') as file:
    train_sequences,test_sequences = pickle.load(file)

train_sequences.head()

Downloading...
From: https://drive.google.com/uc?id=1cvJp9HTZ5z6FvMl5Q7bCenWVbtqgzYCa
To: /content/Sequence_data.pickle
100% 122M/122M [00:01<00:00, 117MB/s]


Unnamed: 0,enc_seq,dec_seq
0,I take back my dog,comment john
1,I take back my dog comment,john
2,Please take a look at,it You may find it useful Vince
3,Please take a look at it,You may find it useful Vince
4,Please take a look at it You,may find it useful Vince


In [6]:
def predict_gpt(s,l=30):
    '''
    Predict from fine-tuned GPT after prefixing 
    '''
    prefix="<|startoftext|> "+s
    p = gpt2.generate(sess,
                prefix=prefix,
                truncate="<|endoftext|>",
                length=l,
                run_name='run1',
                temperature=0.7,
                include_prefix=True,    
                return_as_list=True
                )[0]
                
    p = p[len(prefix):]
    return p.strip()

Error analysis on predictions

In [8]:

def get_prediction_for_sample(data,k=10,seed=None):
    '''
    Randomly sample 'k' sentences and make predictions
    '''
    if seed:
        np.random.seed(seed)
    indices = np.random.choice(data.shape[0],size=k)
    for idx in indices:
        input_sentence = data.iloc[idx].enc_seq
        target_sentence = data.iloc[idx].dec_seq
        print("Input:",input_sentence)
        print('='*130)
        print("Output:",target_sentence)
        print('='*130)
        p = predict_gpt(input_sentence)
        print("Prediction:",p)
        print()

get_prediction_for_sample(test_sequences,seed=42)

Input: Thanks for the update I will complete a review and send the worksheet up first thing in the morning Please let me know if that will be a problem
Output: d
Prediction: 

Input: Rob she could talk to about
Output: a position as a paralegal is looking for help on some of the asset work
Prediction: him being available for an interview How much do you think Brian

Input: No I am actually pretty happy about the deal Not as happy as I would be
Output: if I were working for you
Prediction: if I were working for you

Input: Vince Thanks for you offer I need to change
Output: my agenda for next week so would something in early July work for you Thanks for your assistance
Prediction: my itinerary to Friday Vince

Input: HourAhead No ancillary schedules awarded No variances detected LOG
Output: PARSING FILE PortlandWestDeskCalifornia SchedulingISO Final Schedules txt
Prediction: PARSING FILE PortlandWestDeskCalifornia SchedulingISO Final txt retrieving HourAhead price data process continuin

Observations :-    
1) The Output predictions do not exactly match with our predictions (hence low bleu-score) which is not an issue tbh for our assisted-writing problem as we want our predictions to be general with some context-alignment with the prefix inputted to the model. If it is exactly equal to the output in most cases, then our model just memorized the Enron-data.   
2) In the model's predictions we can see the model outputting names like - Jennifer, Kay etc. so it's better to have a filter/another NER-model on top of our model's prediction to not predict such personal-information. Same goes for abusive words (there are some instances of it in Enron data as well.)    
3) There are some instances of overfitting/memorizing in the above examples, like when the output is :-     
" PARSING FILE PortlandWestDeskCalifornia SchedulingISO Final Schedules txt "    
and our model predicted exactly this.    
To avoid this we can fix our preprocessing steps and remove emails with attachments, logs, automated-emails etc. But this will require a lot of manual effort as well.    
Other than that, since our model's prediction is general and bleu-score on test set is quite good, our model hasn't overfitted.

# END