# Transfer Learning and the Future of Machine Learning
In my opinion, what is by far the most exciting trend in machine learning today is its democratization.

Thanks to the incredible intellectual and computational investments of researchers at places like Google being released to the public, anybody in the world is able to acheive incredible results on many deep learning tasks with very little time or computational cost. Not only can these models be retrained with large datasets, but more importantly they can also be used to get impressive results with few-shot and even zero-shot learning.

Let's get a qualitative understanding of just how effective transfer learning can be by first training a simple LSTM to generate text without any pretraining, and then utilizing a pretrained network to generate text from the same source. Our training data will be from the first installment of the Twilight series. We'll be using OpenAI's state of the art GPT-3 as our pretrained network. With an incredible 175 billion parameters trained on over 45 TB of text data, this transformer model is able to produce state of the art results on many NLP tasks, and can be interacted with using a simple API.

It's worth mentioning that this won't exactly be a fair fight, for more reasons than just parameter count. I'm going to be preparing the data differently for each model, and the results will therefore have some inherent differences. I'll discuss the implications of this, and why the adaptability of pretrained networks is such a big deal, later on.


## Simple LSTM
We'll first create a simple LSTM model to generate text.

We will create our training data by splitting our text into 40 character sequences, and predicting the 41st. This is a fairly simple method, but it will allow us to get a large number of training samples from a relatively short book. Every tenth epoch we will print sample predictions from the current network using several different temperatures and seeds. This will allow us to watch as our network learns the task.


In [31]:
# Colab setup.
try:
    %tensorflow_version 2.x
    COLAB = True
    print('Note: using Google CoLab')
except:
    print('Note: not using Google CoLab')
    COLAB = False
  

Note: using Google CoLab


In [32]:
# Mount drive.
from google.colab import drive
drive.mount('/content/drive')


Drive already mounted at /content/drive; to attempt to forcibly remount, call drive.mount("/content/drive", force_remount=True).


In [33]:
# Set working directory.
%cd '/content/drive/My Drive/Project Directories/Portfolio/twilight/'

/content/drive/My Drive/Project Directories/Portfolio/twilight


In [34]:
# Display GPU type.
!nvidia-smi -L

GPU 0: Tesla P100-PCIE-16GB (UUID: GPU-93b5fcf7-321b-8d34-0f69-15b63cf66ec8)


In [35]:
# Import modules.
from tensorflow.keras.callbacks import LambdaCallback
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense
from tensorflow.keras.layers import LSTM
from tensorflow.keras.optimizers import RMSprop
from tensorflow.keras.utils import get_file
import numpy as np
import random
import sys
import io
import requests
import re


In [36]:
# Load raw text.
with open('twilight1.txt') as f:
    raw_text = f.read()

# Show first 1000 characters.
print(raw_text[0:1000])


My mother drove me to the airport with the windows rolled down. It was seventy-five degrees in 
Phoenix, the sky a perfect, cloudless blue. I was wearing my favorite shirt — sleeveless, white eyelet 
lace; I was wearing it as a farewell gesture. My carry-on item was a parka. 

In the Olympic Peninsula of northwest Washington State, a small town named Forks exists under a 
near-constant cover of clouds. It rains on this inconsequential town more than any other place in the 
United States of America. It was from this town and its gloomy, omnipresent shade that my mother 
escaped with me when I was only a few months old. It was in this town that I'd been compelled to spend 
a month every summer until I was fourteen. That was the year I finally put my foot down; these past three 
summers, my dad, Charlie, vacationed with me in California for two weeks instead. 

It was to Forks that I now exiled myself — an action that I took with great horror. I detested Forks. 

I loved Phoenix. I loved 

In [37]:
# Lowercase text and filter characters.
processed_text = raw_text.lower()
processed_text = re.sub(r'[^\x00-\x7f]',r'', processed_text) 

# Print total length and number of unique characters.
print('Total Length:', len(processed_text))
characters = sorted(list(set(processed_text)))

print('Total Characters:', len(characters))
char_indices = dict((c, i) for i, c in enumerate(characters))
indices_char = dict((i, c) for i, c in enumerate(characters))


Total Length: 665126
Total Characters: 50


In [38]:
# Slice the data into 40 character overlapping sections.
maxlen = 40
step = 3
sentences = []
next_chars = []
for i in range(0, len(processed_text) - maxlen, step):
    sentences.append(processed_text[i: i + maxlen])
    next_chars.append(processed_text[i + maxlen])
print('Number of Sequences:', len(sentences))


Number of Sequences: 221696


In [39]:
# Vectorize character sequences.
x = np.zeros((len(sentences), maxlen, len(characters)), dtype=np.bool)
y = np.zeros((len(sentences), len(characters)), dtype=np.bool)
for i, sentence in enumerate(sentences):
    for t, char in enumerate(sentence):
        x[i, t, char_indices[char]] = 1
    y[i, char_indices[next_chars[i]]] = 1


Deprecated in NumPy 1.20; for more details and guidance: https://numpy.org/devdocs/release/1.20.0-notes.html#deprecations
  
Deprecated in NumPy 1.20; for more details and guidance: https://numpy.org/devdocs/release/1.20.0-notes.html#deprecations
  This is separate from the ipykernel package so we can avoid doing imports until


In [40]:
# Build a simple single LSTM.
model = Sequential()
model.add(LSTM(128, input_shape=(maxlen, len(characters))))
model.add(Dense(len(characters), activation='softmax'))

optimizer = RMSprop(learning_rate=0.01)
model.compile(loss='categorical_crossentropy', optimizer=optimizer)


In [41]:
# Helper function to sample an index from a probability array.
def sample(preds, temperature=1.0):
    preds = np.asarray(preds).astype('float64')
    preds = np.log(preds) / temperature
    exp_preds = np.exp(preds)
    preds = exp_preds / np.sum(exp_preds)
    probas = np.random.multinomial(1, preds, 1)
    return np.argmax(probas)


In [42]:
# Function taken from Jeff Heaton: https://github.com/jeffheaton
def on_epoch_end(epoch, _):
    if epoch % 10 == 0:
        # Function invoked at end of each epoch. Prints generated text.
        print("******************************************************")
        print('----- Generating text after Epoch: %d' % epoch)

        start_index = random.randint(0, len(processed_text) - maxlen - 1)
        for temperature in [0.2, 0.5, 1.0, 1.2]:
            print('----- temperature:', temperature)

            generated = ''
            sentence = processed_text[start_index: start_index + maxlen]
            generated += sentence
            print('----- Generating with seed: "' + sentence + '"')
            sys.stdout.write(generated)

            for i in range(400):
                x_pred = np.zeros((1, maxlen, len(characters)))
                for t, char in enumerate(sentence):
                    x_pred[0, t, char_indices[char]] = 1.

                preds = model.predict(x_pred, verbose=0)[0]
                next_index = sample(preds, temperature)
                next_char = indices_char[next_index]

                generated += next_char
                sentence = sentence[1:] + next_char

                sys.stdout.write(next_char)
                sys.stdout.flush()
            print()


In [44]:
# Fit the model
print_callback = LambdaCallback(on_epoch_end=on_epoch_end)

model.fit(x, y,
          batch_size=128,
          epochs=61,
          callbacks=[print_callback])


Epoch 1/61
----- Generating text after Epoch: 0
----- temperature: 0.2
----- Generating with seed: "cross the sky, causing the sea to darken"
cross the sky, causing the sea to darken to the room and the stepper of the stagger of the stairs and the still the stairs were still the face that i was a few seconds and the still the stairs were so make the stairs and the room with the the was still and the scent of the dark of the stairs and the stairs were still answering and the still 
with the stairs and then i was too 
staring at the stairs and the stairs were something and then
----- temperature: 0.5
----- Generating with seed: "cross the sky, causing the sea to darken"
cross the sky, causing the sea to darken to break at me when i was a smotiously. 

"alice had to be all the door." 

"what do you could hear scan away from the day. i tried to be around of 
his eyes over the fall of my hand to see the door. 

"what you meant me bellate-dese of the side of the steering toward the way to sc

  after removing the cwd from sys.path.


from him not to some prover and helfqug?" i finally was friends with definitions workied magrape, all this was mornes it 
for the house day, wondering 
back. 

"how aid calm the shirror indeds?" i asked, making at repectate five. 

"we could tell that. , i never turned to even pafet. that i can't remember you boulf." 

his po
----- temperature: 1.2
----- Generating with seed: "cross the sky, causing the sea to darken"
cross the sky, causing the sea to darken, butting me of well, dismsoutting from us about bothering through every the stileing around then i would fee something ammmete in tyllly, bapercigity. it all ext. putitly was old voice up aware hee safe. 

my bat with debilly grasal like him. his inretriends inrocking. i 
couldn't be might hfglial her free ben a'd sofmd stoder, extrising. his jace esca. 

gettings if answer 
calmake thlick sawaye
Epoch 2/61
Epoch 3/61
Epoch 4/61
Epoch 5/61
Epoch 6/61
Epoch 7/61
Epoch 8/61
Epoch 9/61
Epoch 10/61
Epoch 11/61
----- Generating text aft

<keras.callbacks.History at 0x7f8c520b5950>

It's incredible to see how much our model was able to learn about the English language so quickly, but let's be honest: nobody would mistake this for human language. Moreover, even though this model is very simple, it still overfits very quickly - the text no longer resembles English after about 20 epochs. This makes sense, since one book just doesn't provide enough data for training an effective model. Next, let's see how we can approach this task with GPT-3.


## GPT-3
Next we'll perform the same task with GPT-3. Since GPT-3 is very adaptable we can use the raw data as input, however the version I'm using has a maximum token length of around 4,096. Therefore we can only choose this much of our text to serve as our example. We'll choose an aribtrary section of the book and ask GPT-3 to generate 250 tokens worth of dialogue based on the example.


In [4]:
# Load raw text.
with open('twilight1.txt') as f:
    raw_text = f.read()
import openai

In [5]:
import secret

def GPT_Generation(prompt, max_tokens, temperature):
    # Call API using my key
    openai.api_key = secret.API_KEY
    
    response = openai.Completion.create(
        engine="text-davinci-002",
        prompt=prompt,
        temperature=temperature,
        top_p = 1,
        max_tokens=max_tokens,
        frequency_penalty = 0,
        presence_penalty = 0
    )
    
    return response.choices[0].text


In [7]:
prompt_heading = 'Generate dialogue text based on the following example: '
prompt = prompt_heading + raw_text[15000:19000]

GPT_Generation(prompt=prompt, max_tokens=250, temperature=0.5)


' type, and I immediately distrusted \nhim. \n\n"Yeah," I muttered. \n\n"I\'m Eric Yorkie," he said, holding out his hand. I shook it. "I live a few houses down from your dad. I \nknow Charlie, he\'s a great guy." \n\n"Yeah," I said again. I was getting really good at this. \n\n"You just moved here from Arizona, right?" \n\n"Yeah." \n\n"Do you know where your next class is?" \n\nI shook my head, no. \n\n"Here, let me show you." He pulled my schedule out of my hand. "You have lunch now, so you don\'t \nhave to go anywhere. I\'ll walk you to your next class after lunch." \n\n"That\'s okay, I can find it." I tried to take the schedule back, but he was already walking away, and I \ndidn\'t want to make a scene. \n\n"I don\'t mind," he called over his shoulder. "I have to go this way anyway." \n\nI sighed and followed him.'

Unsurprisngly, the model produced gramatically correct dialogue. What is surprising, however, is how well GPT-3 was able to replicate the tone of the book with exposure to only 4,000 characters (and that includes line breaks!). We were able to far outperform our LSTM with less data, in less time, and with far fewer lines of code. While this particular example of transfer learning is fairly trivial, the applications are seemingly endless. And as more and more of these models are released to the public, the world's most powerful tools in artificial intelligence will increasingly be able to solve everyday problems for anybody with a laptop and some imagination.
