# TV Script Generation

The Recurrent Neural Network built below generates a new ,"fake" TV script, based on patterns it recognizes in this training data which is a part of the [Seinfeld dataset](https://www.kaggle.com/thec03u5/seinfeld-chronicles#scripts.csv) of scripts from 9 seasons.


In [1]:
# loading in data
import helper
data_dir = './data/Seinfeld_Scripts.txt'
text = helper.load_data(data_dir)

## Exploring the Data


In [2]:
view_line_range = (0, 10)


import numpy as np

print('Dataset Stats')
print('Roughly the number of unique words: {}'.format(len({word: None for word in text.split()})))

lines = text.split('\n')
print('Number of lines: {}'.format(len(lines)))
word_count_line = [len(line.split()) for line in lines]
print('Average number of words in each line: {}'.format(np.average(word_count_line)))

print()
print('The lines {} to {}:'.format(*view_line_range))
print('\n'.join(text.split('\n')[view_line_range[0]:view_line_range[1]]))

Dataset Stats
Roughly the number of unique words: 46367
Number of lines: 109233
Average number of words in each line: 5.544240293684143

The lines 0 to 10:
jerry: do you know what this is all about? do you know, why were here? to be out, this is out...and out is one of the single most enjoyable experiences of life. people...did you ever hear people talking about we should go out? this is what theyre talking about...this whole thing, were all out now, no one is home. not one person here is home, were all out! there are people trying to find us, they dont know where we are. (on an imaginary phone) did you ring?, i cant find him. where did he go? he didnt tell me where he was going. he must have gone out. you wanna go out you get ready, you pick out the clothes, right? you take the shower, you get all ready, get the cash, get your friends, the car, the spot, the reservation...then youre standing around, what do you do? you go we gotta be getting back. once youre out, you wanna get back! y


## Pre-processing Functions

- Lookup Table
- Tokenize Punctuation

### Lookup Table

Transforming words to ids. Creating two dictionaries:
- Dictionary to go from the words to an id - `vocab_to_int`
- Dictionary to go from the id to word - `int_to_vocab`



In [3]:
import problem_unittests as tests
from collections import Counter 

def create_lookup_tables(text):
   
    count = Counter(text)
    vocab = sorted(count, key=count.get, reverse = True)
    vocab_to_int = {word: idx for idx,word in enumerate(vocab,0)}
    int_to_vocab = {idx:word for word,idx in vocab_to_int.items()}
    
    
    return (vocab_to_int, int_to_vocab)


#Test code to check if Lookup tables work properly or not
tests.test_create_lookup_tables(create_lookup_tables)

Tests Passed


### Tokenize Punctuation

Function below returns a dictionary that converts punctuations into unique words

In [4]:

def token_lookup():
   
    token_dict = {'.':'||Period||',
                 ',':'||Comma||',
                 '"':'||Quotation_Mark||',
                 ';':'||Semicolon||',
                 '!':'||Exclamation_Mark||',
                 '?':'||Question_Mark||',
                 '(':'||Left_Parentheses||',
                 ')':'||Right_Parentheses||',
                 '-':'||Dash||',
                 '\n':'||Return||'}
    
    
    return token_dict

# Test code to check the token lookup dict
tests.test_tokenize(token_lookup)

Tests Passed


## Pre-processing all the data and save it


In [5]:
# preprocessing training data
helper.preprocess_and_save_data(data_dir, token_lookup, create_lookup_tables)

In [6]:
import helper
import problem_unittests as tests

#Loading the preprocessed data
int_text, vocab_to_int, int_to_vocab, token_dict = helper.load_preprocess()

## Build the Neural Network


### Check Access to GPU

In [7]:
import torch

# Check for a GPU
train_on_gpu = torch.cuda.is_available()


if not train_on_gpu:
    print('No GPU found. Please use a CPU to train your neural network.')

## Batching and Preparing the Data Loader


In [8]:
from torch.utils.data import TensorDataset, DataLoader
import torch

#Creating batch of words each having a sequence to be fed to RNN
def batch_data(words, sequence_length, batch_size):
  
    
    no_of_batches = int(len(words)/sequence_length)
    remainder = len(words)%sequence_length   
    length_index = no_of_batches * sequence_length
 
    
    X_data = []
    y_data = []
    for idx in range(0,length_index,sequence_length):
        X_data.append(words[idx:idx+sequence_length])
        if idx + sequence_length <length_index:
            y_data.append(words[idx+sequence_length])
        else:
            y_data.append(words[0])
   
    data = TensorDataset(torch.from_numpy(np.array(X_data)),torch.from_numpy(np.array(y_data)))
    data_loader = torch.utils.data.DataLoader(data, 
                                          batch_size=batch_size,shuffle = False)
    # returns a dataloader
    return data_loader


### Testing the dataloader 



In [9]:
# Test dataloader to check if data loader is working fine

test_text = range(50)
t_loader = batch_data(test_text, sequence_length=5, batch_size=10)

data_iter = iter(t_loader)
sample_x, sample_y = data_iter.next()

print(sample_x.shape)
print(sample_x)
print()
print(sample_y.shape)
print(sample_y)

torch.Size([10, 5])
tensor([[ 0,  1,  2,  3,  4],
        [ 5,  6,  7,  8,  9],
        [10, 11, 12, 13, 14],
        [15, 16, 17, 18, 19],
        [20, 21, 22, 23, 24],
        [25, 26, 27, 28, 29],
        [30, 31, 32, 33, 34],
        [35, 36, 37, 38, 39],
        [40, 41, 42, 43, 44],
        [45, 46, 47, 48, 49]], dtype=torch.int32)

torch.Size([10])
tensor([ 5, 10, 15, 20, 25, 30, 35, 40, 45,  0], dtype=torch.int32)



## Build the Neural Network


In [10]:
import torch.nn as nn

class RNN(nn.Module):
    
    def __init__(self, vocab_size, output_size, embedding_dim, hidden_dim, n_layers, dropout=0.5):
        
        super(RNN, self).__init__()
        
       
        
        #class variables
        self.output_size = output_size
        self.n_layers = n_layers
        self.hidden_dim = hidden_dim
        self.dropout = dropout
        self.embedding_dim = embedding_dim
        self.vocab_size = vocab_size
        
        #model layers
        self.embedding = nn.Embedding(vocab_size, embedding_dim)
        self.lstm = nn.LSTM(embedding_dim, hidden_dim, n_layers, dropout=dropout, batch_first=True)
        self.fc = nn.Linear(hidden_dim, output_size)
        
        
    
    def forward(self, nn_input, hidden):
      
        
        
        batch_size = nn_input.size(0)
        embeds = self.embedding(nn_input.long())
        lstm_out, hidden = self.lstm(embeds, hidden)
        
        lstm_out = lstm_out.contiguous().view(-1, self.hidden_dim)
        output = self.fc(lstm_out)
        output = output.view(batch_size, -1, self.output_size)
        output = output[:,-1]
        return output, hidden

    
    
    def init_hidden(self, batch_size):
      
        # Initializing hidden state with zero weights, and move to GPU if available
        
        weight = next(self.parameters())
        
        if (train_on_gpu):
            hidden = (weight.new(self.n_layers, batch_size, self.hidden_dim).zero_().cuda(),
                  weight.new(self.n_layers, batch_size, self.hidden_dim).zero_().cuda())
        else:
            hidden = (weight.new(self.n_layers, batch_size, self.hidden_dim).zero_(),
                      weight.new(self.n_layers, batch_size, self.hidden_dim).zero_())
        
        
        return hidden

# Test to check if the RNN works fine
tests.test_rnn(RNN, train_on_gpu)

Tests Passed


### Function to implement Forward pass and Backpropogation, to be called iteratively in training


In [11]:
def forward_back_prop(rnn, optimizer, criterion, inp, target, hidden):
   
    
    # Moving data to GPU, if available
    if(train_on_gpu):
        inp, target = inp.cuda(), target.cuda()
    
    # Performing backpropagation and optimization
    hidden = tuple([each.data for each in hidden])
    
    # Zero accumulated gradients
    rnn.zero_grad()
        
    # Getting the output from the model
    output, hidden = rnn(inp, hidden)
    
    # Calculate the loss and performing backprop
    loss = criterion(output.squeeze(), target.long())
    loss.backward()
    
    #Clipping the Output
    clip = 5
    nn.utils.clip_grad_norm_(rnn.parameters(), clip)
    
    optimizer.step()
    
    # return the loss over a batch and the hidden state produced by our model
    return loss.item(),hidden

#Test to check if backpropogation worked properly or not
tests.test_forward_back_prop(RNN, forward_back_prop, train_on_gpu)

Tests Passed


## Neural Network Training



In [12]:

def train_rnn(rnn, batch_size, optimizer, criterion, n_epochs, show_every_n_batches=100):
    batch_losses = []
    
    #Switching to training mode
    rnn.train()

    print("Training for %d epoch(s)..." % n_epochs)
    for epoch_i in range(1, n_epochs + 1):
      
        # Initializing hidden state
        hidden = rnn.init_hidden(batch_size)
        
        for batch_i, (inputs, labels) in enumerate(train_loader, 1):

            # Making sure that it iterates over completely full batches only       
            n_batches = len(train_loader.dataset)//batch_size
            if(batch_i > n_batches):
                break
            
            # Forward and Back prop
            loss, hidden = forward_back_prop(rnn, optimizer, criterion, inputs, labels, hidden)          
            
            # Recording loss
            batch_losses.append(loss)

            # Printing loss stats
            if batch_i % show_every_n_batches == 0:
                print('Epoch: {:>4}/{:<4}  Loss: {}\n'.format(
                    epoch_i, n_epochs, np.average(batch_losses)))
                batch_losses = []

    # Returns a trained rnn
    return rnn

### Hyperparameters


In [13]:
# Data parameters

# Sequence Length
sequence_length = 5 

# Batch Size
batch_size = 16

# data loader 
train_loader = batch_data(int_text, sequence_length, batch_size)

In [14]:
# Training and Model parameters

# Number of Epochs
num_epochs = 100

# Learning Rate
learning_rate =  0.0005

# Vocab size
vocab_size = len(vocab_to_int)

# Output size
output_size = len(vocab_to_int)

# Embedding Dimension
embedding_dim = 300

# Hidden Dimension
hidden_dim = 512

# Number of RNN Layers
n_layers = 3

# Shows stats for every n number of batches
show_every_n_batches = 500

### Training the model


In [15]:
# Creating model and moving to gpu if available
rnn = RNN(vocab_size, output_size, embedding_dim, hidden_dim, n_layers, dropout=0.3)
if train_on_gpu:
    rnn.cuda()

# Defining loss and optimization functions for training
optimizer = torch.optim.Adam(rnn.parameters(), lr=learning_rate)
criterion = nn.CrossEntropyLoss()

# Rraining the model
trained_rnn = train_rnn(rnn, batch_size, optimizer, criterion, num_epochs, show_every_n_batches)

# Saving the trained model
helper.save_model('./save/trained_rnn', trained_rnn)
print('Model Trained and Saved')

Training for 100 epoch(s)...
Epoch:    1/100   Loss: 6.040655288219452

Epoch:    1/100   Loss: 5.410666715621948

Epoch:    1/100   Loss: 5.270196217536927

Epoch:    1/100   Loss: 5.044804047584534

Epoch:    1/100   Loss: 4.98444661617279

Epoch:    1/100   Loss: 4.8703562121391295

Epoch:    1/100   Loss: 4.976281734466553

Epoch:    1/100   Loss: 4.908806661128998

Epoch:    1/100   Loss: 4.988546413898468

Epoch:    1/100   Loss: 5.0569617257118225

Epoch:    1/100   Loss: 4.923842119693756

Epoch:    1/100   Loss: 4.903690997123718

Epoch:    1/100   Loss: 4.646698795318604

Epoch:    1/100   Loss: 4.662283033132553

Epoch:    1/100   Loss: 4.7176619181633

Epoch:    1/100   Loss: 4.669923205852508

Epoch:    1/100   Loss: 4.802460240125656

Epoch:    1/100   Loss: 4.827149765491486

Epoch:    1/100   Loss: 4.7838299612998965

Epoch:    1/100   Loss: 4.70973098039627

Epoch:    1/100   Loss: 4.75579637336731

Epoch:    1/100   Loss: 4.68900836801529

Epoch:    2/100   Loss: 4.42

Epoch:    9/100   Loss: 2.8001311280727386

Epoch:    9/100   Loss: 2.8097998044490815

Epoch:    9/100   Loss: 2.827374601840973

Epoch:    9/100   Loss: 2.9320889863967894

Epoch:    9/100   Loss: 2.8679527010917663

Epoch:    9/100   Loss: 2.940863967895508

Epoch:    9/100   Loss: 2.801225285768509

Epoch:    9/100   Loss: 2.8190861628055575

Epoch:    9/100   Loss: 2.8613525826931

Epoch:   10/100   Loss: 2.8419988148040303

Epoch:   10/100   Loss: 2.7729862763881683

Epoch:   10/100   Loss: 2.829536828517914

Epoch:   10/100   Loss: 2.749363852739334

Epoch:   10/100   Loss: 2.7506470301151276

Epoch:   10/100   Loss: 2.7217438914775847

Epoch:   10/100   Loss: 2.7419506204128266

Epoch:   10/100   Loss: 2.6599526870250703

Epoch:   10/100   Loss: 2.7779668006896974

Epoch:   10/100   Loss: 2.8047512221336364

Epoch:   10/100   Loss: 2.7724271779060365

Epoch:   10/100   Loss: 2.672317621231079

Epoch:   10/100   Loss: 2.611170082092285

Epoch:   10/100   Loss: 2.6801391530036924

Epoch:   18/100   Loss: 2.086960557460785

Epoch:   18/100   Loss: 2.078433911442757

Epoch:   18/100   Loss: 2.0557120995521547

Epoch:   18/100   Loss: 2.0309849886894225

Epoch:   18/100   Loss: 2.0611709640026095

Epoch:   18/100   Loss: 2.0761287322044373

Epoch:   18/100   Loss: 2.0776810441017153

Epoch:   18/100   Loss: 1.9818528612852098

Epoch:   18/100   Loss: 1.9920115228891373

Epoch:   18/100   Loss: 2.0182190790176393

Epoch:   18/100   Loss: 1.9974124335050583

Epoch:   18/100   Loss: 2.0309405646324157

Epoch:   18/100   Loss: 2.104115943431854

Epoch:   18/100   Loss: 2.034338116645813

Epoch:   18/100   Loss: 2.0794593465328215

Epoch:   18/100   Loss: 2.0093711377382277

Epoch:   18/100   Loss: 1.9721731683015824

Epoch:   18/100   Loss: 2.023742447972298

Epoch:   19/100   Loss: 2.0342963076040674

Epoch:   19/100   Loss: 2.0335377564430237

Epoch:   19/100   Loss: 2.0871941568851473

Epoch:   19/100   Loss: 2.0511489218473433

Epoch:   19/100   Loss: 2.03640179944

Epoch:   26/100   Loss: 1.720633666753769

Epoch:   26/100   Loss: 1.6787320615053176

Epoch:   26/100   Loss: 1.7163379127383231

Epoch:   26/100   Loss: 1.6431639751195908

Epoch:   26/100   Loss: 1.626405514240265

Epoch:   26/100   Loss: 1.6674654787778855

Epoch:   27/100   Loss: 1.6863804172077852

Epoch:   27/100   Loss: 1.6601920698285102

Epoch:   27/100   Loss: 1.7203833204507828

Epoch:   27/100   Loss: 1.69688291990757

Epoch:   27/100   Loss: 1.7001142145395278

Epoch:   27/100   Loss: 1.6858959869146346

Epoch:   27/100   Loss: 1.6500632890462876

Epoch:   27/100   Loss: 1.616439427256584

Epoch:   27/100   Loss: 1.639740175306797

Epoch:   27/100   Loss: 1.6665327727794648

Epoch:   27/100   Loss: 1.6697401793003082

Epoch:   27/100   Loss: 1.610688008904457

Epoch:   27/100   Loss: 1.6112364857196808

Epoch:   27/100   Loss: 1.6450995041131973

Epoch:   27/100   Loss: 1.6155040404796601

Epoch:   27/100   Loss: 1.6331935123205186

Epoch:   27/100   Loss: 1.6769618018865

Epoch:   35/100   Loss: 1.4960740988254546

Epoch:   35/100   Loss: 1.4546201450824738

Epoch:   35/100   Loss: 1.4682445623874665

Epoch:   35/100   Loss: 1.4611331186294556

Epoch:   35/100   Loss: 1.4901838989257812

Epoch:   35/100   Loss: 1.4203950758576394

Epoch:   35/100   Loss: 1.4289217718839646

Epoch:   35/100   Loss: 1.472645123243332

Epoch:   35/100   Loss: 1.4239641065001487

Epoch:   35/100   Loss: 1.4324942992925644

Epoch:   35/100   Loss: 1.486570269882679

Epoch:   35/100   Loss: 1.4331076099872588

Epoch:   35/100   Loss: 1.46318312728405

Epoch:   35/100   Loss: 1.4234252409338952

Epoch:   35/100   Loss: 1.384642048895359

Epoch:   35/100   Loss: 1.4253462996482849

Epoch:   36/100   Loss: 1.4296383872376426

Epoch:   36/100   Loss: 1.4454560743570328

Epoch:   36/100   Loss: 1.4943590505719184

Epoch:   36/100   Loss: 1.4749989762306213

Epoch:   36/100   Loss: 1.4856925187706946

Epoch:   36/100   Loss: 1.4653575343191623

Epoch:   36/100   Loss: 1.45812451517

Epoch:   43/100   Loss: 1.3383355278968811

Epoch:   43/100   Loss: 1.3158555383980275

Epoch:   43/100   Loss: 1.292631700873375

Epoch:   43/100   Loss: 1.2713959225416183

Epoch:   44/100   Loss: 1.3315731769974148

Epoch:   44/100   Loss: 1.3393407599925995

Epoch:   44/100   Loss: 1.34743254083395

Epoch:   44/100   Loss: 1.3588843821287155

Epoch:   44/100   Loss: 1.3424968831539155

Epoch:   44/100   Loss: 1.3356542936563491

Epoch:   44/100   Loss: 1.3240026867985726

Epoch:   44/100   Loss: 1.313841604232788

Epoch:   44/100   Loss: 1.3448226485848427

Epoch:   44/100   Loss: 1.3313287580013276

Epoch:   44/100   Loss: 1.334609101653099

Epoch:   44/100   Loss: 1.2787523938119412

Epoch:   44/100   Loss: 1.3030830036401748

Epoch:   44/100   Loss: 1.317657398045063

Epoch:   44/100   Loss: 1.2801807554960252

Epoch:   44/100   Loss: 1.2746561449170113

Epoch:   44/100   Loss: 1.3190934264063836

Epoch:   44/100   Loss: 1.3134543684720994

Epoch:   44/100   Loss: 1.329312016904

Epoch:   52/100   Loss: 1.2761895617842673

Epoch:   52/100   Loss: 1.2509381562173367

Epoch:   52/100   Loss: 1.2472553417682648

Epoch:   52/100   Loss: 1.2031101402640343

Epoch:   52/100   Loss: 1.210433324366808

Epoch:   52/100   Loss: 1.2265746412873269

Epoch:   52/100   Loss: 1.182611231982708

Epoch:   52/100   Loss: 1.1958810970187188

Epoch:   52/100   Loss: 1.2243231406211854

Epoch:   52/100   Loss: 1.1984041187763215

Epoch:   52/100   Loss: 1.24478229701519

Epoch:   52/100   Loss: 1.2190803923010827

Epoch:   52/100   Loss: 1.1742088383436202

Epoch:   52/100   Loss: 1.198640151411295

Epoch:   53/100   Loss: 1.2320214658517807

Epoch:   53/100   Loss: 1.2231573364138604

Epoch:   53/100   Loss: 1.2869290744066237

Epoch:   53/100   Loss: 1.2603608994483948

Epoch:   53/100   Loss: 1.265652982980013

Epoch:   53/100   Loss: 1.2435735518336295

Epoch:   53/100   Loss: 1.2410314518213272

Epoch:   53/100   Loss: 1.1828307678699495

Epoch:   53/100   Loss: 1.211847623407

Epoch:   60/100   Loss: 1.1119333633184434

Epoch:   60/100   Loss: 1.1126016315817833

Epoch:   61/100   Loss: 1.1567154910569917

Epoch:   61/100   Loss: 1.149851826608181

Epoch:   61/100   Loss: 1.1740594705045224

Epoch:   61/100   Loss: 1.214702404886484

Epoch:   61/100   Loss: 1.1962625142633916

Epoch:   61/100   Loss: 1.1607718689441682

Epoch:   61/100   Loss: 1.1562195582985877

Epoch:   61/100   Loss: 1.130396901667118

Epoch:   61/100   Loss: 1.1476802214384079

Epoch:   61/100   Loss: 1.2525934011936188

Epoch:   61/100   Loss: 1.1833229419589042

Epoch:   61/100   Loss: 1.1181577155292035

Epoch:   61/100   Loss: 1.120846839785576

Epoch:   61/100   Loss: 1.1487484121918679

Epoch:   61/100   Loss: 1.1464094352126122

Epoch:   61/100   Loss: 1.112542158126831

Epoch:   61/100   Loss: 1.1517143556177616

Epoch:   61/100   Loss: 1.132456440716982

Epoch:   61/100   Loss: 1.128816054880619

Epoch:   61/100   Loss: 1.1127010806202888

Epoch:   61/100   Loss: 1.1040592663288

Epoch:   69/100   Loss: 1.1437993140220641

Epoch:   69/100   Loss: 1.0650379311740399

Epoch:   69/100   Loss: 1.0933371355831623

Epoch:   69/100   Loss: 1.1093712058663368

Epoch:   69/100   Loss: 1.0981013917326927

Epoch:   69/100   Loss: 1.0674306287467479

Epoch:   69/100   Loss: 1.1433769496381283

Epoch:   69/100   Loss: 1.1117397125065327

Epoch:   69/100   Loss: 1.0868792399168015

Epoch:   69/100   Loss: 1.0992056745886802

Epoch:   69/100   Loss: 1.0393076107352972

Epoch:   69/100   Loss: 1.0938908914923668

Epoch:   70/100   Loss: 1.1034936282995476

Epoch:   70/100   Loss: 1.1000006101429463

Epoch:   70/100   Loss: 1.1696375771164893

Epoch:   70/100   Loss: 1.145891383200884

Epoch:   70/100   Loss: 1.125124931037426

Epoch:   70/100   Loss: 1.1452551363259553

Epoch:   70/100   Loss: 1.104897742986679

Epoch:   70/100   Loss: 1.0982152316868306

Epoch:   70/100   Loss: 1.0813649859279395

Epoch:   70/100   Loss: 1.122710790604353

Epoch:   70/100   Loss: 1.1021998423

Epoch:   78/100   Loss: 1.1027156180271538

Epoch:   78/100   Loss: 1.0606385247260333

Epoch:   78/100   Loss: 1.0891618483662606

Epoch:   78/100   Loss: 1.106764804959297

Epoch:   78/100   Loss: 1.10330250787735

Epoch:   78/100   Loss: 1.1511269960403443

Epoch:   78/100   Loss: 1.0406243834793567

Epoch:   78/100   Loss: 1.0229905273914337

Epoch:   78/100   Loss: 1.0510913500487804

Epoch:   78/100   Loss: 1.0867737832963467

Epoch:   78/100   Loss: 1.0361911509931088

Epoch:   78/100   Loss: 1.006834113806486

Epoch:   78/100   Loss: 1.0509291328191757

Epoch:   78/100   Loss: 1.0364281262755395

Epoch:   78/100   Loss: 1.0530146387517452

Epoch:   78/100   Loss: 0.9989553468227387

Epoch:   78/100   Loss: 1.0710115200281143

Epoch:   78/100   Loss: 1.0383929712474347

Epoch:   78/100   Loss: 1.0300476654469968

Epoch:   78/100   Loss: 1.0375490326285361

Epoch:   78/100   Loss: 1.0018079055249691

Epoch:   78/100   Loss: 1.028533718854189

Epoch:   79/100   Loss: 1.10350956722

Epoch:   86/100   Loss: 1.0265892345309258

Epoch:   86/100   Loss: 1.0113543391525746

Epoch:   86/100   Loss: 0.9995378427058458

Epoch:   86/100   Loss: 0.9647544043660163

Epoch:   86/100   Loss: 1.0461338523030281

Epoch:   86/100   Loss: 0.9983386301994324

Epoch:   86/100   Loss: 1.0158029569163918

Epoch:   86/100   Loss: 0.9956066379547119

Epoch:   86/100   Loss: 0.9559389369785786

Epoch:   86/100   Loss: 0.9971070837378502

Epoch:   87/100   Loss: 1.0288761359526448

Epoch:   87/100   Loss: 1.026000640243292

Epoch:   87/100   Loss: 1.016850914940238

Epoch:   87/100   Loss: 1.081895894765854

Epoch:   87/100   Loss: 1.0265400621593

Epoch:   87/100   Loss: 1.0653753851652146

Epoch:   87/100   Loss: 1.0197503186911345

Epoch:   87/100   Loss: 0.9980635887980461

Epoch:   87/100   Loss: 1.0331145013868808

Epoch:   87/100   Loss: 1.0854660280942916

Epoch:   87/100   Loss: 1.034721659809351

Epoch:   87/100   Loss: 0.9609063652455807

Epoch:   87/100   Loss: 1.0123389791548

Epoch:   95/100   Loss: 0.9547573246061802

Epoch:   95/100   Loss: 1.0379504434764386

Epoch:   95/100   Loss: 1.0115966491401196

Epoch:   95/100   Loss: 0.9826254389584065

Epoch:   95/100   Loss: 1.0160501558482646

Epoch:   95/100   Loss: 0.9819023158550263

Epoch:   95/100   Loss: 0.9635925276279449

Epoch:   95/100   Loss: 0.9801190504580736

Epoch:   95/100   Loss: 1.0411311224102975

Epoch:   95/100   Loss: 0.9690788328051567

Epoch:   95/100   Loss: 0.9279839498400688

Epoch:   95/100   Loss: 1.002475690662861

Epoch:   95/100   Loss: 0.9966563975811005

Epoch:   95/100   Loss: 0.9634034254252911

Epoch:   95/100   Loss: 0.9285311163663864

Epoch:   95/100   Loss: 0.9923441169857978

Epoch:   95/100   Loss: 0.9539189417660237

Epoch:   95/100   Loss: 0.9786729808300734

Epoch:   95/100   Loss: 0.9734951550960541

Epoch:   95/100   Loss: 0.9020240970551968

Epoch:   95/100   Loss: 0.9343758348822594

Epoch:   96/100   Loss: 0.9840129645921851

Epoch:   96/100   Loss: 0.9522674

In [16]:
#Creating a Checkpoint

import torch
import helper
import problem_unittests as tests

_, vocab_to_int, int_to_vocab, token_dict = helper.load_preprocess()
trained_rnn = helper.load_model('./save/trained_rnn')

## Generate TV Script


### Generate Text
To generate the text, the network needs to start with a single word and repeat its predictions until it reaches a set length. 

In [17]:

import torch.nn.functional as F

def generate(rnn, prime_id, int_to_vocab, token_dict, pad_value, predict_len=100):
   
    #Switching to Eval mode
    rnn.eval()
    
    # Creating a sequence (batch_size=1) with the prime_id
    current_seq = np.full((1, sequence_length), pad_value)
    current_seq[-1][-1] = prime_id
    predicted = [int_to_vocab[prime_id]]
    
    for _ in range(predict_len):
        if train_on_gpu:
            current_seq = torch.LongTensor(current_seq).cuda()
        else:
            current_seq = torch.LongTensor(current_seq)
        
        # Initializing the hidden state
        hidden = rnn.init_hidden(current_seq.size(0))
        
        # Getting the output of the rnn
        output, _ = rnn(current_seq, hidden)
        
        # Getting the next word probabilities
        p = F.softmax(output, dim=1).data
        if(train_on_gpu):
            p = p.cpu() # moving to CPU
         
        # Uses top_k sampling to get the index of the next word
        top_k = 5
        p, top_i = p.topk(top_k)
        top_i = top_i.numpy().squeeze()
        
        # Selects the likely next word index with some element of randomness
        p = p.numpy().squeeze()
        word_i = np.random.choice(top_i, p=p/p.sum())
        
        # Retrieves that word from the dictionary
        word = int_to_vocab[word_i]
        predicted.append(word)     
        
        if(train_on_gpu):
            current_seq = current_seq.cpu() # move to cpu
        # The generated word becomes the next "current sequence" and the cycle can continue
        if train_on_gpu:
            current_seq = current_seq.cpu()
        current_seq = np.roll(current_seq, -1, 1)
        current_seq[-1][-1] = word_i
    
    gen_sentences = ' '.join(predicted)
    
    # Replaces punctuation tokens
    for key, token in token_dict.items():
        ending = ' ' if key in ['\n', '(', '"'] else ''
        gen_sentences = gen_sentences.replace(' ' + token.lower(), key)
    gen_sentences = gen_sentences.replace('\n ', '\n')
    gen_sentences = gen_sentences.replace('( ', '(')
    
    # return all the sentences
    return gen_sentences

### Generating a New Script
Set `gen_length` to the length of TV script you want to generate and set `prime_word` to one of the following to start the prediction:
- "jerry"
- "elaine"
- "george"
- "kramer"

You can also start with any other names you find in the original text file!

In [18]:
gen_length = 400 # modify the length to your preference
prime_word = 'elaine' # name for starting the script


pad_word = helper.SPECIAL_WORDS['PADDING']
generated_script = generate(trained_rnn, vocab_to_int[prime_word + ':'], int_to_vocab, token_dict, vocab_to_int[pad_word], gen_length)
print(generated_script)

elaine: oh yeah, yeah. i mean, the last thing is good work. i'm doing this night! we are just leaving at the one of the.

jerry: okay.

jerry: what are you doing?

elaine: i am also at them and heavy stared and on the other side, and she's got a great friend. water.

kramer: what happened?

elaine: i don't know it.

george: most thing every one gift"""" i don't think i'll stay you open it.

kramer: oh, damn rate? what are you going about?

kramer: no, no. it's not the money. she's got the lot of love of restaurant? they know what's going on?

kramer: i don't live it in my room, but i guess we should stay one of these toys.

kramer: best one water in paris with every?

bob: where were bandages them on the sidewalk) yeah, yeah. we'll leave.

kramer: no, you're not giving myself in there.

jerry: ] costanza costanza wont me because i'm like a leper askew! it's very white. oh look at this?"

elaine: nazi: there's pretty much man. i bet there's an we were talking to her?

morty: that's the 

#### Saving the favorite scripts


In [19]:
# Save script to a text file
f =  open("generated_script_1.txt","w")
f.write(generated_script)
f.close()