## <center>Text Generation Using Recurrent Neural Networks </center>

#### **IMPORTING THE NECESSARY LIBRARIES**
<ul>
  <li>Pytorch</li>
  <li>NLTK</li>
  <li>String</li>
</ul>

In [1]:
import torch
import torch.nn as nn
from pprint import pprint
import string
from torch.utils.data import Dataset, DataLoader
from collections import Counter
import torch.optim as optim
import nltk
from nltk.corpus import stopwords
from nltk.stem import WordNetLemmatizer
from nltk import word_tokenize, pos_tag

#### **CHARACTER LEVEL PREDICTION**

In [2]:
data="When it comes to generating text, GANs and LSTMs have different approaches. LSTMs excel at capturing sequential patterns and context, making them well-suited for tasks like language translation and text summarization. However, they can struggle with creativity and diversity in their output. On the other hand, GANs are designed to generate novel and diverse text by learning the underlying data distribution. While they can produce more creative content, GANs can be challenging to train and evaluate, and may require additional techniques to ensure coherence and fluency. Ultimately, the choice between GANs and LSTMs depends on the specific text generation task and the desired output: if you need coherent and natural-sounding text, LSTMs might be the better choice, but if you want to generate creative and diverse content, GANs could be the way to go."
chars=list(set(data))

char_to_idx={char:i for i,char in enumerate(chars)}
idx_to_char={i:char for i,char in enumerate(chars)}


#### **MAKING CHARACTER PREDICTION MODEL USING RNN**

In [3]:
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")


In [4]:
class RNNModel(nn.Module):
    def __init__(self, input_size, hidden_size, output_size):
        super(RNNModel, self).__init__()
        self.hidden_size = hidden_size
        self.lstm = nn.LSTM(input_size, hidden_size, batch_first=True)
        self.fc = nn.Linear(hidden_size, output_size)

    def forward(self, x):
        # Initialize hidden and cell states
        h0 = torch.zeros(1, x.size(0), self.hidden_size).to(x.device)
        c0 = torch.zeros(1, x.size(0), self.hidden_size).to(x.device)
        
        # Forward pass through LSTM
        out, _ = self.lstm(x, (h0, c0))
        
        # Pass the last time step's output through a fully connected layer
        out = self.fc(out[:, -1, :])
        return out


#### **DEFINING THE MODEL, CRITERION AND OPTIMIZER**

In [5]:
model = RNNModel(len(chars), 16, len(chars))
criterion=nn.CrossEntropyLoss()
optimizer=torch.optim.Adam(model.parameters(),lr=0.01)


In [6]:
inputs=[char_to_idx[ch] for ch in data[:-1]]
targets=[char_to_idx[ch] for ch in data[1:]]

inputs=torch.tensor(inputs,dtype=torch.long).view(-1,1)
inputs=nn.functional.one_hot(inputs,num_classes=len(chars)).float()

targets=torch.tensor(targets,dtype=torch.long)


#### **TRAINING THE MODEL WITH 100 EPOCHS**

In [7]:
i=0
for epoch in range(800):
    model.train()
    outputs=model(inputs)
    loss=criterion(outputs,targets)
    optimizer.zero_grad()
    loss.backward()
    optimizer.step()
    i+=1
    if(i%100==0):
        print(f"epoch {epoch+1}, loss {loss.item()}")
    

epoch 100, loss 2.1541919708251953
epoch 200, loss 2.0188331604003906
epoch 300, loss 1.9942307472229004
epoch 400, loss 1.986881971359253
epoch 500, loss 1.9836246967315674
epoch 600, loss 1.9818694591522217
epoch 700, loss 1.9808194637298584
epoch 800, loss 1.9801398515701294


#### **TESTING THE MODEL**

In [8]:
model.eval()
test_input=char_to_idx['S']
test_input=nn.functional.one_hot(torch.tensor(test_input).view(-1,1),num_classes=len(chars)).float()
pred_output=model(test_input)
pred_char=torch.argmax(pred_output,1).item()
pred_char = idx_to_char[pred_char]


In [9]:
print(pred_char)

T


#### **WORD LEVEL PREDICTION**

In [10]:
data="When it comes to generating text, GANs and LSTMs have different approaches. LSTMs excel at capturing sequential patterns and context, making them well-suited for tasks like language translation and text summarization. However, they can struggle with creativity and diversity in their output. On the other hand, GANs are designed to generate novel and diverse text by learning the underlying data distribution. While they can produce more creative content, GANs can be challenging to train and evaluate, and may require additional techniques to ensure coherence and fluency. Ultimately, the choice between GANs and LSTMs depends on the specific text generation task and the desired output: if you need coherent and natural-sounding text, LSTMs might be the better choice, but if you want to generate creative and diverse content, GANs could be the way to go."


In [11]:
data = data.translate(str.maketrans('', '', string.punctuation))
words=word_tokenize(data)
vocab=list(set(words))
words_to_idx={word:i for i,word in enumerate(vocab)}
idx_to_words={i:word for i,word in enumerate(vocab)}



In [12]:
vocab_size = len(vocab)

model2 = RNNModel(input_size=vocab_size,hidden_size= 16, output_size=vocab_size)
criterion=nn.CrossEntropyLoss()
optimizer=torch.optim.Adam(model2.parameters(),lr=0.01)


In [13]:

word_inputs = [words_to_idx.get(ch, -1) for ch in data.split()[:-1]]  
word_targets = [words_to_idx.get(ch, -1) for ch in data.split()[1:]]


word_inputs = [x for x in word_inputs if x != -1]
word_targets = [x for x in word_targets if x != -1]


word_inputs = torch.tensor(word_inputs, dtype=torch.long).view(-1, 1)
word_targets = torch.tensor(word_targets, dtype=torch.long)

# One-hot encoding
word_inputs = nn.functional.one_hot(word_inputs, num_classes=vocab_size).float()


In [14]:
i=0
for epoch in range(800):
    model2.train()
    word_outputs=model2(word_inputs)
    loss=criterion(word_outputs,word_targets)
    optimizer.zero_grad()
    loss.backward()
    optimizer.step()
    i+=1
    if(i%100==0):
        print(f"epoch {epoch+1}, loss {loss.item()}")

epoch 100, loss 0.7227327227592468
epoch 200, loss 0.6480339169502258


epoch 300, loss 0.6387345194816589
epoch 400, loss 0.6353968381881714
epoch 500, loss 0.6337555646896362
epoch 600, loss 0.6328078508377075
epoch 700, loss 0.6322029232978821
epoch 800, loss 0.6317892670631409


In [15]:
def predict_word(word:str):
    model2.eval()
    test_input=words_to_idx[word]
    test_input=nn.functional.one_hot(torch.tensor(test_input).view(-1,1),num_classes=len(vocab)).float()
    pred_output=model2(test_input)
    pred_char=torch.argmax(pred_output,1).item()
    pred_char = idx_to_words[pred_char]
    return pred_char


In [16]:
for i in vocab:
    output_pred=predict_word(i)
    print(f"Input word is '{i}' and predicted next word is '{output_pred}' " )

Input word is 'language' and predicted next word is 'translation' 
Input word is 'they' and predicted next word is 'can' 
Input word is 'way' and predicted next word is 'to' 
Input word is 'learning' and predicted next word is 'the' 
Input word is 'to' and predicted next word is 'generate' 
Input word is 'train' and predicted next word is 'and' 
Input word is 'like' and predicted next word is 'language' 
Input word is 'the' and predicted next word is 'choice' 
Input word is 'other' and predicted next word is 'hand' 
Input word is 'tasks' and predicted next word is 'like' 
Input word is 'capturing' and predicted next word is 'sequential' 
Input word is 'creative' and predicted next word is 'content' 
Input word is 'additional' and predicted next word is 'techniques' 
Input word is 'are' and predicted next word is 'designed' 
Input word is 'their' and predicted next word is 'output' 
Input word is 'want' and predicted next word is 'to' 
Input word is 'excel' and predicted next word is 'a

#### **NEXT SENTENCE GEENRATION**

In [17]:
with open('alice.txt','r',encoding='utf-8') as file:
    text=file.read()

In [18]:
words=text.split()
word_count=Counter(words)
vocab=list(word_count.keys())
vocab_size=len(vocab)

In [19]:
word_to_idx={i:word for word,i in enumerate(vocab)}
idx_to_Word={word:i for word,i in enumerate(vocab)}


In [20]:
SEQUENCE_LENGTH = 64
samples = [words[i:i+SEQUENCE_LENGTH+1] for i in range(len(words)-SEQUENCE_LENGTH)]


In [21]:
class textloader(Dataset):
    def __init__(self,samples,word_to_idx):
        self.samples=samples
        self.word_to_idx=word_to_idx
        
    def __len__(self):
        return len(self.samples)
    
    def __getitem__(self,idx):
        samples=self.samples[idx]
        input_seq=torch.LongTensor([self.word_to_idx[word] for word in samples[:-1]])
        target_seq=torch.LongTensor([self.word_to_idx[word] for word in samples[1:]])
        return input_seq, target_seq

In [22]:
batch_size=12
dataset=textloader(samples,word_to_idx)
dataloader=DataLoader(dataset,batch_size=batch_size,shuffle=True)
print(dataset[1])

(tensor([ 1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12, 13, 14, 15, 16, 17, 18,
        19, 20, 21, 22, 23, 19, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 31,
        35, 13, 36, 15, 13, 37, 38, 39, 40, 22, 41, 10, 33, 42, 18, 43, 44,  2,
         3,  4, 45, 46,  7,  8, 47, 48, 49, 50]), tensor([ 2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19,
        20, 21, 22, 23, 19, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 31, 35,
        13, 36, 15, 13, 37, 38, 39, 40, 22, 41, 10, 33, 42, 18, 43, 44,  2,  3,
         4, 45, 46,  7,  8, 47, 48, 49, 50, 51]))


In [23]:
class TextGenerationModel(nn.Module):
    def __init__(self,vocab_size,embedding_dim,hidden_size,num_layers):
        super(TextGenerationModel, self).__init__()
        self.embedding=nn.Embedding(vocab_size,embedding_dim)
        self.lstm=nn.LSTM(input_size=embedding_dim,hidden_size=hidden_size,num_layers=num_layers,batch_first=True)
        self.fc=nn.Linear(hidden_size,vocab_size)
        self.hidden_size=hidden_size
        self.num_layers=num_layers
        
    def forward(self,x,hidden=None):
        if hidden==None:
            hidden=self.init_hidden(x.shape[0])
        x=self.embedding(x)
        out,(h_n,c_n)=self.lstm(x,hidden)
        out=out.contiguous().view(-1,self.hidden_size)
        out=self.fc(out)
        return out,(h_n,c_n)
    
    def init_hidden(self, batch_size):
        h0 = torch.zeros(self.num_layers, batch_size, self.hidden_size).to(device)
        c0 = torch.zeros(self.num_layers, batch_size, self.hidden_size).to(device)
        return h0, c0
    
    
        

In [24]:
embedding_dim = 16
hidden_size = 32
num_layers = 1
learning_rate = 0.01
epochs = 50

In [25]:
device=('cuda' if torch.cuda.is_available() else 'cpu')
model=TextGenerationModel(vocab_size,embedding_dim,hidden_size,num_layers).to(device)
criterion=nn.CrossEntropyLoss()
optimizer=optim.Adam(model.parameters(),lr=learning_rate)



In [48]:
def train(model,epochs,dataloader,criterion,optimizer):
    model.train()
    for epoch in range(epochs):
        epoch_loss=0
        for input_Seq,target_Seq in dataloader:
            input_Seq,target_Seq=input_Seq.to(device),target_Seq.to(device)
            outputs,_=model(input_Seq)
            loss=criterion(outputs,target_Seq.view(-1))
            
            optimizer.zero_grad()
            loss.backward()
            optimizer.step()
            epoch_loss+=loss.detach().cpu().numpy()
        epoch_loss /= len(dataloader)
        print(f"Epoch {epoch} loss: {epoch_loss:.3f}")


In [26]:


train(model,epochs,dataloader,criterion)
        
    

Epoch 0 loss: 3.547
Epoch 1 loss: 1.931
Epoch 2 loss: 1.524
Epoch 3 loss: 1.336
Epoch 4 loss: 1.226
Epoch 5 loss: 1.154
Epoch 6 loss: 1.105
Epoch 7 loss: 1.066
Epoch 8 loss: 1.040
Epoch 9 loss: 1.014
Epoch 10 loss: 0.993
Epoch 11 loss: 0.976
Epoch 12 loss: 0.959
Epoch 13 loss: 0.949
Epoch 14 loss: 0.934
Epoch 15 loss: 0.923
Epoch 16 loss: 0.910
Epoch 17 loss: 0.907
Epoch 18 loss: 0.896
Epoch 19 loss: 0.890
Epoch 20 loss: 0.883
Epoch 21 loss: 0.875
Epoch 22 loss: 0.869
Epoch 23 loss: 0.867
Epoch 24 loss: 0.857
Epoch 25 loss: 0.855
Epoch 26 loss: 0.850
Epoch 27 loss: 0.843
Epoch 28 loss: 0.846
Epoch 29 loss: 0.834
Epoch 30 loss: 0.832
Epoch 31 loss: 0.829
Epoch 32 loss: 0.824
Epoch 33 loss: 0.821
Epoch 34 loss: 0.821
Epoch 35 loss: 0.815
Epoch 36 loss: 0.811
Epoch 37 loss: 0.811
Epoch 38 loss: 0.810
Epoch 39 loss: 0.806
Epoch 40 loss: 0.802
Epoch 41 loss: 0.798
Epoch 42 loss: 0.798
Epoch 43 loss: 0.800
Epoch 44 loss: 0.789
Epoch 45 loss: 0.792
Epoch 46 loss: 0.791
Epoch 47 loss: 0.783
Ep

In [28]:
torch.save(model.state_dict(), 'text generator.pth')

In [30]:
geenratory=torch.load('text generator.pth')

In [34]:
def generate_text(geenratory,start,num_words):
    geenratory.eval()
    words=start.split()
    for _ in range(num_words):
        input_seq=torch.LongTensor([word_to_idx[word] for word in words[-SEQUENCE_LENGTH:]]).unsqueeze(0).to(device)
        h,c=geenratory.init_hidden(1)
        output,(h,c)=geenratory(input_seq,(h,c))
        next_token=output.argmax(1)[-1].item()
        words.append(idx_to_Word[next_token])
        
    return " ".join(words)



print('Generated text is: ',generate_text(model,'unless it was all ridges and furrows;',num_words=100))
        
        

Generated text is:  unless it was all ridges and furrows; the balls were live hedgehogs, the mallets live flamingoes, and the soldiers were silent, and looked at her or three soldiers were silent, and looked at each other of the house opened, and a large plate came skimming out, straight at the Footman's head: it just grazed his nose, and broke to pieces against one of the trees behind him. '--or next day, maybe,' the Footman continued in a fight with the middle of one! There ought to have finished,' said the King. 'When did you begin?' The Hatter she felt that she had not like to get a little.


In [35]:
print('Generated text is: ',generate_text(model,'Alice was a',num_words=100))

Generated text is:  Alice was a child,' said to the jury. 'Not yet, not yet!' the Rabbit hastily interrupted. 'There's a great deal to eat or drink under the circumstances. There was a little bottle that stood near the looking-glass. There was no label this time with the words 'DRINK ME,' but nevertheless she uncorked it and put it a little queer, won't you?' 'Not a bit,' said the Caterpillar. 'Well, perhaps your feelings may be different,' said Alice; 'all I know is, you please, sir--' The Rabbit started violently, dropped the white kid gloves and the fan, and skurried away into the darkness as hard


#### **Fine Tuning the Model for better Results**

Changing the Optimizer and Learning Rate

In [50]:
optimizers=optim.Adam(model.parameters(),lr=0.01)


In [51]:
epochss=15
train(model,epochss,dataloader,criterion,optimizers)


Epoch 0 loss: 0.782
Epoch 1 loss: 0.779
Epoch 2 loss: 0.783
Epoch 3 loss: 0.780
Epoch 4 loss: 0.771
Epoch 5 loss: 0.777
Epoch 6 loss: 0.772
Epoch 7 loss: 0.777
Epoch 8 loss: 0.769
Epoch 9 loss: 0.765
Epoch 10 loss: 0.769
Epoch 11 loss: 0.764
Epoch 12 loss: 0.765
Epoch 13 loss: 0.771
Epoch 14 loss: 0.761


In [56]:
print('Generated text is: ',generate_text(model,'On this the White Rabbit',num_words=100))

Generated text is:  On this the White Rabbit gave a little scream of laughter. 'Oh, hush!' the Rabbit whispered in a frightened tone. 'The Queen will hear you! You see, she looked much far about the Dormouse said--' the Hatter said, tossing his head contemptuously. 'I dare say you say even when it's pleased. Now I growl when I'm pleased, and wag my tail when I'm angry. Therefore I'm mad.' 'I call it purring, not growling,' said Alice. 'Call it that stood made out that it had been. But a box of comfits, (luckily the salt water had not feel encouraged to ask any more questions I should


In [59]:
print('Generated text is: ',generate_text(model,'the rabbit',num_words=100))

Generated text is:  the rabbit with either a waistcoat-pocket, or a watch to have no time when he met in the house, "Let us both go and live at the number of executions Alice cautiously bottle was beginning to get to,' said the Cat. 'I don't much care where--' said Alice. 'Then it doesn't matter which way you go,' said the Cat. '--so long as she would gather about her other into a small passage, not much larger than in her and then them free, Exactly as we were. My notion and said severely 'Who is this?' She said the Duchess, 'as pigs have to


In [64]:

print('Generated text is: ',generate_text(model,'can I' ,num_words=100))

Generated text is:  can I shouldn't like THAT!' 'Oh, I wish you could tell you had been looked up, and there stood the same, shedding gallons of tears, until there was no more and seemed every way, and then said the Mouse heard one who you tell of her going, though she looked back once its legs hanging down, but generally, just as she had to kneel down on the floor: in another minute this Alice as she could do, lying down into the darkness as hard as she could guess, she was now about two feet high, and was going to dive in among
