In [10]:
text="""
Identifying the most likely word to follow a given string of words is the basic goal of the Natural Language Processing (NLP) task of “next word prediction.” This predictive skill is essential in various applications, including text auto-completion, speech recognition, and machine translation. Deep learning approaches have transformed NLP by attaining remarkable success in various language-related tasks, such as next-word prediction.

Deep learning models are excellent at identifying complex dependencies and patterns in sequential data, which makes them suitable for challenges requiring the prediction of the next word. These models may successfully describe the context and make precise predictions by utilizing recurrent neural networks (RNNs) and their variations, such as Long Short-Term Memory (LSTM) and Gated Recurrent Unit (GRU).

There are many benefits to being able to correctly predict the following word in a given situation. Offering relevant and coherent word suggestions improves the user experience in auto-completion systems.

Model Architecture
The model architecture is a critical component in building an effective next-word prediction system using deep learning in NLP.
One common approach is to utilize recurrent neural networks (RNNs) or their variants, such as Long Short-Term Memory (LSTM) or Gated Recurrent Unit (GRU). These architectures are specifically designed to capture sequential dependencies in the input text, enabling accurate predictions of the next word.
Recurrent Neural Networks (RNNs): Recurrent Neural Networks are a class of neural networks that can process sequential data by maintaining hidden states that capture the context and information from previous inputs. RNNs have loops within their architecture that allow them to store and propagate information through time.
Long Short-Term Memory (LSTM): LSTM is a specialized variant of RNNs that overcomes the limitations of traditional RNNs, such as the vanishing gradient problem. LSTM introduces a memory cell that allows the network to selectively store and retrieve information over long sequences, making it particularly effective in modeling long-term dependencies.
Gated Recurrent Unit (GRU): GRU is another variant of RNNs that simplifies the architecture of LSTM by merging the cell state and hidden state into a single state vector. GRU also introduces gating mechanisms to control the flow of information, making it computationally efficient while still capturing long-term dependencies.
Model Architecture for Next Word Prediction
The model architecture for next-word prediction typically consists of the following components:
Embedding Layer: The word embeddings, often referred to as distributed representations of words, are learned by the embedding layer. It captures semantic and contextual information by mapping every word in the lexicon to a dense vector representation. The model can be trained to change these embeddings as trainable parameters.
Recurrent Layers: The recurrent layers, like LSTM or GRU, process the input word embeddings in sequential order and keep hidden states that record the sequential data. The model can learn the contextual connections between words and their placements in the sequence thanks to these layers.
Dense Layers: One or more dense layers are then added after the recurrent layers to convert the learned features into the appropriate output format. When predicting the next word, the dense layers translate the hidden representations to a probability distribution across the vocabulary, indicating the likelihood that each word will be the following word.
Training and Optimization
The model is trained using a large corpus of text data, where the input sequences are paired with their corresponding target word. The training process involves optimizing the model’s parameters by minimizing a suitable loss function, such as categorical cross-entropy. The optimization is typically performed using an optimization algorithm like Adam or Stochastic Gradient Descent (SGD).
Inference and Prediction
The model can be used to predict the next word once it has been trained. The trained model receives an input of a list of words, processes it through the learned architecture, and outputs a probability distribution across the vocabulary. The anticipated next word is then chosen as the one with the highest likelihood.
We can create reliable and accurate next-word prediction models for NLP by combining the strength of recurrent neural networks, such as LSTM or GRU, with the right training and optimization methods. The model design successfully captures the text’s sequential dependencies, enabling the system to produce fluent and contextually relevant predictions.
"""

In [11]:
import tensorflow as tf
from tensorflow.keras.preprocessing.text import Tokenizer

In [12]:
tokenizer=Tokenizer()

In [13]:
tokenizer.fit_on_texts([text])

In [14]:
unique_words=len(tokenizer.word_index)
unique_words

289

In [15]:
tokenizer.word_index

{'the': 1,
 'word': 2,
 'and': 3,
 'of': 4,
 'to': 5,
 'a': 6,
 'in': 7,
 'recurrent': 8,
 'next': 9,
 'model': 10,
 'is': 11,
 'as': 12,
 'prediction': 13,
 'by': 14,
 'lstm': 15,
 'that': 16,
 'are': 17,
 'rnns': 18,
 'gru': 19,
 'architecture': 20,
 'layers': 21,
 'such': 22,
 'sequential': 23,
 'neural': 24,
 'networks': 25,
 'long': 26,
 'or': 27,
 'dependencies': 28,
 'their': 29,
 'term': 30,
 'can': 31,
 'information': 32,
 'it': 33,
 'words': 34,
 'nlp': 35,
 'data': 36,
 'for': 37,
 'these': 38,
 'memory': 39,
 'input': 40,
 'hidden': 41,
 'dense': 42,
 'trained': 43,
 'optimization': 44,
 'text': 45,
 'deep': 46,
 'learning': 47,
 'models': 48,
 'predictions': 49,
 'short': 50,
 'gated': 51,
 'unit': 52,
 'following': 53,
 'an': 54,
 'using': 55,
 'one': 56,
 'process': 57,
 'state': 58,
 'embeddings': 59,
 'learned': 60,
 'be': 61,
 'training': 62,
 'with': 63,
 'identifying': 64,
 'given': 65,
 'language': 66,
 'various': 67,
 'auto': 68,
 'completion': 69,
 'have': 70,
 '

In [16]:
input_seqs=[]
for sentence in text.split('\n'):
    tokenize_sentence=tokenizer.texts_to_sequences([sentence])[0]

    for i in range(1,len(tokenize_sentence)):
        input_seqs.append(tokenize_sentence[:i+1])

In [17]:
tokenize_sentence

[]

In [18]:
max_len=max([len(x) for x in input_seqs])
max_len

65

In [19]:
from tensorflow.keras.preprocessing.sequence import pad_sequences

padded_input_seqs=pad_sequences(input_seqs, maxlen=max_len, padding='pre')

In [20]:
x=padded_input_seqs[:, :-1]
y=padded_input_seqs[:, -1]

In [21]:
y.shape[0]

687

In [22]:
x.shape, y.shape

((687, 64), (687,))

In [23]:
from tensorflow.keras.utils import to_categorical

y=to_categorical(y, num_classes=unique_words+1)

In [24]:
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Embedding, LSTM, Dense

In [25]:
model=Sequential()

model.add(Embedding(unique_words+1, 100, input_length=x.shape[1]))
model.add(LSTM(150))
model.add(Dense(unique_words+1, activation='softmax'))




In [26]:
 model.compile(
     loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy']
 )




In [27]:
x.shape

(687, 64)

In [28]:
y.shape

(687, 290)

In [29]:
model.summary()

Model: "sequential"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
 embedding (Embedding)       (None, 64, 100)           29000     
                                                                 
 lstm (LSTM)                 (None, 150)               150600    
                                                                 
 dense (Dense)               (None, 290)               43790     
                                                                 
Total params: 223390 (872.62 KB)
Trainable params: 223390 (872.62 KB)
Non-trainable params: 0 (0.00 Byte)
_________________________________________________________________


In [33]:
model.fit(x,y,epochs=50)

Epoch 1/50
Epoch 2/50
Epoch 3/50
Epoch 4/50
Epoch 5/50
Epoch 6/50
Epoch 7/50
Epoch 8/50
Epoch 9/50
Epoch 10/50
Epoch 11/50
Epoch 12/50
Epoch 13/50
Epoch 14/50
Epoch 15/50
Epoch 16/50
Epoch 17/50
Epoch 18/50
Epoch 19/50
Epoch 20/50
Epoch 21/50
Epoch 22/50
Epoch 23/50
Epoch 24/50
Epoch 25/50
Epoch 26/50
Epoch 27/50
Epoch 28/50
Epoch 29/50
Epoch 30/50
Epoch 31/50
Epoch 32/50
Epoch 33/50
Epoch 34/50
Epoch 35/50
Epoch 36/50
Epoch 37/50
Epoch 38/50
Epoch 39/50
Epoch 40/50
Epoch 41/50
Epoch 42/50
Epoch 43/50
Epoch 44/50
Epoch 45/50
Epoch 46/50
Epoch 47/50
Epoch 48/50
Epoch 49/50
Epoch 50/50


<keras.src.callbacks.History at 0x179adb6db50>

In [34]:
import numpy as np
text="Identifying"
for i in range(5):
    tokenized=tokenizer.texts_to_sequences([text])[0]
    

    padded=pad_sequences([tokenized], maxlen=max_len-1, padding='pre')
    
    pos=np.argmax(model.predict(padded))

    for word, index in tokenizer.word_index.items():
        if index==pos:
            text=text +" "+ word
            print(text)
        
        

Identifying the
Identifying the most
Identifying the most likely
Identifying the most likely word
Identifying the most likely word to
