# <span style="color:black;">**Word Prediction using RNN and LSTM**</span>


## <span style="color:black">**Overview**</span>

Recurrent Neural Networks (RNNs) and Long Short-Term Memory (LSTM) networks are powerful tools for sequential data tasks, such as natural language processing. This project demonstrates how to use these models to predict the next word in a sentence, leveraging the ability of RNNs and LSTMs to maintain context over sequences.

# <span style="color:black;">**Features**</span>

- **Implementation of RNN for sentence word prediction**: Build and train a basic RNN model to predict the next word in a sentence.
- **Implementation of LSTM for sentence word prediction**: Enhance the RNN model with LSTM layers to better capture long-term dependencies in the text.
- **Training and evaluation scripts**: Scripts to train the models and evaluate their performance on the test data.
- **Preprocessing and tokenization of text data**: Techniques for preparing the text data, including tokenization and padding to ensure consistent input lengths.
- **Model saving and loading functionality**: Save the trained models to disk and load them for future predictions or further training.


In [34]:
import numpy as np
import tensorflow as tf
from tensorflow. keras.models import Sequential
from tensorflow. keras. layers import Dense, SimpleRNN, LSTM, Embedding
from tensorflow.keras.preprocessing.text import Tokenizer
from tensorflow.keras.preprocessing.sequence import pad_sequences

## **Dataset Overview**

### **Description**

This dataset consists of individual sentences related to various concepts and applications in artificial intelligence (AI), machine learning (ML), and related fields. Each sentence provides a definition, explanation, or concept related to these domains.

### **Features of the Dataset**

1. **Comprehensive Coverage**: The sentences cover a wide range of topics, including different types of algorithms, learning paradigms, specific AI techniques, and their applications.
2. **Educational Use**: It serves as a useful resource for learning and understanding various AI and ML concepts.
3. **Versatile Applications**: The dataset can be applied in various NLP tasks such as text generation, classification, and summarization.

### **Possible Uses**

- **Text Generation**: Train a model to generate sentences or paragraphs on AI topics.
- **Conceptual Understanding**: Use the dataset to build tools or applications that explain AI concepts to users.
- **Content Classification**: Develop classifiers to categorize sentences into different AI-related topics or domains.


In [35]:
sentences = [
    "Machine learning algorithms improve through experience."
    "Neural networks are inspired by biological neural networks."
    "Deep learning is a subset of machine learning."
    "Artificial intelligence aims to create intelligent machines."
    "Supervised learning uses labeled training data."
    "Unsupervised learning finds patterns in unlabeled data."
    "Reinforcement learning learns through interaction with an environment."
    "Natural language processing enables machines to understand human language."
    "Computer vision allows machines to interpret visual information."
    "Convolutional neural networks excel at image recognition tasks."
    "Recurrent neural networks are used for sequential data processing."
    "Support vector machines are effective for classification problems."
    "Decision trees are used for both classification and regression tasks."
    "Random forests combine multiple decision trees for improved accuracy."
    "Gradient boosting is an ensemble learning technique."
    "K-means clustering is an unsupervised learning algorithm."
    "Principal component analysis is used for dimensionality reduction."
    "Genetic algorithms are inspired by natural selection."
    "Artificial neural networks are composed of interconnected nodes."
    "Backpropagation is used to train neural networks."
    "Transfer learning leverages knowledge from pre-trained models."
    "Generative adversarial networks create new data samples."
    "Long short-term memory networks are used for time series analysis."
    "Autoencoders are used for feature learning and dimensionality reduction."
    "Ensemble methods combine multiple models for better predictions."
    "Overfitting occurs when a model performs well on training data but poorly on new data."
    "Cross-validation helps assess a model's performance on unseen data."
    "Hyperparameter tuning optimizes model performance."
    "Feature engineering creates new features from existing data."
    "Data preprocessing is crucial for successful machine learning."
    "Bias-variance tradeoff is a fundamental concept in machine learning."
    "Confusion matrices evaluate classification model performance."
    "ROC curves visualize classifier performance across different thresholds."
    "t-SNE is used for visualizing high-dimensional data."
    "Word embeddings represent words as vectors in a continuous space."
    "Sentiment analysis determines the emotional tone of text."
    "Recommender systems suggest items based on user preferences."
    "Anomaly detection identifies unusual patterns in data."
    "Reinforcement learning agents learn through trial and error."
    "Q-learning is a model-free reinforcement learning algorithm."
    "Markov decision processes model decision-making in uncertain environments."
    "Bayesian networks represent probabilistic relationships among variables."
    "Fuzzy logic allows for reasoning based on 'degrees of truth'."
    "Expert systems emulate human expert decision-making."
    "Knowledge representation is fundamental to artificial intelligence."
    "Heuristic search algorithms find approximate solutions to complex problems."
    "A* search algorithm is used for pathfinding and graph traversal."
    "Minimax algorithm is used in game theory and decision making."
    "Alpha-beta pruning optimizes the minimax algorithm."
    "Monte Carlo tree search is used in game AI."
    "Evolutionary algorithms solve optimization problems inspired by natural evolution."
    "Swarm intelligence algorithms are inspired by collective behavior in nature."
    "Self-organizing maps are used for dimensionality reduction and visualization."
    "Boltzmann machines are stochastic recurrent neural networks."
    "Restricted Boltzmann machines are used for dimensionality reduction and feature learning."
    "Deep belief networks are composed of multiple layers of latent variables."
    "Capsule networks aim to improve upon traditional convolutional neural networks."
    "Attention mechanisms allow models to focus on specific parts of input data."
    "Transformer models have revolutionized natural language processing tasks."
    "BERT is a transformer-based model for natural language understanding."
    "GPT (Generative Pre-trained Transformer) models generate human-like text."
    "Few-shot learning aims to learn from a small number of examples."
    "Zero-shot learning classifies instances of classes not seen during training."
    "Meta-learning involves learning how to learn efficiently."
    "Federated learning allows training models on distributed data sources."
    "Edge AI brings artificial intelligence capabilities to edge devices."
    "Explainable AI aims to make AI systems' decisions interpretable."
    "Adversarial machine learning studies vulnerabilities of AI systems."
    "Quantum machine learning explores quantum computing for AI tasks."
    "Neuromorphic computing aims to mimic biological neural systems."
    "Automated machine learning (AutoML) automates the process of applying machine learning."
    "Ethical AI focuses on developing AI systems that are fair and unbiased."
    "Computer-generated art uses AI to create original artworks."
    "AI-powered robotics combines AI with physical machines."
    "Conversational AI enables natural language interactions with machines."
    "Speech recognition converts spoken language into text."
    "Text-to-speech systems convert written text into spoken words."
    "Object detection identifies and locates objects in images or videos."
    "Semantic segmentation classifies each pixel in an image."
    "Instance segmentation identifies and delineates each object instance."
    "Facial recognition identifies or verifies a person from their face."
    "Emotion recognition detects human emotions from facial expressions or voice."
    "Gesture recognition interprets human gestures via mathematical algorithms."
    "Autonomous vehicles use AI for navigation and decision-making."
    "Predictive maintenance uses AI to predict equipment failures."
    "Fraud detection employs AI to identify fraudulent activities."
    "AI in healthcare assists in diagnosis and treatment planning."
    "Bioinformatics uses AI for analyzing biological data."
    "AI in finance is used for algorithmic trading and risk assessment."
    "Computational creativity explores AI's potential for creative tasks."
    "AI ethics addresses moral and societal implications of AI."
    "Artificial general intelligence aims to match human-level intelligence."
    "Narrow AI specializes in specific tasks."
    "The Turing test assesses a machine's ability to exhibit intelligent behavior."
    "Machine perception deals with how machines understand sensory input."
    "Cognitive computing aims to simulate human thought processes."
    "AI alignment ensures AI systems' goals are aligned with human values."
    "Robotic process automation uses AI to automate repetitive tasks."
    "AI augmentation enhances human intelligence rather than replacing it."
    "The singularity refers to the hypothetical future creation of superintelligent AI."
]

### **Word Tokenization and Sequencing**

The `Tokenizer` is used to convert words in the provided sentences into unique indices, allowing us to map each word to a numerical representation. The `total_words` variable indicates the total number of unique words plus one for padding. The `input_sequences` list is then prepared to store sequences of these word indices for further use in model training or analysis.


In [36]:
#Assign number or index to each word
tokenizer = Tokenizer()
# Fit the tokenizer on the provided sentences
tokenizer.fit_on_texts(sentences)
# Get the total number of unique words (plus one for padding)
total_words = len (tokenizer.word_index) + 1
print (total_words)
#unique words
print (tokenizer.word_index)
# Initialize a list to hold input sequences
input_sequences = []
# Iterate over each sentence

435
{'ai': 1, 'learning': 2, 'to': 3, 'for': 4, 'are': 5, 'is': 6, 'data': 7, 'in': 8, 'and': 9, 'networks': 10, 'of': 11, 'used': 12, 'a': 13, 'machine': 14, 'neural': 15, 'machines': 16, 'human': 17, 'on': 18, 'intelligence': 19, 'tasks': 20, 'decision': 21, 'algorithms': 22, 'aims': 23, 'natural': 24, 'language': 25, 'models': 26, 'model': 27, 'the': 28, 'systems': 29, 'artificial': 30, 'uses': 31, 'with': 32, 'recognition': 33, 'algorithm': 34, 'from': 35, 'text': 36, 'inspired': 37, 'by': 38, 'training': 39, 'an': 40, 'dimensionality': 41, 'reduction': 42, 'performance': 43, 'identifies': 44, 'making': 45, 'through': 46, 'biological': 47, 'create': 48, 'reinforcement': 49, 'processing': 50, 'allows': 51, 'classification': 52, 'problems': 53, 'multiple': 54, 'analysis': 55, 'new': 56, 'feature': 57, 'based': 58, 'detection': 59, 'learn': 60, 'search': 61, 'transformer': 62, 'computing': 63, 'or': 64, 'improve': 65, 'deep': 66, 'intelligent': 67, 'unsupervised': 68, 'patterns': 69, 

### **Generating N-Gram Sequences**

The code iterates over each sentence to convert it into a sequence of integers using the tokenizer. For each sequence, it generates n-gram sequences by creating all possible combinations of words within the sentence. These n-gram sequences are appended to the `input_sequences` list for use in model training.


In [37]:
# Iterate over each sentence
for line in sentences:
  # Convert the sentence to a sequence of integers
  token_list = tokenizer.texts_to_sequences( [line]) [0]
  # print (token_list)
  # Create n-gram sequences (generate possiable combination of words in sentance)
  for i in range(1, len(token_list)):
    n_gram_sequence = token_list[: i+1]
    input_sequences. append (n_gram_sequence)

## **Sequence Padding and Length Determination**

The `max_sequence_len` variable is calculated to determine the length of the longest sequence in the dataset. The `pad_sequences` function then pads all sequences to this maximum length, ensuring uniform input size for the model. The padded sequences are converted into a NumPy array and printed, showing each sequence aligned to the same length with padding.


In [38]:
# Determine the maximum sequence length
max_sequence_len = max([len(x) for x in input_sequences] )
# Pad sequences to ensure they are all the same length
input_sequences = np.array (pad_sequences (input_sequences, maxlen=max_sequence_len, padding='pre'))
#n-gram seq
print (input_sequences)

[[  0   0   0 ...   0  14   2]
 [  0   0   0 ...  14   2  22]
 [  0   0   0 ...   2  22  65]
 ...
 [  0   0  14 ... 432 433  11]
 [  0  14   2 ... 433  11 434]
 [ 14   2  22 ...  11 434   1]]


## **Preparing Input and Output Data**

The code separates the sequences into input (`X`) and output (`y`) components. `X` contains all elements of the sequences except the last one, used as input data for the model. `y` consists of the last element of each sequence, which represents the target output. Both `X` and `y` are then prepared for model training.


In [39]:
# all elements of seq except the last one
X = input_sequences [:, :-1]
print ("Input Data",X)
#the last element of each seq
y = input_sequences [:, -1]
#print ("Output Data",y)

Input Data [[  0   0   0 ...   0   0  14]
 [  0   0   0 ...   0  14   2]
 [  0   0   0 ...  14   2  22]
 ...
 [  0   0  14 ... 431 432 433]
 [  0  14   2 ... 432 433  11]
 [ 14   2  22 ... 433  11 434]]


### **One-Hot Encoding**

The `to_categorical` function from TensorFlow is used to convert the output labels (`y`) into one-hot encoded vectors. This transforms the numerical labels into a binary matrix representation, where each label is represented as a vector with a single high value (1) corresponding to the class, and all other values are zero. This encoding is done for `total_words` classes, which corresponds to the number of unique words in the dataset.


In [40]:
#one hot encoding,predict word out of num clasess= unqiue words 435
y = tf.keras.utils.to_categorical(y, num_classes=total_words)

### **Define the RNN Model**

The model is a Sequential network where layers are stacked linearly. 

1. **Embedding Layer**: Converts word indices into dense vectors with a dimensionality of 10. The input dimension is the total number of unique words, and the input length is `max_sequence_len - 1`. This layer transforms each word index into a 10-dimensional vector.

2. **RNN Layer**: A `SimpleRNN` layer with 30 units processes the sequences and captures temporal dependencies.

3. **Dense Output Layer**: A `Dense` layer with `total_words` units and a softmax activation function is used for multi-class classification. This layer outputs probabilities for each word in the vocabulary.


In [41]:
#Define the RNN model
#Sequential model allows stacking layers in a linear fashion
model = Sequential([
  # Embedding layer to convert word indices to dense vectors o
  # Input dimension: total number of words, Output dimension:
  # Input length: length of input sequences (excluding the las
  # 10 means, 10 diamention vector, feature save in the 10 diamention vector, if you have 1000 sentences the resultant vector size 1000*10
  #max_sequence_len, if I have sentance of 5 words than I will give 4 words as input, 5th will be pridict by model
  Embedding (total_words, 10, input_length=max_sequence_len-1),
  #RNN 30 seq of cells
  SimpleRNN(30),
  # Dense output layer with a softmax activation function
  # Output dimension: total number of words (for multi-class
  # softmax is the last layer
  Dense(total_words, activation='softmax')
  ])




In [42]:
model.compile( loss='categorical_crossentropy',optimizer='adam', metrics=['accuracy'])

In [43]:
model.fit(X, y, epochs=50, verbose=1)

Epoch 1/50
[1m28/28[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m6s[0m 112ms/step - accuracy: 0.0037 - loss: 6.0738
Epoch 2/50
[1m28/28[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m3s[0m 111ms/step - accuracy: 0.0124 - loss: 6.0439
Epoch 3/50
[1m28/28[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m3s[0m 110ms/step - accuracy: 0.0384 - loss: 6.0040
Epoch 4/50
[1m28/28[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m5s[0m 114ms/step - accuracy: 0.0430 - loss: 5.9080
Epoch 5/50
[1m28/28[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m3s[0m 116ms/step - accuracy: 0.0332 - loss: 5.7642
Epoch 6/50
[1m28/28[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m3s[0m 115ms/step - accuracy: 0.0250 - loss: 5.6476
Epoch 7/50
[1m28/28[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m4s[0m 129ms/step - accuracy: 0.0208 - loss: 5.5763
Epoch 8/50
[1m28/28[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m3s[0m 116ms/step - accuracy: 0.0306 - loss: 5.5946
Epoch 9/50
[1m28/28[0m [32m━━━━━━━━━━

<keras.src.callbacks.history.History at 0x2b32961fd70>

### **Function to Predict the Next Word(s)**

The `predict_next_word` function generates text based on a seed text input.

1. **Convert to Sequence**: The seed text is converted into a sequence of integers using the tokenizer and padded to the required input length.

2. **Predict Next Word**: The model predicts the probabilities for the next word, and the word with the highest probability is selected.

3. **Update Seed Text**: The predicted word is appended to the seed text, and the process can be repeated for multiple words as specified.


In [47]:
# Function to predict the next word(s) given a seed text
def predict_next_word(seed_text, next_words=1):
  for _ in range(next_words):
    # Convert the seed text to a sequence of integers
    token_list = tokenizer.texts_to_sequences([seed_text])[0]
    # Pad the sequence to match the input length required
    token_list = pad_sequences([token_list], maxlen=max_sequence_len-1, padding='pre')
    # Predict the probabilities of the next word in the se
    predicted = model.predict(token_list, verbose=0)
    # Get the index of the word with the highest probability
    predicted_word_index = np.argmax(predicted, axis=-1)[0]
    predicted_word = tokenizer.index_word[predicted_word_index]
    # Append the predicted word to the seed text
    seed_text += " " + predicted_word
    # Get the word corresponding to the predicted index
  return seed_text

### **Predict and Print Next Word**

To predict and print the next word(s) given a seed text, call the `predict_next_word` function with the desired seed text and number of words to predict. For example:



In [49]:
print(predict_next_word("machine learning ai to"))
     

machine learning ai to language


### **Define the LSTM Model**

The model is a Sequential network with layers stacked linearly:

1. **Embedding Layer**: Converts word indices into dense vectors with a dimensionality of 10. The input dimension is the total number of unique words, and the input length is `max_sequence_len - 1`. This layer transforms each word index into a 10-dimensional vector.

2. **LSTM Layer**: An `LSTM` layer with 100 units processes the sequences, capturing long-term dependencies and patterns.

3. **Dense Output Layer**: A `Dense` layer with `total_words` units and a softmax activation function is used for multi-class classification, outputting probabilities for each word in the vocabulary.


In [50]:
#Define the LSTM model
#Sequential model allows stacking layers in a linear fashion
model = Sequential([
  # Embedding layer to convert word indices to dense vectors o
  # Input dimension: total number of words, Output dimension:
  # Input length: length of input sequences (excluding the las
  # 10 means, 10 diamention vector, feature save in the 10 diamention vector, if you have 1000 sentences the resultant vector size 1000*10
  #max_sequence_len, if I have sentance of 5 words than I will give 4 words as input, 5th will be pridict by model
  Embedding (total_words, 10, input_length=max_sequence_len-1),
  #LSTM 100 seq of cells
  LSTM(100),
  # Dense output layer with a softmax activation function
  # Output dimension: total number of words (for multi-class
  # softmax is the last layer
  Dense(total_words, activation='softmax')
  ])

In [51]:
model.compile( loss='categorical_crossentropy',optimizer='adam', metrics=['accuracy'])

In [52]:
model.fit(X, y, epochs=50, verbose=1)

Epoch 1/50
[1m28/28[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m11s[0m 312ms/step - accuracy: 0.0133 - loss: 6.0606
Epoch 2/50
[1m28/28[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m9s[0m 331ms/step - accuracy: 0.0349 - loss: 5.7012
Epoch 3/50
[1m28/28[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m9s[0m 338ms/step - accuracy: 0.0420 - loss: 5.6252
Epoch 4/50
[1m28/28[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m10s[0m 360ms/step - accuracy: 0.0324 - loss: 5.5556
Epoch 5/50
[1m28/28[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m10s[0m 358ms/step - accuracy: 0.0288 - loss: 5.5348
Epoch 6/50
[1m28/28[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m10s[0m 352ms/step - accuracy: 0.0397 - loss: 5.5580
Epoch 7/50
[1m28/28[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m11s[0m 376ms/step - accuracy: 0.0334 - loss: 5.5475
Epoch 8/50
[1m28/28[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m10s[0m 360ms/step - accuracy: 0.0273 - loss: 5.5597
Epoch 9/50
[1m28/28[0m [32m━━━━

<keras.src.callbacks.history.History at 0x2b325643a10>

### **Function to Predict the Next Word(s)**

The `predict_next_word` function generates text based on a seed text input. It can predict one or multiple words, depending on the `next_words` parameter.

1. **Convert to Sequence**: The seed text is converted into a sequence of integers using the tokenizer and padded to the required length.

2. **Predict Next Word**: The model predicts the probabilities for the next word. The word with the highest probability is selected and appended to the seed text.

3. **Repeat for Multiple Words**: The process repeats for the number of words specified by `next_words`, generating and appending each predicted word to the seed text.


In [53]:
# Function to predict the next word(s) given a seed text
# next_words mean predict 1 one word, if set next_words=2 it means predict two words
def predict_next_word(seed_text, next_words=2):
    for _ in range(next_words):
      # Convert the seed text to a sequence of integers
      token_list = tokenizer.texts_to_sequences([seed_text])[0]
      # Pad the sequence to match the input length required
      token_list = pad_sequences([token_list], maxlen=max_sequence_len-1, padding='pre')
      # Predict the probabilities of the next word in the se
      predicted = model.predict(token_list, verbose=0)
      # Get the index of the word with the highest probability
      predicted_word_index = np.argmax(predicted, axis=-1)[0]
      predicted_word = tokenizer.index_word[predicted_word_index]
      # Append the predicted word to the seed text
      seed_text += " " + predicted_word
      # Get the word corresponding to the predicted index
    return seed_text
     

### **Predict and Print Next Word**

To predict and print the next word given the seed text "Artificial Intelligence" with `next_words=1`, use the following code:




In [66]:
print(predict_next_word("Artifitial Intellegence ",1))

Artifitial Intellegence  learning


### **Conclusion**

This project explored the use of Recurrent Neural Networks (RNNs) and Long Short-Term Memory (LSTM) networks for word prediction tasks. Both RNN and LSTM models were implemented to predict the next word(s) given a seed text.

### **Key Steps:**
- **Data Preparation**: Tokenized sentences, padded sequences, and encoded outputs for model training.
- **Model Definition**: Constructed RNN and LSTM models to process sequential data and generate predictions.
- **Prediction**: Used models to extend seed text with predicted words.

### **Evaluation:**
- **Model Performance**: The effectiveness of the RNN and LSTM models was evaluated based on their ability to predict meaningful and contextually appropriate words. 
- **Accuracy**: Measures such as prediction accuracy and loss were used to assess model performance during training.
- **Output Quality**: Predictions were qualitatively assessed by examining the coherence and relevance of the generated text.

Overall, the RNN and LSTM models demonstrated their capability to handle sequential data and provide contextually relevant predictions. Future work could involve fine-tuning hyperparameters, exploring more complex architectures, or expanding the dataset to further enhance model performance.
