<a href="https://colab.research.google.com/github/alikaiser12/AI/blob/main/LSTM.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Predicting Next Word in a Sentence with LSTM


In this example, we will train an LSTM model to predict the next word in a sentence. Let's take a simple dataset of sentences and train the LSTM model to predict the next word given the current sequence of words.

# Step 1: Import the Required Libraries
We will use the Keras API with TensorFlow to build our LSTM model.

In [1]:
import numpy as np
import tensorflow as tf
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import LSTM, Dense
from tensorflow.keras.preprocessing.sequence import pad_sequences
from sklearn.model_selection import train_test_split


Numpy is used for numerical operations (arrays and matrices).

TensorFlow and Keras are used to build and train the LSTM model.

pad_sequences is used to ensure all input sequences have the same length.

train_test_split will split the data into training and testing sets.

# Step 2: Prepare the Dataset
We will create a simple dataset of sentences, each consisting of words. Our task is to predict the next word in the sequence.

In [2]:
# Example sentences
sentences = ['i love machine learning', 'i love deep learning', 'deep learning is amazing']

# Create a dictionary of unique words (vocabulary)
vocab = set(' '.join(sentences).split())  # Extract unique words
word_to_index = {word: i+1 for i, word in enumerate(vocab)}  # Map words to integers
index_to_word = {i: word for word, i in word_to_index.items()}  # Reverse mapping

# Convert sentences into sequences of integers
sequences = [[word_to_index[word] for word in sentence.split()] for sentence in sentences]

# Prepare the input and output sequences for next word prediction
X_list = []
y_list = []
for seq in sequences:
    for i in range(1, len(seq)):
        # Input is the sequence up to the current word
        X_list.append(seq[:i])
        # Output is the next word
        y_list.append(seq[i])

# Pad the input sequences and convert to numpy array
X = pad_sequences(X_list, padding='pre')

# Reshape X to (samples, timesteps, features). We have 1 feature per word.
X = X.reshape((X.shape[0], X.shape[1], 1))

# Convert y to a numpy array
y = np.array(y_list)

Vocabulary: We create a set of unique words from the sentences.

Mapping Words to Integers: Each word is mapped to a unique integer, allowing the model to process the text.

Input and Output Sequences: We create input sequences (X) and output sequences (y) where the output is the next word in the sequence.

# Step 3: Padding the Sequences
LSTM models require the input data to have the same length. We use padding to ensure all sequences are of equal length.

In [3]:
# This cell is no longer needed as padding is done in the previous cell.

pad_sequences ensures that all sequences are padded to the same length. It adds zeros at the beginning of shorter sequences (pre-padding).

# Step 4: Reshaping the Input Data
Before feeding the data into the LSTM model, we need to reshape it into the format (samples, timesteps, features).

In [4]:
# This cell is no longer needed as reshaping is done in the previous cell.

Reshape: The LSTM expects data in the shape (samples, timesteps, features). In this case, we have 1 feature per word.

# Step 5: Build the LSTM Model
Now we can create the LSTM model using Keras. Here's a simple model with one LSTM layer and one Dense output layer.

In [5]:
model = Sequential()

# Add an LSTM layer with 50 units (neurons) and the input shape of the sequence length and number of features
model.add(LSTM(50, input_shape=(X.shape[1], 1), activation='relu'))

# Add a Dense layer to predict the next word (the output of the LSTM)
model.add(Dense(len(vocab), activation='softmax'))  # Number of words in the vocab as the output size

# Compile the model with categorical crossentropy loss and Adam optimizer
model.compile(loss='sparse_categorical_crossentropy', optimizer='adam', metrics=['accuracy'])


  super().__init__(**kwargs)


LSTM Layer: The LSTM layer processes sequential data. We specify the number of units (neurons) as 50 and the input shape as (sequence_length, 1) because each word is a single feature.

Dense Layer: The Dense layer produces the output prediction, which is a probability distribution over the vocabulary.

Compile: We use sparse_categorical_crossentropy loss for multi-class classification (predicting the next word) and Adam optimizer for training.

# Step 6: Train the LSTM Model
Now, let's train the LSTM model on our data.

In [6]:
print("Shape of X:", X.shape)
print("Shape of y:", y.shape)

# Increase batch size to potentially avoid low-level TensorFlow issues
model.fit(X, y, epochs=100, batch_size=2) # Increased batch size

Shape of X: (9, 3, 1)
Shape of y: (9,)
Epoch 1/100


InvalidArgumentError: Graph execution error:

Detected at node compile_loss/sparse_categorical_crossentropy/SparseSoftmaxCrossEntropyWithLogits/SparseSoftmaxCrossEntropyWithLogits defined at (most recent call last):
  File "<frozen runpy>", line 198, in _run_module_as_main

  File "<frozen runpy>", line 88, in _run_code

  File "/usr/local/lib/python3.11/dist-packages/colab_kernel_launcher.py", line 37, in <module>

  File "/usr/local/lib/python3.11/dist-packages/traitlets/config/application.py", line 992, in launch_instance

  File "/usr/local/lib/python3.11/dist-packages/ipykernel/kernelapp.py", line 712, in start

  File "/usr/local/lib/python3.11/dist-packages/tornado/platform/asyncio.py", line 205, in start

  File "/usr/lib/python3.11/asyncio/base_events.py", line 608, in run_forever

  File "/usr/lib/python3.11/asyncio/base_events.py", line 1936, in _run_once

  File "/usr/lib/python3.11/asyncio/events.py", line 84, in _run

  File "/usr/local/lib/python3.11/dist-packages/ipykernel/kernelbase.py", line 510, in dispatch_queue

  File "/usr/local/lib/python3.11/dist-packages/ipykernel/kernelbase.py", line 499, in process_one

  File "/usr/local/lib/python3.11/dist-packages/ipykernel/kernelbase.py", line 406, in dispatch_shell

  File "/usr/local/lib/python3.11/dist-packages/ipykernel/kernelbase.py", line 730, in execute_request

  File "/usr/local/lib/python3.11/dist-packages/ipykernel/ipkernel.py", line 383, in do_execute

  File "/usr/local/lib/python3.11/dist-packages/ipykernel/zmqshell.py", line 528, in run_cell

  File "/usr/local/lib/python3.11/dist-packages/IPython/core/interactiveshell.py", line 2975, in run_cell

  File "/usr/local/lib/python3.11/dist-packages/IPython/core/interactiveshell.py", line 3030, in _run_cell

  File "/usr/local/lib/python3.11/dist-packages/IPython/core/async_helpers.py", line 78, in _pseudo_sync_runner

  File "/usr/local/lib/python3.11/dist-packages/IPython/core/interactiveshell.py", line 3257, in run_cell_async

  File "/usr/local/lib/python3.11/dist-packages/IPython/core/interactiveshell.py", line 3473, in run_ast_nodes

  File "/usr/local/lib/python3.11/dist-packages/IPython/core/interactiveshell.py", line 3553, in run_code

  File "/tmp/ipython-input-6-736608511.py", line 5, in <cell line: 0>

  File "/usr/local/lib/python3.11/dist-packages/keras/src/utils/traceback_utils.py", line 117, in error_handler

  File "/usr/local/lib/python3.11/dist-packages/keras/src/backend/tensorflow/trainer.py", line 371, in fit

  File "/usr/local/lib/python3.11/dist-packages/keras/src/backend/tensorflow/trainer.py", line 219, in function

  File "/usr/local/lib/python3.11/dist-packages/keras/src/backend/tensorflow/trainer.py", line 132, in multi_step_on_iterator

  File "/usr/local/lib/python3.11/dist-packages/keras/src/backend/tensorflow/trainer.py", line 113, in one_step_on_data

  File "/usr/local/lib/python3.11/dist-packages/keras/src/backend/tensorflow/trainer.py", line 60, in train_step

  File "/usr/local/lib/python3.11/dist-packages/keras/src/trainers/trainer.py", line 383, in _compute_loss

  File "/usr/local/lib/python3.11/dist-packages/keras/src/trainers/trainer.py", line 351, in compute_loss

  File "/usr/local/lib/python3.11/dist-packages/keras/src/trainers/compile_utils.py", line 691, in __call__

  File "/usr/local/lib/python3.11/dist-packages/keras/src/trainers/compile_utils.py", line 700, in call

  File "/usr/local/lib/python3.11/dist-packages/keras/src/losses/loss.py", line 67, in __call__

  File "/usr/local/lib/python3.11/dist-packages/keras/src/losses/losses.py", line 33, in call

  File "/usr/local/lib/python3.11/dist-packages/keras/src/losses/losses.py", line 2246, in sparse_categorical_crossentropy

  File "/usr/local/lib/python3.11/dist-packages/keras/src/ops/nn.py", line 1963, in sparse_categorical_crossentropy

  File "/usr/local/lib/python3.11/dist-packages/keras/src/backend/tensorflow/nn.py", line 744, in sparse_categorical_crossentropy

Received a label value of 7 which is outside the valid range of [0, 7).  Label values: 2 7
	 [[{{node compile_loss/sparse_categorical_crossentropy/SparseSoftmaxCrossEntropyWithLogits/SparseSoftmaxCrossEntropyWithLogits}}]] [Op:__inference_multi_step_on_iterator_2094]

Training: We use the fit() method to train the model. We train for 100 epochs with a batch size of 1 (since we have a small dataset).

# Step 7: Predict the Next Word
Once the model is trained, we can use it to predict the next word in a sentence.

In [7]:
test_sentence = 'i love'
test_sequence = [word_to_index[word] for word in test_sentence.split()]
test_sequence = pad_sequences([test_sequence], padding='pre', maxlen=X.shape[1])
test_sequence = test_sequence.reshape((test_sequence.shape[0], test_sequence.shape[1], 1))

predicted_index = model.predict(test_sequence)
predicted_word = index_to_word[np.argmax(predicted_index)]
print(f"The next word after '{test_sentence}' is: {predicted_word}")


[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 220ms/step
The next word after 'i love' is: love


Prediction: We convert the test sentence into a sequence of integers, pad it, and reshape it to match the input format expected by the LSTM. We then use the model to predict the next word by selecting the word with the highest probability (argmax).

# Detailed Explanation of the Code:
Data Preprocessing: We convert sentences into sequences of integers. Each word is mapped to a unique integer.

Padding: We pad the sequences to ensure that they all have the same length. This is necessary for RNNs and LSTMs.

Reshaping: The LSTM expects data to have a specific shape: (samples, timesteps, features).

Model Creation: We create an LSTM model with one LSTM layer and one Dense output layer. The LSTM layer learns patterns from the sequential data.

Training: We train the model on our input and output data, allowing it to learn the relationships between the words.

Prediction: Once trained, the model can predict the next word given an input sequence.

# Summary:
LSTMs are designed to handle sequential data and can remember long-term dependencies.

In this example, we used an LSTM to predict the next word in a sentence.

LSTMs are particularly useful for tasks like text generation, language translation, and speech recognition.

In [11]:
import numpy as np
import tensorflow as tf
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import LSTM, Dense
from tensorflow.keras.preprocessing.sequence import pad_sequences
from sklearn.model_selection import train_test_split

# Example sentences
sentences = ['i love machine learning', 'i love deep learning', 'deep learning is amazing']

# Create a dictionary of unique words (vocabulary)
vocab = sorted(list(set(' '.join(sentences).split()))) # Extract unique words and sort for consistent indexing
word_to_index = {word: i for i, word in enumerate(vocab)}  # Map words to integers, starting from 0
index_to_word = {i: word for word, i in word_to_index.items()}  # Reverse mapping

# Convert sentences into sequences of integers
sequences = [[word_to_index[word] for word in sentence.split()] for sentence in sentences]

# Prepare the input and output sequences for next word prediction
X_list = []
y_list = []
for seq in sequences:
    for i in range(1, len(seq)):
        # Input is the sequence up to the current word
        X_list.append(seq[:i])
        # Output is the next word
        y_list.append(seq[i])

# Pad the input sequences and convert to numpy array
X = pad_sequences(X_list, padding='pre')

# Reshape X to (samples, timesteps, features). We have 1 feature per word.
X = X.reshape((X.shape[0], X.shape[1], 1))

# Convert y to a numpy array (output labels)
y = np.array(y_list)

# Print debug information
print("Vocabulary:", vocab)
print("Word to index mapping:", word_to_index)
print("Unique values in y:", np.unique(y))
print("Shape of X:", X.shape)
print("Shape of y:", y.shape)


# Build the LSTM model
model = Sequential()

# Add an LSTM layer with 50 units (neurons) and the input shape of (sequence_length, 1)
model.add(LSTM(50, input_shape=(X.shape[1], 1), activation='relu'))

# Add a Dense layer to predict the next word (the output of the LSTM)
model.add(Dense(len(vocab), activation='softmax'))  # Number of words in the vocab as the output size

# Compile the model with sparse categorical crossentropy loss and Adam optimizer
model.compile(loss='sparse_categorical_crossentropy', optimizer='adam', metrics=['accuracy'])

# Train the model on the data
model.fit(X, y, epochs=100, batch_size=1)

# Make a prediction for the next word after a given sentence
test_sentence = 'i love'
test_sequence = [word_to_index[word] for word in test_sentence.split()]
test_sequence = pad_sequences([test_sequence], padding='pre', maxlen=X.shape[1])
test_sequence = test_sequence.reshape((test_sequence.shape[0], test_sequence.shape[1], 1))

predicted_index = model.predict(test_sequence)
predicted_word = index_to_word[np.argmax(predicted_index)]
print(f"The next word after '{test_sentence}' is: {predicted_word}")

Vocabulary: ['amazing', 'deep', 'i', 'is', 'learning', 'love', 'machine']
Word to index mapping: {'amazing': 0, 'deep': 1, 'i': 2, 'is': 3, 'learning': 4, 'love': 5, 'machine': 6}
Unique values in y: [0 1 3 4 5 6]
Shape of X: (9, 3, 1)
Shape of y: (9,)
Epoch 1/100


  super().__init__(**kwargs)


[1m9/9[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m2s[0m 7ms/step - accuracy: 0.1107 - loss: 1.9186    
Epoch 2/100
[1m9/9[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 6ms/step - accuracy: 0.6320 - loss: 1.8891 
Epoch 3/100
[1m9/9[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 5ms/step - accuracy: 0.3287 - loss: 1.8698     
Epoch 4/100
[1m9/9[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 5ms/step - accuracy: 0.3019 - loss: 1.8956     
Epoch 5/100
[1m9/9[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 6ms/step - accuracy: 0.2152 - loss: 1.9530     
Epoch 6/100
[1m9/9[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 6ms/step - accuracy: 0.6320 - loss: 1.7733 
Epoch 7/100
[1m9/9[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 7ms/step - accuracy: 0.1060 - loss: 1.9659     
Epoch 8/100
[1m9/9[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 6ms/step - accuracy: 0.5987 - loss: 1.7132 
Epoch 9/100
[1m9/9[0m [32m━━━━━━━━━━━━━━━━━━━━