<a href="https://colab.research.google.com/github/alikaiser12/AI/blob/main/RNN.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# **Predicting Next Word in a Sentence with an RNN**


Let's say you want to predict the next word in a sentence. For simplicity, we'll use a small dataset with just a few sentences.

# Step 1: Import the Required Libraries

In [1]:
import numpy as np
import tensorflow as tf
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import SimpleRNN, Dense
from tensorflow.keras.preprocessing.sequence import pad_sequences


Numpy is used for numerical operations (arrays and matrices).

TensorFlow is a library that helps with building neural networks. Keras is a part of TensorFlow, and it's used to build models easily.

pad_sequences will help us ensure all input sequences have the same length, which is required by RNNs.

## Step 2: Prepare the Data

We'll use a simple dataset of sentences. In this example, we'll pretend we're training an RNN to predict the next word in a sentence.

In [2]:
# Example sentences
sentences = ['i love machine learning', 'i love deep learning', 'deep learning is amazing']

# Create a dictionary of unique words (vocabulary)
vocab = set(' '.join(sentences).split())  # Extract unique words
word_to_index = {word: i+1 for i, word in enumerate(vocab)}  # Map words to integers
index_to_word = {i: word for word, i in word_to_index.items()}  # Reverse mapping

# Convert sentences into sequences of integers
sequences = [[word_to_index[word] for word in sentence.split()] for sentence in sentences]

# Prepare the input and output sequences
X = [seq[:-1] for seq in sequences]  # All words except the last one
y = [seq[1:] for seq in sequences]  # All words except the first one (target word)


# Vocabulary: We create a set of unique words from all the sentences.

Mapping Words to Integers: We map each word to a unique integer for the model to process.

Input and Output Sequences: We split the sentences into input (X) and output (y). For example, for the sentence "i love machine learning", the input would be ["i", "love", "machine"] and the output would be ["love", "machine", "learning"].

# Step 3: Padding Sequences
RNNs expect input sequences of the same length. We use padding to make sure all sequences have the same length.

In [3]:
X = pad_sequences(X, padding='pre')
y = pad_sequences(y, padding='pre')


pad_sequences ensures that all sequences have the same length. It adds zeros at the beginning (pre-padding) of shorter sequences.

# Step 4: Build the RNN Model
Now, let's build a simple RNN model to learn the patterns in our data.

In [4]:
model = Sequential()

# Add an RNN layer with 50 units (neurons) and the input shape of the sequence length and number of features
model.add(SimpleRNN(50, input_shape=(X.shape[1], 1), activation='relu'))

# Add a Dense layer to predict the next word (the output of the RNN)
model.add(Dense(len(vocab), activation='softmax'))  # Number of words in the vocab as the output size

# Compile the model with categorical crossentropy loss and Adam optimizer
model.compile(loss='sparse_categorical_crossentropy', optimizer='adam', metrics=['accuracy'])


  super().__init__(**kwargs)


# SimpleRNN: This is the actual RNN layer. We specify the number of units (neurons) to be 50. The input shape is (sequence_length, 1) because each word is a single feature (we could extend this if we had more features per word).

Dense Layer: This is the output layer that will predict the next word. We use the softmax activation to output a probability distribution over the vocabulary.

Compile: We use sparse_categorical_crossentropy loss because we are dealing with multi-class classification (predicting one of many words) and Adam optimizer to minimize the loss.

# Step 5: Reshaping the Input Data
Before training, we need to reshape our input data so that it matches the expected shape for the RNN.

In [5]:
X = np.array(X)  # Convert X to a numpy array
X = X.reshape((X.shape[0], X.shape[1], 1))  # Reshape the input to (samples, timesteps, features)


# Reshape: The RNN expects the data in the shape (samples, timesteps, features). Here, samples is the number of sequences, timesteps is the length of each sequence, and features is the number of features per timestep (in this case, 1 for each word).

# Step 6: Train the Model
Now, we train the model on our data.

In [7]:
# Reshape y to match the expected shape for sparse_categorical_crossentropy
y = np.array(y)
y = y.reshape((y.shape[0], y.shape[1]))

model.fit(X, y, epochs=100, batch_size=1)

Epoch 1/100


ValueError: Argument `output` must have rank (ndim) `target.ndim - 1`. Received: target.shape=(1, 3), output.shape=(1, 7)

# Step 2: Prepare the Data (Corrected)

Let's re-prepare the data to correctly align input and output sequences for training.

In [8]:
# Example sentences
sentences = ['i love machine learning', 'i love deep learning', 'deep learning is amazing']

# Create a dictionary of unique words (vocabulary)
vocab = set(' '.join(sentences).split())  # Extract unique words
vocab_size = len(vocab) + 1 # Add 1 for padding
word_to_index = {word: i+1 for i, word in enumerate(vocab)}  # Map words to integers
index_to_word = {i: word for word, i in word_to_index.items()}  # Reverse mapping

# Convert sentences into sequences of integers
sequences = [[word_to_index[word] for word in sentence.split()] for sentence in sentences]

# Prepare the input and output sequences
X = []
y = []
for seq in sequences:
    for i in range(1, len(seq)):
        X.append(seq[:i])
        y.append(seq[i])

# Pad the input sequences
max_sequence_length = max([len(seq) for seq in X])
X = pad_sequences(X, maxlen=max_sequence_length, padding='pre')

# Convert y to numpy array
y = np.array(y)

# Step 3: Build the RNN Model (Corrected)

Now, let's build the RNN model with the correct input shape based on the padded sequences.

In [9]:
model = Sequential()

# Add an embedding layer to represent words as dense vectors
model.add(tf.keras.layers.Embedding(vocab_size, 50, input_length=max_sequence_length))

# Add an RNN layer
model.add(SimpleRNN(50, return_sequences=False)) # return_sequences=False because we predict one word at the end of the sequence

# Add a Dense layer to predict the next word
model.add(Dense(vocab_size, activation='softmax'))  # Number of words in the vocab as the output size

# Compile the model
model.compile(loss='sparse_categorical_crossentropy', optimizer='adam', metrics=['accuracy'])

model.summary()



# Step 4: Train the Model (Corrected)

Finally, train the corrected model.

In [10]:
model.fit(X, y, epochs=100, batch_size=1)

Epoch 1/100
[1m9/9[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m2s[0m 6ms/step - accuracy: 0.1004 - loss: 2.0767    
Epoch 2/100
[1m9/9[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 5ms/step - accuracy: 0.6866 - loss: 1.9687 
Epoch 3/100
[1m9/9[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 5ms/step - accuracy: 0.5783 - loss: 1.8623 
Epoch 4/100
[1m9/9[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 5ms/step - accuracy: 0.6991 - loss: 1.7156 
Epoch 5/100
[1m9/9[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 5ms/step - accuracy: 0.5749 - loss: 1.6116 
Epoch 6/100
[1m9/9[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 5ms/step - accuracy: 0.4908 - loss: 1.4698     
Epoch 7/100
[1m9/9[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 6ms/step - accuracy: 0.6713 - loss: 1.4530     
Epoch 8/100
[1m9/9[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 5ms/step - accuracy: 0.6181 - loss: 1.1876     
Epoch 9/100
[1m9/9[0m [32m━━━━━━━━━━━━

<keras.src.callbacks.history.History at 0x7fa92b54fe10>

Training: We fit the model on our input data (X) and target data (y). We train for 100 epochs, using a batch size of 1 (since we have a small dataset).

Step 7: Predict the Next Word
Once the model is trained, we can use it to predict the next word for a given input.

In [11]:
test_sentence = 'i love'
test_sequence = [word_to_index[word] for word in test_sentence.split()]
test_sequence = pad_sequences([test_sequence], padding='pre', maxlen=X.shape[1])
test_sequence = test_sequence.reshape((test_sequence.shape[0], test_sequence.shape[1], 1))

predicted_index = model.predict(test_sequence)
predicted_word = index_to_word[np.argmax(predicted_index)]
print(f"The next word after '{test_sentence}' is: {predicted_word}")


[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 223ms/step
The next word after 'i love' is: deep


Prediction: We convert the test sentence into a sequence of integers, pad it, and reshape it. Then we use the model to predict the next word by looking at the output probabilities and selecting the word with the highest probability (argmax).

# Detailed Explanation of the Code:
Data Preprocessing: We convert sentences into numerical sequences because RNNs work with numbers, not text. We also pad sequences to ensure they have the same length.

Building the RNN: The RNN processes sequences of data. We used the SimpleRNN layer to model the sequential relationships between words.

Training: We use the fit() method to train the model. The model learns to predict the next word based on the sequences it sees during training.

Prediction: After training, the model can predict the next word in a sentence by processing the input sequence.