# Title: Accents & Agnostic Speech Transcription Model for User Enhanced Accessibility.

## Team - Titans Alliance.

### Introduction.

In today's digital age, education has surpassed the boundaries of traditional classrooms, thanks to the proliferation of e-learning platforms. However, in the middle of this transformation, a significant challenge persists ensuring that educational content is accessible to learners of diverse backgrounds and abilities. Imagine trying to learn from an online lecture, only to be hindered by accents or dialects that make understanding difficult. This is the reality faced by many learners worldwide. Our project, driven by a passion for inclusivity and accessibility in education, aims to tackle this challenge. By developing an innovative speech transcription model, we aspire to break down barriers and empower learners from all walks of life to engage effectively with educational content. To achieve our goal, we start by converting audio files into Mel-frequency cepstral coefficients (MFCCs), a technique widely used in audio signal processing. MFCCs allow us to extract essential features from the sound spectrum, enabling our model to understand spoken language more effectively. Additionally, we perform a train-test split to ensure that our model is trained on a diverse dataset while also being evaluated on unseen data.

We don't stop there. We rigorously evaluate our model's performance using metrics such as Loss function and accuracy between the predicted outputs and actual label, we can further decode the outputs using decode function but that part is left out due to constraint in computation resources, also these output sequences require preprocessing that can be changed to texts but now the outputs are in the form of sequences so we transformed our labels into the sequences further details will be explained below in the Notebook. These metrics allow us to assess how accurately our model transcribes speech and how well it generalizes to new data. We compare the performance of different models, including Recurrent Neural Networks (RNN), Long Short-Term Memory (LSTM), and Gated Recurrent Unit (GRU), to identify the most effective approach.

Ultimately, our goal is to select the model that achieves the highest accuracy. By doing so, we can pave the way for a more inclusive and accessible future in ed-tech platforms, where every learner has the opportunity to thrive. So, let's get into it!

### 0. Import Required Packages

In [1]:
pip install SpeechRecognition

Collecting SpeechRecognition
  Downloading SpeechRecognition-3.10.3-py2.py3-none-any.whl.metadata (29 kB)
Downloading SpeechRecognition-3.10.3-py2.py3-none-any.whl (32.8 MB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m32.8/32.8 MB[0m [31m48.5 MB/s[0m eta [36m0:00:00[0m:00:01[0m00:01[0m
[?25hInstalling collected packages: SpeechRecognition
Successfully installed SpeechRecognition-3.10.3
Note: you may need to restart the kernel to use updated packages.


In [33]:
pip install keras

Note: you may need to restart the kernel to use updated packages.


In [2]:
pip install resampy

Collecting resampy
  Downloading resampy-0.4.3-py3-none-any.whl.metadata (3.0 kB)
Downloading resampy-0.4.3-py3-none-any.whl (3.1 MB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m3.1/3.1 MB[0m [31m40.5 MB/s[0m eta [36m0:00:00[0m00:01[0m00:01[0m
[?25hInstalling collected packages: resampy
Successfully installed resampy-0.4.3
Note: you may need to restart the kernel to use updated packages.


In [4]:
!pip install --upgrade librosa



In [3]:
from sklearn.feature_extraction.text import CountVectorizer
import tensorflow as tf
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import librosa
import sklearn
import nltk
import tarfile
import os
import resampy
import speech_recognition as sr
from sklearn.model_selection import train_test_split
from sklearn.metrics import confusion_matrix, classification_report
import warnings
warnings.filterwarnings('ignore')


2024-05-02 12:11:08.998980: E external/local_xla/xla/stream_executor/cuda/cuda_dnn.cc:9261] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
2024-05-02 12:11:08.999100: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:607] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
2024-05-02 12:11:09.186144: E external/local_xla/xla/stream_executor/cuda/cuda_blas.cc:1515] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered


### 1. Load Data.

In this step we load the data into our Notebook our data is audio files downloaded from Mozilla Common Voice website, which contains the audio files where the speakers record speaking a phrase in english and we are going to convert this audio files into numeric inputs, but at first as the files are in `.mp3` format so we convert them into Mel-Frequency Cepstral Coefficients (MFCC) which measure the way humar ear perceives the sounds.

In [5]:
# Step 1: Load Audio Files
data_dir = "/kaggle/input/audiodata/clips"
file_paths = [os.path.join(data_dir, file) for file in os.listdir(data_dir) if file.endswith('.mp3')]

### 2. Data Preprocessing.

Now, that we have converted our audio files into MFCCs, now we are going to create labels that is our target variables by using SpeechRecognition module by Google, which transcribes the speech from the audio files but for that at first we have to change the format as this module can recognize `.wav` files but not the files with the extension `.mp3`. So, let's do it.

In [7]:
def preprocess_audio(audio_file):
    audio, sr = librosa.load(audio_file, sr=6000, mono=True, res_type='kaiser_fast')
    mfccs = librosa.feature.mfcc(y=audio, sr=sr, n_mfcc=13)
    return mfccs

In [8]:
preprocess =  [preprocess_audio(file_path) for file_path in file_paths]

In [10]:
# Install ffmpeg (if not already installed)
!apt-get install ffmpeg

# Path to the folder containing the MP3 files
mp3_folder = "/kaggle/input/audiodata/clips"

# Path to the folder where you want to save the converted WAV files
wav_folder = "/kaggle/working/wav"

# Create the output folder if it doesn't exist
if not os.path.exists(wav_folder):
    os.makedirs(wav_folder)

# Iterate over each file in the MP3 folder
for mp3_file in os.listdir(mp3_folder):
    # Check if the file is an MP3 file
    if mp3_file.endswith('.mp3'):
        # Construct the paths for the input and output files
        mp3_path = os.path.join(mp3_folder, mp3_file)
        wav_path = os.path.join(wav_folder, os.path.splitext(mp3_file)[0] + '.wav')

        # Convert MP3 to WAV using ffmpeg
        !ffmpeg -i "$mp3_path" "$wav_path"

Reading package lists... Done
Building dependency tree       
Reading state information... Done
ffmpeg is already the newest version (7:4.2.7-0ubuntu0.1).
0 upgraded, 0 newly installed, 0 to remove and 65 not upgraded.
ffmpeg version 4.2.7-0ubuntu0.1 Copyright (c) 2000-2022 the FFmpeg developers
  built with gcc 9 (Ubuntu 9.4.0-1ubuntu1~20.04.1)
  configuration: --prefix=/usr --extra-version=0ubuntu0.1 --toolchain=hardened --libdir=/usr/lib/x86_64-linux-gnu --incdir=/usr/include/x86_64-linux-gnu --arch=amd64 --enable-gpl --disable-stripping --enable-avresample --disable-filter=resample --enable-avisynth --enable-gnutls --enable-ladspa --enable-libaom --enable-libass --enable-libbluray --enable-libbs2b --enable-libcaca --enable-libcdio --enable-libcodec2 --enable-libflite --enable-libfontconfig --enable-libfreetype --enable-libfribidi --enable-libgme --enable-libgsm --enable-libjack --enable-libmp3lame --enable-libmysofa --enable-libopenjpeg --enable-libopenmpt --enable-libopus --enable

Since we have changed the format, now it's time to transcribe them using SpeechRecognition

In [11]:
def transcribe_wav_files(wav_folder):
    transcripts = []
    recognizer = sr.Recognizer()

    # Iterate over each WAV file in the folder
    for wav_file in os.listdir(wav_folder):
        if wav_file.endswith('.wav'):
            # Construct the path to the WAV file
            wav_path = os.path.join(wav_folder, wav_file)

            # Load the WAV file
            with sr.AudioFile(wav_path) as source:
                audio_data = recognizer.record(source)

                # Perform speech recognition
                try:
                    transcript = recognizer.recognize_google(audio_data)
                    transcripts.append(transcript)
                except sr.UnknownValueError:
                    # Handle unrecognized speech
                    transcripts.append("Recognition failed for file: {}".format(wav_file))
                except sr.RequestError as e:
                    # Handle API request error
                    transcripts.append("Could not request results for file: {} (error: {})".format(wav_file, e))

    return transcripts

In [24]:
transcripts = transcribe_wav_files(wav_folder)
print(transcripts)

['bunny bunny bunny at Herbert Lehman High School in the Bronx', 'he plays for Managua', 'the hotel provides good views of the local Lake and the alveolar Alps', 'this must be an incredibly distressing time for them', 'Murray is married to Melissa and has two children', 'he joined Damian Blake and Neil Clark as new members of Letterkenny Town Council', 'she began gymnastics when she was 8 years old', 'lighter yellowish', 'me Teresa Darcy miento ISTA contribution', 'they suggested to her the writing of the life of Edgeworth', 'smolensky began his business activities on the black market of the so-called Shadow economy', 'the team would play the next five seasons decked all in blue', 'Insurance contributed to developing and disappointing King friendly safe for sex Technologies', 'driving 1 the series', 'it is native to Tropical Africa Trinidad and tropical Central and South America', 'Nancy unit turned Southwest word while lion Rock turned Eastward', 'he also became renowned for his off-a

Now, we are going to preprocess our data so that we can use that to train our models LSTM, GRU and RNN. In preprocessing to ensure the feature(mfcc coefficients) and target (labels/transcripts) are having the same dimensionality, we are going to do some padding for both the target and feature variable so, let's do it

In [26]:
audio_folder = "/kaggle/working/wav"

In [27]:
audio_dir = os.listdir(audio_folder)

In [28]:
file_transcript_mapping = {}

In [29]:
# Iterate over each index and its corresponding transcript
for i in range(len(file_paths)):
    # Construct the full file path
    full_audio_path = os.path.join(audio_folder, audio_dir[i])
    # Add the mapping to the dictionary
    file_transcript_mapping[full_audio_path] = transcripts[i]

# Access transcript for a specific audio file
specific_audio_file = "/kaggle/working/wav/common_voice_en_34893938.wav"
print("Transcript for", specific_audio_file, ":", file_transcript_mapping[specific_audio_file])

Transcript for /kaggle/working/wav/common_voice_en_34893938.wav : Apple talked over coax cable was the way macintosh's communicated


In [30]:
transcripts = list(file_transcript_mapping.values())

In [31]:
len(transcripts)

470

In [63]:
# Find the maximum length of MFCC array
max_mfcc_length = max(len(mfcc) for mfcc in preprocess)

# Pad MFCC features to have the same time steps
preprocess_padded = [np.pad(mfcc, ((0, max_mfcc_length - len(mfcc)), (0, 0)), mode='constant') for mfcc in preprocess]

In [64]:
shapes = [np.array(p).shape for p in preprocess_padded]
print(shapes[:5])  # This will show the shape of each element

[(13, 104), (13, 49), (13, 62), (13, 77), (13, 72)]


In [65]:
# Find the maximum length in the second dimension
max_length = max(p.shape[1] for p in preprocess_padded)

In [66]:
max_length

120

In [67]:
# Pad each array to have the same second dimension length
padded_arrays = [np.pad(p, ((0, 0), (0, max_length - p.shape[1])), 'constant', constant_values=0) for p in preprocess_padded]

# Now convert to a single numpy array
uniform_array = np.stack(padded_arrays)

print("Shape of the unified array:", uniform_array.shape)

Shape of the unified array: (470, 13, 120)


In [69]:
from tensorflow.keras.preprocessing.text import Tokenizer

# Tokenize at the word level
word_tokenizer = Tokenizer(filters='', lower=False, split=' ')
word_tokenizer.fit_on_texts(transcripts)  # transcripts should be dataset text

# Convert texts to word-index sequences
word_sequences = word_tokenizer.texts_to_sequences(transcripts)

# Pad sequences to have a uniform length
from tensorflow.keras.preprocessing.sequence import pad_sequences
max_len = 120  # This should be chosen based on data
padded_word_sequences = pad_sequences(word_sequences, maxlen=max_len, padding='post', truncating='post')

In [70]:
uniform_array.shape

(470, 13, 120)

In [71]:
padded_word_sequences.shape

(470, 120)

In [59]:
# Pad each array to have the same second dimension length
padded_arrays = [np.pad(p, ((0, 0), (0, max_length - p.shape[1])), 'constant', constant_values=0) for p in preprocess_padded]

# Now convert to a single numpy array
uniform_array = np.stack(padded_arrays)

print("Shape of the unified array:", uniform_array.shape)

Shape of the unified array: (470, 13, 120)


In [98]:
# Assuming uniform_array is your MFCC data
X = np.transpose(uniform_array, (0,2,1))

In [99]:
X.shape

(470, 120, 13)

In [103]:
from tensorflow.keras.preprocessing.text import Tokenizer

# Assuming transcripts is your dataset text
tokenizer = Tokenizer()
tokenizer.fit_on_texts(transcripts)

### 3. Model Building.

In this step we are going to build a model that takes in the padded sequences of both the mfcc coefficients and the labels/transcripts converted into numeric sequences and gives output as the sequence which can further be decoded into text and then use the metrics like Word Error rate (WER) to assess the model performance. But as we have a constraint on the resources we are going to chose our model based on the loss function and accuracy of validation set but this model can further be enhanced by interpreting these sequences and we plan to do that further.

### Long Short Term Memory(LSTM).

In [136]:
from keras.models import Sequential
from keras.layers import LSTM, Dense, TimeDistributed, Dropout
from keras.utils import to_categorical
import numpy as np

num_words = len(word_tokenizer.word_index) + 1

lstm_model = Sequential([
    LSTM(100, return_sequences=True, input_shape=(120, 13)),  # Increased number of units
    Dropout(0.5),  # Adding dropout
    LSTM(100, return_sequences=True),
    TimeDistributed(Dense(num_words, activation='softmax'))
])
lstm_model.compile(optimizer='adam', loss='categorical_crossentropy', metrics=['accuracy'])

# Assuming padded_transcripts is your encoded and padded transcript data
# and uniform_array is your prepared MFCC feature set shaped 
X = np.array(X)
# y = to_categorical(padded_word_sequences, num_classes=num_words)
X_train, X_test, y_train, y_test = train_test_split(X, padded_word_sequences, test_size = 0.2, random_state = 42)

y_train = to_categorical(y_train, num_classes = num_words)
y_test = to_categorical(y_test, num_classes = num_words)
# Training the model
lstm_model.fit(X_train, y_train, epochs=10, batch_size=32, validation_data= (X_test, y_test))

Epoch 1/10
[1m12/12[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m9s[0m 195ms/step - accuracy: 0.4900 - loss: 7.3426 - val_accuracy: 0.9229 - val_loss: 5.6665
Epoch 2/10
[1m12/12[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 48ms/step - accuracy: 0.9225 - loss: 5.0192 - val_accuracy: 0.9229 - val_loss: 2.8081
Epoch 3/10
[1m12/12[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 47ms/step - accuracy: 0.9228 - loss: 2.2489 - val_accuracy: 0.9229 - val_loss: 0.9987
Epoch 4/10
[1m12/12[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 45ms/step - accuracy: 0.9234 - loss: 0.8865 - val_accuracy: 0.9229 - val_loss: 0.7538
Epoch 5/10
[1m12/12[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 48ms/step - accuracy: 0.9242 - loss: 0.7253 - val_accuracy: 0.9229 - val_loss: 0.7329
Epoch 6/10
[1m12/12[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 48ms/step - accuracy: 0.9236 - loss: 0.7036 - val_accuracy: 0.9229 - val_loss: 0.7084
Epoch 7/10
[1m12/12[0m [32m━━━

<keras.src.callbacks.history.History at 0x7889d41c2500>

In [142]:
y_train.shape

(376, 120, 2016)

In [138]:
predictions = lstm_model.predict(X_test)

[1m3/3[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 15ms/step


###  Gated Recurrent Unit.

In [143]:
from keras.models import Sequential
from keras.layers import GRU, Dense, TimeDistributed, Dropout

# Construct the model
gru_model = Sequential([
    GRU(200, return_sequences=True, input_shape=(120, 13)),  # Increased number of units
    Dropout(0.5),  # Adding dropout
    GRU(200, return_sequences=True),  # Increased number of units
    TimeDistributed(Dense(num_words, activation='softmax'))
])

# Compile the model
gru_model.compile(optimizer='adam', loss='categorical_crossentropy', metrics=['accuracy'])

# Print model summary
gru_model.summary()

In [144]:
gru_model.fit(X_train, y_train, epochs=10, batch_size=32, validation_data= (X_test, y_test))

Epoch 1/10
[1m12/12[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m9s[0m 198ms/step - accuracy: 0.6747 - loss: 6.5647 - val_accuracy: 0.9229 - val_loss: 1.4796
Epoch 2/10
[1m12/12[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 56ms/step - accuracy: 0.9237 - loss: 1.0547 - val_accuracy: 0.9229 - val_loss: 0.6534
Epoch 3/10
[1m12/12[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 55ms/step - accuracy: 0.9229 - loss: 0.6277 - val_accuracy: 0.9229 - val_loss: 0.6132
Epoch 4/10
[1m12/12[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 53ms/step - accuracy: 0.9224 - loss: 0.5933 - val_accuracy: 0.9229 - val_loss: 0.5982
Epoch 5/10
[1m12/12[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 55ms/step - accuracy: 0.9239 - loss: 0.5668 - val_accuracy: 0.9229 - val_loss: 0.5934
Epoch 6/10
[1m12/12[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 53ms/step - accuracy: 0.9246 - loss: 0.5565 - val_accuracy: 0.9244 - val_loss: 0.5926
Epoch 7/10
[1m12/12[0m [32m━━━

<keras.src.callbacks.history.History at 0x7889a6f0df30>

In [147]:
gru_predictions = gru_model.predict(X_train)

[1m12/12[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m3s[0m 119ms/step


In [149]:
gru_evaluation = gru_model.evaluate(X_test, y_test)

[1m3/3[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 32ms/step - accuracy: 0.9238 - loss: 0.6023


### Recurrent Neural Network.

In [150]:
from keras.models import Sequential
from keras.layers import RNN, Dense, TimeDistributed, Dropout, SimpleRNNCell


# Construct the model
rnn_model = Sequential([
    RNN(SimpleRNNCell(200), return_sequences=True, input_shape=(120, 13)),  # Increased number of units
    Dropout(0.5),  # Adding dropout
    RNN(SimpleRNNCell(200), return_sequences=True),  # Increased number of units
    TimeDistributed(Dense(num_words, activation='softmax'))
])

# Compile the model
rnn_model.compile(optimizer='adam', loss='categorical_crossentropy', metrics=['accuracy'])

# Print model summary
rnn_model.summary()

In [151]:
rnn_model.fit(X_train, y_train, epochs=10, batch_size=32, validation_data= (X_test, y_test))

Epoch 1/10
[1m 2/12[0m [32m━━━[0m[37m━━━━━━━━━━━━━━━━━[0m [1m0s[0m 70ms/step - accuracy: 0.0130 - loss: 7.5461       

I0000 00:00:1714665076.877828    3611 device_compiler.h:186] Compiled cluster using XLA!  This line is logged at most once for the lifetime of the process.


[1m12/12[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 24s/step - accuracy: 0.4448 - loss: 6.2547 

W0000 00:00:1714665342.706808    3612 graph_launch.cc:671] Fallback to op-by-op mode because memset node breaks graph update


[1m12/12[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m570s[0m 27s/step - accuracy: 0.4652 - loss: 6.1405 - val_accuracy: 0.9225 - val_loss: 1.3944
Epoch 2/10
[1m12/12[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 57ms/step - accuracy: 0.9218 - loss: 1.1028 - val_accuracy: 0.9229 - val_loss: 0.8004
Epoch 3/10
[1m12/12[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 48ms/step - accuracy: 0.9215 - loss: 0.8007 - val_accuracy: 0.9229 - val_loss: 0.8293
Epoch 4/10
[1m12/12[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 48ms/step - accuracy: 0.9239 - loss: 0.7780 - val_accuracy: 0.9229 - val_loss: 0.8281
Epoch 5/10
[1m12/12[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 48ms/step - accuracy: 0.9229 - loss: 0.7749 - val_accuracy: 0.9229 - val_loss: 0.8198
Epoch 6/10
[1m12/12[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 48ms/step - accuracy: 0.9213 - loss: 0.7695 - val_accuracy: 0.9229 - val_loss: 0.8138
Epoch 7/10
[1m12/12[0m [32m━━━━━━━━━━━━━━

<keras.src.callbacks.history.History at 0x788a46a09990>

In [152]:
rnn_predictions = rnn_model.predict(X_train)

[1m12/12[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m33s[0m 2s/step


In [153]:
rnn_evaluation = rnn_model.evaluate(X_test, y_test)

[1m3/3[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 30ms/step - accuracy: 0.9216 - loss: 0.7834


### Conclusion.

The GRU model appears to have achieved the highest accuracy (0.9238) and lowest loss (0.6023) among the RNN, GRU, and LSTM models. However, accuracy and loss metrics alone may not provide a comprehensive understanding of our model performance.

To determine which model is the best choice, consider the following factors:

Performance Metrics: While accuracy and loss are important metrics, consider other performance metrics such as precision, recall, and F1-score, especially if your dataset is imbalanced or if certain classes are more important than others.

Complexity and Training Time: Evaluate the complexity and training time of each model. GRU and LSTM models are typically more complex than simple RNNs, which could result in longer training times. Consider whether the higher accuracy of the GRU model justifies the increased complexity and training time.

Generalization: Assess how well each model generalizes to unseen data. We can also use techniques like cross-validation or holdout validation to evaluate the models on validation or test datasets that were not used during training.

Interpretability: Consider the interpretability of each model. Simple RNNs are easier to interpret than GRU and LSTM models due to their simpler architecture. Based on interpretability a simpler model may be preferable like the RNN.

Resource Constraints: As we've resource constraints on further processing and evaluating a perfect speech recognition model and interpreting the output of the model to use our WER, CER metrics it is imperative to consider the resources available

In [3]:
# Check the reverse word index
#print("Reverse Word Index:")
#print(reverse_word_index)

# Check the first few sequences
#for i, sequence in enumerate(rnn_predictions[:5]):
    #print(f"Sequence {i}: {sequence}")

# Verify if sequences are mapped to words correctly
#for i, sequence in enumerate(rnn_predictions[:5]):
   # text = ' '.join(reverse_word_index.get(int(idx[0]), '') for idx in sequence)
    #print(f"Predicted Text {i}: {text}")

In [2]:
# Reverse mapping of word indices to words
# reverse_word_index = {idx: word for word, idx in word_tokenizer.word_index.items()}

# Function to convert word indices to text
# def sequences_to_text(sequences):
   # texts = []
   # for sequence in sequences:
    #    # Convert each element to integer before using it as a key
     #   text = ' '.join(reverse_word_index.get(int(idx[0]), '') for idx in sequence)
      #  texts.append(text)
    #return texts

# Convert predicted sequences to text
# predicted_texts = sequences_to_text(rnn_predictions)