## Model Architecture Explanation

### 1. **Embedding Layer** – *Word Translator*
- Converts word indices into dense word vectors using pre-trained GloVe embeddings.
- Acts like a translator that maps each word ID to its meaningful vector representation.
- Parameters:
  - `input_dim=10000`: We use a vocabulary of 10,000 words.
  - `output_dim=100`: Each word is mapped to a 100-dimensional vector.
  - `weights=[embedding_matrix]`: GloVe vectors are loaded here.
  - `input_length=500`: All sequences are padded to a fixed length of 500.
  - `trainable=False`: GloVe embeddings are kept fixed and not updated during training.

---

### 2. **LSTM Layer** – *Sequence Processor*
- LSTM (Long Short-Term Memory) is a type of RNN that remembers long-term dependencies and forgets irrelevant ones.
- Think of it as a thoughtful reader that processes one word at a time and retains important context.
- Parameters:
  - `units=128`: Number of memory cells (how much information it can retain).
  - `dropout=0.2`: Randomly turns off 20% of neurons during training to prevent overfitting.
  - `recurrent_dropout=0.2`: Applies dropout to the recurrent connections.

---

### 3. **Dense Layer** – *Decision Maker*
- A fully connected layer that receives the final output from the LSTM and makes a binary decision.
- Activation:
  - `sigmoid`: Outputs a probability between 0 and 1, representing sentiment (positive/negative).

---

### **Overall Flow:**
1. Word indices → Embedding Layer → word vectors
2. Word vectors → LSTM → sentence understanding
3. Understanding → Dense Layer → probability score (sentiment)

In [1]:
import numpy as np
from tensorflow.keras.datasets import imdb
from tensorflow.keras.preprocessing.sequence import pad_sequences
from tensorflow.keras.preprocessing.text import Tokenizer

# Step 1: Load word index mapping
word_index = imdb.get_word_index()

# Step 2: Shift indices to account for special tokens
word_index = {k: (v + 3) for k, v in word_index.items()}

word_index["<PAD>"] = 0
word_index["<START>"] = 1
word_index["<UNK>"] = 2
word_index["<UNUSED>"] = 3

# Step 3: Reverse index to get words from IDs
reverse_word_index = {value: key for key, value in word_index.items()}

# Step 4: Load GloVe vectors
embedding_index = {}
embedding_dim = 100

with open("../GloVe/glove.6B.100d.txt", encoding='utf-8') as f:
    for line in f:
        values = line.split()
        word = values[0]
        vector = np.asarray(values[1:], dtype='float32')
        embedding_index[word] = vector

# Step 5: Create embedding matrix for the words in the IMDb dataset
vocab_size = 10000
embedding_matrix = np.zeros((vocab_size, embedding_dim))
# print("embedding_matrix: ", embedding_matrix)
for i in range(4, vocab_size):  # skip special tokens
    word = reverse_word_index.get(i, None)
    if word:
        embedding_vector = embedding_index.get(word)
        if embedding_vector is not None:
            embedding_matrix[i] = embedding_vector


2025-05-27 19:03:19.174678: I external/local_xla/xla/tsl/cuda/cudart_stub.cc:32] Could not find cuda drivers on your machine, GPU will not be used.
2025-05-27 19:03:19.292969: I external/local_xla/xla/tsl/cuda/cudart_stub.cc:32] Could not find cuda drivers on your machine, GPU will not be used.
2025-05-27 19:03:19.414501: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:467] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
E0000 00:00:1748352799.500857   10245 cuda_dnn.cc:8579] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
E0000 00:00:1748352799.528582   10245 cuda_blas.cc:1407] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
W0000 00:00:1748352799.686901   10245 computation_placer.cc:177] computation placer already registered. Please check linkage and avoid linkin

ImportError: 
A module that was compiled using NumPy 1.x cannot be run in
NumPy 2.1.3 as it may crash. To support both 1.x and 2.x
versions of NumPy, modules must be compiled with NumPy 2.0.
Some module may need to rebuild instead e.g. with 'pybind11>=2.12'.

If you are a user of the module, the easiest solution will be to
downgrade to 'numpy<2' or try to upgrade the affected module.
We expect that some modules will need time to support NumPy 2.




A module that was compiled using NumPy 1.x cannot be run in
NumPy 2.1.3 as it may crash. To support both 1.x and 2.x
versions of NumPy, modules must be compiled with NumPy 2.0.
Some module may need to rebuild instead e.g. with 'pybind11>=2.12'.

If you are a user of the module, the easiest solution will be to
downgrade to 'numpy<2' or try to upgrade the affected module.
We expect that some modules will need time to support NumPy 2.

Traceback (most recent call last):  File "/home/omkarjadhav/miniconda3/envs/spacy-env/lib/python3.12/runpy.py", line 198, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "/home/omkarjadhav/miniconda3/envs/spacy-env/lib/python3.12/runpy.py", line 88, in _run_code
    exec(code, run_globals)
  File "/home/omkarjadhav/miniconda3/envs/spacy-env/lib/python3.12/site-packages/ipykernel_launcher.py", line 18, in <module>
    app.launch_new_instance()
  File "/home/omkarjadhav/miniconda3/envs/spacy-env/lib/python3.12/site-packages/trait

ImportError: 
A module that was compiled using NumPy 1.x cannot be run in
NumPy 2.1.3 as it may crash. To support both 1.x and 2.x
versions of NumPy, modules must be compiled with NumPy 2.0.
Some module may need to rebuild instead e.g. with 'pybind11>=2.12'.

If you are a user of the module, the easiest solution will be to
downgrade to 'numpy<2' or try to upgrade the affected module.
We expect that some modules will need time to support NumPy 2.




A module that was compiled using NumPy 1.x cannot be run in
NumPy 2.1.3 as it may crash. To support both 1.x and 2.x
versions of NumPy, modules must be compiled with NumPy 2.0.
Some module may need to rebuild instead e.g. with 'pybind11>=2.12'.

If you are a user of the module, the easiest solution will be to
downgrade to 'numpy<2' or try to upgrade the affected module.
We expect that some modules will need time to support NumPy 2.

Traceback (most recent call last):  File "/home/omkarjadhav/miniconda3/envs/spacy-env/lib/python3.12/runpy.py", line 198, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "/home/omkarjadhav/miniconda3/envs/spacy-env/lib/python3.12/runpy.py", line 88, in _run_code
    exec(code, run_globals)
  File "/home/omkarjadhav/miniconda3/envs/spacy-env/lib/python3.12/site-packages/ipykernel_launcher.py", line 18, in <module>
    app.launch_new_instance()
  File "/home/omkarjadhav/miniconda3/envs/spacy-env/lib/python3.12/site-packages/trait

ImportError: 
A module that was compiled using NumPy 1.x cannot be run in
NumPy 2.1.3 as it may crash. To support both 1.x and 2.x
versions of NumPy, modules must be compiled with NumPy 2.0.
Some module may need to rebuild instead e.g. with 'pybind11>=2.12'.

If you are a user of the module, the easiest solution will be to
downgrade to 'numpy<2' or try to upgrade the affected module.
We expect that some modules will need time to support NumPy 2.




A module that was compiled using NumPy 1.x cannot be run in
NumPy 2.1.3 as it may crash. To support both 1.x and 2.x
versions of NumPy, modules must be compiled with NumPy 2.0.
Some module may need to rebuild instead e.g. with 'pybind11>=2.12'.

If you are a user of the module, the easiest solution will be to
downgrade to 'numpy<2' or try to upgrade the affected module.
We expect that some modules will need time to support NumPy 2.

Traceback (most recent call last):  File "/home/omkarjadhav/miniconda3/envs/spacy-env/lib/python3.12/runpy.py", line 198, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "/home/omkarjadhav/miniconda3/envs/spacy-env/lib/python3.12/runpy.py", line 88, in _run_code
    exec(code, run_globals)
  File "/home/omkarjadhav/miniconda3/envs/spacy-env/lib/python3.12/site-packages/ipykernel_launcher.py", line 18, in <module>
    app.launch_new_instance()
  File "/home/omkarjadhav/miniconda3/envs/spacy-env/lib/python3.12/site-packages/trait

ImportError: 
A module that was compiled using NumPy 1.x cannot be run in
NumPy 2.1.3 as it may crash. To support both 1.x and 2.x
versions of NumPy, modules must be compiled with NumPy 2.0.
Some module may need to rebuild instead e.g. with 'pybind11>=2.12'.

If you are a user of the module, the easiest solution will be to
downgrade to 'numpy<2' or try to upgrade the affected module.
We expect that some modules will need time to support NumPy 2.




A module that was compiled using NumPy 1.x cannot be run in
NumPy 2.1.3 as it may crash. To support both 1.x and 2.x
versions of NumPy, modules must be compiled with NumPy 2.0.
Some module may need to rebuild instead e.g. with 'pybind11>=2.12'.

If you are a user of the module, the easiest solution will be to
downgrade to 'numpy<2' or try to upgrade the affected module.
We expect that some modules will need time to support NumPy 2.

Traceback (most recent call last):  File "/home/omkarjadhav/miniconda3/envs/spacy-env/lib/python3.12/runpy.py", line 198, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "/home/omkarjadhav/miniconda3/envs/spacy-env/lib/python3.12/runpy.py", line 88, in _run_code
    exec(code, run_globals)
  File "/home/omkarjadhav/miniconda3/envs/spacy-env/lib/python3.12/site-packages/ipykernel_launcher.py", line 18, in <module>
    app.launch_new_instance()
  File "/home/omkarjadhav/miniconda3/envs/spacy-env/lib/python3.12/site-packages/trait

ImportError: 
A module that was compiled using NumPy 1.x cannot be run in
NumPy 2.1.3 as it may crash. To support both 1.x and 2.x
versions of NumPy, modules must be compiled with NumPy 2.0.
Some module may need to rebuild instead e.g. with 'pybind11>=2.12'.

If you are a user of the module, the easiest solution will be to
downgrade to 'numpy<2' or try to upgrade the affected module.
We expect that some modules will need time to support NumPy 2.



In [2]:
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Embedding, LSTM, Dense

# Create model
model = Sequential()

# Add layers
model.add(Embedding(input_dim=vocab_size,
                    output_dim=embedding_dim,
                    weights=[embedding_matrix],
                    input_length=500,
                    trainable=False,
                    mask_zero=True))  # Freeze embeddings

model.add(LSTM(units=128, dropout=0.2, recurrent_dropout=0.2))
model.add(Dense(1, activation='sigmoid'))

model.compile(optimizer='adam', loss='binary_crossentropy', metrics=['accuracy'])
model.summary()


2025-05-27 19:05:05.617786: E external/local_xla/xla/stream_executor/cuda/cuda_platform.cc:51] failed call to cuInit: INTERNAL: CUDA error: Failed call to cuInit: CUDA_ERROR_NO_DEVICE: no CUDA-capable device is detected


In [3]:
(x_train, y_train), (x_test, y_test) = imdb.load_data(num_words=vocab_size)

# Pad sequences
x_train = pad_sequences(x_train, maxlen=500, padding='post')
x_test = pad_sequences(x_test, maxlen=500, padding='post')

Each review have different length. So we are going to use `pad_sequences()` function to ensure that all sequences are of the same length (`max_length=200`) by adding zeros at the end (`padding='post'`).

In [None]:
history = model.fit(x_train, y_train,
                    epochs=5,
                    batch_size=128,
                    validation_split=0.2)

Epoch 1/5
[1m 64/157[0m [32m━━━━━━━━[0m[37m━━━━━━━━━━━━[0m [1m53s[0m 577ms/step - accuracy: 0.5655 - loss: 0.6789 

In [21]:
loss, accuracy = model.evaluate(x_test, y_test)
print(f"Test Accuracy: {accuracy:.4f}")

Test Accuracy: 0.5186


First we are getting `word and index` by using function `get_word_index()`. Those are the top 10000 words from imbd reviews. So we can use each word for getting that perticular word's vectors from glove embedding and store index as key and vector as a value in `embedding_matrix`. Because we are going to pass that embedding_matrix dictionary to the `Embedding()` layer.


#### Why do we pass embedding_matrix to Embedding()?
The `Embedding()` layer in Keras is responsible for converting each word (represented by an integer index) into a dense vector of fixed size (like 100 dimensions). But instead of learning these word vectors from scratch, we can use pre-trained word vectors like GloVe `(embedding_matrix in our case)` — which are already trained on a huge dataset and capture word meanings well.


When you pass `weights=[embedding_matrix]` into the Embedding layer, you're telling the model:
"Use these pre-trained GloVe vectors instead of learning them from scratch."