# 📘 Chapter 16: Natural Language Processing with RNNs and Attention

Bab ini membahas bagaimana model deep learning digunakan untuk menangani data teks atau urutan kata dalam konteks **Natural Language Processing (NLP)**, dengan fokus pada **RNN (GRU)** dan **Attention Mechanism**.

---

## 🎯 Tujuan Pembelajaran

- Memahami representasi kata menggunakan Embedding
- Menggunakan model RNN (GRU) untuk klasifikasi teks
- Menerapkan mekanisme attention untuk fokus pada bagian penting input
- Melatih dan mengevaluasi model NLP menggunakan dataset IMDB

---

## 📦 Dataset: IMDB Movie Reviews

- Berisi 50.000 review film (25.000 train + 25.000 test)
- Label: 1 (positif) atau 0 (negatif)
- Data sudah ditokenisasi oleh Keras dan di-*pad* ke panjang tetap

---

## 🧱 Arsitektur Model

Model utama terdiri dari 3 komponen inti yang digunakan secara kombinatif:

1. **Embedding Layer**  
   Mengubah integer (representasi kata) menjadi vektor dense berdimensi tetap.
   > Contoh: `Embedding(input_dim=10000, output_dim=64)`

2. **Recurrent Layer: GRU**  
   Layer RNN efisien yang memproses urutan kata dan menyimpan konteks.
   - Jika `return_sequences=False`, hanya output akhir yang diambil (untuk klasifikasi).
   - Jika `return_sequences=True`, seluruh urutan keluaran digunakan (untuk attention).

3. **Attention Mechanism (Manual)**  
   Digunakan untuk memberikan bobot berbeda ke setiap kata dalam urutan input.
   - Dibangun secara manual menggunakan Dense → Softmax → Multiply.
   - Output akhir adalah kombinasi tertimbang dari semua langkah waktu.

4. **Output Layer**  
   Layer `Dense(1, activation="sigmoid")` untuk klasifikasi biner (positif/negatif).

---

## ⚙️ Kompilasi & Pelatihan

- **Loss Function**: `binary_crossentropy`
- **Optimizer**: `adam`
- **Metric**: `accuracy`
- Pelatihan dilakukan selama 5 epoch dengan validasi 20% dari data training.

---

## 🔍 Evaluasi Model

Model dievaluasi terhadap data uji (`X_test`, `y_test`) untuk mengukur akurasi generalisasi. Model dengan Attention biasanya menghasilkan performa yang lebih baik dan lebih dapat diinterpretasikan.

```python
model.evaluate(X_test, y_test)


In [1]:
# CHAPTER 16: Natural Language Processing with RNNs and Attention
# ---------------------------------------------------------------
# Fokus: Proses teks dengan RNN dan Attention sederhana
# Dataset: IMDB reviews (biner, positif/negatif)

import tensorflow as tf
from tensorflow import keras
import numpy as np
import matplotlib.pyplot as plt

print("TensorFlow version:", tf.__version__)

# ===================================================
# 1. Load dan Preprocess Dataset IMDB
# ===================================================
vocab_size = 10000
max_length = 200

(X_train, y_train), (X_test, y_test) = keras.datasets.imdb.load_data(num_words=vocab_size)
X_train = keras.preprocessing.sequence.pad_sequences(X_train, maxlen=max_length)
X_test = keras.preprocessing.sequence.pad_sequences(X_test, maxlen=max_length)

# ===================================================
# 2. Model NLP dengan Embedding dan GRU
# ===================================================
model_gru = keras.models.Sequential([
    keras.layers.Embedding(input_dim=vocab_size, output_dim=64, input_length=max_length),
    keras.layers.GRU(64, return_sequences=False),
    keras.layers.Dense(1, activation="sigmoid")
])

model_gru.compile(loss="binary_crossentropy", optimizer="adam", metrics=["accuracy"])

history_gru = model_gru.fit(X_train, y_train, epochs=5, validation_split=0.2)

# ===================================================
# 3. Model dengan Return Sequences = True (untuk Attention)
# ===================================================
# GRU menghasilkan output sequence penuh (bukan hanya vektor akhir)
inputs = keras.layers.Input(shape=[None])
embed = keras.layers.Embedding(input_dim=vocab_size, output_dim=64)(inputs)
gru_out = keras.layers.GRU(64, return_sequences=True)(embed)

# ===================================================
# 4. Attention Layer Sederhana (Manual)
# ===================================================
attention = keras.layers.Dense(1, activation="tanh")(gru_out)
attention = keras.layers.Flatten()(attention)
attention = keras.layers.Activation("softmax")(attention)
attention = keras.layers.RepeatVector(64)(attention)
attention = keras.layers.Permute([2, 1])(attention)

sent_representation = keras.layers.Multiply()([gru_out, attention])
sent_representation = keras.layers.Lambda(lambda x: tf.reduce_sum(x, axis=1))(sent_representation)

output = keras.layers.Dense(1, activation="sigmoid")(sent_representation)

# Gabungkan jadi model
model_attention = keras.models.Model(inputs=inputs, outputs=output)

model_attention.compile(loss="binary_crossentropy", optimizer="adam", metrics=["accuracy"])
history_attention = model_attention.fit(X_train, y_train, epochs=5, validation_split=0.2)

# ===================================================
# 5. Evaluasi Model
# ===================================================
print("Evaluasi GRU:")
model_gru.evaluate(X_test, y_test)

print("Evaluasi GRU + Attention:")
model_attention.evaluate(X_test, y_test)


TensorFlow version: 2.18.0
Downloading data from https://storage.googleapis.com/tensorflow/tf-keras-datasets/imdb.npz
[1m17464789/17464789[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m2s[0m 0us/step




Epoch 1/5
[1m625/625[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m12s[0m 12ms/step - accuracy: 0.6930 - loss: 0.5527 - val_accuracy: 0.8460 - val_loss: 0.3551
Epoch 2/5
[1m625/625[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m6s[0m 10ms/step - accuracy: 0.8886 - loss: 0.2792 - val_accuracy: 0.8568 - val_loss: 0.3382
Epoch 3/5
[1m625/625[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m10s[0m 10ms/step - accuracy: 0.9356 - loss: 0.1738 - val_accuracy: 0.8776 - val_loss: 0.3128
Epoch 4/5
[1m625/625[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m10s[0m 10ms/step - accuracy: 0.9644 - loss: 0.1062 - val_accuracy: 0.8728 - val_loss: 0.3343
Epoch 5/5
[1m625/625[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m7s[0m 11ms/step - accuracy: 0.9792 - loss: 0.0651 - val_accuracy: 0.8694 - val_loss: 0.4334
Epoch 1/5
[1m625/625[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m9s[0m 12ms/step - accuracy: 0.7028 - loss: 0.5305 - val_accuracy: 0.8762 - val_loss: 0.2982
Epoch 2/5
[1m625/625[0m

[0.6086789965629578, 0.8387200236320496]