# Chapter 16: Natural Language Processing with RNNs and Attention

**Tujuan:** Menguasai pemrosesan teks sekuens: character RNN, word embeddings, encoder–decoder, dan mekanisme attention (Transformer).

---

## 1. Character-Level RNN (Shakespeare Text)

- **Dataset:** teks Shakespeare  
- **Tokenisasi:** character-level  
- **Model:** Embedding → SimpleRNN/LSTM → Dense(softmax)  
- **Training:** prediksi karakter berikutnya  
- **Sampling:** generate teks baru

In [33]:
import tensorflow as tf
from tensorflow.keras import layers, models
import numpy as np

### 1.1 Muat data (misal 'shakespeare.txt')

In [34]:
text = open('shakespeare.txt', 'r').read().lower()
chars = sorted(set(text))
char2idx = {c:i for i,c in enumerate(chars)}
idx2char = np.array(chars)

### 1.2 Encode teks ke integer

In [35]:
text_as_int = np.array([char2idx[c] for c in text])

### 1.3 Windowing: input seq length N, target = next char

In [36]:
seq_length = 100
examples_per_epoch = len(text)//(seq_length+1)
char_dataset = tf.data.Dataset.from_tensor_slices(text_as_int)
sequences = char_dataset.batch(seq_length+1, drop_remainder=True)

def split_input_target(chunk):
    return chunk[:-1], chunk[1:]

dataset = sequences.map(split_input_target).shuffle(10000).batch(64, drop_remainder=True)

### 1.4 Bangun model

In [37]:
vocab_size = len(chars)
embedding_dim = 256
rnn_units = 512

model = models.Sequential([
    layers.Embedding(vocab_size, embedding_dim),
    layers.LSTM(rnn_units, return_sequences=True, stateful=False),
    layers.Dense(vocab_size)
])

### 1.5 Compile & train (demo singkat)

In [39]:
model.compile(loss=tf.losses.SparseCategoricalCrossentropy(from_logits=True),
              optimizer='adam')

model.fit(dataset, epochs=10)

Epoch 1/10
[1m2/2[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 44ms/step - loss: 3.4457
Epoch 2/10
[1m2/2[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 44ms/step - loss: 3.0955 
Epoch 3/10
[1m2/2[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 38ms/step - loss: 3.1611 
Epoch 4/10
[1m2/2[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 37ms/step - loss: 3.1099
Epoch 5/10
[1m2/2[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 30ms/step - loss: 3.0211
Epoch 6/10
[1m2/2[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 30ms/step - loss: 2.9431 
Epoch 7/10
[1m2/2[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 30ms/step - loss: 2.9317 
Epoch 8/10
[1m2/2[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 28ms/step - loss: 2.9085 
Epoch 9/10
[1m2/2[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 27ms/step - loss: 2.8712
Epoch 10/10
[1m2/2[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 29ms/step - loss: 2.8338


<keras.src.callbacks.history.History at 0x799678192fd0>

## 2. Sentiment Analysis (IMDB) dengan Embeddings + LSTM
- Dataset: `tf.keras.datasets.imdb`

- Preprocessing: `TextVectorization` → integer tokens

- Model: Embedding → LSTM → Dense(sigmoid)

- Loss: binary crossentropy

In [40]:
from tensorflow.keras.datasets import imdb
from tensorflow.keras.layers import TextVectorization

### 2.1 Load data (top 10k words)

In [41]:
(X_train, y_train), (X_test, y_test) = imdb.load_data(num_words=10000)
word_index = imdb.get_word_index()

Downloading data from https://storage.googleapis.com/tensorflow/tf-keras-datasets/imdb.npz
[1m17464789/17464789[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 0us/step
Downloading data from https://storage.googleapis.com/tensorflow/tf-keras-datasets/imdb_word_index.json
[1m1641221/1641221[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 0us/step


### 2.2 Decode & gunakan TextVectorization

In [42]:
idx2word = {i+3:w for w,i in word_index.items()}
idx2word[0],idx2word[1],idx2word[2] = '<pad>','<start>','<unk>'
def decode_review(ids): return ' '.join(idx2word.get(i,'?') for i in ids)

# Direct use dataset of integers
max_len = 200
X_train = tf.keras.preprocessing.sequence.pad_sequences(X_train, maxlen=max_len)
X_test  = tf.keras.preprocessing.sequence.pad_sequences(X_test,  maxlen=max_len)

### 2.3 Bangun model

In [43]:
model2 = models.Sequential([
    layers.Embedding(10000, 16, input_length=max_len),
    layers.Bidirectional(layers.LSTM(64)),
    layers.Dense(1, activation='sigmoid')
])
model2.compile(optimizer='adam', loss='binary_crossentropy', metrics=['accuracy'])
model2.summary()



### 2.4 Latih (demo singkat)

In [45]:
history = model2.fit(X_train, y_train, epochs=10, batch_size=128, validation_split=0.2)

Epoch 1/10
[1m157/157[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m3s[0m 20ms/step - accuracy: 0.8801 - loss: 0.2985 - val_accuracy: 0.8734 - val_loss: 0.3031
Epoch 2/10
[1m157/157[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m4s[0m 23ms/step - accuracy: 0.9222 - loss: 0.2071 - val_accuracy: 0.8780 - val_loss: 0.3022
Epoch 3/10
[1m157/157[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m4s[0m 19ms/step - accuracy: 0.9476 - loss: 0.1518 - val_accuracy: 0.8670 - val_loss: 0.3173
Epoch 4/10
[1m157/157[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m3s[0m 21ms/step - accuracy: 0.9603 - loss: 0.1242 - val_accuracy: 0.8708 - val_loss: 0.3941
Epoch 5/10
[1m157/157[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m3s[0m 21ms/step - accuracy: 0.9686 - loss: 0.0976 - val_accuracy: 0.8712 - val_loss: 0.3857
Epoch 6/10
[1m157/157[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m5s[0m 20ms/step - accuracy: 0.9738 - loss: 0.0814 - val_accuracy: 0.8666 - val_loss: 0.4061
Epoch 7/10
[1m157/157

## 3. Encoder–Decoder untuk Machine Translation
- Dataset: sepasang kalimat (EN→FR)

- Model:

  - Encoder: Embedding→LSTM → state

  - Decoder: Embedding→LSTM (init state encoder) → Dense(softmax)

- Training: teacher forcing

### 3.1 Contoh mock dataset

In [47]:
eng_sentences = ['hello', 'how are you', 'good morning']
fra_sentences = ['bonjour', 'comment ça va', 'bonjour']

### 3.2 Tokenisasi & padding

In [48]:
vectorizer_en = TextVectorization(output_mode='int', output_sequence_length=5)
vectorizer_en.adapt(eng_sentences)
vectorizer_fr = TextVectorization(output_mode='int', output_sequence_length=5)
vectorizer_fr.adapt(fra_sentences)

X_en = vectorizer_en(eng_sentences)
X_fr = vectorizer_fr(fra_sentences)

### 3.3 Bangun encoder–decoder

In [50]:
import keras
from tensorflow.keras import layers

enc_emb = layers.Embedding(input_dim=vectorizer_en.vocabulary_size(), output_dim=16)
dec_emb = layers.Embedding(input_dim=vectorizer_fr.vocabulary_size(), output_dim=16)

encoder_inputs = keras.Input(shape=(None,))
enc_x = enc_emb(encoder_inputs)
_, state_h, state_c = layers.LSTM(32, return_state=True)(enc_x)

decoder_inputs = keras.Input(shape=(None,))
dec_x = dec_emb(decoder_inputs)
dec_lstm = layers.LSTM(32, return_sequences=True, return_state=True)
dec_outputs, _, _ = dec_lstm(dec_x, initial_state=[state_h, state_c])
decoder_dense = layers.Dense(vectorizer_fr.vocabulary_size(), activation='softmax')
decoder_outputs = decoder_dense(dec_outputs)

seq2seq = keras.Model([encoder_inputs, decoder_inputs], decoder_outputs)
seq2seq.compile(optimizer='adam', loss='sparse_categorical_crossentropy')
seq2seq.summary()
seq2seq.fit([X_en, X_fr[:,:-1]], X_fr[:,1:,None], epochs=1)

[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m3s[0m 3s/step - loss: 1.7939


<keras.src.callbacks.history.History at 0x7996ba34c110>

## 4. Attention & Transformer (Sekilas)
- Attention: bobot kontribusi setiap encoding langkah

- Transformer: layer multi-head attention + feed‑forward tanpa RNN

In [55]:
from tensorflow.keras.layers import MultiHeadAttention

### 4.1 Contoh single-head attention

In [56]:
# dummy data
query = tf.random.normal((1, 5, 16))
key   = tf.random.normal((1, 6, 16))
val   = tf.random.normal((1, 6, 16))

attn_layer = MultiHeadAttention(num_heads=2, key_dim=16)
output, weights = attn_layer(
    query=query,
    value=val,
    key=key,
    return_attention_scores=True
)

print("Attention output shape:", output.shape)
print("Attention scores shape:", weights.shape)


Attention output shape: (1, 5, 16)
Attention scores shape: (1, 2, 5, 6)


# Ringkasan Chapter 16
1. Char-RNN untuk generate teks character-level.

2. Embeddings+LSTM efektif untuk sentiment analysis.

3. Encoder–Decoder (seq2seq) dasar neural machine translation.

4. Attention meningkatkan konteks global; pondasi Transformer.