# Arsitektur Model

    +--------------------------------------------------------------------------------------------------+
    |                                            Transformer Lite Model                                |
    +--------------------------------------------------------------------------------------------------+
    |                                                                                                  |
    | +---------------------+   +---------------------+    +---------------------+    +----------------+|
    | |  Input Embedding    |   |  Positional Encoding |    |  Encoder Layer 1    |    |  Decoder Layer 1|
    | |                     |   |                     |    |                     |    |                ||
    | | +----------------+  |   | +-----------------+ |    | +----------------+  |    | +-------------+ ||
    | | | Token Embedding |  |   | | Positional     | |    | | Self-Attention |  |    | | Masked      | ||
    | | |                |  |   | | Encoding       | |    | |                |  |    | | Self-        | ||
    | | |                |  |   | |                | |    | |                |  |    | | Attention   | ||
    | | +----------------+  |   | +-----------------+ |    | +----------------+  |    | +-------------+ ||
    | +---------------------+   +---------------------+    +---------------------+    +----------------+|
    |                                                                                                  |
    | +---------------------+   +---------------------+    +---------------------+    +----------------+|
    | |   Encoder Layer 2   |   |  Decoder Layer 2    |    |  Encoder Layer N    |    |  Decoder Layer N|
    | |                     |   |                     |    |                     |    |                ||
    | | +----------------+  |   | +-----------------+ |    | +----------------+  |    | +-------------+ ||
    | | | Self-Attention |  |   | | Masked          | |    | | Self-Attention |  |    | | Masked      | ||
    | | |                |  |   | | Self-Attention  | |    | |                |  |    | | Self-        | ||
    | | |                |  |   | |                | |    | |                |  |    | | Attention   | ||
    | | +----------------+  |   | +-----------------+ |    | +----------------+  |    | +-------------+ ||
    | +---------------------+   +---------------------+    +---------------------+    +----------------+|
    |                                                                                                  |
    | +---------------------+   +---------------------+    +---------------------+                      |
    | | Final Linear Layer  |   |   Output           |    |   Prediction        |                      |
    | |                     |   |                    |    |                     |                      |
    | | +----------------+  |   | +-----------------+ |    | +----------------+  |                      |
    | | | Linear Layer   |  |   | | Softmax         | |    | | Final Output   |  |                      |
    | | |                |  |   | |                | |    | |                |  |                      |
    | | +----------------+  |   | +-----------------+ |    | +----------------+  |                      |
    | +---------------------+   +---------------------+    +---------------------+                      |
    +--------------------------------------------------------------------------------------------------+


## Penjelasan

### Input Embedding
Token Embedding: Mengubah token dari teks input menjadi vektor numerik yang dapat diproses oleh jaringan saraf.

Positional Encoding: Menambahkan informasi posisi ke setiap token embedding untuk menjaga urutan token dalam urutan input.

### Encoder

Self-Attention Mechanism: Mempelajari representasi token dengan mempertimbangkan semua token dalam urutan input untuk menangkap konteks global.

Feed-Forward Network: Lapisan feed-forward jaringan saraf untuk memproses representasi token lebih lanjut.

Layer Normalization dan Dropout: Membantu stabilisasi dan generalisasi model.

### Decoder

Masked Self-Attention Mechanism: Serupa dengan self-attention pada encoder, tetapi dengan masking untuk memastikan prediksi token hanya bergantung pada token sebelumnya dalam urutan.

Self-Attention Mechanism: Mengintegrasikan informasi dari urutan input dan urutan output yang dihasilkan sejauh ini.

Feed-Forward Network: Seperti pada encoder, digunakan untuk memproses representasi token lebih lanjut.

Layer Normalization dan Dropout: Sama seperti pada encoder, digunakan untuk stabilisasi dan generalisasi.

### Output

Final Linear Layer: Mengubah representasi token dari decoder menjadi distribusi probabilitas token output.

Softmax Layer: Menghasilkan prediksi probabilitas token output.

Prediction: Token dengan probabilitas tertinggi dipilih sebagai prediksi akhir.

## Kelebihan Transformer Lite

Efisiensi: Menggunakan arsitektur ringan yang lebih efisien dalam penggunaan memori dan komputasi, cocok untuk perangkat dengan sumber daya terbatas seperti GPU dengan memori terbatas.

Kemampuan Pemahaman Konteks: Menggunakan mekanisme perhatian yang kuat untuk menangkap hubungan konteks antar-token dalam urutan input.

Fleksibilitas: Dapat disesuaikan dengan berbagai tugas NLP seperti penerjemahan, pemrosesan teks, dan lain-lain.

## Import Libraries and Define Helper Functions

In [2]:
import os
import tensorflow as tf
import numpy as np
import sentencepiece as spm
import matplotlib.pyplot as plt

# Function to read and clean text files
def read_text_files(folder_path):
    texts = []
    for filename in os.listdir(folder_path):
        if filename.endswith(".txt"):
            with open(os.path.join(folder_path, filename), 'r', encoding='utf-8') as file:
                content = file.read().strip()
                if content:  # Check if the file is not empty
                    cleaned_content = clean_text(content)
                    texts.append(cleaned_content)
    return texts

def clean_text(text):
    # Remove unwanted characters
    unwanted_chars = ['*', '#', '_', ')', '(', '!', '?', '.', ',', '-']
    for char in unwanted_chars:
        text = text.replace(char, '')
    return text


## Read Dataset and Train SentencePiece Tokenizer

In [3]:
# Read and clean dataset
folder_path = './Dataset/nlp_dataset'
texts = read_text_files(folder_path)

# Save the cleaned texts to a temporary file for SentencePiece training
with open("cleaned_texts.txt", "w", encoding="utf-8") as f:
    for text in texts:
        f.write(f"{text}\n")

In [12]:
# Tentukan ukuran kosakata yang sesuai dengan jumlah token unik dalam data
vocab_size = 6205

# Train SentencePiece model
spm.SentencePieceTrainer.train(input='cleaned_texts.txt', model_prefix='m', vocab_size=vocab_size)

# Load the SentencePiece model
sp = spm.SentencePieceProcessor(model_file='m.model')

# Tokenize the dataset
tokenized_texts = [sp.encode(text, out_type=int) for text in texts]

# Prepare the data for TensorFlow
tokenized_texts = [np.array(text) for text in tokenized_texts]

# Menghitung panjang maksimum sequence dalam dataset
max_seq_len = max(len(seq) for seq in tokenized_texts)

# Padding semua sequence ke panjang maksimum
padded_texts = [np.pad(seq, (0, max_seq_len - len(seq)), 'constant') for seq in tokenized_texts]

# Konversi sequence ke tensor
vectorized_texts = tf.convert_to_tensor(padded_texts, dtype=tf.int64)

# Siapkan dataset untuk pelatihan
batch_size = 8  # Mengurangi ukuran batch lebih lanjut
dataset = tf.data.Dataset.from_tensor_slices((vectorized_texts, vectorized_texts))
dataset = dataset.batch(batch_size, drop_remainder=True)

## Define Positional Encoding and Transformer Lite Block

In [13]:
# Define Positional Encoding
class PositionalEncoding(tf.keras.layers.Layer):
    def __init__(self, position, d_model):
        super(PositionalEncoding, self).__init__()
        self.pos_encoding = self.positional_encoding(position, d_model)
    
    def get_angles(self, pos, i, d_model):
        angle_rates = 1 / np.power(10000, (2 * (i // 2)) / np.float32(d_model))
        return pos * angle_rates
    
    def positional_encoding(self, position, d_model):
        angle_rads = self.get_angles(np.arange(position)[:, np.newaxis],
                                     np.arange(d_model)[np.newaxis, :],
                                     d_model)
        
        angle_rads[:, 0::2] = np.sin(angle_rads[:, 0::2])
        angle_rads[:, 1::2] = np.cos(angle_rads[:, 1::2])
        
        pos_encoding = angle_rads[np.newaxis, ...]
        return tf.cast(pos_encoding, dtype=tf.float32)
    
    def call(self, x):
        return x + self.pos_encoding[:, :tf.shape(x)[1], :]

# Define Transformer Lite Block
class TransformerLiteBlock(tf.keras.layers.Layer):
    def __init__(self, d_model, num_heads, dff, rate=0.1):
        super(TransformerLiteBlock, self).__init__()

        self.mha = tf.keras.layers.MultiHeadAttention(num_heads=num_heads, key_dim=d_model)
        self.ffn = tf.keras.Sequential([
            tf.keras.layers.Dense(dff, activation='relu'),
            tf.keras.layers.Dense(d_model)
        ])

        self.layernorm1 = tf.keras.layers.LayerNormalization(epsilon=1e-6)
        self.layernorm2 = tf.keras.layers.LayerNormalization(epsilon=1e-6)

        self.dropout1 = tf.keras.layers.Dropout(rate)
        self.dropout2 = tf.keras.layers.Dropout(rate)
    
    def call(self, x, training, mask):
        attn_output = self.mha(x, x, x, attention_mask=mask)  # (batch_size, input_seq_len, d_model)
        attn_output = self.dropout1(attn_output, training=training)
        out1 = self.layernorm1(x + attn_output)  # (batch_size, input_seq_len, d_model)

        ffn_output = self.ffn(out1)  # (batch_size, input_seq_len, d_model)
        ffn_output = self.dropout2(ffn_output, training=training)
        out2 = self.layernorm2(out1 + ffn_output)  # (batch_size, input_seq_len, d_model)
        
        return out2


## Define Transformer Lite Model

In [17]:
class TransformerLiteModel(tf.keras.Model):
    def __init__(self, num_layers, d_model, num_heads, dff, input_vocab_size, target_vocab_size, pe_input, pe_target, rate=0.1):
        super(TransformerLiteModel, self).__init__()

        self.encoder_embedding = tf.keras.layers.Embedding(input_vocab_size, d_model)
        self.decoder_embedding = tf.keras.layers.Embedding(target_vocab_size, d_model)
        
        self.pos_encoding = PositionalEncoding(pe_input, d_model)
        
        self.enc_layers = [TransformerLiteBlock(d_model, num_heads, dff, rate) for _ in range(num_layers)]
        self.dec_layers = [TransformerLiteBlock(d_model, num_heads, dff, rate) for _ in range(num_layers)]
        
        self.dropout = tf.keras.layers.Dropout(rate)
        
        self.final_layer = tf.keras.layers.Dense(target_vocab_size)

    def create_padding_mask(self, seq):
        # Pastikan seq memiliki dimensi yang benar
        if len(seq.shape) == 1:
            seq = tf.expand_dims(seq, axis=0)
        seq = tf.cast(tf.math.equal(seq, 0), tf.float32)
        return seq[:, tf.newaxis, tf.newaxis, :]  # (batch_size, 1, 1, seq_len)

    def create_look_ahead_mask(self, size):
        mask = 1 - tf.linalg.band_part(tf.ones((size, size)), -1, 0)
        return mask  # (seq_len, seq_len)

    def call(self, inputs, training):
        inp = inputs[0]
        tar = inputs[1]
        
        # Pastikan inp dan tar memiliki setidaknya 2 dimensi
        if len(inp.shape) == 1:
            inp = tf.expand_dims(inp, axis=0)
        if len(tar.shape) == 1:
            tar = tf.expand_dims(tar, axis=0)

        enc_padding_mask = self.create_padding_mask(inp)
        look_ahead_mask = self.create_look_ahead_mask(tf.shape(tar)[1])
        dec_padding_mask = self.create_padding_mask(tar)

        inp = self.encoder_embedding(inp)  # (batch_size, input_seq_len, d_model)
        tar = self.decoder_embedding(tar)  # (batch_size, target_seq_len, d_model)

        inp = self.pos_encoding(inp)
        tar = self.pos_encoding(tar)

        for enc_layer in self.enc_layers:
            inp = enc_layer(inp, training, enc_padding_mask)

        for dec_layer in self.dec_layers:
            tar = dec_layer(tar, training, look_ahead_mask)

        final_output = self.final_layer(tar)  # (batch_size, target_seq_len, target_vocab_size)
        
        return final_output

## Compile and Train the Model

In [18]:
# Hyperparameters
num_layers = 4  # Asli: 12, disesuaikan menjadi 4
d_model = 128   # Asli: 768, disesuaikan menjadi 128
num_heads = 4   # Asli: 12, disesuaikan menjadi 4
dff = 512       # Asli: 3072, disesuaikan menjadi 512
input_vocab_size = 6000  # Tetap
target_vocab_size = 6000  # Tetap
pe_input = 5000  # Kurangi nilai pe_input
pe_target = 5000  # Kurangi nilai pe_target

# Instantiate and compile the model
transformer_lite = TransformerLiteModel(num_layers, d_model, num_heads, dff, input_vocab_size, target_vocab_size, pe_input, pe_target)

# Define the learning rate schedule
learning_rate = tf.keras.optimizers.schedules.ExponentialDecay(
    initial_learning_rate=0.001,
    decay_steps=10000,
    decay_rate=0.96,
    staircase=True
)

optimizer = tf.keras.optimizers.Adam(learning_rate)

transformer_lite.compile(optimizer=optimizer, loss='sparse_categorical_crossentropy', metrics=['accuracy'])

# Mengurangi ukuran batch menjadi 8
batch_size = 8

# Build the model by calling it on a batch of data
input_batch = tf.random.uniform((batch_size, 256), dtype=tf.int64, minval=0, maxval=input_vocab_size)
target_batch = tf.random.uniform((batch_size, 256), dtype=tf.int64, minval=0, maxval=target_vocab_size)

_ = transformer_lite((input_batch, target_batch), training=False)

# Display model summary
transformer_lite.summary()

Model: "transformer_lite_model_3"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
 embedding_6 (Embedding)     multiple                  768000    
                                                                 
 embedding_7 (Embedding)     multiple                  768000    
                                                                 
 positional_encoding_3 (Posi  multiple                 0         
 tionalEncoding)                                                 
                                                                 
 transformer_lite_block_24 (  multiple                 396032    
 TransformerLiteBlock)                                           
                                                                 
 transformer_lite_block_25 (  multiple                 396032    
 TransformerLiteBlock)                                           
                                          

In [19]:
# Train the model
epochs = 10
history = transformer_lite.fit(dataset, epochs=epochs)


# Save the model
model_save_path = './models/transformer_lite_nlp_model.h5'
transformer_lite.save(model_save_path)

Epoch 1/10


ValueError: in user code:

    File "C:\Users\gabri\anaconda3\envs\myenv\lib\site-packages\keras\engine\training.py", line 1021, in train_function  *
        return step_function(self, iterator)
    File "C:\Users\gabri\anaconda3\envs\myenv\lib\site-packages\keras\engine\training.py", line 1010, in step_function  **
        outputs = model.distribute_strategy.run(run_step, args=(data,))
    File "C:\Users\gabri\anaconda3\envs\myenv\lib\site-packages\keras\engine\training.py", line 1000, in run_step  **
        outputs = model.train_step(data)
    File "C:\Users\gabri\anaconda3\envs\myenv\lib\site-packages\keras\engine\training.py", line 860, in train_step
        loss = self.compute_loss(x, y, y_pred, sample_weight)
    File "C:\Users\gabri\anaconda3\envs\myenv\lib\site-packages\keras\engine\training.py", line 918, in compute_loss
        return self.compiled_loss(
    File "C:\Users\gabri\anaconda3\envs\myenv\lib\site-packages\keras\engine\compile_utils.py", line 201, in __call__
        loss_value = loss_obj(y_t, y_p, sample_weight=sw)
    File "C:\Users\gabri\anaconda3\envs\myenv\lib\site-packages\keras\losses.py", line 141, in __call__
        losses = call_fn(y_true, y_pred)
    File "C:\Users\gabri\anaconda3\envs\myenv\lib\site-packages\keras\losses.py", line 245, in call  **
        return ag_fn(y_true, y_pred, **self._fn_kwargs)
    File "C:\Users\gabri\anaconda3\envs\myenv\lib\site-packages\keras\losses.py", line 1862, in sparse_categorical_crossentropy
        return backend.sparse_categorical_crossentropy(
    File "C:\Users\gabri\anaconda3\envs\myenv\lib\site-packages\keras\backend.py", line 5202, in sparse_categorical_crossentropy
        res = tf.nn.sparse_softmax_cross_entropy_with_logits(

    ValueError: `labels.shape` must equal `logits.shape` except for the last dimension. Received: labels.shape=(8, 4653) and logits.shape=(1, 4653, 6000)
