# **IMPLEMENTASI DAN EKSPERIMEN MODEL TRANSFORMER PADA TIGA DATASET UNTUK MENENTUKAN KINERJA TERBAIK**

Nama  : Maulana Seno Aji Yudhantara  
NRP   : 152022065  
Kelas : IFB-454 DEEP LEARNING  

## Konteks Proyek Kelompok:
Bagian dari tugas kelompok yang menguji tiga arsitektur deep learning berbeda pada tiga dataset yang sama. Notebook ini fokus pada eksperimen dan optimasi model Transformer.

## Tujuan Eksperimen:
Notebook ini bertujuan mengimplementasikan arsitektur Transformer dan melakukan eksperimen untuk mencari konfigurasi parameter terbaik pada tiga dataset yang telah ditentukan kelompok: BBC News, Sentiment140, dan FordA (UCR Time Series).  
Setiap dataset memiliki karakteristik berbeda (teks dan time series) sehingga diperlukan preprocessing dan penyesuaian model yang sesuai.

## Dataset yang Digunakan:
1. **BBC News**  
   - Jenis: Teks (klasifikasi topik berita)  
   - Deskripsi: Dataset berisi artikel berita BBC yang dikelompokkan ke dalam beberapa kategori topik.  

2. **Sentiment140**  
   - Jenis: Teks (klasifikasi sentimen tweet)  
   - Deskripsi: Dataset tweet Twitter yang dilabeli sentimen positif, negatif, dan netral.  

3. **FordA (UCR Time Series Classification)**  
   - Jenis: Time Series (klasifikasi deret waktu)  
   - Deskripsi: Dataset data sensor mobil Ford yang bertujuan klasifikasi kondisi berdasarkan sinyal time series.  

## Rencana Implementasi:
- Import library yang diperlukan  
- Memuat dan eksplorasi dataset  
- Preprocessing data sesuai kebutuhan Transformer  
- Definisi dan pelatihan model Transformer dengan variasi parameter  
- Evaluasi dan visualisasi hasil performa model  

## Output yang Dihasilkan:
- Source code lengkap model Transformer  
- Hasil evaluasi performa dan parameter terbaik  
- Penjelasan teknis dan hasil eksperimen sebagai bahan laporan kelompok  


# **Tahap 1 – Import Semua Library** 
Melakukan import seluruh library yang dibutuhkan mulai dari data handling, preprocessing, pembuatan model, hingga evaluasi dan visualisasi. Semua import dilakukan di satu cell agar terorganisasi rapi.

In [1]:
import torch

print("Number of GPU: ", torch.cuda.device_count())
print("GPU Name: ", torch.cuda.get_device_name())


device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
print('Using device:', device)

Number of GPU:  1
GPU Name:  NVIDIA GeForce GTX 1650
Using device: cuda


In [34]:
# Cell Code: Import Library
import os
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

# Untuk pemrosesan teks
import re
import string
from sklearn.preprocessing import MinMaxScaler

# Library untuk machine learning dan deep learning
import torch
import torch.nn as nn
import torch.optim as optim
from torch.nn.utils.rnn import pad_sequence
import torch.nn.functional as F
from transformers import AutoTokenizer
from torch.utils.data import Dataset, DataLoader
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import LabelEncoder
from sklearn.metrics import accuracy_score, classification_report, confusion_matrix

# Untuk time series (jika perlu preprocessing khusus)
from scipy import stats

# Untuk progress bar
from tqdm.notebook import tqdm

# Set device
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
print(f'Using device: {device}')


Using device: cuda


# **Tahap 2 – Load Dataset**
Memuat ketiga dataset ke memori notebook. Setiap dataset akan dibaca sesuai format dan disiapkan untuk tahap preprocessing.

- **Tahap 2.1:** Load dataset BBC News  
- **Tahap 2.2:** Load dataset Sentiment140  
- **Tahap 2.3:** Load dataset FordA    

## Tahap 2.1 – Load Dataset BBC News

In [5]:
def load_bbcnews_dataset(data_dir):
    texts = []
    labels = []
    label_names = []

    for label_name in os.listdir(data_dir):
        label_path = os.path.join(data_dir, label_name)
        if os.path.isdir(label_path):
            label_names.append(label_name)
            for filename in os.listdir(label_path):
                file_path = os.path.join(label_path, filename)
                with open(file_path, 'r', encoding='utf-8', errors='ignore') as f:
                    text = f.read().strip()
                    texts.append(text)
                    labels.append(label_name)

    df = pd.DataFrame({'text': texts, 'label': labels})
    return df

# Panggil fungsi
bbcnews_dir = './Dataset/BBCNews'  
df_bbc = load_bbcnews_dataset(bbcnews_dir)

# Lihat ringkasan
print(df_bbc.head())
print(df_bbc['label'].value_counts())


                                                text     label
0  Ad sales boost Time Warner profit\n\nQuarterly...  business
1  Dollar gains on Greenspan speech\n\nThe dollar...  business
2  Yukos unit buyer faces loan claim\n\nThe owner...  business
3  High fuel prices hit BA's profits\n\nBritish A...  business
4  Pernod takeover talk lifts Domecq\n\nShares in...  business
label
sport            512
business         511
politics         418
tech             402
entertainment    387
Name: count, dtype: int64


## Tahap 2.2 – Load Dataset Sentiment140

In [6]:
def load_sentiment140_dataset(filepath, sample_size=100000):
    # Nama kolom sesuai dokumentasi Sentiment140
    cols = ['target', 'ids', 'date', 'flag', 'user', 'text']
    
    # Load dataset
    df = pd.read_csv(filepath, encoding='latin-1', names=cols, usecols=['target', 'text'])
    
    # Konversi label
    df['target'] = df['target'].replace({0: 'negative', 2: 'neutral', 4: 'positive'})

    # Sampling data agar tidak terlalu besar saat eksperimen
    df_sample = df.sample(n=sample_size, random_state=42).reset_index(drop=True)

    return df_sample

# Panggil fungsi
sentiment140_path = './Dataset/Sentimen140/training.1600000.processed.noemoticon.csv'
df_sentiment = load_sentiment140_dataset(sentiment140_path, sample_size=100000)

# Lihat ringkasan
print(df_sentiment.head())
print(df_sentiment['target'].value_counts())

     target                                               text
0  negative             @chrishasboobs AHHH I HOPE YOUR OK!!! 
1  negative  @misstoriblack cool , i have no tweet apps  fo...
2  negative  @TiannaChaos i know  just family drama. its la...
3  negative  School email won't open  and I have geography ...
4  negative                             upper airways problem 
target
positive    50057
negative    49943
Name: count, dtype: int64


## Tahap 2.3 – Load Dataset FordA Time Series

In [7]:
def load_forda_dataset(train_path, test_path):
    def read_txt(path):
        data = np.loadtxt(path)
        y = data[:, 0]
        X = data[:, 1:]
        return X, y

    # Load train & test
    X_train, y_train = read_txt(train_path)
    X_test, y_test = read_txt(test_path)

    # Gabungkan data
    X = np.concatenate([X_train, X_test], axis=0)
    y = np.concatenate([y_train, y_test], axis=0)

    # Ubah label -1 → 0
    y = np.where(y == -1, 0, 1)

    return X, y

# Path
forda_train_path = './Dataset/FordA/FordA_TRAIN.txt'
forda_test_path = './Dataset/FordA/FordA_TEST.txt'

# Load dataset
X_forda, y_forda = load_forda_dataset(forda_train_path, forda_test_path)

# Cek shape dan distribusi label
print("X shape:", X_forda.shape)
print("y shape:", y_forda.shape)
print("Distribusi label:", np.bincount(y_forda.astype(int)))


X shape: (4921, 500)
y shape: (4921,)
Distribusi label: [2527 2394]


# **Tahap 3 - Preprocessing Dataset**
Melakukan persiapan data sesuai kebutuhan tipe dataset.
- **Tahap 3.1:** Preprocessing BBC News — pembersihan teks, tokenisasi, encoding label, dan pembagian data train-test.  
- **Tahap 3.2:** Preprocessing Sentiment140 — pembersihan tweet (hapus URL, tanda baca, stopwords), tokenisasi, encoding label, pembagian data train-test.  
- **Tahap 3.3:** Preprocessing FordA — normalisasi data time series, pembagian data train-test, dan reshaping data agar sesuai input Transformer.


## Tahap 3.1 – Preprocessing BBC News

In [9]:
# Ambil teks dan label dari dataframe
texts_bbc = df_bbc['text'].tolist()
labels_bbc = df_bbc['label'].tolist()

# Fungsi bersihkan teks
def clean_text(text):
    text = text.lower()
    text = re.sub(r'http\S+|www.\S+', '', text)               # hapus URL
    text = re.sub(r'<.*?>', '', text)                         # hapus HTML tag
    text = re.sub(r'[^a-z\s]', '', text)                      # hapus angka dan tanda baca
    text = re.sub(r'\s+', ' ', text).strip()                  # hapus spasi berlebih
    return text

# Bersihkan semua teks
texts_bbc_clean = [clean_text(text) for text in texts_bbc]

# Encode label string → angka
label_encoder_bbc = LabelEncoder()
labels_bbc_encoded = label_encoder_bbc.fit_transform(labels_bbc)

# Split data
X_bbc_train, X_bbc_test, y_bbc_train, y_bbc_test = train_test_split(
    texts_bbc_clean, labels_bbc_encoded, test_size=0.2, random_state=42, stratify=labels_bbc_encoded
)

# Cek hasil
print("Contoh teks:", X_bbc_train[0][:300])
print("Label numerik:", y_bbc_train[0])
print("Label aslinya:", label_encoder_bbc.inverse_transform([y_bbc_train[0]])[0])


Contoh teks: euniversity disgraceful waste a failed government scheme to offer uk university courses online has been branded a disgraceful waste by mps the euniversity was scrapped last year having attracted only students at a cost of m chief executive john beaumont was paid a bonus of despite a failure to bring
Label numerik: 2
Label aslinya: politics


## Tahap 3.2 – Preprocessing Sentiment140

In [10]:
# Ambil teks dan label
texts_sent = df_sentiment['text'].tolist()
labels_sent = df_sentiment['target'].tolist()

# Fungsi bersihkan tweet
def clean_tweet(text):
    text = text.lower()
    text = re.sub(r'http\S+|www.\S+', '', text)         # hapus URL
    text = re.sub(r'@\w+', '', text)                    # hapus mention
    text = re.sub(r'#\w+', '', text)                    # hapus hashtag
    text = re.sub(r'[^\w\s]', '', text)                 # hapus tanda baca
    text = re.sub(r'\d+', '', text)                     # hapus angka
    text = re.sub(r'\s+', ' ', text).strip()            # hapus spasi berlebih
    return text

# Bersihkan semua teks tweet
texts_sent_clean = [clean_tweet(text) for text in texts_sent]

# Encode label string ke angka
label_encoder_sent = LabelEncoder()
labels_sent_encoded = label_encoder_sent.fit_transform(labels_sent)

# Split data
X_sent_train, X_sent_test, y_sent_train, y_sent_test = train_test_split(
    texts_sent_clean, labels_sent_encoded, test_size=0.2, random_state=42, stratify=labels_sent_encoded
)

# Cek hasil
print("Contoh tweet:", X_sent_train[0])
print("Label numerik:", y_sent_train[0])
print("Label aslinya:", label_encoder_sent.inverse_transform([y_sent_train[0]])[0])


Contoh tweet: mommy can you bring me home a pastrami sandwich im hungry
Label numerik: 0
Label aslinya: negative


## Tahap 3.3 – Preprocessing FordA Time Series

In [13]:
# Normalisasi fitur ke rentang [0, 1]
scaler_forda = MinMaxScaler()
X_forda_scaled = scaler_forda.fit_transform(X_forda)

# Bentuk input untuk Transformer → [samples, sequence_length, 1]
X_forda_reshaped = X_forda_scaled[..., np.newaxis]  # tambahkan dimensi ke-3

# Split data
X_forda_train, X_forda_test, y_forda_train, y_forda_test = train_test_split(
    X_forda_reshaped, y_forda, test_size=0.2, random_state=42, stratify=y_forda
)

# Cek hasil
print("Shape train:", X_forda_train.shape)
print("Shape test:", X_forda_test.shape)
print("Contoh label:", y_forda_train[:5])


Shape train: (3936, 500, 1)
Shape test: (985, 500, 1)
Contoh label: [0 1 0 0 0]


# **Tahap 4 - Pembuatan Dataset dan DataLoader PyTorch**
- Membuat class ```Dataset``` untuk masing-masing dataset
- Membuat DataLoader untuk training dan testing

## Tahap 4.1 – Dataset & DataLoader untuk BBCNews
Dataset BBCNews sudah dibersihkan pada tahap sebelumnya. 
Pada tahap ini dilakukan tokenisasi menggunakan tokenizer BERT dari Huggingface (`bert-base-uncased`), 
kemudian teks dikonversi menjadi ID dan attention mask. 

Data dibungkus dalam class `BBCNewsDataset` dan digunakan sebagai input ke `DataLoader` PyTorch.

In [15]:
# Inisialisasi tokenizer (bisa pakai tokenizer umum seperti BERT tokenizer dari Huggingface)
tokenizer_bbc = AutoTokenizer.from_pretrained("bert-base-uncased")

# Maksimal panjang token (bisa disesuaikan)
MAX_LEN_BBC = 128

# Dataset PyTorch
class BBCNewsDataset(Dataset):
    def __init__(self, texts, labels, tokenizer, max_len):
        self.texts = texts
        self.labels = labels
        self.tokenizer = tokenizer
        self.max_len = max_len

    def __len__(self):
        return len(self.texts)

    def __getitem__(self, idx):
        text = self.texts[idx]
        label = self.labels[idx]

        # Tokenisasi
        encoding = self.tokenizer(
            text,
            padding='max_length',
            truncation=True,
            max_length=self.max_len,
            return_tensors='pt'
        )

        return {
            'input_ids': encoding['input_ids'].squeeze(0),     # [seq_len]
            'attention_mask': encoding['attention_mask'].squeeze(0),
            'label': torch.tensor(label, dtype=torch.long)
        }

# Split data BBC
X_train_bbc, X_test_bbc, y_train_bbc, y_test_bbc = train_test_split(
    texts_bbc_clean, labels_bbc_encoded,
    test_size=0.2, random_state=42, stratify=labels_bbc_encoded
)

# Buat Dataset dan DataLoader
train_dataset_bbc = BBCNewsDataset(X_train_bbc, y_train_bbc, tokenizer_bbc, MAX_LEN_BBC)
test_dataset_bbc = BBCNewsDataset(X_test_bbc, y_test_bbc, tokenizer_bbc, MAX_LEN_BBC)

train_loader_bbc = DataLoader(train_dataset_bbc, batch_size=16, shuffle=True)
test_loader_bbc = DataLoader(test_dataset_bbc, batch_size=16, shuffle=False)


## Tahap 4.2 – Dataset & DataLoader Sentiment140 (dengan BERT)

Pada tahap ini, teks dari dataset Sentiment140 dikonversi menjadi input ID dan attention mask menggunakan tokenizer BERT.  
Label sentimen dikonversi menjadi angka dan data dibagi menjadi data training dan testing.  
Data dibungkus ke dalam `Sentiment140Dataset` dan dimasukkan ke dalam `DataLoader`.


### Tokenizer BERT

In [16]:
tokenizer_sentiment = AutoTokenizer.from_pretrained("bert-base-uncased")
MAX_LEN_SENTIMENT = 128

### Encode label dan split data

In [17]:
# Encode label ke angka
label_encoder_sentiment = LabelEncoder()
labels_sentiment_encoded = label_encoder_sentiment.fit_transform(df_sentiment['target'])
texts_sentiment = df_sentiment['text'].tolist()

# Split data
X_train_sent, X_test_sent, y_train_sent, y_test_sent = train_test_split(
    texts_sentiment, labels_sentiment_encoded,
    test_size=0.2, random_state=42, stratify=labels_sentiment_encoded
)


### Class Dataset PyTorch

In [18]:
class Sentiment140Dataset(Dataset):
    def __init__(self, texts, labels, tokenizer, max_len):
        self.texts = texts
        self.labels = labels
        self.tokenizer = tokenizer
        self.max_len = max_len

    def __len__(self):
        return len(self.texts)

    def __getitem__(self, idx):
        text = self.texts[idx]
        label = self.labels[idx]

        encoding = self.tokenizer(
            text,
            padding='max_length',
            truncation=True,
            max_length=self.max_len,
            return_tensors='pt'
        )

        return {
            'input_ids': encoding['input_ids'].squeeze(0),
            'attention_mask': encoding['attention_mask'].squeeze(0),
            'label': torch.tensor(label, dtype=torch.long)
        }


### DataLoader

In [19]:
# Buat Dataset dan DataLoader
train_dataset_sent = Sentiment140Dataset(X_train_sent, y_train_sent, tokenizer_sentiment, MAX_LEN_SENTIMENT)
test_dataset_sent = Sentiment140Dataset(X_test_sent, y_test_sent, tokenizer_sentiment, MAX_LEN_SENTIMENT)

train_loader_sent = DataLoader(train_dataset_sent, batch_size=32, shuffle=True)
test_loader_sent = DataLoader(test_dataset_sent, batch_size=32, shuffle=False)


## Tahap 4.3 – Dataset & DataLoader FordA (Time Series)

Pada tahap ini, data time series dari dataset FordA diubah menjadi tensor PyTorch.  
Setiap sampel direpresentasikan dalam format `(timesteps, 1)` agar sesuai dengan input Transformer.  
Dataset dibagi menjadi train dan test, lalu dibungkus dalam `FordADataset` dan digunakan dalam `DataLoader`.


### Class Dataset PyTorch

In [20]:
class FordADataset(Dataset):
    def __init__(self, X, y):
        self.X = torch.tensor(X, dtype=torch.float32)  # (samples, timesteps)
        self.y = torch.tensor(y, dtype=torch.long)

    def __len__(self):
        return len(self.X)

    def __getitem__(self, idx):
        x = self.X[idx].unsqueeze(-1)  # (timesteps,) → (timesteps, 1)
        y = self.y[idx]
        return {'input': x, 'label': y}


### Split Train-Test & Buat DataLoader

In [21]:
# Split train-test
X_train_ford, X_test_ford, y_train_ford, y_test_ford = train_test_split(
    X_forda, y_forda, test_size=0.2, random_state=42, stratify=y_forda
)

# Dataset
train_dataset_ford = FordADataset(X_train_ford, y_train_ford)
test_dataset_ford = FordADataset(X_test_ford, y_test_ford)

# DataLoader
train_loader_ford = DataLoader(train_dataset_ford, batch_size=32, shuffle=True)
test_loader_ford = DataLoader(test_dataset_ford, batch_size=32, shuffle=False)


# **Tahap 5 – Definisi dan Inisialisasi Model Transformer**

Pada tahap ini, arsitektur Transformer didefinisikan menggunakan PyTorch.
Model menggunakan embedding layer, positional encoding, Transformer Encoder, dan pooling.
Model ini digunakan untuk klasifikasi teks BBCNews dan Sentiment140.


## Tahap 5.1 Model Transformer untuk Teks (BBCNews & Sentiment140)

In [29]:
class TransformerClassifier(nn.Module):
    def __init__(self, input_dim, embed_dim, num_heads, hidden_dim, num_layers, num_classes, dropout=0.1, max_len=512):
        super(TransformerClassifier, self).__init__()
        self.embedding = nn.Embedding(input_dim, embed_dim, padding_idx=0)
        self.positional_encoding = nn.Parameter(torch.zeros(1, max_len, embed_dim))

        encoder_layer = nn.TransformerEncoderLayer(
            d_model=embed_dim,
            nhead=num_heads,
            dim_feedforward=hidden_dim,
            dropout=dropout,
            activation='relu',
            batch_first=True  # Agar input shape tetap (batch, seq, dim)
        )
        self.transformer_encoder = nn.TransformerEncoder(encoder_layer, num_layers=num_layers)

        self.dropout = nn.Dropout(dropout)
        self.fc = nn.Linear(embed_dim, num_classes)

    def forward(self, input_ids, attention_mask):
        # input_ids: (batch_size, seq_len)
        x = self.embedding(input_ids)  # (batch_size, seq_len, embed_dim)
        x = x + self.positional_encoding[:, :x.size(1), :]  # Positional encoding

        # Gunakan attention mask untuk padding token (0 jadi True → skip)
        # Transformer mask: True = masked, False = keep
        src_key_padding_mask = ~attention_mask.bool()  # (batch_size, seq_len)

        # Forward ke Transformer encoder
        x = self.transformer_encoder(x, src_key_padding_mask=src_key_padding_mask)

        # Mean pooling (mask padding token)
        mask = attention_mask.unsqueeze(-1).type(torch.float)  # (batch, seq, 1)
        x = (x * mask).sum(1) / mask.sum(1)  # (batch, embed_dim)

        x = self.dropout(x)
        out = self.fc(x)
        return out


## Tahap 5.2 – Model Transformer untuk Time Series (FordA)

In [24]:
class TimeSeriesTransformer(nn.Module):
    def __init__(self, input_dim, embed_dim, num_heads, hidden_dim, num_layers, num_classes, dropout=0.1, max_len=500):
        super(TimeSeriesTransformer, self).__init__()

        # Linear projection dari input ke embedding dimension
        self.linear_proj = nn.Linear(input_dim, embed_dim)

        # Positional encoding sebagai parameter learnable
        self.positional_encoding = nn.Parameter(torch.zeros(1, max_len, embed_dim))

        # Encoder Transformer
        encoder_layer = nn.TransformerEncoderLayer(
            d_model=embed_dim,
            nhead=num_heads,
            dim_feedforward=hidden_dim,
            dropout=dropout,
            batch_first=True
        )
        self.transformer_encoder = nn.TransformerEncoder(encoder_layer, num_layers=num_layers)

        # Dropout & Output layer
        self.dropout = nn.Dropout(dropout)
        self.fc = nn.Linear(embed_dim, num_classes)

    def forward(self, x):
        # x: (batch_size, seq_len, input_dim)
        x = self.linear_proj(x)  # (batch_size, seq_len, embed_dim)
        x = x + self.positional_encoding[:, :x.size(1), :]  # Add positional encoding
        x = self.transformer_encoder(x)  # (batch_size, seq_len, embed_dim)
        x = x.mean(dim=1)  # Pooling (mean over time steps)
        x = self.dropout(x)
        return self.fc(x)


# **Tahap 6 – Inisialisasi Model dan Hyperparameter Eksperimen**
Di tahap ini, kita akan:

- Menentukan hyperparameter eksperimen untuk masing-masing dataset.
- Menginisialisasi model Transformer untuk BBCNews, Sentiment140, dan FordA.

## Tahap 6.1 – Konfigurasi Hyperparameter

In [36]:
hyperparams = {
    'bbcnews': {
        'embed_dim': 128,
        'num_heads': 4,
        'hidden_dim': 256,
        'num_layers': 2,
        'dropout': 0.1,
        'lr': 1e-4,
        'num_classes': len(label_encoder_bbc.classes_)
    },
    'sentiment140': {
        'embed_dim': 128,
        'num_heads': 4,
        'hidden_dim': 256,
        'num_layers': 2,
        'dropout': 0.1,
        'lr': 1e-4,
        'num_classes': len(label_encoder_sentiment.classes_)
    },
    'forda': {
        'input_dim': 1,
        'embed_dim': 64,
        'num_heads': 2,
        'hidden_dim': 128,
        'num_layers': 2,
        'dropout': 0.1,
        'num_classes': 2,
        'seq_len': 500,
        'lr': 1e-3
    }
}


## Tahap 6.2 – Inisialisasi Model

In [31]:
# BBC News
model_bbc = TransformerClassifier(
    input_dim=tokenizer_bbc.vocab_size,
    embed_dim=hyperparams['bbcnews']['embed_dim'],
    num_heads=hyperparams['bbcnews']['num_heads'],
    hidden_dim=hyperparams['bbcnews']['hidden_dim'],
    num_layers=hyperparams['bbcnews']['num_layers'],
    num_classes=hyperparams['bbcnews']['num_classes'],
    dropout=hyperparams['bbcnews']['dropout']
).to(device)

# Sentiment140
model_sentiment = TransformerClassifier(
    input_dim=tokenizer_sentiment.vocab_size,
    embed_dim=hyperparams['sentiment140']['embed_dim'],
    num_heads=hyperparams['sentiment140']['num_heads'],
    hidden_dim=hyperparams['sentiment140']['hidden_dim'],
    num_layers=hyperparams['sentiment140']['num_layers'],
    num_classes=hyperparams['sentiment140']['num_classes'],
    dropout=hyperparams['sentiment140']['dropout']
).to(device)

# FordA Time Series
model_forda = TimeSeriesTransformer(
    input_dim=hyperparams['forda']['input_dim'],
    embed_dim=hyperparams['forda']['embed_dim'],
    num_heads=hyperparams['forda']['num_heads'],
    hidden_dim=hyperparams['forda']['hidden_dim'],
    num_layers=hyperparams['forda']['num_layers'],
    num_classes=hyperparams['forda']['num_classes'],
    dropout=hyperparams['forda']['dropout'],
    max_len=hyperparams['forda']['seq_len']
).to(device)

### Persiapan optimizer dan loss

In [37]:
# Loss function (CrossEntropy untuk klasifikasi)
criterion = nn.CrossEntropyLoss()

# Optimizer (misal Adam)
optimizer_bbc = optim.Adam(model_bbc.parameters(), lr=hyperparams['bbcnews']['lr'])
optimizer_sentiment = optim.Adam(model_sentiment.parameters(), lr=hyperparams['sentiment140']['lr'])
optimizer_forda = optim.Adam(model_forda.parameters(), lr=hyperparams['forda']['lr'])

### Fungsi train & evaluate

In [38]:
def train(model, dataloader, optimizer, criterion, device):
    model.train()
    total_loss = 0
    correct = 0
    total = 0

    for batch in dataloader:
        optimizer.zero_grad()
        inputs = batch['input_ids'].to(device) if 'input_ids' in batch else batch['X'].to(device)
        labels = batch['label'].to(device)

        outputs = model(inputs)
        loss = criterion(outputs, labels)
        loss.backward()
        optimizer.step()

        total_loss += loss.item()
        _, predicted = outputs.max(1)
        correct += predicted.eq(labels).sum().item()
        total += labels.size(0)

    avg_loss = total_loss / len(dataloader)
    accuracy = correct / total
    return avg_loss, accuracy


def evaluate(model, dataloader, criterion, device):
    model.eval()
    total_loss = 0
    correct = 0
    total = 0

    with torch.no_grad():
        for batch in dataloader:
            inputs = batch['input_ids'].to(device) if 'input_ids' in batch else batch['X'].to(device)
            labels = batch['label'].to(device)

            outputs = model(inputs)
            loss = criterion(outputs, labels)
            total_loss += loss.item()
            _, predicted = outputs.max(1)
            correct += predicted.eq(labels).sum().item()
            total += labels.size(0)

    avg_loss = total_loss / len(dataloader)
    accuracy = correct / total
    return avg_loss, accuracy


# **Tahap 7 – Training dan Evaluasi Model**

## Fungsi `train_one_epoch` — Melatih model selama 1 epoch

In [42]:
def train_one_epoch(model, dataloader, criterion, optimizer, device, use_attention_mask=False):
    model.train()
    running_loss = 0.0
    correct = 0
    total = 0

    for batch in dataloader:
        optimizer.zero_grad()

        # Ambil input dan label
        input_ids = batch['input_ids'].to(device) if 'input_ids' in batch else batch['inputs'].to(device)
        labels = batch['label'].to(device) if 'label' in batch else batch['labels'].to(device)

        # Optional: attention_mask jika ada dan diperlukan
        attention_mask = None
        if use_attention_mask and 'attention_mask' in batch:
            attention_mask = batch['attention_mask'].to(device)

        # Forward pass, sesuaikan dengan ada/tidak attention_mask
        if attention_mask is not None:
            outputs = model(input_ids, attention_mask)
        else:
            outputs = model(input_ids)

        loss = criterion(outputs, labels)
        loss.backward()
        optimizer.step()

        running_loss += loss.item() * labels.size(0)
        _, predicted = torch.max(outputs.data, 1)
        correct += (predicted == labels).sum().item()
        total += labels.size(0)

    epoch_loss = running_loss / total
    epoch_acc = correct / total
    return epoch_loss, epoch_acc


## Fungsi `evaluate` — Mengevaluasi model (val/test) per epoch

In [43]:
def evaluate(model, dataloader, criterion, device, use_attention_mask=False):
    model.eval()
    running_loss = 0.0
    correct = 0
    total = 0

    with torch.no_grad():
        for batch in dataloader:
            input_ids = batch['input_ids'].to(device) if 'input_ids' in batch else batch['inputs'].to(device)
            labels = batch['label'].to(device) if 'label' in batch else batch['labels'].to(device)

            attention_mask = None
            if use_attention_mask and 'attention_mask' in batch:
                attention_mask = batch['attention_mask'].to(device)

            if attention_mask is not None:
                outputs = model(input_ids, attention_mask)
            else:
                outputs = model(input_ids)

            loss = criterion(outputs, labels)

            running_loss += loss.item() * labels.size(0)
            _, predicted = torch.max(outputs.data, 1)
            correct += (predicted == labels).sum().item()
            total += labels.size(0)

    epoch_loss = running_loss / total
    epoch_acc = correct / total
    return epoch_loss, epoch_acc


## Loop Training dan Evaluasi untuk ketiga dataset

In [44]:
num_epochs = 10

for epoch in range(num_epochs):
    # BBCNews (misal pakai attention_mask)
    train_loss_bbc, train_acc_bbc = train_one_epoch(
        model_bbc, train_loader_bbc, criterion, optimizer_bbc, device, use_attention_mask=True
    )
    val_loss_bbc, val_acc_bbc = evaluate(
        model_bbc, test_loader_bbc, criterion, device, use_attention_mask=True
    )

    # Sentiment140 (pakai attention_mask juga)
    train_loss_sent, train_acc_sent = train_one_epoch(
        model_sentiment, train_loader_sentiment, criterion, optimizer_sentiment, device, use_attention_mask=True
    )
    val_loss_sent, val_acc_sent = evaluate(
        model_sentiment, test_loader_sentiment, criterion, device, use_attention_mask=True
    )

    # FordA (time series, gak ada attention_mask)
    train_loss_forda, train_acc_forda = train_one_epoch(
        model_forda, train_loader_forda, criterion, optimizer_forda, device, use_attention_mask=False
    )
    val_loss_forda, val_acc_forda = evaluate(
        model_forda, test_loader_forda, criterion, device, use_attention_mask=False
    )

    print(f"Epoch {epoch+1}/{num_epochs}")
    print(f"BBCNews   - Train loss: {train_loss_bbc:.4f}, Train acc: {train_acc_bbc:.4f}, Val loss: {val_loss_bbc:.4f}, Val acc: {val_acc_bbc:.4f}")
    print(f"Sentiment - Train loss: {train_loss_sent:.4f}, Train acc: {train_acc_sent:.4f}, Val loss: {val_loss_sent:.4f}, Val acc: {val_acc_sent:.4f}")
    print(f"FordA     - Train loss: {train_loss_forda:.4f}, Train acc: {train_acc_forda:.4f}, Val loss: {val_loss_forda:.4f}, Val acc: {val_acc_forda:.4f}")


  output = torch._nested_tensor_from_mask(


NameError: name 'train_loader_sentiment' is not defined

# **Tahap 7 – Kesimpulan dan Catatan**  
Menyimpulkan temuan eksperimen, mengidentifikasi parameter terbaik, dan membuat catatan penting untuk diskusi kelompok.