# Chatbot Q&A Quranic Reasoning

## Business Understanding

- Bagaimana potensi penggunaan QRQA Dataset dalam mengembangkan produk edukasi digital Islam berbasis AI (seperti chatbot tanya jawab, aplikasi pembelajaran, atau virtual mufti)?

  _Untuk mengidentifikasi peluang produk turunan dan segmen pasar potensial (pelajar, akademisi, pesantren digital, dll.)._

- Model bahasa mana (seperti LLaMA, Mistral, DeepSeek, dsb.) yang paling cocok untuk fine-tuning dengan QRQA Dataset dalam konteks kecepatan, akurasi, dan efisiensi biaya?

  _Akan dites pada Notebook ini._

- Bagaimana cara mengukur efektivitas reasoning model terhadap pertanyaan-pertanyaan kompleks dalam QRQA?

  _Menggunakan metrik evaluasi seperti BLEU, ROUGE, atau human-evaluated Islamic consistency score._

## Data and Tools Acquisition

In [1]:
!pip install transformers
!pip install kaggle
!pip install rouge-score
!pip install openpyxl.

Collecting rouge-score
  Downloading rouge_score-0.1.2.tar.gz (17 kB)
  Preparing metadata (setup.py) ... [?25l[?25hdone
Building wheels for collected packages: rouge-score
  Building wheel for rouge-score (setup.py) ... [?25l[?25hdone
  Created wheel for rouge-score: filename=rouge_score-0.1.2-py3-none-any.whl size=24935 sha256=ef2ca352fef0fac24f0d0c4fcf866aa6f08b2ec08f4ccc025f651c95d59574b7
  Stored in directory: /root/.cache/pip/wheels/1e/19/43/8a442dc83660ca25e163e1bd1f89919284ab0d0c1475475148
Successfully built rouge-score
Installing collected packages: rouge-score
Successfully installed rouge-score-0.1.2
[31mERROR: Invalid requirement: 'openpyxl.': Expected end or semicolon (after name and no valid version specifier)
    openpyxl.
            ^[0m[31m
[0m

In [2]:
import numpy as np
import matplotlib.pyplot as plt
import kagglehub
from kagglehub import KaggleDatasetAdapter
import os
import pathlib
import pandas as pd
from sklearn.model_selection import train_test_split
from transformers import PegasusTokenizer, PegasusForConditionalGeneration
import torch
from torch.utils.data import DataLoader, Dataset
from torch.optim import AdamW
from nltk.translate.bleu_score import sentence_bleu
from rouge_score import rouge_scorer

2025-05-15 03:51:19.817184: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:477] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
E0000 00:00:1747281080.270333      19 cuda_dnn.cc:8310] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
E0000 00:00:1747281080.402578      19 cuda_blas.cc:1418] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered


In [3]:
# ! mkdir ~/.kaggle

In [4]:
# !cp /content/drive/MyDrive/CollabData/kaggle_API/kaggle.json ~/.kaggle/kaggle.json

In [5]:
# ! chmod 600 ~/.kaggle/kaggle.json

In [6]:
# ! kaggle datasets download lazer999/quranic-reasoning-synthetic-dataset

In [7]:
# ! kaggle datasets download alizahidraja/quran-english

In [8]:
# ! unzip quranic-reasoning-synthetic-dataset.zip

In [9]:
# ! unzip quran-english.zip

## Data Preparation

In [10]:
file_path = "/kaggle/input/quranic-reasoning-synthetic-dataset/Quran_R1_excel.xlsx"
df = pd.read_excel(file_path)
df.head()

Unnamed: 0.1,Unnamed: 0,Question,Complex_CoT,Response
0,0,What is the significance of patience (sabr) in...,Patience (sabr) is a key virtue emphasized in ...,The Quran highlights patience as a sign of str...
1,1,Why do we have to pray five times a day? Would...,The five daily prayers are a fundamental pilla...,The five daily prayers maintain spiritual conn...
2,2,What does the Quran say about friendships? How...,Friendship plays a crucial role in shaping a b...,The Quran advises selecting righteous friends ...
3,3,Why does the Quran emphasize so much on gratit...,Gratitude (shukr) is vital in Islam as it fost...,"The Quran underscores gratitude, promising inc..."
4,4,How should we deal with disagreements among si...,The Quran encourages resolving sibling dispute...,Sibling disagreements should be resolved with ...


In [11]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 857 entries, 0 to 856
Data columns (total 4 columns):
 #   Column       Non-Null Count  Dtype 
---  ------       --------------  ----- 
 0   Unnamed: 0   857 non-null    int64 
 1   Question     857 non-null    object
 2   Complex_CoT  857 non-null    object
 3   Response     857 non-null    object
dtypes: int64(1), object(3)
memory usage: 26.9+ KB


Column `Unnamed: 0` merupakan Column yang harus kita drop karena tidak berguna

In [12]:
df = df.drop(columns=['Unnamed: 0'])
df.head()
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 857 entries, 0 to 856
Data columns (total 3 columns):
 #   Column       Non-Null Count  Dtype 
---  ------       --------------  ----- 
 0   Question     857 non-null    object
 1   Complex_CoT  857 non-null    object
 2   Response     857 non-null    object
dtypes: object(3)
memory usage: 20.2+ KB


Let's go to the next data

In [13]:
file_path = "/kaggle/input/quran-english/Quran_English_with_Tafseer.csv"
df_quran = pd.read_csv(file_path)
df_quran.head()

Unnamed: 0,Name,Surah,Ayat,Verse,Tafseer
0,The Opening,1,1,"In the name of Allah, the Beneficent, the Merc...",In the Name of God the Compassionate the Merciful
1,The Opening,1,2,"Praise be to Allah, Lord of the Worlds,",In the Name of God the name of a thing is that...
2,The Opening,1,3,"The Beneficent, the Merciful.",The Compassionate the Merciful that is to say ...
3,The Opening,1,4,"Owner of the Day of Judgment,",Master of the Day of Judgement that is the day...
4,The Opening,1,5,Thee (alone) we worship; Thee (alone) we ask f...,You alone we worship and You alone we ask for ...


In [14]:
df_quran.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 6236 entries, 0 to 6235
Data columns (total 5 columns):
 #   Column   Non-Null Count  Dtype 
---  ------   --------------  ----- 
 0   Name     6236 non-null   object
 1   Surah    6236 non-null   int64 
 2   Ayat     6236 non-null   int64 
 3   Verse    6236 non-null   object
 4   Tafseer  6235 non-null   object
dtypes: int64(2), object(3)
memory usage: 243.7+ KB


In [15]:
display(df_quran[df_quran['Tafseer'].isnull()])

Unnamed: 0,Name,Surah,Ayat,Verse,Tafseer
4555,Muhammad,47,11,That is because Allah is patron of those who b...,


Ada satu data yang tidak memiliki tafsir kosong, dalam hal ini kita akan isi data kosong ini dengan data sintetis

In [16]:
# Fill empty 'Tafseer' values with a synthetic data
df_quran['Tafseer'] = df_quran['Tafseer'].fillna("This surah emphasizes that Allah is the protector and ally (Mawlā) of those who believe, offering them divine support, guidance, and victory, while the disbelievers are left without any true protector. This verse reassures the believers that despite external challenges or opposition, they are never alone—Allah stands by them in both worldly and spiritual affairs. Conversely, disbelievers, no matter their apparent power or alliances, lack divine backing and are ultimately vulnerable. Revealed in the context of struggle between faith and disbelief, particularly in times of conflict, this verse highlights the importance of trusting in Allah, as real strength and success come through His support, not mere worldly means.")
print(df_quran[df_quran['Tafseer'].isnull()])

Empty DataFrame
Columns: [Name, Surah, Ayat, Verse, Tafseer]
Index: []


Next, mari kita expand data kita dengan menggunakan Tafsir dari berbagai perawi

In [17]:
file_path = "/kaggle/input/quran-nlp/data/main_df.csv"
df_tafseer = pd.read_csv(file_path)
df_tafseer.head()

Unnamed: 0,Name,Surah,Ayat,Arabic,Translation - Muhammad Tahir-ul-Qadri,Translation - Arthur J,Translation - Marmaduke Pickthall,Tafaseer - Tafsir al-Jalalayn,Tafaseer - Tanwir al-Miqbas min Tafsir Ibn Abbas,EnglishTitle,ArabicTitle,RomanTitle,NumberOfVerses,NumberOfRukus,PlaceOfRevelation
0,The Opening,1,1,بِسمِ ٱلله الرَّحْمٰنِ الرَّحِيـمِ,"All praise be to Allah alone, the Sustainer of...","In the Name of God, the Merciful, the Compassi...","In the name of Allah, the Beneficent, the Merc...",In the Name of God the Compassionate the Merciful,"In the name of Allah, the Beneficent, the Merc...",Al-Fatihah,ٱلْفَاتِحَة,al-Ḥamd,7,1,Makkah
1,The Opening,1,2,ٱلْحَمْدُ للَّهِ رَبِّ ٱلْعَالَمِينَ,"Most Compassionate, Ever-Merciful,","Praise belongs to God, the Lord of all Being,","Praise be to Allah, Lord of the Worlds,",In the Name of God the name of a thing is that...,And on his authority it is related that Ibn 'A...,Al-Fatihah,ٱلْفَاتِحَة,al-Ḥamd,7,1,Makkah
2,The Opening,1,3,ٱلرَّحْمـٰنِ ٱلرَّحِيمِ,Master of the Day of Judgment.,"the All-merciful, the All-compassionate,","The Beneficent, the Merciful.",The Compassionate the Merciful that is to say ...,(The Beneficent) the Gentle. (The Merciful) th...,Al-Fatihah,ٱلْفَاتِحَة,al-Ḥamd,7,1,Makkah
3,The Opening,1,4,مَـٰلِكِ يَوْمِ ٱلدِّينِ,(O Allah!) You alone do we worship and to You ...,the Master of the Day of Doom.,"Owner of the Day of Judgment,",Master of the Day of Judgement that is the day...,(Owner of the Day of Judgement) the Arbitrator...,Al-Fatihah,ٱلْفَاتِحَة,al-Ḥamd,7,1,Makkah
4,The Opening,1,5,إِيَّاكَ نَعْبُدُ وَإِيَّاكَ نَسْتَعِينُ,"Show us the straight path,",Thee only we serve; to Thee alone we pray for ...,Thee (alone) we worship; Thee (alone) we ask f...,You alone we worship and You alone we ask for ...,"(Thee (alone) we worship), we turn to you as t...",Al-Fatihah,ٱلْفَاتِحَة,al-Ḥamd,7,1,Makkah


In [18]:
df_tafseer.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 6236 entries, 0 to 6235
Data columns (total 15 columns):
 #   Column                                            Non-Null Count  Dtype 
---  ------                                            --------------  ----- 
 0   Name                                              6236 non-null   object
 1   Surah                                             6236 non-null   int64 
 2   Ayat                                              6236 non-null   int64 
 3   Arabic                                            6236 non-null   object
 4   Translation - Muhammad Tahir-ul-Qadri             6236 non-null   object
 5   Translation - Arthur J                            6236 non-null   object
 6   Translation - Marmaduke Pickthall                 6236 non-null   object
 7   Tafaseer - Tafsir al-Jalalayn                     6236 non-null   object
 8   Tafaseer - Tanwir al-Miqbas min Tafsir Ibn Abbas  6236 non-null   object
 9   EnglishTitle                  

In [19]:
columns_to_drop = ['Translation - Arthur J', 'Translation - Marmaduke Pickthall', 'NumberOfRukus']
df_tafseer = df_tafseer.drop(columns=[col for col in columns_to_drop if col in df_tafseer.columns])
df_tafseer.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 6236 entries, 0 to 6235
Data columns (total 12 columns):
 #   Column                                            Non-Null Count  Dtype 
---  ------                                            --------------  ----- 
 0   Name                                              6236 non-null   object
 1   Surah                                             6236 non-null   int64 
 2   Ayat                                              6236 non-null   int64 
 3   Arabic                                            6236 non-null   object
 4   Translation - Muhammad Tahir-ul-Qadri             6236 non-null   object
 5   Tafaseer - Tafsir al-Jalalayn                     6236 non-null   object
 6   Tafaseer - Tanwir al-Miqbas min Tafsir Ibn Abbas  6236 non-null   object
 7   EnglishTitle                                      6236 non-null   object
 8   ArabicTitle                                       6236 non-null   object
 9   RomanTitle                    

Lanjut ke tahap selanjutnya

### Data Merging

Sebelum kita develop modelnya, dataset `df`, `df_quran` dan `df_tafseer` akan digabung dengan format Kolom `Question` dan `Response`

In [20]:
# Create the first template
df_quran['Question'] = "What is the meaning of Surah " + df_quran['Surah'].astype(str) + ":" + df_quran['Ayat'].astype(str) + "?"
df_quran['Response'] = df_quran['Tafseer']

# Create the second template and append it to the first dataframe
df_quran_2 = pd.DataFrame()
df_quran_2['Question'] = "What is the meaning of Surah " + df_quran['Name'] + ":" + df_quran['Ayat'].astype(str) + "?"
df_quran_2['Response'] = df_quran['Tafseer']

df_quran = pd.concat([df_quran, df_quran_2], ignore_index=True)

# Select only the relevant columns for merging
df_quran = df_quran[['Question', 'Response']]

display(df_quran)

Unnamed: 0,Question,Response
0,What is the meaning of Surah 1:1?,In the Name of God the Compassionate the Merciful
1,What is the meaning of Surah 1:2?,In the Name of God the name of a thing is that...
2,What is the meaning of Surah 1:3?,The Compassionate the Merciful that is to say ...
3,What is the meaning of Surah 1:4?,Master of the Day of Judgement that is the day...
4,What is the meaning of Surah 1:5?,You alone we worship and You alone we ask for ...
...,...,...
12467,What is the meaning of Surah Mankind:2?,the King of mankind
12468,What is the meaning of Surah Mankind:3?,the God of mankind both maliki’l-nās and ilāhi...
12469,What is the meaning of Surah Mankind:4?,from the evil of the slinking whisperer Satan ...
12470,What is the meaning of Surah Mankind:5?,who whispers in the breasts of mankind in thei...


In [21]:
# Rename long tafsir columns for convenience
df_tafseer = df_tafseer.rename(columns={
    'Tafaseer - Tafsir al-Jalalayn': 'Tafsir_Jalalayn',
    'Tafaseer - Tanwir al-Miqbas min Tafsir Ibn Abbas': 'Tafsir_IbnAbbas'
})

# Template 1 – Asking for place of revelation only
df1 = pd.DataFrame()
df1['Question'] = "Where was Surah " + df_tafseer['Name'] + ", verse " + df_tafseer['Ayat'].astype(str) + " revealed?"
df1['Response'] = "It was revealed in " + df_tafseer['PlaceOfRevelation'] + "."

# --- Combine all templates into one dataframe ---
df_tafseer = pd.concat([df1], ignore_index=True)

# --- Select only relevant columns ---
df_tafseer = df_tafseer[['Question', 'Response']]

display(df_tafseer)

Unnamed: 0,Question,Response
0,"Where was Surah The Opening, verse 1 revealed?",It was revealed in Makkah.
1,"Where was Surah The Opening, verse 2 revealed?",It was revealed in Makkah.
2,"Where was Surah The Opening, verse 3 revealed?",It was revealed in Makkah.
3,"Where was Surah The Opening, verse 4 revealed?",It was revealed in Makkah.
4,"Where was Surah The Opening, verse 5 revealed?",It was revealed in Makkah.
...,...,...
6231,"Where was Surah Mankind, verse 2 revealed?",It was revealed in Makkah.
6232,"Where was Surah Mankind, verse 3 revealed?",It was revealed in Makkah.
6233,"Where was Surah Mankind, verse 4 revealed?",It was revealed in Makkah.
6234,"Where was Surah Mankind, verse 5 revealed?",It was revealed in Makkah.


In [22]:
# --- Concatenate with other QnA datasets ---
merged_df = pd.concat([df], ignore_index=True)

# Optional: Preview the result
display(merged_df)

Unnamed: 0,Question,Complex_CoT,Response
0,What is the significance of patience (sabr) in...,Patience (sabr) is a key virtue emphasized in ...,The Quran highlights patience as a sign of str...
1,Why do we have to pray five times a day? Would...,The five daily prayers are a fundamental pilla...,The five daily prayers maintain spiritual conn...
2,What does the Quran say about friendships? How...,Friendship plays a crucial role in shaping a b...,The Quran advises selecting righteous friends ...
3,Why does the Quran emphasize so much on gratit...,Gratitude (shukr) is vital in Islam as it fost...,"The Quran underscores gratitude, promising inc..."
4,How should we deal with disagreements among si...,The Quran encourages resolving sibling dispute...,Sibling disagreements should be resolved with ...
...,...,...,...
852,Analyze the 'Story of the Calf' in Surah Al-Ba...,The 'Story of the Calf' (Samiri incident) in S...,"The 'Calf Story' in Surah Al-Baqarah (2:51-54,..."
853,Explore the Quranic concept of 'Al-Jannah' (Pa...,'Al-Jannah' (Paradise) and 'An-Nar' (Hellfire)...,'Al-Jannah' (Paradise) and 'An-Nar' (Hellfire)...
854,Analyze Surah Al-Asr (103). How does this brie...,"Surah Al-Asr (103), though concise, outlines a...",Surah Al-Asr (103) outlines a path to salvatio...
855,Explore the Quranic concept of 'Al-Wilayah' (G...,'Al-Wilayah' (Guardianship/Protection) of Alla...,'Al-Wilayah' (Guardianship/Protection) of Alla...


## Model Development

Kita akan menggunakan model Pegasus, cek penjelasan Transformer [disini](https://medium.com/@varun5/pegasus-large-language-model-8c2aeee1e11)

In [23]:
inputt=merged_df['Question'].tolist()
labelt=merged_df['Response'].tolist()

Split Train-Test (Dalam hal ini kita akan pisah 9:1)

In [24]:
train_inputs, test_inputs, train_labels, test_labels = train_test_split(inputt, labelt, test_size=0.1, random_state=42)

Mari kita Panggil Tokenizer dan Pre-Model yang akan kita pakai, dalam hal ini Pegasus

In [25]:
tokenizer = PegasusTokenizer.from_pretrained("google/pegasus-xsum")
model = PegasusForConditionalGeneration.from_pretrained("google/pegasus-xsum")

tokenizer_config.json:   0%|          | 0.00/87.0 [00:00<?, ?B/s]

spiece.model:   0%|          | 0.00/1.91M [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/65.0 [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/3.52M [00:00<?, ?B/s]

config.json:   0%|          | 0.00/1.39k [00:00<?, ?B/s]

pytorch_model.bin:   0%|          | 0.00/2.28G [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/2.28G [00:00<?, ?B/s]

Some weights of PegasusForConditionalGeneration were not initialized from the model checkpoint at google/pegasus-xsum and are newly initialized: ['model.decoder.embed_positions.weight', 'model.encoder.embed_positions.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


generation_config.json:   0%|          | 0.00/259 [00:00<?, ?B/s]

Sebelum melatih model, mari kita tokenisasi data

In [26]:
def tokenize_data(inputs, labels, tokenizer, max_length=128):
    input_encodings = tokenizer(
        list(inputs), max_length=max_length, padding=True, truncation=True, return_tensors="pt"
    )
    label_encodings = tokenizer(
        list(labels), max_length=max_length, padding=True, truncation=True, return_tensors="pt"
    )
    return input_encodings, label_encodings

train_inputs_enc, train_labels_enc = tokenize_data(train_inputs, train_labels, tokenizer)
test_inputs_enc, test_labels_enc = tokenize_data(test_inputs, test_labels, tokenizer)

In [27]:
class CustomDataset(Dataset):
    def __init__(self, encodings, labels):
        self.encodings = encodings
        self.labels = labels

    def __len__(self):
        return len(self.labels["input_ids"])

    def __getitem__(self, idx):
        return {
            "input_ids": self.encodings["input_ids"][idx],
            "attention_mask": self.encodings["attention_mask"][idx],
            "labels": self.labels["input_ids"][idx],
        }

train_dataset = CustomDataset(train_inputs_enc, train_labels_enc)
test_dataset = CustomDataset(test_inputs_enc, test_labels_enc)

train_loader = DataLoader(train_dataset, batch_size=8, shuffle=True)
test_loader = DataLoader(test_dataset, batch_size=8)

Mari kita train model kita kali ini serta menggunakan Optimizer untuk meningkatkan Akurasi model!

In [28]:
optimizer = AdamW(model.parameters(), lr=5e-6)

device = torch.device("cuda") if torch.cuda.is_available() else torch.device("cpu")
model.to(device)

epochs = 15
for epoch in range(epochs):
    model.train()
    for batch in train_loader:
        optimizer.zero_grad()
        
        input_ids = batch["input_ids"].to(device)
        attention_mask = batch["attention_mask"].to(device)
        labels = batch["labels"].to(device)
        
        outputs = model(input_ids=input_ids, attention_mask=attention_mask, labels=labels)
        loss = outputs.loss
        loss.backward()
        optimizer.step()

    print(f"Epoch {epoch + 1} Loss: {loss.item()}")

Epoch 1 Loss: 7.73588228225708
Epoch 2 Loss: 7.480173587799072
Epoch 3 Loss: 9.645816802978516
Epoch 4 Loss: 7.6141438484191895
Epoch 5 Loss: 9.363567352294922
Epoch 6 Loss: 7.773708343505859
Epoch 7 Loss: 8.943352699279785
Epoch 8 Loss: 6.859461307525635
Epoch 9 Loss: 8.452292442321777
Epoch 10 Loss: 5.539506912231445
Epoch 11 Loss: 8.166243553161621
Epoch 12 Loss: 7.463526248931885
Epoch 13 Loss: 6.666658878326416
Epoch 14 Loss: 4.968800067901611
Epoch 15 Loss: 3.534841775894165


## Model Testing

In [29]:
model.eval()
for batch in test_loader:
    input_ids = batch["input_ids"].to(device)
    attention_mask = batch["attention_mask"].to(device)
    labels = batch["labels"].to(device)

    input_texts = [tokenizer.decode(ids, skip_special_tokens=True) for ids in input_ids]
    true_labels = [tokenizer.decode(label, skip_special_tokens=True) for label in labels]

    outputs = model.generate(
        input_ids=input_ids,
        attention_mask=attention_mask,
        max_length=50
    )
    predictions = [tokenizer.decode(output, skip_special_tokens=True) for output in outputs]

    for input_text, true_label, pred in zip(input_texts, true_labels, predictions):
        print("-" * 50)
        print(f"input_txt: {input_text}")
        print(f"true_label: {true_label}")
        print(f"true_pred: {pred}")

    break

--------------------------------------------------
input_txt: Why does Allah allow bad people to succeed in this world?
true_label: The Quran warns that worldly success is a test: 'Do not be deceived by the prosperity of those who disbelieve' (3:196). True success lies in righteousness.
true_pred: Why do bad people succeed in this world?
--------------------------------------------------
input_txt: What does the Quran teach about the responsibility of using reason to safeguard one’s faith?
true_label: It teaches that using reason is a fundamental responsibility that protects and strengthens one’s faith.
true_pred: The Quran teaches that reason is essential to safeguard one’s faith.
--------------------------------------------------
input_txt: What does the Quran teach about handling criticism within the family?
true_label: The Quran encourages using constructive criticism as an opportunity for growth, responding with patience and humility.
true_pred: It teaches that criticism should be

## Model Evaluation

In [30]:
# Initialize the ROUGE scorer
scorer = rouge_scorer.RougeScorer(['rouge1', 'rougeL'], use_stemmer=True)

# Assuming 'predictions' and 'true_labels' are lists of strings from the previous code block

bleu_scores = []
rouge1_scores = []
rougeL_scores = []

for prediction, true_label in zip(predictions, true_labels):
  # Calculate BLEU score
  reference = [true_label.split()]
  candidate = prediction.split()
  bleu_score = sentence_bleu(reference, candidate)
  bleu_scores.append(bleu_score)

  # Calculate ROUGE scores
  scores = scorer.score(true_label, prediction)
  rouge1_scores.append(scores['rouge1'].fmeasure)
  rougeL_scores.append(scores['rougeL'].fmeasure)

# Calculate average scores
avg_bleu = np.mean(bleu_scores)
avg_rouge1 = np.mean(rouge1_scores)
avg_rougeL = np.mean(rougeL_scores)

print(f"Average BLEU Score: {avg_bleu}")
print(f"Average ROUGE-1 Score: {avg_rouge1}")
print(f"Average ROUGE-L Score: {avg_rougeL}")

Average BLEU Score: 5.319864555242108e-80
Average ROUGE-1 Score: 0.28852806846008844
Average ROUGE-L Score: 0.25225644896323307


The hypothesis contains 0 counts of 2-gram overlaps.
Therefore the BLEU score evaluates to 0, independently of
how many N-gram overlaps of lower order it contains.
Consider using lower n-gram order or use SmoothingFunction()
The hypothesis contains 0 counts of 3-gram overlaps.
Therefore the BLEU score evaluates to 0, independently of
how many N-gram overlaps of lower order it contains.
Consider using lower n-gram order or use SmoothingFunction()
The hypothesis contains 0 counts of 4-gram overlaps.
Therefore the BLEU score evaluates to 0, independently of
how many N-gram overlaps of lower order it contains.
Consider using lower n-gram order or use SmoothingFunction()


## Penjelasan Setiap Metrik

---

- **BLEU (Bilingual Evaluation Understudy)**

  BLEU digunakan untuk mengukur kemiripan antara hasil generasi model dengan jawaban referensi berdasarkan kesamaan n-gram.

  Dalam model kali ini, skor BLEU kita menunjukkan bahwa model menghasilkan jawaban yang memiliki kemiripan n-gram dengan referensi, namun tetap menyisakan ruang untuk peningkatan struktur dan kesesuaian kata.

---

- **ROUGE-1**

  Mengukur kesamaan kata secara langsung (unigram overlap) antara jawaban model dan jawaban referensi.

---

- **ROUGE-L**

  Mengukur kesamaan struktur atau urutan kata (longest common subsequence).

## Model Saving

In [31]:
# Save the model
model_path = "/kaggle/working/Model"
model.save_pretrained(model_path)
tokenizer.save_pretrained(model_path)

print(f"Model saved to {model_path}")



Model saved to /kaggle/working/Model
