# Chatbot Q&A Quranic Reasoning

## Business Understanding

- Bagaimana potensi penggunaan QRQA Dataset dalam mengembangkan produk edukasi digital Islam berbasis AI (seperti chatbot tanya jawab, aplikasi pembelajaran, atau virtual mufti)?

  _Untuk mengidentifikasi peluang produk turunan dan segmen pasar potensial (pelajar, akademisi, pesantren digital, dll.)._

- Model bahasa mana (seperti LLaMA, Mistral, DeepSeek, dsb.) yang paling cocok untuk fine-tuning dengan QRQA Dataset dalam konteks kecepatan, akurasi, dan efisiensi biaya?

  _Akan dites pada Notebook ini._

- Bagaimana cara mengukur efektivitas reasoning model terhadap pertanyaan-pertanyaan kompleks dalam QRQA?

  _Menggunakan metrik evaluasi seperti BLEU, ROUGE, atau human-evaluated Islamic consistency score._

## Data and Tools Acquisition

In [19]:
!pip install transformers
!pip install kaggle



In [20]:
import numpy as np
import matplotlib.pyplot as plt
import kagglehub
from kagglehub import KaggleDatasetAdapter
from google.colab import files
import os
import pathlib
import pandas as pd
from sklearn.model_selection import train_test_split
from transformers import T5Tokenizer, T5ForConditionalGeneration
import torch
from torch.utils.data import DataLoader, Dataset
from torch.optim import AdamW

In [21]:
! mkdir ~/.kaggle

mkdir: cannot create directory ‘/root/.kaggle’: File exists


In [22]:
!cp /content/drive/MyDrive/CollabData/kaggle_API/kaggle.json ~/.kaggle/kaggle.json

In [23]:
! chmod 600 ~/.kaggle/kaggle.json

In [24]:
! kaggle datasets download lazer999/quranic-reasoning-synthetic-dataset

Dataset URL: https://www.kaggle.com/datasets/lazer999/quranic-reasoning-synthetic-dataset
License(s): CC0-1.0
quranic-reasoning-synthetic-dataset.zip: Skipping, found more recently modified local copy (use --force to force download)


In [25]:
! kaggle datasets download alizahidraja/quran-english

Dataset URL: https://www.kaggle.com/datasets/alizahidraja/quran-english
License(s): GPL-2.0
quran-english.zip: Skipping, found more recently modified local copy (use --force to force download)


In [26]:
! unzip quranic-reasoning-synthetic-dataset.zip

Archive:  quranic-reasoning-synthetic-dataset.zip
replace Quran_R1_excel.xlsx? [y]es, [n]o, [A]ll, [N]one, [r]ename: A
  inflating: Quran_R1_excel.xlsx     


In [27]:
! unzip quran-english.zip

Archive:  quran-english.zip
replace Quran_English.csv? [y]es, [n]o, [A]ll, [N]one, [r]ename: A
  inflating: Quran_English.csv       
  inflating: Quran_English_with_Tafseer.csv  


## Data Preparation

In [28]:
file_path = "/content/Quran_R1_excel.xlsx"
df = pd.read_excel(file_path)
df.head()

Unnamed: 0.1,Unnamed: 0,Question,Complex_CoT,Response
0,0,What is the significance of patience (sabr) in...,Patience (sabr) is a key virtue emphasized in ...,The Quran highlights patience as a sign of str...
1,1,Why do we have to pray five times a day? Would...,The five daily prayers are a fundamental pilla...,The five daily prayers maintain spiritual conn...
2,2,What does the Quran say about friendships? How...,Friendship plays a crucial role in shaping a b...,The Quran advises selecting righteous friends ...
3,3,Why does the Quran emphasize so much on gratit...,Gratitude (shukr) is vital in Islam as it fost...,"The Quran underscores gratitude, promising inc..."
4,4,How should we deal with disagreements among si...,The Quran encourages resolving sibling dispute...,Sibling disagreements should be resolved with ...


In [31]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 857 entries, 0 to 856
Data columns (total 4 columns):
 #   Column       Non-Null Count  Dtype 
---  ------       --------------  ----- 
 0   Unnamed: 0   857 non-null    int64 
 1   Question     857 non-null    object
 2   Complex_CoT  857 non-null    object
 3   Response     857 non-null    object
dtypes: int64(1), object(3)
memory usage: 26.9+ KB


Column `Unnamed: 0` merupakan Column yang harus kita drop karena tidak berguna

In [37]:
df = df.drop(columns=['Unnamed: 0'])
df.head()
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 857 entries, 0 to 856
Data columns (total 3 columns):
 #   Column       Non-Null Count  Dtype 
---  ------       --------------  ----- 
 0   Question     857 non-null    object
 1   Complex_CoT  857 non-null    object
 2   Response     857 non-null    object
dtypes: object(3)
memory usage: 20.2+ KB


In [29]:
file_path = "/content/Quran_English_with_Tafseer.csv"
df_quran = pd.read_csv(file_path)
df_quran.head()

Unnamed: 0,Name,Surah,Ayat,Verse,Tafseer
0,The Opening,1,1,"In the name of Allah, the Beneficent, the Merc...",In the Name of God the Compassionate the Merciful
1,The Opening,1,2,"Praise be to Allah, Lord of the Worlds,",In the Name of God the name of a thing is that...
2,The Opening,1,3,"The Beneficent, the Merciful.",The Compassionate the Merciful that is to say ...
3,The Opening,1,4,"Owner of the Day of Judgment,",Master of the Day of Judgement that is the day...
4,The Opening,1,5,Thee (alone) we worship; Thee (alone) we ask f...,You alone we worship and You alone we ask for ...


In [32]:
df_quran.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 6236 entries, 0 to 6235
Data columns (total 5 columns):
 #   Column   Non-Null Count  Dtype 
---  ------   --------------  ----- 
 0   Name     6236 non-null   object
 1   Surah    6236 non-null   int64 
 2   Ayat     6236 non-null   int64 
 3   Verse    6236 non-null   object
 4   Tafseer  6235 non-null   object
dtypes: int64(2), object(3)
memory usage: 243.7+ KB


In [34]:
display(df_quran[df_quran['Tafseer'].isnull()])

Unnamed: 0,Name,Surah,Ayat,Verse,Tafseer
4555,Muhammad,47,11,That is because Allah is patron of those who b...,


Ada satu data yang tidak memiliki tafsir kosong, dalam hal ini kita akan isi data kosong ini dengan data sintetis

In [36]:
# Fill empty 'Tafseer' values with a synthetic data
df_quran['Tafseer'] = df_quran['Tafseer'].fillna("This surah mphasizes that Allah is the protector and ally (Mawlā) of those who believe, offering them divine support, guidance, and victory, while the disbelievers are left without any true protector. This verse reassures the believers that despite external challenges or opposition, they are never alone—Allah stands by them in both worldly and spiritual affairs. Conversely, disbelievers, no matter their apparent power or alliances, lack divine backing and are ultimately vulnerable. Revealed in the context of struggle between faith and disbelief, particularly in times of conflict, this verse highlights the importance of trusting in Allah, as real strength and success come through His support, not mere worldly means.")print(df_quran[df_quran['Tafseer'].isnull()])

Empty DataFrame
Columns: [Name, Surah, Ayat, Verse, Tafseer]
Index: []
