# Quran_QA K-Folds Ensemble 

In last Notebook __[quran_qa_v1](https://github.com/EmanElrefai/Quran_QA/blob/main/quran_qa_v1.ipynb)__ I mentioned that there many difrent pretrained models have been tried with diffrent configurations parameters. I chose the models with higher results so that their pRR range is between [48.4 to 52.9]

## Imports

In [1]:
#Reading and spliting the data
import pandas as pd
from sklearn.model_selection import KFold
import numpy as np

#
import joblib
import os
import torch
import json
from datasets import Dataset
from transformers import AutoModelForQuestionAnswering
from transformers import pipeline
from scipy.special import softmax
from transformers import AutoTokenizer
from transformers import PreTrainedTokenizerFast


  from .autonotebook import tqdm as notebook_tqdm


## Loading the Testing Data

Here, Loading the development dataset

In [2]:
dev_data_path = r'./datasets/qrcd_v1_1_dev.jsonl'

def read_data(datapath):
    
    with open(datapath ,'rb') as fp:
        datalist = list(fp)
    data =[]
    for json_str in datalist:
        result = json.loads(json_str)
        #print(f"result: {result}")
        data.append(result)    
    return data 
val_data=read_data(dev_data_path)

In [3]:
val_data

[{'pq_id': '9:60-61_316',
  'passage': 'إنما الصدقات للفقراء والمساكين والعاملين عليها والمؤلفة قلوبهم وفي الرقاب والغارمين وفي سبيل الله وابن السبيل فريضة من الله والله عليم حكيم. ومنهم الذين يؤذون النبي ويقولون هو أذن قل أذن خير لكم يؤمن بالله ويؤمن للمؤمنين ورحمة للذين آمنوا منكم والذين يؤذون رسول الله لهم عذاب أليم.',
  'surah': 9,
  'verses': '60-61',
  'question': 'ما هي مصارف الزكاة؟',
  'answers': [{'text': 'للفقراء والمساكين والعاملين عليها والمؤلفة قلوبهم وفي الرقاب والغارمين وفي سبيل الله وابن السبيل',
    'start_char': 13}]},
 {'pq_id': '2:178-179_400',
  'passage': 'يا أيها الذين آمنوا كتب عليكم القصاص في القتلى الحر بالحر والعبد بالعبد والأنثى بالأنثى فمن عفي له من أخيه شيء فاتباع بالمعروف وأداء إليه بإحسان ذلك تخفيف من ربكم ورحمة فمن اعتدى بعد ذلك فله عذاب أليم. ولكم في القصاص حياة يا أولي الألباب لعلكم تتقون.',
  'surah': 2,
  'verses': '178-179',
  'question': 'متى يحل الإسلام دم الشخص؟',
  'answers': [{'text': 'كتب عليكم القصاص في القتلى الحر بالحر والعبد بالعبد والأن

## Data spliting
Where the data will splited to 8 parts. Each part has difrent 80% of data beacuse we have 8 tuning configurations for pretrained models.

In [4]:
def split_multi_answers(data_list):
    new_data=[]
    #Loop all sampe
    for sample in data_list :
        # If the sample has a single answer append it to the new_data list and return back to pick the next sample.
        if len(sample['answers']) ==1:
            new_data.append(sample)
            continue
        # if the sample has multiple answers, Loop through all answeres of the sample, and store each answer with it passage, question in a single sample.
        # And then append this sample to the new_data list.
        for answer in sample['answers']:
            new_sample={
            'answers':[answer],
            'passage':sample['passage'],
            'pq_id':sample['pq_id'],
            'question': sample['question'],
            'surah':sample['surah'],
            'verses':sample['verses']
            }
            new_data.append(new_sample)
    return new_data 

# split the answers in the training and the validation datasets.
val_data= split_multi_answers(val_data)
val_datadf = pd.DataFrame(val_data)


In [5]:
val_datadf

Unnamed: 0,pq_id,passage,surah,verses,question,answers
0,9:60-61_316,إنما الصدقات للفقراء والمساكين والعاملين عليها...,9,60-61,ما هي مصارف الزكاة؟,[{'text': 'للفقراء والمساكين والعاملين عليها و...
1,2:178-179_400,يا أيها الذين آمنوا كتب عليكم القصاص في القتلى...,2,178-179,متى يحل الإسلام دم الشخص؟,[{'text': 'كتب عليكم القصاص في القتلى الحر بال...
2,2:190-194_400,وقاتلوا في سبيل الله الذين يقاتلونكم ولا تعتدو...,2,190-194,متى يحل الإسلام دم الشخص؟,[{'text': 'قاتلوا في سبيل الله الذين يقاتلونكم...
3,2:243-245_400,ألم تر إلى الذين خرجوا من ديارهم وهم ألوف حذر ...,2,243-245,متى يحل الإسلام دم الشخص؟,"[{'text': 'قاتلوا في سبيل الله', 'start_char':..."
4,4:74-76_400,فليقاتل في سبيل الله الذين يشرون الحياة الدنيا...,4,74-76,متى يحل الإسلام دم الشخص؟,"[{'text': 'الذين آمنوا يقاتلون في سبيل الله', ..."
...,...,...,...,...,...,...
123,39:38-41_417,ولئن سألتهم من خلق السماوات والأرض ليقولن الله...,39,38-41,هل سمح الإسلام بحرية الاعتقاد بالدخول إلى الإس...,[{'text': 'من اهتدى فلنفسه ومن ضل فإنما يضل عل...
124,109:1-6_417,قل يا أيها الكافرون. لا أعبد ما تعبدون. ولا أن...,109,1-6,هل سمح الإسلام بحرية الاعتقاد بالدخول إلى الإس...,[{'text': 'قل يا أيها الكافرون. لا أعبد ما تعب...
125,5:44-47_351,إنا أنزلنا التوراة فيها هدى ونور يحكم بها النب...,5,44-47,هل يؤثم الحاكم الذي لا يحكم بما أنزل الله وشرّع؟,[{'text': 'من لم يحكم بما أنزل الله فأولئك هم ...
126,5:44-47_351,إنا أنزلنا التوراة فيها هدى ونور يحكم بها النب...,5,44-47,هل يؤثم الحاكم الذي لا يحكم بما أنزل الله وشرّع؟,[{'text': 'من لم يحكم بما أنزل الله فأولئك هم ...


## Ensemble

In Ensemble, the following should be done:
- Tokenize the question and passage to be prepared.
- Loading All K-Fold models
- Pass the question and the passage to every model
- Execute the ensemble using the voting technique.
  - Sum all model's start scores outputs together.
  - Sum all model's end scores outputs together.
  - Obtain the highest 5 start and end scores.
  - Obtain the highest 5 start and end scores indices.
  - Compute the full_probability from each combination of start and end.
  - Extract the answer from its indices.

In [6]:
# import the AutoTokenizer which will be used to download the pretrained tokenizer model.
token_transformer_name = "aubmindlab/bert-base-arabertv02"

# Downloading the pretrained tokenizer
tokenizer = AutoTokenizer.from_pretrained(token_transformer_name) # inputs id , attention mask 


In [7]:
def read_model_file(model_path, model_name):
    model = joblib.load(model_path+model_name)
    return model

In [8]:
models_path = './/models//'
qa_models = []
for model_name in os.listdir(models_path):
    print(model_name)
    
    qa_model = read_model_file(models_path, model_name)
    qa_models.append(qa_model)

stars_run00.hd5
stars_run01.hd5
stars_run02.hd5
stars_run03.hd5
stars_run04.hd5
stars_run05.hd5
stars_run06.hd5
stars_run07.hd5


In [9]:
def predict_quran_qav1(passage , question):
    min_answer_length=1
    number_of_required_answers = 5
    models_outputs = []
    ranked_answers=[]
    
    # Pass the question and the passage to the tokenizer
    inputs = tokenizer(question, passage, add_special_tokens=True, return_tensors="pt").to("cuda") 

    # Obtain the input_ids from inputs
    input_ids = inputs["input_ids"].tolist()[0]
    
    for model in qa_models:
        # predict the inputs from the qa_model
        models_outputs.append(model(**inputs))

    
    sum_start_answer_scores = models_outputs[0].start_logits
    sum_end_answer_scores = models_outputs[0].end_logits
    
    for i in range(1, len(models_outputs)):
        sum_start_answer_scores += models_outputs[i].start_logits
        sum_end_answer_scores += models_outputs[i].end_logits
    
    #print(sum_start_answer_scores)
    #print(sum_end_answer_scores)
    
    
    # Get the most likely beginning of answer with the argmax of the score    
    answer_starts_probs = softmax(torch.topk(sum_start_answer_scores , 5).values.cpu().data.numpy())

    # print(answer_starts_probs)
    answer_starts =  torch.topk(sum_start_answer_scores , 5).indices
    
    # Get the most likely end of answer with the argmax of the score
    answer_ends_probs = softmax(torch.topk(sum_end_answer_scores, 5).values.cpu().data.numpy())
    answer_ends = torch.topk(sum_end_answer_scores, 5).indices +1
    
    
    # obtain the full probability by multiplying the matrix elementwise
    full_probs = softmax((np.multiply(answer_starts_probs,answer_ends_probs)))[0] #check[1]

    #print(f"Question: {question}")
    #print('top predicted answers:')
    idx =0
    
    # loop on each answer_start and answer_end indicies
    #This loop mainly will be used to convert the indcies to the words according to the indcies obtained from above.
    for answer_start ,  answer_end in zip(answer_starts.tolist()[0], answer_ends.tolist()[0]):
        
        idx+=1
        # use the convert_tokens_to_string API to convert the input_ids
        #'from the answer_start to the answer_end' back to the words starting from
        answer = tokenizer.convert_tokens_to_string( tokenizer.convert_ids_to_tokens(input_ids[answer_start:answer_end]))
        
       
        #If the answer is not empty
        if answer.strip() !='':
            
            # if the full_probs is good, then register the answer
            if (full_probs[idx-1] > 0.1):    
                
                # print the answer
                #print(f"Answer number {idx}: {answer}")
                #Append the answer to the ranked_answers
                ranked_answers.append( { 'answer': answer, 'rank' : len(ranked_answers)+1, 'score':float(full_probs[idx-1])})
    #This is just for checking the ranked_answers if empty.
    if len(ranked_answers) == 0:
        print(' Empty Answer List')      
    return ranked_answers

## Evaluation the Ensemble

### Using Example

In [10]:
text ="فما لهم عن التذكرة معرضين. كانهم حمر مستنفرة. فرت من قسورة "
q='ماهي الحيوانات التي ذكرت في القران ؟'
predict_quran_qav1(text ,q)

[{'answer': 'حمر مستنفرة. فرت من قسورة',
  'rank': 1,
  'score': 0.4042997360229492},
 {'answer': 'كانهم حمر مستنفرة', 'rank': 2, 'score': 0.1489250510931015},
 {'answer': 'فرت من قس', 'rank': 3, 'score': 0.1489250510931015}]

### Using Development Dataset

In [11]:
result ={}
for index, sample in val_datadf.iterrows():
    result[sample['pq_id']] = predict_quran_qav1(sample['passage'], sample['question'])

In [12]:
ensemble_path = r'./result/stars_runEn00.json'
with open(ensemble_path, 'w' , encoding= 'utf8') as fp:
    json.dump(result , fp, ensure_ascii=False)

In [13]:
!python ./evaluation/quranqa22_eval.py --gold_answers_file=./datasets/qrcd_v1_1_dev.jsonl --run_file=./result/stars_runDVEn00.json


Loaded 109 records from ./datasets/qrcd_v1_1_dev.jsonl
The run file is correct.
{"pRR": 0.615163330784261, "exact_match": 0.3119266055045872, "f1": 0.5815573658068455}




### Using the Testing Dataset

In [15]:
test_data_path = r'./datasets/qrcd_v1.1_test_with_Answers.jsonl'

test_data=read_data(test_data_path)

In [16]:
result ={}
for sample in test_data:
    result[sample['pq_id']] = predict_quran_qav1(sample['passage'], sample['question'])

In [17]:
testing_ensemble_path = r'./result/stars_runEnTs00.json'
with open(testing_ensemble_path, 'w' , encoding= 'utf8') as fp:
    json.dump(result , fp, ensure_ascii=False)

In [18]:
!python ./evaluation/quranqa22_eval.py --gold_answers_file=./datasets/qrcd_v1.1_test_with_Answers.jsonl --run_file=./result/stars_runEnTs00.json


Loaded 352 records from ./datasets/qrcd_v1.1_test_with_Answers.jsonl
The run file is correct.
{"pRR": 0.41712232662464604, "exact_match": 0.11647727272727272, "f1": 0.38278974740529287}




In [19]:
!python ./evaluation/quranqa22_submission_checker.py --run_file=./result/stars_runEnTs00.json

The run file is correct.
