<a href="https://colab.research.google.com/github/MustafaHussiein/AI-Powered-Media-Processing/blob/main/ModelFunetuning.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [2]:
!pip install --upgrade transformers
!pip install rouge-score
from transformers import AutoTokenizer, TrainingArguments, Trainer, BertForQuestionAnswering, pipeline
from sklearn.metrics import precision_score, recall_score, f1_score
from difflib import SequenceMatcher
import numpy as np
import csv
import time
import torch
from nltk.translate.bleu_score import sentence_bleu
import pandas as pd
import os
from datasets import load_dataset, DatasetDict
from rouge_score import rouge_scorer



In [3]:
torch.cuda.empty_cache()
models = ["bert-base-multilingual-cased","bert-large-uncased-whole-word-masking-finetuned-squad", "roberta-base", "bert-base-uncased", "aubmindlab/bert-base-arabertv02"]
tokenizers = [AutoTokenizer.from_pretrained(model_name, use_fast = True) for model_name in models]
models_org = [BertForQuestionAnswering.from_pretrained(model_name) for model_name in models]
# Initialize the QA pipeline
qa_pipelines_org = [pipeline("question-answering", model=model, tokenizer=tokenizer) for model,tokenizer in zip(models_org, tokenizers)]
output_filename = 'results.xlsx'
sheet_name = 'Sheet1'
file_paths = ['arabic_samples.xlsx', 'DAWQAS_Masun_Nabhan_Homsi.xlsx']
metrics_headers = ['Dataset', 'Model name', 'Question', 'Type', 'Actual answer',
                   'Answer','Blue' , 'Rouge1_Precision', 'Rouge1_Recall',
                   'Rouge1_Fmeasure', 'Rouge2_Precision', 'Rouge2_Recall',
                   'Rouge2_Fmeasure', 'RougeL_Precision', 'RougeL_Recall',
                   'RougeL_Fmeasure', 'EM', 'SM', 'Inference_time', 'Tuning']

The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


tokenizer_config.json:   0%|          | 0.00/49.0 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/625 [00:00<?, ?B/s]

vocab.txt:   0%|          | 0.00/996k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/1.96M [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/48.0 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/443 [00:00<?, ?B/s]

vocab.txt:   0%|          | 0.00/232k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/466k [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/25.0 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/481 [00:00<?, ?B/s]

vocab.json:   0%|          | 0.00/899k [00:00<?, ?B/s]

merges.txt:   0%|          | 0.00/456k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/1.36M [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/48.0 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/570 [00:00<?, ?B/s]

vocab.txt:   0%|          | 0.00/232k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/466k [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/381 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/384 [00:00<?, ?B/s]

vocab.txt: 0.00B [00:00, ?B/s]

tokenizer.json: 0.00B [00:00, ?B/s]

special_tokens_map.json:   0%|          | 0.00/112 [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/714M [00:00<?, ?B/s]

Some weights of BertForQuestionAnswering were not initialized from the model checkpoint at bert-base-multilingual-cased and are newly initialized: ['qa_outputs.bias', 'qa_outputs.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


model.safetensors:   0%|          | 0.00/1.34G [00:00<?, ?B/s]

Some weights of the model checkpoint at bert-large-uncased-whole-word-masking-finetuned-squad were not used when initializing BertForQuestionAnswering: ['bert.pooler.dense.bias', 'bert.pooler.dense.weight']
- This IS expected if you are initializing BertForQuestionAnswering from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertForQuestionAnswering from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
You are using a model of type roberta to instantiate a model of type bert. This is not supported for all configurations of models and can yield errors.


model.safetensors:   0%|          | 0.00/499M [00:00<?, ?B/s]

Some weights of BertForQuestionAnswering were not initialized from the model checkpoint at roberta-base and are newly initialized: ['bert.embeddings.LayerNorm.bias', 'bert.embeddings.LayerNorm.weight', 'bert.embeddings.position_embeddings.weight', 'bert.embeddings.token_type_embeddings.weight', 'bert.embeddings.word_embeddings.weight', 'bert.encoder.layer.0.attention.output.LayerNorm.bias', 'bert.encoder.layer.0.attention.output.LayerNorm.weight', 'bert.encoder.layer.0.attention.output.dense.bias', 'bert.encoder.layer.0.attention.output.dense.weight', 'bert.encoder.layer.0.attention.self.key.bias', 'bert.encoder.layer.0.attention.self.key.weight', 'bert.encoder.layer.0.attention.self.query.bias', 'bert.encoder.layer.0.attention.self.query.weight', 'bert.encoder.layer.0.attention.self.value.bias', 'bert.encoder.layer.0.attention.self.value.weight', 'bert.encoder.layer.0.intermediate.dense.bias', 'bert.encoder.layer.0.intermediate.dense.weight', 'bert.encoder.layer.0.output.LayerNorm.bia

model.safetensors:   0%|          | 0.00/440M [00:00<?, ?B/s]

Some weights of BertForQuestionAnswering were not initialized from the model checkpoint at bert-base-uncased and are newly initialized: ['qa_outputs.bias', 'qa_outputs.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


model.safetensors:   0%|          | 0.00/543M [00:00<?, ?B/s]

Some weights of BertForQuestionAnswering were not initialized from the model checkpoint at aubmindlab/bert-base-arabertv02 and are newly initialized: ['qa_outputs.bias', 'qa_outputs.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
Device set to use cuda:0
Device set to use cuda:0
Device set to use cuda:0
Device set to use cuda:0
Device set to use cuda:0


In [4]:
#------elimination of _x000D
def preprocess_text(text):
    # Replace any non-printable or special characters
    text = text.replace('_x000D_', ' ').strip()
    return text
#-----------------------------------------------
#----------------preprocessing data
def preprocess_function(examples, tokenizer):
    questions = [q.strip() for q in examples["question"]]
    contexts = [c.strip() for c in examples["context"]]
    answers = examples["answers"]

    inputs = tokenizer(
        questions,
        contexts,
        truncation=True,
        padding="max_length",
        max_length=512,
        return_offsets_mapping=True,
        return_tensors="pt"
    )

    offset_mapping = inputs.pop("offset_mapping")
    start_positions_tokenized = []
    end_positions_tokenized = []

    for i, offsets in enumerate(offset_mapping):
        context = contexts[i]
        answer = answers[i]

        # Extract the text of the answer
        if isinstance(answer, dict):
            answer_text = answer["text"]
        else:
            answer_text = answer

        # Find the start character index of the answer text in the context
        start_char = context.find(answer_text)

        # If the answer is not found, you can handle it by defaulting to 0 or skipping
        if start_char == -1:
            # Here, we skip the entry
            print(f"Answer not found in context for entry {i}. Skipping.")
            start_positions_tokenized.append(0)
            end_positions_tokenized.append(0)
            continue

        end_char = start_char + len(answer_text)

        # Find the start and end token indices based on offsets
        start_token = None
        end_token = None

        for idx, (start, end) in enumerate(offsets):
            if start <= start_char < end:
                start_token = idx
            if start < end_char <= end:
                end_token = idx
                break

        # Default to the first token if not found (for robustness)
        if start_token is None:
            start_token = 0
        if end_token is None:
            end_token = 0

        start_positions_tokenized.append(start_token)
        end_positions_tokenized.append(end_token)

    inputs["start_positions"] = start_positions_tokenized
    inputs["end_positions"] = end_positions_tokenized

    return inputs
#----------------------------------------------------------------------
#------------------getting answer
def get_answer_bert(paragraph, question, model, s_p, e_p, tokenizer):
    model.to("cuda")
    if paragraph and isinstance(paragraph, str) and paragraph.strip():
        # Combine question and paragraph into a single input string
        input_text = f"Context: {paragraph}\nQuestion: {question}\nAnswer:"
    else:
        # If no paragraph is provided, only use the question
        input_text = f"Question: {question}\nAnswer:"
    # Tokenize the input
    inputs = tokenizer(
        input_text,
        return_tensors='pt',
        padding='max_length',  # Adjust padding based on max length
        truncation=True,
        max_length=512
    ).to("cuda")
    input_ids = inputs["input_ids"]
    # Perform text generation (inference) with max_new_tokens for generated output
    with torch.no_grad():
        outputs = model(**inputs)

    # Extract the logits for start and end positions
    start_logits = outputs.start_logits
    end_logits = outputs.end_logits

    # Get the most likely start and end indices
    start_index = torch.argmax(outputs.start_logits)
    end_index = torch.argmax(outputs.end_logits)

    # Convert indices to tokens and decode
    answer_ids = input_ids[0][start_index:end_index + 1]
    answer = tokenizer.decode(answer_ids)

    # Return the answer, predicted spans, and reference spans
    pred_spans = [(start_index.item(), end_index.item())]
    ref_spans = [(s_p, e_p)]

    return answer, pred_spans, ref_spans
#----------------------------------------
#--------------------------------------evaluation of answers
def span_match(predictions, references, pred_spans, ref_spans):
    """
    Calculate the Span Matching (SM) score.

    Args:
        predictions (list of str): The list of predicted answers.
        references (list of str): The list of actual answers.
        pred_spans (list of tuples): The list of predicted spans (start, end).
        ref_spans (list of tuples): The list of reference spans (start, end).

    Returns:
        float: The Span Matching score as a percentage.
    """
    assert len(predictions) == len(references) == len(pred_spans) == len(ref_spans), "All input lists must be of the same length"

    def spans_match(pred_span, ref_span):
        return pred_span == ref_span

    span_matches = [1 if spans_match(ps, rs) else 0 for ps, rs in zip(pred_spans, ref_spans)]
    sm_score = sum(span_matches) / len(references) * 100
    return sm_score


def evaluate_answers(predicted, actual, pred_spans, ref_spans):
    references = [actual.split(' ')]
    predictions = [predicted.split(' ')]
    blue = sentence_bleu(references, predicted.split(' '))
    r_scorer = rouge_scorer.RougeScorer(['rouge1', 'rouge2', 'rougeL'], use_stemmer=True)
    # Compute the ROUGE scores
    rouge = r_scorer.score(actual, predicted)
    rouge1 = rouge['rouge1']
    rouge2 = rouge['rouge2']
    rougeL = rouge['rougeL']
    exact_matches = [1 if p == r else 0 for p, r in zip(predictions, references)]
    em_score = sum(exact_matches) / len(references) * 100
    sm_score = span_match(predictions, references, pred_spans, ref_spans)
    return blue, rouge1.precision, rouge1.recall, rouge1.fmeasure, rouge2.precision, rouge2.recall, rouge2.fmeasure, rougeL.precision, rougeL.recall, rougeL.fmeasure, em_score,sm_score
#-----------------------------------------------------------------------------
#-------------------------------------------------saving results to csv file
def write_to_csv(filename, field_names, data):
    # Check if the file exists
    file_exists = False
    try:
        with open(filename, 'r', encoding='utf-8-sig') as file:
            file_exists = True
    except FileNotFoundError:
        file_exists = False

    # Open the CSV file in the appropriate mode with UTF-8 encoding
    mode = 'a' if file_exists else 'w'
    with open(filename, mode, newline='', encoding='utf-8-sig') as file:
        writer = csv.writer(file)

        # Write a new line if the file is empty
        if not file_exists:
            writer.writerow(field_names)  # Example column headers

        # Write the data to the file
        writer.writerow(data)
#----------------------------------------------------------
# --------------------------------define the metrics for evaluation
def compute_metrics(pred):
    predictions_tuple, label_ids_tuple = pred

    # Flatten predictions and labels for metric computation
    all_predictions = []
    all_labels = []

    for predictions, label_ids in zip(predictions_tuple, label_ids_tuple):
        predicted_labels = np.argmax(predictions, axis=-1)

        all_predictions.extend(predicted_labels)
        all_labels.extend(label_ids)

    # Calculate metrics
    accuracy = np.mean(np.array(all_predictions) == np.array(all_labels))
    precision = precision_score(all_labels, all_predictions, average='weighted')
    recall = recall_score(all_labels, all_predictions, average='weighted')
    f1 = f1_score(all_labels, all_predictions, average='weighted')

    return {
        "accuracy": accuracy,
        "precision": precision,
        "recall": recall,
        "f1": f1
    }
#---------------------------------------------------------------------------
#---------------------------------define the callback class
from transformers import TrainerCallback
class SaveMetricsCallback(TrainerCallback):
    def __init__(self, filename):
        self.filename = filename
        self.headers_written = False

    def on_evaluate(self, args, state, control, metrics=None, **kwargs):
        if metrics:
            with open(self.filename, 'a', newline='', encoding='utf-8-sig') as csvfile:
                writer = csv.writer(csvfile)
                if not self.headers_written:
                    # Write headers only once
                    writer.writerow(['epoch', 'eval_loss', 'eval_accuracy', 'eval_precision', 'eval_recall', 'eval_f1'])
                    self.headers_written = True
                writer.writerow([state.epoch, metrics.get('eval_loss'), metrics.get('eval_accuracy'), metrics.get('eval_precision'), metrics.get('eval_recall'), metrics.get('eval_f1')])

# Example compute_metrics function (adjust as per your requirements)
def compute_metrics(pred):
    predictions_tuple, label_ids_tuple = pred

    accuracies = []
    for predictions, label_ids in zip(predictions_tuple, label_ids_tuple):
        accuracy = (np.argmax(predictions, axis=-1) == label_ids).astype(float).mean().item()
        accuracies.append(accuracy)

    return {"accuracy": np.mean(accuracies)}  # Mean accuracy across all predictions
#-----------------------------------------------------------------------
#--------------------------------------------testing model
def test(dataset_name, dataset_sample, model_name, model, tokenizer, case):
    for data in dataset_sample:
        paragraph = data["context"]
        question = data["question"]
        start_p = data["start_positions"]

        # Debugging print to understand the structure of answers
        print("Answers field:", data["answers"])

        # Check if 'answers' is a string or a dictionary
        if isinstance(data["answers"], dict):
            actual_answer = data["answers"]["text"]  # Assuming answers is a dictionary with "text" key
        else:
            actual_answer = data["answers"]  # Assuming answers is a string
        end_p = start_p + len(actual_answer)
        # Get answer without paragraph
        st = time.time()
        if question:
            answer_without_paragraph, pred_spans_without, ref_spans_without = get_answer_bert(" ", question, model, 0, 0, tokenizer)
        else:
            answer_without_paragraph, pred_spans_without, ref_spans_without = " ", [(0,0)], [(0,0)]
        time_without = time.time() - st
        # Get answer with paragraph
        st = time.time()
        if question:
            answer_with_paragraph, pred_spans_with, ref_spans_with = get_answer_bert(paragraph, question, model, 0, 0, tokenizer)
        else:
            answer_with_paragraph, pred_spans_with, ref_spans_with = " ", [(0,0)], [(0,0)]
        time_with = time.time() - st
        print(f"Question: {question}")
        print(f"Actual Answer: {actual_answer}")
        print(f"Answer without Paragraph: {answer_without_paragraph}")
        print(f"Answer with Paragraph: {answer_with_paragraph}")
        print("-------")
        # Evaluate answers and prepare data for CSV
        blue_without, precision1_without, recall1_without, fmeasure1_without, precision2_without, recall2_without, fmeasure2_without, precisionL_without, recallL_without, fmeasureL_without, em_without, sm_without = evaluate_answers(answer_without_paragraph, actual_answer, pred_spans_without, ref_spans_without)
        row_without = [dataset_name, model_name, question, 'without paragraph', actual_answer, answer_without_paragraph, blue_without, precision1_without, recall1_without, fmeasure1_without, precision2_without, recall2_without, fmeasure2_without, precisionL_without, recallL_without, fmeasureL_without, em_without, sm_without, time_without, case]

        blue_with, precision1_with, recall1_with, fmeasure1_with, precision2_with, recall2_with, fmeasure2_with, precisionL_with, recallL_with, fmeasureL_with, em_with, sm_with = evaluate_answers(answer_with_paragraph, actual_answer, pred_spans_with, ref_spans_with)
        row_with = [dataset_name, model_name, question, 'with paragraph', actual_answer, answer_with_paragraph, blue_with, precision1_with, recall1_with, fmeasure1_with, precision2_with, recall2_with, fmeasure2_with, precisionL_with, recallL_with, fmeasureL_with, em_with, sm_with, time_with, case]
        # Write to CSV
        write_to_csv('results.csv', metrics_headers, row_without)
        write_to_csv('results.csv', metrics_headers, row_with)
#-----------------------------------------------------------------------------
#--------------------------------------------------training parameters
training_args = TrainingArguments(
    output_dir='./results',
    eval_strategy="epoch", # Changed from EvaluationStrategy
    learning_rate=0.1,
    per_device_train_batch_size=2,
    per_device_eval_batch_size=2,
    num_train_epochs=100,
    weight_decay=0.01,
)
#------------------------------------------------------------------------------

In [None]:

for file_path in file_paths:
    dataset_name = file_path.split('.')[0]
    preprocessed_dataset = 'qa_dataset_prepared' + '_' + dataset_name + '.csv'
    if os.path.exists(file_path):
        df = pd.read_excel(file_path, sheet_name='Sheet1')  # Replace 'Sheet1' with your sheet name
        # Example to convert the DataFrame to a list of dictionaries
        data = []
        for index, row in df.iterrows():
            start_position =  row["context"].find(row["answers"])
            data.append({
                "context": row["context"],
                "question": row["question"],
                "start_positions": start_position,
                "answers": row["answers"]
            })

        # Convert to pandas DataFrame and then to Dataset
        df_prepared = pd.DataFrame(data)
        df_prepared.to_csv(preprocessed_dataset, index=False, encoding='utf-8-sig')

    # Load the dataset
    dataset = load_dataset('csv', data_files=preprocessed_dataset)
    split_dataset = dataset['train'].train_test_split(test_size=0.2)  # Adjust test_size as needed
    dataset = DatasetDict({
        'train': split_dataset['train'],
        'test': split_dataset['test']
    })
    # Sample data for debugging
    dataset_sample = []
    for i in range(len(dataset["test"])):
        dataset_sample.append({
        "context": dataset["test"]["context"][i],
        "question": dataset["test"]["question"][i],
        "answers": dataset["test"]["answers"][i],
        "start_positions": dataset["test"]["start_positions"][i] })
    for i in range(len(models_org)):
        test(dataset_name, dataset_sample, models[i], models_org[i], tokenizers[i], 'before tuning')
        tokenized_datasets = dataset.map(lambda examples: preprocess_function(examples, tokenizers[i]), batched=True)
        trained_name = './fine_tuned_bert_'+ models[i] + '_' + dataset_name
        if os.path.isdir(trained_name) == False:
            model = models_org[i]
            model.to("cuda")
            tokenizer = tokenizers[i]
            trainer = Trainer(
                model=model,
                args=training_args,
                train_dataset=tokenized_datasets['train'],
                eval_dataset=tokenized_datasets['test'],
                compute_metrics=compute_metrics,

            )

            # Add the metrics saving callback before training starts
            trainer.add_callback(SaveMetricsCallback('training_metrics.csv'))

            # Start training
            trainer.train()

            # Evaluate after training
            trainer.evaluate()

            # Save the fine-tuned model and tokenizer
            model.save_pretrained(trained_name)
            tokenizer.save_pretrained(trained_name)
        model_trained = BertForQuestionAnswering.from_pretrained(trained_name)
        tokenizer_trained = AutoTokenizer.from_pretrained(trained_name, use_fast = True)
        # AutoTokenizer.from_pretrained(trained_name, use_fast = True)
        qa_pipeline_trained = pipeline("question-answering", model=model_trained, tokenizer=tokenizer_trained)
        test(dataset_name, dataset_sample, models[i], model_trained, tokenizer_trained, 'after tuning')

Generating train split: 0 examples [00:00, ? examples/s]

Answers field: عملها لسنوات عديدة في الساحة الموسيقية الدنماركية
Question: ما سبب حصول ان لينيت على جائزة الشرف من الاتحاد الدولي للصناعة الفونوغرافية ؟
Actual Answer: عملها لسنوات عديدة في الساحة الموسيقية الدنماركية
Answer without Paragraph: 
Answer with Paragraph: 
-------
Answers field: شارك مع منتخب إنجلترا تحت 18 سنة لكرة القدم ومنتخب إنجلترا تحت 19 سنة لكرة القدم.


  return forward_call(*args, **kwargs)


Question: كيف كانت مشاركة اللاعب سيمون إيستوود مع منتخب انجلترا؟
Actual Answer: شارك مع منتخب إنجلترا تحت 18 سنة لكرة القدم ومنتخب إنجلترا تحت 19 سنة لكرة القدم.
Answer without Paragraph: 
Answer with Paragraph: 
-------
Answers field: يوجد بها مسجد يعود بناءه إلى عام 1848 حسب ما يروى
Question: لماذا تعد الفائدية من أقدم مناطق الجبل  الأخضر؟
Actual Answer: يوجد بها مسجد يعود بناءه إلى عام 1848 حسب ما يروى
Answer without Paragraph: 
Answer with Paragraph: يربط بين مدينة المرج ولملودة كما يوجد طريق يربط الفائدية بالطريق الرئيسي البيضاء شحات وهذا طوله حوالي 13 كم كما يوجد في منطقة الفائدية اثار موجودة إلى الآن ( القلعة ) وهي تعود للعهد العثماني ومنطقة الفائدية هي من أقدم مناطق الجبل الأخضر حيث يوجد بها مسجد يعود بناءه إلى عام 1848 حسب ما يروى ، غير ان كثير من الناس لايصلون فيه لوجد قبور كثيرة تحيط يالمسجد وتحيط به كما تعتبر الفائدية مسرحا لعدد كثير من المعارك التي دارت بين الطليان والمجاهدين تتميز الفائدية بارتفاع شاهق مما يجعل المناخ شتاء بارد جدا وفي الصيف معتدل جدا كما يمكنك وانت واقف 

The hypothesis contains 0 counts of 2-gram overlaps.
Therefore the BLEU score evaluates to 0, independently of
how many N-gram overlaps of lower order it contains.
Consider using lower n-gram order or use SmoothingFunction()
The hypothesis contains 0 counts of 3-gram overlaps.
Therefore the BLEU score evaluates to 0, independently of
how many N-gram overlaps of lower order it contains.
Consider using lower n-gram order or use SmoothingFunction()
The hypothesis contains 0 counts of 4-gram overlaps.
Therefore the BLEU score evaluates to 0, independently of
how many N-gram overlaps of lower order it contains.
Consider using lower n-gram order or use SmoothingFunction()


[1;30;43mStreaming output truncated to the last 5000 lines.[0m
Actual Answer:  حزمة الأشعة المستخدمة هنا تكون عبارة عن نيترونات بدلاً من الفوتونات.
Answer without Paragraph: 
Answer with Paragraph: 
-------
Answers field: الانكماش الزجاجي 
Question: ما سبب الأبصار الومضي؟ 
Actual Answer: الانكماش الزجاجي 
Answer without Paragraph: 
Answer with Paragraph: الزجاجي أو التسييل والذي هو من الأسباب الأكثر شيوعا ل
-------
Answers field: ونها الدولة المستضيفة للمسابقة._x000D_

Question: لماذا تجاوزت بولندا تصفيات بطولة اوربا ؟
Actual Answer: ونها الدولة المستضيفة للمسابقة._x000D_

Answer without Paragraph: 
Answer with Paragraph: ##ست المنتخبات فيما بينها لتحديد هوية المتأهلين إلى بطولة أوروبا تحت 21 سنة لكرة القدم 2017. تأهل 11 منتخب من هذه التصفيات إلى المسابقة النهائية بالإضافة إلى بولندا التي تأهلت تلقائيا دون خوض التصفيات ، كونها الدولة المست
-------
Answers field: تأليفه لقصص مشهورة مثل جومانجي في سنة 1981 
Question: ما سبب ذياع صيت آلسبورغ؟
Actual Answer: تأليفه لقصص مشهورة مثل جومانج

Map:   0%|          | 0/4450 [00:00<?, ? examples/s]

Answer not found in context for entry 23. Skipping.
Answer not found in context for entry 91. Skipping.
Answer not found in context for entry 94. Skipping.
Answer not found in context for entry 101. Skipping.
Answer not found in context for entry 258. Skipping.
Answer not found in context for entry 492. Skipping.
Answer not found in context for entry 568. Skipping.
Answer not found in context for entry 760. Skipping.
Answer not found in context for entry 781. Skipping.
Answer not found in context for entry 831. Skipping.
Answer not found in context for entry 932. Skipping.
Answer not found in context for entry 972. Skipping.
Answer not found in context for entry 985. Skipping.
Answer not found in context for entry 991. Skipping.
Answer not found in context for entry 16. Skipping.
Answer not found in context for entry 112. Skipping.
Answer not found in context for entry 119. Skipping.
Answer not found in context for entry 132. Skipping.
Answer not found in context for entry 160. Skippin

Map:   0%|          | 0/1113 [00:00<?, ? examples/s]

Answer not found in context for entry 19. Skipping.
Answer not found in context for entry 106. Skipping.
Answer not found in context for entry 123. Skipping.
Answer not found in context for entry 163. Skipping.
Answer not found in context for entry 164. Skipping.
Answer not found in context for entry 188. Skipping.
Answer not found in context for entry 235. Skipping.
Answer not found in context for entry 243. Skipping.
Answer not found in context for entry 269. Skipping.
Answer not found in context for entry 338. Skipping.
Answer not found in context for entry 358. Skipping.
Answer not found in context for entry 442. Skipping.
Answer not found in context for entry 453. Skipping.
Answer not found in context for entry 501. Skipping.
Answer not found in context for entry 507. Skipping.
Answer not found in context for entry 564. Skipping.
Answer not found in context for entry 594. Skipping.
Answer not found in context for entry 598. Skipping.
Answer not found in context for entry 656. Skip

<IPython.core.display.Javascript object>

[34m[1mwandb[0m: Logging into wandb.ai. (Learn how to deploy a W&B server locally: https://wandb.me/wandb-server)
[34m[1mwandb[0m: You can find your API key in your browser here: https://wandb.ai/authorize?ref=models
wandb: Paste an API key from your profile and hit enter:

 ··········


[34m[1mwandb[0m: No netrc file found, creating one.
[34m[1mwandb[0m: Appending key for api.wandb.ai to your netrc file: /root/.netrc
[34m[1mwandb[0m: Currently logged in as: [33mmostafahosen042[0m ([33mmostafahosen042-freelancer[0m) to [32mhttps://api.wandb.ai[0m. Use [1m`wandb login --relogin`[0m to force relogin


Epoch,Training Loss,Validation Loss,Accuracy
1,6.6049,,0.191824


  return forward_call(*args, **kwargs)
  return forward_call(*args, **kwargs)
  return forward_call(*args, **kwargs)
  return forward_call(*args, **kwargs)
