# Extension: DeBERTa-v3-Large for SQuAD QA

We created this extension with the intention to compare our baseline Dynamic Coattention Network with a state-of-the-art transformer architecture and evaluate its performance on the SQuAD v1.1 dataset.

## Why DeBERTa

It significantly improves upon earlier architectures like BERT and RoBERTa.

The attention score between two words is decomposed into four components: content-to-content, content-to-position, position-to-content, and position-to-position.

While the mentioned disentangled attention captures relative positions, Enhanced Mask Decoder incorporates absolute positions before softmax layer to improve the model's understanding of global syntax.

## Evaluation parameters and Configuration

Model ID: deepset/deberta-v3-large-squad2

max_seq_len: 512 - This is the standard token limit for Tranformers

doc_stride: 128  - This implements a sliding window with overlap to ensure that context is preserved for longer tokens

batch_size: 16 - We used batching to maximize GPU throughput

## Sliding Window

Since we used context which exceeds the model's token limit (we used 600 in the main model), we utilized a sliding window strategy:

### Windowing:

The context is split into overlapping chunks of 512 tokens.

### Stride:

The doc_stride ensures that is an answer span is cut off, will likely appear fully in the next window. This prevents context loss.


## Batching:

To resolve the bottleneck of sequential GPU processing, we implemented pipelined batching.

## Final results

After running the official evaluation script, we achieved the following results with this model:

### Exact Match: 87,46%

### F1 Score: 94,04%




In [None]:
%load_ext autoreload
%autoreload 2

In [None]:
!pip install transformers datasets gradio torch tf-keras ipywidgets

In [2]:
from extension import DeBERTaExtension
import gradio as gr
import json
from tqdm import tqdm

In [3]:
model = DeBERTaExtension()

Device set to use cuda:0


In [None]:
formatted_data = []
question_ids = [] # we track the question IDs here

for article in model.dataset:
    for paragraph in article['paragraphs']:
        for qa in paragraph['qas']:
            formatted_data.append({
                # question and context arguments because of the pipeline
                "question": qa['question'],
                "context": paragraph['context']
            })
            question_ids.append(qa['id'])

predictions = {}


# here we use bacthing to improve CUDA processing
for i, result in enumerate(tqdm(model.qa_pipeline(
                                    formatted_data, 
                                    batch_size=16,
                                    max_seq_len=512, 
                                    doc_stride=128), 
                               total=len(formatted_data), 
                               desc="Running Batched DeBERTa")):
    
    # here we get the corresponding question IDs
    q_id = question_ids[i]
    predictions[q_id] = result['answer']

# we save the json file for evaluation
with open('deberta_predictions.json', 'w') as f:
    json.dump(predictions, f)

In [6]:
!python evaluate-v1.1.py dev-v1.1.json deberta_predictions.json

{"exact_match": 87.46452223273415, "f1": 94.04179570735954}
