# Extension: DeBERTa-v3-Large for SQuAD QA

We created this extension with the intention to compare our baseline Dynamic Coattention Network with a state-of-the-art transformer architecture and evaluate its performance on the SQuAD v1.1 dataset.

## Why DeBERTa

It significantly improves upon earlier architectures like BERT and RoBERTa.

The attention score between two words is decomposed into four components: content-to-content, content-to-position, position-to-content, and position-to-position.

While the mentioned disentangled attention captures relative positions, Enhanced Mask Decoder incorporates absolute positions before softmax layer to improve the model's understanding of global syntax.

## Evaluation parameters and Configuration

Model ID: deepset/deberta-v3-large-squad2

max_seq_len: 512 - This is the standard token limit for Tranformers

doc_stride: 128  - This implements a sliding window with overlap to ensure that context is preserved for longer tokens

batch_size: 16 - We used batching to maximize GPU throughput

## Sliding Window

Since we used context which exceeds the model's token limit (we used 600 in the main model), we utilized a sliding window strategy:

### Windowing:

The context is split into overlapping chunks of 512 tokens.

### Stride:

The doc_stride ensures that is an answer span is cut off, will likely appear fully in the next window. This prevents context loss.


## Batching:

To resolve the bottleneck of sequential GPU processing, we implemented pipelined batching.

## Final results

After running the official evaluation script, we achieved the following results with this model:

### Exact Match: 87,46%

### F1 Score: 94,04%




In [1]:
%load_ext autoreload
%autoreload 2

In [2]:
!pip install transformers datasets gradio torch tf-keras ipywidgets

Collecting transformers
  Downloading transformers-4.46.3-py3-none-any.whl.metadata (44 kB)
Collecting datasets
  Downloading datasets-3.1.0-py3-none-any.whl.metadata (20 kB)
Collecting gradio
  Downloading gradio-4.44.1-py3-none-any.whl.metadata (15 kB)
Collecting torch
  Downloading torch-2.4.1-cp38-cp38-win_amd64.whl.metadata (27 kB)
Collecting tf-keras
  Downloading tf_keras-2.15.1-py3-none-any.whl.metadata (1.7 kB)
Collecting ipywidgets
  Using cached ipywidgets-8.1.8-py3-none-any.whl.metadata (2.4 kB)
Collecting huggingface-hub<1.0,>=0.23.2 (from transformers)
  Downloading huggingface_hub-0.36.0-py3-none-any.whl.metadata (14 kB)
Collecting pyyaml>=5.1 (from transformers)
  Downloading PyYAML-6.0.3-cp38-cp38-win_amd64.whl.metadata (2.2 kB)
Collecting requests (from transformers)
  Using cached requests-2.32.4-py3-none-any.whl.metadata (4.9 kB)
Collecting tokenizers<0.21,>=0.20 (from transformers)
  Downloading tokenizers-0.20.3-cp38-none-win_amd64.whl.metadata (6.9 kB)
Collecting

In [3]:
from extension import DeBERTaExtension
import gradio as gr
import json
from tqdm import tqdm

In [5]:
model = DeBERTaExtension()

Generating validation split: 0 examples [00:00, ? examples/s]

tokenizer_config.json:   0%|          | 0.00/392 [00:00<?, ?B/s]

To support symlinks on Windows, you either need to activate Developer Mode or to run Python as an administrator. In order to activate developer mode, see this article: https://docs.microsoft.com/en-us/windows/apps/get-started/enable-your-device-for-development
Xet Storage is enabled for this repo, but the 'hf_xet' package is not installed. Falling back to regular HTTP download. For better performance, install the package with: `pip install huggingface_hub[hf_xet]` or `pip install hf_xet`


spm.model:   0%|          | 0.00/2.46M [00:00<?, ?B/s]

Xet Storage is enabled for this repo, but the 'hf_xet' package is not installed. Falling back to regular HTTP download. For better performance, install the package with: `pip install huggingface_hub[hf_xet]` or `pip install hf_xet`


tokenizer.json:   0%|          | 0.00/8.65M [00:00<?, ?B/s]

added_tokens.json:   0%|          | 0.00/18.0 [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/156 [00:00<?, ?B/s]

config.json: 0.00B [00:00, ?B/s]

Xet Storage is enabled for this repo, but the 'hf_xet' package is not installed. Falling back to regular HTTP download. For better performance, install the package with: `pip install huggingface_hub[hf_xet]` or `pip install hf_xet`


model.safetensors:   0%|          | 0.00/1.74G [00:00<?, ?B/s]

ValueError: 0 unrecognized or not available.

In [None]:
formatted_data = []
question_ids = [] # we track the question IDs here

for article in model.dataset:
    for paragraph in article['paragraphs']:
        for qa in paragraph['qas']:
            formatted_data.append({
                # question and context arguments because of the pipeline
                "question": qa['question'],
                "context": paragraph['context']
            })
            question_ids.append(qa['id'])

predictions = {}


# here we use bacthing to improve CUDA processing
for i, result in enumerate(tqdm(model.qa_pipeline(
                                    formatted_data, 
                                    batch_size=16,
                                    max_seq_len=512, 
                                    doc_stride=128), 
                               total=len(formatted_data), 
                               desc="Running Batched DeBERTa")):
    
    # here we get the corresponding question IDs
    q_id = question_ids[i]
    predictions[q_id] = result['answer']

# we save the json file for evaluation
with open('deberta_predictions.json', 'w') as f:
    json.dump(predictions, f)

In [6]:
!python evaluate-v1.1.py dev-v1.1.json deberta_predictions.json

{"exact_match": 87.46452223273415, "f1": 94.04179570735954}
