# Question Answering Examples on SQuAD

Based on the script [`run_qa.py`](https://github.com/huggingface/transformers/blob/main/examples/pytorch/question-answering/run_qa.py).

**Note:** This script only works with models that have a fast tokenizer (backed by the 🤗 Tokenizers library) as it
uses special features of those tokenizers. You can check if your favorite model has a fast tokenizer in
[this table](https://huggingface.co/transformers/index.html#supported-frameworks).

`run_qa.py` allows you to fine-tune any supported model on the SQUAD dataset or another question-answering dataset of the `datasets` library or your own csv/jsonlines files as long as they are structured the same way as SQUAD. You might need to tweak the data processing inside the script if your data is structured differently.

Note that if your dataset contains samples with no possible answers (like SQUAD version 2), you need to pass along the flag `--version_2_with_negative`.

### Grab the datasets library and Model-References

In [None]:
%cd ~/Gaudi-tutorials/PyTorch/Single_card_tutorials
!git clone https://github.com/HabanaAI/Model-References.git

### Clone the Optimum-Habana project and check out 1.11.1 release. This repository gives access to the examples that are optimized for Intel Gaudi:

In [None]:
%cd ~/Gaudi-tutorials/PyTorch/Single_card_tutorials
!git clone -b v1.11.1 https://github.com/huggingface/optimum-habana.git

### Install Optimum-Habana library. This will install the library that works with this example:

In [None]:
import sys
!{sys.executable} -m pip install optimum-habana==1.11.1

### The following example is based on the Optimum-Habana Q&A example. Change to the question-answering directory and install the additional software requirements for this specific example:

In [None]:
%cd ~/Gaudi-tutorials/PyTorch/Single_card_tutorials/optimum-habana/examples/question-answering/
!{sys.executable} -m pip install --quiet -r requirements.txt

### All of this so you don't have to type your home directory name :(

In [9]:
import os

def get_home_directory_name():
    """
    Determines the name of the user's home directory.

    Returns:
    str: The name of the user's home directory.
    """
    try:
        home_directory_path = os.path.expanduser("~")
        home_directory_name = os.path.basename(home_directory_path)
        return home_directory_name
    except Exception as e:
        print(f"An unexpected error occurred: {e}")
        return None

def change_to_directory(path):
    """
    Changes the current working directory to the given path.

    Parameters:
    path (str): The path to change the current working directory to.

    Returns:
    None
    """
    try:
        os.chdir(path)
        print(f"Changed directory to: {path}")
    except FileNotFoundError:
        print(f"Error: The directory {path} does not exist.")
    except PermissionError:
        print(f"Error: Permission denied to change to directory {path}.")
    except Exception as e:
        print(f"An unexpected error occurred: {e}")

# Example usage
if __name__ == "__main__":
    home_directory_name = get_home_directory_name()
    if home_directory_name:
        print(f"Home directory name: {home_directory_name}")
        
        # Construct the full path to the home directory
        home_directory_path = os.path.expanduser("~")
        
        # Specified path to append to the home directory
        specified_path = "Gaudi-tutorials/PyTorch/Single_card_tutorials/optimum-habana/examples/question-answering/"
        
        # Construct the full path to the specified directory
        full_specified_path = os.path.join(home_directory_path, specified_path)
        
        # Change to the specified directory
        change_to_directory(full_specified_path)
    else:
        print("Failed to determine the home directory name.")

Home directory name: uf5476f2787b2d9fde60f1fac6eeb06f
Changed directory to: /home/uf5476f2787b2d9fde60f1fac6eeb06f/Gaudi-tutorials/PyTorch/Single_card_tutorials/optimum-habana/examples/question-answering/


In [10]:
print(os.getcwd())

/home/uf5476f2787b2d9fde60f1fac6eeb06f/Gaudi-tutorials/PyTorch/Single_card_tutorials/optimum-habana/examples/question-answering


### Execute Fine-Tuning of Bert using the SQuAD Dataset.  

Bert is actually trained on wikipedia.  This will enhance it to be familiar with the Stanford Q&A dataset.


In [None]:
%run run_qa.py \
  --model_name_or_path bert-large-uncased-whole-word-masking \
  --gaudi_config_name Habana/bert-large-uncased-whole-word-masking \
  --dataset_name squad \
  --do_train \
  --do_eval \
  --per_device_train_batch_size 24 \
  --per_device_eval_batch_size 8 \
  --learning_rate 3e-5 \
  --num_train_epochs 2 \
  --max_seq_length 384 \
  --doc_stride 128 \
  --output_dir ~/Gaudi-tutorials/PyTorch/Single_card_tutorials/squad \
  --use_habana \
  --use_lazy_mode \
  --use_hpu_graphs_for_inference \
  --throughput_warmup_steps 3 \
  --bf16

### Inference Example Run

For some reason I need to set the run from directory again.

In [11]:
# Example usage
if __name__ == "__main__":
    home_directory_name = get_home_directory_name()
    if home_directory_name:
        print(f"Home directory name: {home_directory_name}")
        
        # Construct the full path to the home directory
        home_directory_path = os.path.expanduser("~")
        
        # Specified path to append to the home directory
        specified_path = "Gaudi-tutorials/PyTorch/Single_card_tutorials/optimum-habana/examples/question-answering/"
        
        # Construct the full path to the specified directory
        full_specified_path = os.path.join(home_directory_path, specified_path)
        
        # Change to the specified directory
        change_to_directory(full_specified_path)
    else:
        print("Failed to determine the home directory name.")

Home directory name: uf5476f2787b2d9fde60f1fac6eeb06f
Changed directory to: /home/uf5476f2787b2d9fde60f1fac6eeb06f/Gaudi-tutorials/PyTorch/Single_card_tutorials/optimum-habana/examples/question-answering/


In [None]:
print(os.getcwd())

In [None]:
cmd = f'python3 run_qa.py \
  --model_name_or_path bert-large-uncased-whole-word-masking \
  --gaudi_config_name Habana/bert-large-uncased-whole-word-masking \
  --dataset_name squad \
  --do_eval \
  --per_device_eval_batch_size 8 \
  --max_seq_length 384 \
  --doc_stride 128 \
  --output_dir /home/uf5476f2787b2d9fde60f1fac6eeb06f/Gaudi-tutorials/PyTorch/Single_card_tutorials/squad_infer_results \
  --use_habana \
  --use_lazy_mode \
  --use_hpu_graphs_for_inference \
  --bf16 '

### The Below executes the Inference setup with the above parameters.

The --output_dir has the best model results

In [None]:
print(cmd)
import os
os.system(cmd)