# Homework Machine Model Setup

Transformers are commonly used for question answering problems. As a type of neural network, transformers can do their job very well when given the right architecture and enough data. Below, we explore some different models using the `transformers` library. Created by Huggingface, you can get pretrained transformers very easily. 

If you want to go the not-so-easy route (and preferably have GPUs/TPUs), you might want to take a look at the short section below where I lay out steps you'd need to take to train an `ALBERT` model on the SQuAD dataset. Note because I have to GPU/TPU, I did not see this approach to completion - I didn't want my laptop running for days on end with potentially subpar results. 

I initially chose ALBERT to use for the Homework Machine since it's lighter weight than BERT which is more well known.

## ALBERT Setup & Tuning

To start off, we'll want to clone the [ALBERT repo](https://github.com/google-research/albert) and following the [fine-tuning on SQuAD](https://github.com/google-research/albert#fine-tuning-on-squad) instructions. The `pip` command below will do it for you. Just don't be like me and be sure to fill out the `...` where appropriate. 

That said, what should we replace the ellipses with? It wasn't super clear to me initially, so I added some comments below to help out anyone who wants to try out this training method. Some are file locations you want to create, others are from files you'll need to download from elsewhere. Let's take a look:

```
pip install -r albert/requirements.txt
python -m albert.run_squad_v2 \
  --albert_config_file=... \            # download an ALBERT model (ex. https://tfhub.dev/google/albert_large/3, 
                                        #   links in the readme of ALBERT) and within its folder, use 
                                        #   \{your_model}\assets\albert_config.json 
                                        #   (your_model = albert_large_3 for the ex)
                                        
  --output_dir=... \                    # output location (not sure what that output is at the moment)
  
  --train_file=... \                    # for squad (v2), this is the location of train-v2.0.json 
                                        #   (downloaded from https://rajpurkar.github.io/SQuAD-explorer/)
                                        
  --predict_file=... \                  # same as above, except use dev-v2.0.json
  
  --train_feature_file=... \            # a file made by TFRecordWriter. I called this file train_feature_file.tf
  
  --predict_feature_file=... \          # assumed to be the same as above
  
  --predict_feature_left_file=... \     # assumed to be the same as above
  
  --init_checkpoint=... \               # as mentioned in the readme, can instead put:
                                        #   --albert_hub_module_handle=https://tfhub.dev/google/albert_base/1
                                        #   I have no checkpoints made so I'm fine with using their suggestion
                                        
  --spm_model_file=... \                # from your downloaded model, something analagous to 
                                        #   \albert_large_3\assets\30k-clean.model 
                                        #   (albert_large_3 is the model I'll be using)
                                        
  --do_lower_case \
  --max_seq_length=384 \
  --doc_stride=128 \
  --max_query_length=64 \
  --do_train \
  --do_predict \
  --train_batch_size=48 \
  --predict_batch_size=8 \
  --learning_rate=5e-5 \
  --num_train_epochs=2.0 \
  --warmup_proportion=.1 \
  --save_checkpoints_steps=5000 \
  --n_best_size=20 \
  --max_answer_length=30
```
There are a number of settings I didn't touch on - I'm sure the defaults are reasonable but feel free to look into different setting for things like batch size, epochs and learning rate if you want to make training faster or more manageable given your system's RAM. Definitely worth looking into if training time is a concern.

## Transformers Library Question Answering


In [1]:
# mutes ALL warnings
import warnings
warnings.filterwarnings("ignore")

In [2]:
import numpy as np
import pandas as pd
from transformers import pipeline
from transformers import AutoTokenizer, TFAutoModelForQuestionAnswering
import tensorflow as tf

Let's first test out that the library works as desired. We'll be using the example for question answering taken from the [huggingface transformers site](https://huggingface.co/transformers/task_summary.html).

In [3]:
tokenizer = AutoTokenizer.from_pretrained("bert-large-uncased-whole-word-masking-finetuned-squad")
model = TFAutoModelForQuestionAnswering.from_pretrained("bert-large-uncased-whole-word-masking-finetuned-squad")


text = r"""
🤗 Transformers (formerly known as pytorch-transformers and pytorch-pretrained-bert) provides general-purpose
architectures (BERT, GPT-2, RoBERTa, XLM, DistilBert, XLNet…) for Natural Language Understanding (NLU) and Natural
Language Generation (NLG) with over 32+ pretrained models in 100+ languages and deep interoperability between
TensorFlow 2.0 and PyTorch.
"""
questions = [
    "How many pretrained models are available in 🤗 Transformers?",
    "What does 🤗 Transformers provide?",
    "🤗 Transformers provides interoperability between which frameworks?",
]
for question in questions:
    inputs = tokenizer(question, text, add_special_tokens=True, return_tensors="tf")
    input_ids = inputs["input_ids"].numpy()[0]
    outputs = model(inputs)
    answer_start_scores = outputs.start_logits
    answer_end_scores = outputs.end_logits
    answer_start = tf.argmax(
        answer_start_scores, axis=1
    ).numpy()[0]  # Get the most likely beginning of answer with the argmax of the score
    answer_end = (
        tf.argmax(answer_end_scores, axis=1) + 1
    ).numpy()[0]  # Get the most likely end of answer with the argmax of the score
    answer = tokenizer.convert_tokens_to_string(tokenizer.convert_ids_to_tokens(input_ids[answer_start:answer_end]))
    print(f"Question: {question}")
    print(f"Answer: {answer}")

All model checkpoint layers were used when initializing TFBertForQuestionAnswering.

All the layers of TFBertForQuestionAnswering were initialized from the model checkpoint at bert-large-uncased-whole-word-masking-finetuned-squad.
If your task is similar to the task the model of the checkpoint was trained on, you can already use TFBertForQuestionAnswering for predictions without further training.


Question: How many pretrained models are available in 🤗 Transformers?
Answer: over 32 +
Question: What does 🤗 Transformers provide?
Answer: general - purpose architectures
Question: 🤗 Transformers provides interoperability between which frameworks?
Answer: tensorflow 2. 0 and pytorch


Huggingface also has the super easy to use `pipeline` if you don't want a specific model. Below's a very quick demo using the context and questions above.

In [6]:
nlp = pipeline("question-answering")
for question in questions:
    print(f"Question: {question}")
    result = nlp(question=question, context=text)
    print(f"Answer: '{result['answer']}'")


Some layers from the model checkpoint at distilbert-base-cased-distilled-squad were not used when initializing TFDistilBertModel: ['dropout_19', 'qa_outputs']
- This IS expected if you are initializing TFDistilBertModel from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing TFDistilBertModel from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
All the layers of TFDistilBertModel were initialized from the model checkpoint at distilbert-base-cased-distilled-squad.
If your task is similar to the task the model of the checkpoint was trained on, you can already use TFDistilBertModel for predictions without further training.
Some layers from the model checkpoint at distilbert-base-cased-distilled-squad were not use

Question: How many pretrained models are available in 🤗 Transformers?
Answer: 'over 32+'
Question: What does 🤗 Transformers provide?
Answer: 'general-purpose
architectures'
Question: 🤗 Transformers provides interoperability between which frameworks?
Answer: 'TensorFlow 2.0 and PyTorch'


Awesome, it works! Next we'll show an example of using another model not in the example. Find a model you want to use from the [models](https://huggingface.co/models) page and put its name into the `model_string` variable. 

TODO: figure out how to use other models (although I not a necessary step).

In [12]:
model_string = "deepset/xlm-roberta-large-squad2"
nlp = pipeline('question-answering', model=model_string, tokenizer=model_string)

# q = questions[0]
# print(f"Question: {q}")
# result = nlp(question=q, context=text)
# print(f"Answer: '{result['answer']}'")

404 Client Error: Not Found for url: https://huggingface.co/deepset/xlm-roberta-large-squad2/resolve/main/tf_model.h5


OSError: Can't load weights for 'deepset/xlm-roberta-large-squad2'. Make sure that:

- 'deepset/xlm-roberta-large-squad2' is a correct model identifier listed on 'https://huggingface.co/models'

- or 'deepset/xlm-roberta-large-squad2' is the correct path to a directory containing a file named one of tf_model.h5, pytorch_model.bin.

