# NLP Assignment 5
Created by Prof. [Mohammad M. Ghassemi](https://ghassemi.xyz)

Submitted by: <span style="color:red"> INSERT YOUR NAME HERE </span>

In collaboration with: <span style="color:red"> INSERT YOUR (OPTIONAL) HOMEWORK PARTNER'S NAME HERE </span>

<hr> 

In [1]:
import importlib
from materials.code import utils
import matplotlib.pyplot as plt
import os
import requests

# IMPORT SOME BASIC TOOLS:
from pprint import pprint
import pyarrow

<hr>

# Part 3: Transformers
If you understand how the basic Transformer model worked from the lecture, then you are equipped to understand nearly all of the variants of Transformer-based NLP. Huggingface does a good job of implementing many contemporary [Transformers in PyTorch](https://huggingface.co/transformers/) and even provides links to the papers for the transformer models. 

<br> In practice, training these models requires very large datasets and computational power far beyond what is tenable in our course, and even in many academic research labs! For this reason, these models tend to be used as a starting point, and are fine tuned for specific problems. That is, we use the weights of the model trained by Google, Facebook, or some other rich entity with TPU arrays as a starting point, but then we tune those weights using some new data that we have for a task we care about. 

<br> Because the pace of NLP research is so fast, there are many models to choose from and the landscape is continuously changing. So, in this section I would like you to learn how to navigate the NLP literature to become familiar with the many flavors of transformer models, and develop some intuition for their differences, and how to tune them.

<hr> 

## Learning Exercise 3: 
#### Worth 3/5 Points
#### A. Summarize contemporary transformers
For each of the 32 [Transformer models referenced here](https://huggingface.co/transformers/), provide a brief description of the model's approach, and any advantages or disadvantages of the model. Here's an example of what you might provide for the [ALBERT](https://arxiv.org/abs/1909.11942) model.


1. __ALBERT__:
  * __Approach__: 2 design changes over BERT - 
    * (1) Input-level embeddings are context-independent, and hidden-layer embeddings are context-dependent allowing the embedding matrix to be relatively-low dimensional compared to prior approaches; 
    * (2) parameters are shared across layers to reduce number of parameters.
  * __Advantage__: almost 90% reduction in the number of model parameters compared to the BERT-base model, 
  * __Disadvantage__: small reduction in model performance across reported benchmarks


> 

<span style="color:red"> INSERT WRITE UP HERE </span>


<hr>

#### B. Tune a Transformer

Using the skeleton code below as a starting point, tune the `BertSequenceModel` for a problem of your choice. If you need data, you can take take a look at the `load_data` function from earlier in the tutorial. Compare the performance of the tuned model against the un-tuned model. Comment on any differences that exist.

In [None]:
from transformers import BertForSequenceClassification, Trainer, TrainingArguments
model = BertForSequenceClassification.from_pretrained("bert-large-uncased")

In [None]:
from sklearn.metrics import accuracy_score, precision_recall_fscore_support

training_args = TrainingArguments(
    output_dir                  ='./results',    # output directory
    num_train_epochs            =3,              # total # of training epochs
    per_device_train_batch_size =16,             # batch size per device during training
    per_device_eval_batch_size  =64,             # batch size for evaluation
    warmup_steps                =500,            # number of warmup steps for learning rate scheduler
    weight_decay                =0.01,           # strength of weight decay
    logging_dir                 ='./logs',       # directory for storing logs
)

trainer = Trainer(
    model         =model,               # the instantiated Transformers model to be trained
    args          =training_args,       # training arguments, defined above
    train_dataset =train_dataset,       # training dataset
    eval_dataset  =test_dataset         # evaluation dataset
)

def compute_metrics(pred):
    labels = pred.label_ids
    preds = pred.predictions.argmax(-1)
    precision, recall, f1, _ = precision_recall_fscore_support(labels, preds, average='binary')
    acc = accuracy_score(labels, preds)
    return {
        'accuracy': acc,
        'f1': f1,
        'precision': precision,
        'recall': recall
    }

<span style="color:red"> INSERT AN INTERPRETATION OF YOUR RESULTS HERE </span>

<hr>
<h1><span style="color:red"> Self Assessment </span></h1>
Please provide an assessment of how successfully you accomplished the learning exercises in this assignment according to the instruction provided; do not assign yourself points for effort. This self assessment will be used as a starting point when I grade your assignments. Please note that if you over-estimate your grade on a given learning exercise, you will face a 50% penalty on the total points granted for that exercise. If you underestimate your grade, there will be no penalty.

* Learning Exercise: 
    * <span style="color:red">X</span>/3 points