# Project 5: Goals and Deliverables

The goals of this assignment are:
* To work with the object oriented version of our corpus code.
* To modify a web app that we can use to analyze text data.
* To finetune a transformer model and write a model card for it.

Here are the steps you should do to successfully complete this project:
1. Check out the assignment from Github. 
2. Make a codespace with at least 8GB of RAM (preferably more!).
3. Copy your `spacy_on_corpus.py` from project 4b.
4. Copy the anvil callable functions and your API key from project 4b into the file `server.py`.
5. Complete all the instructions in this notebook. Make sure to answer all questions, and to commit the notebook in a "run" state!
6. Edit the README.md file. Provide your name, your class year, links to/descriptions of any extensions and a list of resources you used. 
7. Commit your code often. We will take the last commit before the deadline as your submission of the project.

Possible extensions (from least points to most points):
* Modify the `render_document_sentiment` method you implemented for this project to have a third column, `Aspect`. Fill it with the first keyphrase extracted from the sentence using the keyphrase extraction algorithm from project 4b, or with the first noun chunk in the sentence. Explain whether this is better than the baseline implementation for this project, and why.
* Finetune a different model (other than distilbert-cased) for the sentence sentiment task.
* Finetune a transformer model for a different NLP task. Add it to your web app.
* Your other ideas are welcome! If you'd like to discuss one with Dr Stent, feel free.

# Set Up



1. Copy your `spacy_on_corpus.py` from project 4b.
2. Copy the anvil callable functions from project 4b into the file `server.py`.
3. Run % `pip install -r requirements.txt`.

## Make Sure We Can Work With .py Files We Are Editing

Run the code cell below.

In [None]:
# Automatically reload your external source code
%load_ext autoreload
%autoreload 2

## Get a Corpus

In the code cell below, build a corpus using `creator.jsonl`.

## Test the Server

There are now two ways to start our server:

1. From a notebook: import `server`, then call the `run` function.
2. On the terminal: % `python server`.

In the code cell below, try the first way.

If the above code cells don't work, then you haven't followed the set up instructions. Go back to that section.

# Finetune a Transformer Model


We can currently get document-level sentiment but quite often a movie review is nuanced: some sentences say good things about some aspects of the movie, while others say bad things about other aspects. In this project, we will finetune a transformer model on the sentiment of sentences from movie reviews, and then add to our webapp the ability to see sentence-level sentiment for a document.

## Get Some Labeled Data

We are going to use the (SST)[https://huggingface.co/datasets/sst] dataset. Note the datasheet!

First, we download the dataset.

In [None]:
# datasets is a huggingface python package that makes it easy to download huggingface datasets
from datasets import load_dataset

# download the sst dataset
raw_sst = load_dataset("glue", "sst2")

# make it smaller for testing; once everything is working, train on all the data by commenting this line out and rerunning the notebook
raw_sst = raw_sst.filter(lambda e, i: i<100, with_indices=True)

Then, we look at the dataset.

**ALWAYS LOOK AT YOUR DATA**

In [None]:
# look at the sst dataset

# look at the sst training data

# look at the sst training data sentences. Note each data point is a pre-tokenized sentence.

# look at the sst training data labels.

Questions:

1. *For supervised machine learning, each data point has to have what?*
2. *Why do we split data for supervised machine learning into train, dev (validation) and test?*
3. *How many datapoints are in the dataset altogether (train, validation and test)?*

## Tokenize the Data

We will use the small `distilbert` model for this project. So we want to use its tokenizer.

In [None]:
from transformers import AutoTokenizer

tokenizer = AutoTokenizer.from_pretrained("distilbert-base-cased")

Next we need to tokenize our data using that tokenizer.

In [None]:
def transformer_tokenize(example):
    """Tokenizes the input data using the designated transformer tokenizer.

    :param example: a text
    :type example: str
    """
    return tokenizer(example['sentence'], padding="max_length", truncation=True)

# this tokenizes the train, validation and test sets
tokenized_sst = raw_sst.map(transformer_tokenize, batched=True)

Questions:

4. *When we looked at our data, we saw that it was tokenized (kind of like spaCy tokenizes). Why do we need to tokenize again with the transformer tokenizer?*
5. *Recall that a transformer has fixed width input. Look at the tokenize function above.*
   * *If the input text is shorter, what does the toknizer do?*
   * *If the input text is longer, what does the tokenizer do?*
6. *What type of supervised machine learning could we do if our labels are numeric?*

## Get a Transformer Model

We will use the distilbert model for efficiency.

*Our classification task has how many labels?*

In [None]:
from transformers import AutoModelForSequenceClassification

num_labels = 
model = AutoModelForSequenceClassification.from_pretrained("distilbert-base-cased", num_labels=num_labels)

## Instantiate and Run a Trainer

Huggingface gives us a nice clean way to train: the `Trainer`. Each trainer has training arguments - where you can set hyperparameters. We will make a default set of training arguments.

In [None]:
from transformers import TrainingArguments

training_args = TrainingArguments(output_dir="sst_model", evaluation_strategy="epoch")

We will add an accuracy metric from the `evaluate` package so we can see accuracy while training.

In [None]:
# this huggingf
import evaluate
import numpy as np

# Setup evaluation 
accuracy_metric = evaluate.load("accuracy")

def compute_metrics(eval_pred):
    """Compute the metric!"""
    logits, labels = eval_pred
    predictions = np.argmax(logits, axis=-1)
    return accuracy_metric.compute(predictions=predictions, references=labels)

Then we will make a trainer using the model, the training arguments, our train and our dev data.

In [None]:
from transformers import Trainer

trainer = Trainer(model=model, args=training_args, train_dataset=tokenized_sst["train"], eval_dataset=tokenized_sst["validation"], compute_metrics=compute_metrics)

And finally, we train.

**This step takes a long time.** If you want to speed it up, you will need a GPU! But codespaces don't currently have GPU options. So:

1. Download this notebook.
2. Open [https://colab.research.google.com](https://colab.research.google.com).
3. Upload the notebook.
4. Add a cell at the top of the notebook and in it type:
```
!pip install datasets
!pip install transformers[torch]
!pip install evaluate
```
4. In the Runtime menu, choose `GPU`.
5. Run the notebook there to train a model.
6. Download the trained model.
   * in a code cell, type `!tar -czf model.tgz sst-model`
   * download model.tgz
7. Upload the trained model here in the codespace.
   * upload model.tgz
   * in the terminal, type `!tar -xzf model.tgz`

After this course, you can always still use codespaces, and I recommend it because of the tight integration with Github (so your code is saved!). You can *also* always use Colab. If you use Colab, your notebooks will be backed up in your Google Drive, but *no other files that you generated in colab are saved*.

In [None]:
trainer.train()


And we save the trained model.

In [None]:
trainer.save_model("sst-model")


# Evaluate

Now we should evaluate the model.

First we load the model.

In [None]:
from transformers import pipeline

sentence_sentiment = pipeline("text-classification", model="sst-model", tokenizer=tokenizer)

Then we try the model on a couple of sample sentences to sanity check.

In [None]:
print(sentence_sentiment("This movie was awful!"))

print(sentence_sentiment("This movie was great!"))

Now we want to run the model on each of our test data points.

Notice that the model outputs 'LABEL_0' or 'LABEL_1' while the data has labels -1 and 0. So *you* should define a function to map the labels.

In [None]:
def get_label(output):
    """Gets a numeric label for the output from the classifier.
        Sample output from classifier:  [{'label': 'LABEL_0', 'score': 0.61}] corresponds to label -1
        Sample output from classifier: [{'label': 'LABEL_1', 'score': 0.72}] corresponds to label 0   

        :param output: the output from the classifier
        :type output: list[dict]
        :returns: a label
        :rtype: int
    """
    pass

Now run the classifier on each dev data point.

Each element in `raw_sst['validation']` is a dictionary with keys `idx`, `sentence` and `label`. For each dev data point, you should make a new dictionary with keys `idx`, `sentence`, `label` and `pred` (for the output from the classifier). Add this dictionary to the list of results.

In [None]:
results = []
# your work here!

Now we have:

* `gold` labels, and
* model predictions

for each of the dev data points.

We will calculate two metrics:

1. accuracy
2. confusion matrix

## Accuracy

In the code cell below, implement the `accuracy` function. The accuracy of a classifier is the number of correctly labeled data points divided by the total number of data points.

In [None]:
def accuracy(results):
    """ Returns the accuracy of a list of classifier results

    :param results: a list of dictionaries. Each dictionary contains, at minimum, the keys 'label' and 'pred'
    :type results: list[dict]
    :returns: accuracy
    :rtype: float
    """
    pass



Now print the accuracy of the finetuned model.

Questions:

7. *How accurate is the finetuned model?*
8. *What would be the accuracy of a simple model that flipped a coin?*

## Confusion Matrix

In the code cell below, implement the `confusion_matrix` function. A confusion matrix for a classifier is like a spreadsheet or table that has all the labels along the rows and columns. Each cell contains the number of data points where the gold label corresponded to that row label, and the predicted label to that column label.

For example, for labels `TRUE` and `FALSE`, here is a possible confusion matrix:

| | TRUE | FALSE |
| --- | ---- | ----- |
| TRUE | 5 | 2   |
| FALSE | 1 | 4   | 

This says that there were 7 total data points with gold label `TRUE`, of which 5 had predicted label `TRUE`. There were 5 total data points with gold label `FALSE`, of which 4 had predicted label `FALSE`. This is a pretty good classifier!

Your confusion matrix will be a dictionary of dictionaries. Here's the above confusion matrix as a dictionary of dictionaries:
```
cf = {'TRUE': {'TRUE': 5, 'FALSE': 2}, 'FALSE': {'TRUE': 1, 'FALSE': 4}}
```

In [None]:
def confusion_matrix(results):
    """ Returns the confusion matrix for a list of classifier results

    :param results: a list of dictionaries. Each dictionary contains, at minimum, the keys 'label' and 'pred'
    :type results: list[dict]
    :returns: confusion matrix
    :rtype: dict
    """
    pass


Now print the confusion matrix for the finetuned model.

(If you become an advanced transformers tool builder, you could instead use the `evaluate` package made by the huggingface team.)

Questions:

7. *For which class (-1 or 1, negative or positive sentiment) is the finetuned model most accurate?*
8. *Why do you say that?*

# Add Your Finetuned Model to Your Webapp



1. Create a class attribute in the `corpus` class for loading your model.


2. Create an instance method, `get_sentence_level_sentiment`, in the `corpus` class. This method should return a list of pairs `(sentence, label)` where `label` 
 is the sentiment label for the sentence. Test it in the code cell below.


3. Create an instance method, `render_document_sentiments`, in the `corpus` class. This method should return a markdown table for the document containing the sentences and their corresponding sentiment labels. At the bottom, it should have an extra row where the "sentence" is the string "document" and the label is the document-level sentiment (*not* from your finetuned model; from project 4b). Test it in the code cell below.


4. In `server.py`, add an anvil callable function, `get_doc_sentiment_markdown`, that calls `render_document_sentiments`. Test it in the code cell below.


5. Add a radio button 'Sentiment' to the `Analyze Document` form in your web app; when clicked, this should call `get_document_sentiment_markdown`. Paste a screenshot here:

# Model Card

***ALWAYS DOCUMENT YOUR MODEL***

Complete the model card reading in Perusall.

Then, complete the model card in `model_card.md` for your finetuned model.

# Resources

* https://livingdatalab.com/posts/2023-04-23-fine-tuning-a-sentiment-analysis-model-with-huggingface.html
* https://huggingface.co/docs/transformers/v4.15.0/model_sharing
* https://huggingface.co/docs/datasets/process