-
Notifications
You must be signed in to change notification settings - Fork 2
Assignment: Natural Language Inference
In this assignment, you will build a system that does "natural language inference" (NLI). It will determine, for a "premise" sentence and a "hypothesis" sentence, whether the hypothesis follows from the premise.
We will use the Stanford NLI Corpus (SNLI), which is also available on Hugging Face. As explained on the SNLI website (which you should read), the premise can either entail the hypothesis; the hypothesis can contradict the premise; or they can be neutral with respect to each other. The dataset is balanced: One third of instances are of each type, so a system that guesses at random achieves an accuracy of 33%.
As always, you can use any classes from Pytorch that you like, but no classes or functions from high-level libraries such as Hugging Face except where specifically allowed.
Fine-tune a pretrained XLM-RoBERTa model to perform NLI. Please do not use existing code or Hugging Face classes beyond RoBERTa itself and the tokenizer classes, as in the earlier assignments.
You can enter two sentences as input into RoBERTa by separating them with the separation token, tokenizer.sep_token, which defaults to </s>:
The boy sleeps . </s> The boy laughs .
Fine-tune your classifier on the SNLI training set, and evaluate it on the validation set. Report the accuracy, training time, and the runtime on the validation set.
Now use prompting with a pre-trained large language model (LLM) to solve SNLI. You can try to provide detailed natural-language instructions on what you want the LLM to do; you can provide one or more examples; or you can try a mixture of both. Run the LLM on at least the first 100 examples from the validation set and report accuracy and runtime.
You have a choice regarding the LLM you want to use. You will obtain the highest accuracy if you use the OpenAI API with the gpt-4-1106-preview model. This will require you to make an OpenAI account and pay a small amount of money.
Alternatively, you can choose to use any open-source LLM and run it on your own computer or a server. An easy way to do this is to use the GPT4All Python API. You will need to download a model file in gguf format. One legitimate choice would be to use an instruction-tuned Mistral model, e.g. from this website. Note that these models require a substantial amount of main memory; consider running them on a department server.
Discuss the findings from your finetuning and prompting experiments, with regard to speed, accuracy, and the amount of required training data, and to anything else you found interesting.