# Instruction Tuning Notes

## Context

> 💡 This post assumes familiarity with large language models and  their training and finetuning. We cover the idea without deep diving into the technical details.

Pretraining LLMs on “next token prediction” task has proven to show incredible generalisation powers as long as you throw enough data, parameters and compute at it. However, it is possible to get more out of your language model if you finetune it on a smaller set. Many have already experimented with finetuning LLMs on downstream tasks. But you can also improve their generalisation and instruction-following abilities by using a dataset that presents tasks as instructions and expects LLM to predict the output.

## What is Instruction Tuning?

> "A form of **[fine-tuning](https://developers.google.com/machine-learning/glossary/generative#fine-tuning)** that improves a **[generative AI](https://developers.google.com/machine-learning/glossary/generative#generative-AI)** model's ability to follow instructions. Instruction tuning involves training a model on a series of instructions prompts, typically covering a wide variety of tasks. The resulting instruction-tuned model then tends to generate useful responses to **[zero-shot prompts](https://developers.google.com/machine-learning/glossary/generative#zero-shot-prompting)** across a variety of tasks." - Google Dev Page


## Why Instruction Tuning?

Given the [scaling law](https://arxiv.org/abs/2001.08361), we can expect models to get better with more dataset and parameters. But it is possible to squeeze out more performance by methods like instruction tuning that allow few-shot learning (also called in-context learning; ICL) and zero-shot learning. This way, user can provide prompts with instructions and expect model to perform tasks accordingly.

Think pretraining as barebones for building world knowledge and instruction tuning as lessons on problem solving.

## How?

It’s simple, just use the dataset with an input construction as described below. Different models use different approaches but the idea is same, provide instructions that have details of the task and then ask model to predict the output.

![OpenAI InstructGPT](images/InstructionTuning/Untitled.png)

![FlanT5](images/InstructionTuning/Untitled%201.png)

ref(FLAN-T5)

Stanford NLP group released [Alpaca](https://crfm.stanford.edu/2023/03/13/alpaca.html): an instruction-tuned model that starts from LLaMA and uses [instructions generated by GPT-3](https://raw.githubusercontent.com/tatsu-lab/stanford_alpaca/main/alpaca_data.json) as a dataset

> 💡 Note on Alpaca: However, the original implementation is less accessible due to licensing constraints of the underlying **[LLaMA](https://ai.facebook.com/blog/large-language-model-llama-meta-ai/)** model. Furthermore, users have noted **[potential noise](https://github.com/tloen/alpaca-lora/issues/65)** in the synthetic dataset. Hence, it may be better to explore a fully accessible model that is already trained on high-quality (but less diverse) instructions such as **[Flan-T5](https://arxiv.org/abs/2210.11416)**. - [flan-alpaca-gpt4-xl](https://huggingface.co/declare-lab/flan-alpaca-gpt4-xl)

![T0 model](images/InstructionTuning/Untitled%202.png)

image from [T0 paper](https://arxiv.org/pdf/2110.08207.pdf)

## Conclusion

It’s nice but the amount of data needed is still pretty large (~10k-100k). Also, deciding the best format for instructions and output is another unbounded experiment.

### References

[Outerbounds Blog](https://outerbounds.com/blog/custom-llm-tuning/) - Beautifully covers Instruction Tuning

[Flan-T5](https://arxiv.org/abs/2109.01652) - introduces “instruction tuning”

[Multi-task finetuning paper](https://arxiv.org/abs/2110.08207)

[OpenAI paper](https://arxiv.org/abs/2203.02155) Instruct GPT - RLHF with Supervised finetuning step that is instruction tuning 

[Alpaca](https://crfm.stanford.edu/2023/03/13/alpaca.html) - Open source instruction tuned model 

[scaling law](https://arxiv.org/abs/2001.08361)