# prefixtune: default program

In [1]:
from default import *
import os, sys

  from .autonotebook import tqdm as notebook_tqdm


## Run the default solution on small

In [4]:
basemodel = 'distilgpt2'
table_to_text = TableToText("peft", basemodel=basemodel)
model = AutoModelForCausalLM.from_pretrained(basemodel)
decoder_output = table_to_text.decode(model, '../data/input/small.txt')
print("\n".join(decoder_output))

10it [00:13,  1.35s/it]

0||  _______________________________________________    A new report from the US Department of Justice's Office of the Inspector General (OIG) has revealed that the Justice Department's Office of the Inspector General (OIG) has found that the Department of Justice has
1||   The following is a list of the most popular and popular websites in the United States.    The following is a list of the most popular and popular websites in the United States.    The following is a list of the
2||  __________   If you're looking for the first time in your life, you're looking for the first time in your life. If you're looking for the first time in your life, you're looking for the first time in your life
3||  ___________   A new report from the U.S. Department of Homeland Security (DHS) in the wake of the 9/11 attacks on the U.S. and its allies in the Middle East has revealed that the U.
4||  __________________________________________    I’ve been working on a new project for the past few months, a




Ignore the warnings from the transformers library. They are expected to occur.

## Evaluate the default output

See `bleu.py`

# 1. Prefix Tunning
Prefix Tunning is already implemented and wrapped up by huggingface, we study the usage of the peft by browsing the offical tutorial: https://huggingface.co/docs/peft/quicktour. And we altered the codes provided by the official insturction according to our needs.

+ First we need to design the prefix tunning configurations. Since we are doing a causal inference task with distilgpt2, we choose the `task_type` to be CAUSAL_LM. And we need to set the `inference_mode` to be False, and then set the predesigned `num_virtual_tokens` and `prefix_projection`. (Notice that in the prefix_projection is set to False in default, but here we need it to be True).
+ Then we conbine the prefix tunning component with the distilgpt2 model. 

In [None]:
peft_config = PrefixTuningConfig(task_type=TaskType.CAUSAL_LM,
									inference_mode=False,
									num_virtual_tokens=self.virtualtokens, 
									prefix_projection=self.prefixprojection)
model = get_peft_model(model, peft_config)

+ During the training loop, we need to update the prefix related weights, thus in each epoch, we need to perform the backward propagation.

In [None]:
for epoch in range(self.epochs):
		model.train()

		# TODO rest of the training steps for prefix tuning
		for key in data_loaders.keys():
			for step, batch in enumerate(tqdm(data_loaders[key])):
				batch = {k: v.to(device) for k, v in batch.items()}
				outputs = model(**batch)
				loss = outputs.loss
				loss.backward()
				optimizer.step()
				lr_scheduler.step()
				optimizer.zero_grad()

		# ...

+ In the inference part, we need to load the prefix tunning layers and the base model together.

In [None]:
if False:
        print(f"Loading the non-finetuned pre-trained model: {opts.basemodel}", file=sys.stderr)
        model = AutoModelForCausalLM.from_pretrained(opts.basemodel)
        model = model.to(device)
else:
	if not os.path.isdir(modelfile + opts.modelsuffix) or opts.force:
		print(f"Could not find modelfile {modelfile + opts.modelsuffix} or -f used. Starting training.", file=sys.stderr)
		table_to_text.train()
		print("Training done.", file=sys.stderr)
	# use the model file if available and opts.force is False
	assert(os.path.isdir(modelfile + opts.modelsuffix))
	print(f"Found modelfile {modelfile + opts.modelsuffix}. Starting decoding.", file=sys.stderr)
	# TODO: if using hf peft library for prefix tuning:

	model = AutoModelForCausalLM.from_pretrained(opts.basemodel)
	model = PeftModel.from_pretrained(model, modelfile + opts.modelsuffix)
	model = model.to(device)

After the basic setings with no paramameter changing(compared to default file), the BLEU scores are:  
+ small.out score: 13.4207
+ dev.out score: 16.1767

## 2. Hyperparameter Tunning
Other than the basic settings of the prefix tunning component, there is also lots of hyperparameters that we can adjust.
### 2.1 Preft - Number of Virtual Tokens
The number of virtual token controls the length of the embeddings added to each input to the model. Theoretically, when the number is too small, it is not enough to encode the task-related information, and lead to relatively low performace. The feault number of virtual token is 5, we tried to increase it, while leaving all other hyperparameter to be the same.
+ **virtualtokens = 10**: small.out score: 15.5933, dev.out score: 18.1553
+ **virtualtokens = 20**:  small.out score: 19.2093, dev.out score: 17.9956

### 2.2 Decoding - max_new_tokens
The max_new_tokens controls the max length of the sentence generated by the model. The default solution uses 50.

### 2.3 Decoding - num_beams
The num_beams controls the beam search width. The default solution uses 5.

### 2.5 Decoding - temperature
The temperature controls the next token possibility, that is the lower the tempreature, the smaller the softmax outputs.

+ **virtualtokens = 20, max_new_tokens = 50, num_beams = 3, temperature = 0.1**: small.out score: 16.0280, dev.out score: 14.3532
+ **virtualtokens = 20, max_new_tokens = 20, num_beams = 7, temperature = 0.2**: dev.small.out score: 11.0139, out score: 12.4284
+ **virtualtokens = 5, max_new_tokens = 20, num_beams = 7, temperature = 0.2**: small.out score: 28.8313, dev.out score: 21.1418
+ **virtualtokens = 5, max_new_tokens = 20, num_beams = 5, temperature = 0.2**: small.out score: 30.3896, dev.out score: 30.3138