# Instruction tuning using Lora and the Dolly dataset!

https://www.philschmid.de/instruction-tune-llama-2

## Load the finetuned model

Download the model from Azure ML
az ml job download --name <run id/job name> --resource-group <> --workspace-name <>

In [44]:
model_dir = "../models/artifacts/outputs/model"

In [35]:
import torch
from peft import AutoPeftModelForCausalLM
from transformers import AutoTokenizer

# load base LLM model and tokenizer
model = AutoPeftModelForCausalLM.from_pretrained(
    model_dir,
    #low_cpu_mem_usage=True,
    torch_dtype=torch.float16,
    #load_in_4bit=True,
) 

model.to('cuda:0');

Loading checkpoint shards: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 2/2 [00:03<00:00,  1.50s/it]


In [36]:
tokenizer = AutoTokenizer.from_pretrained(model_dir)

## Load the dataset

In [39]:
from datasets import load_dataset 
from random import randrange

# Load dataset from the hub and get a sample
dataset = load_dataset("databricks/databricks-dolly-15k", split="train")

In [40]:
def create_prompt(text):
    p = f"""### Instruction:
Use the Input below to create an instruction, which could have been used to generate the input using an LLM. 

### Input:
{text}

### Response:
"""
    return p



In [41]:
sample = dataset[randrange(len(dataset))]
sample

{'instruction': 'Who directed the episode of Game of Thrones where Jon Snow and Tormund Giantsbane go to ask the wildlings to settle beyond the Wall, but end up encountering White Walkers and the Night King?',
 'context': '',
 'response': 'Season five, episode eight entitled "Hardhome"',
 'category': 'open_qa'}

In [42]:
prompt = create_prompt(sample['response'])

input_ids = tokenizer(prompt, return_tensors="pt", truncation=True).input_ids.cuda()
# with torch.inference_mode():
outputs = model.generate(input_ids=input_ids, max_new_tokens=100, do_sample=True, top_p=0.9,temperature=0.9)

Asking to truncate to max_length but no maximum length is provided and the model has no predefined maximum length. Default to no truncation.
The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.


In [43]:
print(f"Prompt:\n{sample['response']}\n")
print(f"Generated instruction:\n{tokenizer.batch_decode(outputs.detach().cpu().numpy(), skip_special_tokens=True)[0][len(prompt):]}")
print(f"Ground truth:\n{sample['instruction']}")

Prompt:
Season five, episode eight entitled "Hardhome"

Generated instruction:
What is the episode where Jon Snow travels north of the Wall to speak with the leader of the wildlings, Tormund Giantsbane, beyond the Wall?

Ground truth:
Who directed the episode of Game of Thrones where Jon Snow and Tormund Giantsbane go to ask the wildlings to settle beyond the Wall, but end up encountering White Walkers and the Night King?
