## Easily generate titles using fine-tuned LLaMa models


Using this file, you can easily use the fine-tuned LLaMa models to generate titles for a listing description of your choice.

Importantly, note that this notebook uses the LLaMa-1 generation as LLaMa-2 requires a permit by META AI.

Note that this requires a strong GPU!

You merely need to define a few variables (see below):

* read_in_from_drive: a boolean, set to True if you need to read in the models from gdrive
* read_in_file_adapter: the path to the llama-adapter tuned model
* read_in_file_lora. the path to the lora tuned model.





### Cloning the environment needed

In [None]:
# nicht

!git clone https://github.com/Lightning-AI/lit-llama ## this first ?!


Cloning into 'lit-llama'...
remote: Enumerating objects: 1865, done.[K
remote: Counting objects: 100% (604/604), done.[K
remote: Compressing objects: 100% (134/134), done.[K
remote: Total 1865 (delta 532), reused 474 (delta 469), pack-reused 1261[K
Receiving objects: 100% (1865/1865), 1.63 MiB | 3.21 MiB/s, done.
Resolving deltas: 100% (1167/1167), done.


In [None]:
!pip install -r lit-llama/requirements.txt
# might take some time dpeneding on internet connectivity


Collecting lightning@ git+https://github.com/Lightning-AI/lightning@master (from -r lit-llama/requirements.txt (line 2))
  Cloning https://github.com/Lightning-AI/lightning (to revision master) to /tmp/pip-install-d7ebm41s/lightning_a199327333e445d2852b1d79f7fa3ad0
  Running command git clone --filter=blob:none --quiet https://github.com/Lightning-AI/lightning /tmp/pip-install-d7ebm41s/lightning_a199327333e445d2852b1d79f7fa3ad0
  Resolved https://github.com/Lightning-AI/lightning to commit a3218cb0380579d4dccfc0c8b49a8802664291dc
  Running command git submodule update --init --recursive -q
  Encountered 31 file(s) that should have been pointers, but weren't:
        .notebooks/course_UvA-DL/01-introduction-to-pytorch.ipynb
        .notebooks/course_UvA-DL/02-activation-functions.ipynb
        .notebooks/course_UvA-DL/03-initialization-and-optimization.ipynb
        .notebooks/course_UvA-DL/04-inception-resnet-densenet.ipynb
        .notebooks/course_UvA-DL/05-transformers-and-MH-attent

In [None]:

!git clone https://huggingface.co/openlm-research/open_llama_7b checkpoints/open-llama/7B


Cloning into 'checkpoints/open-llama/7B'...
remote: Enumerating objects: 21, done.[K
remote: Counting objects: 100% (3/3), done.[K
remote: Compressing objects: 100% (3/3), done.[K
remote: Total 21 (delta 2), reused 0 (delta 0), pack-reused 18[K
Unpacking objects: 100% (21/21), 7.72 KiB | 1.54 MiB/s, done.
Filtering content: 100% (3/3), 4.55 GiB | 4.21 MiB/s, done.
Encountered 1 file(s) that may not have been copied correctly on Windows:
	pytorch_model-00001-of-00002.bin

See: `git lfs help smudge` for more details.


In [None]:

!python lit-llama/scripts/convert_hf_checkpoint.py --checkpoint_dir checkpoints/open-llama/7B --model_size 7B


Initializing lit-llama
Saving to disk at checkpoints/lit-llama/7B
Processing checkpoints/open-llama/7B/pytorch_model-00002-of-00002.bin
Processing checkpoints/open-llama/7B/pytorch_model-00001-of-00002.bin


It is assumed that you will loadin the models from drive, hence connecting to drive is essential.

If you store the models elswhere, ignore his cell or set read_in_from_drive to False

In [None]:

read_in_from_drive = True

if read_in_from_drive:

  # connecting to drive
  from google.colab import drive
  drive.mount('/content/gdrive')

else:
  pass


Mounted at /content/gdrive


### Importing libraties and defining some needed functions below.

In [None]:
import os
from pathlib import Path
from typing import Optional
import torch
from sentencepiece import SentencePieceProcessor, SentencePieceTrainer
import sys
import torch
import requests
import json
from torch.utils.data import random_split
from tqdm import tqdm


In [None]:
sys.path.append("/content/lit-llama")

import lightning as L
import torch
from generate import generate
from lit_llama import Tokenizer
from lit_llama.adapter import LLaMA
from lit_llama.utils import EmptyInitOnDevice, lazy_load, llama_model_lookup
from scripts.prepare_alpaca import generate_prompt


In [None]:
!mkdir out

In [None]:


class Tokenizer:
    """Tokenizer for LLaMA."""

    def __init__(self, model_path: Path) -> None:
        self.processor = SentencePieceProcessor(model_file=str(model_path))
        self.bos_id = self.processor.bos_id()
        self.eos_id = self.processor.eos_id()
        self.pad_id = self.processor.pad_id()

    @property
    def vocab_size(self) -> int:
        return self.processor.vocab_size()

    def encode(
        self,
        string: str,
        bos: bool = True,
        eos: bool = False,
        max_length: int = -1,
        pad: bool = False,
        device: Optional[torch.device] = None
    ) -> torch.Tensor:
        tokens = self.processor.encode(string)
        if bos:
            tokens = [self.bos_id] + tokens
        if eos:
            tokens = tokens + [self.eos_id]
        if max_length > 0:
            tokens = tokens[:max_length]
        if pad and len(tokens) < max_length:
            tokens += [self.pad_id] * (max_length - len(tokens))

        return torch.tensor(tokens, dtype=torch.int, device=device)

    def decode(self, tokens: torch.Tensor) -> str:
        return self.processor.decode(tokens.tolist())

    @staticmethod
    def train(input: str, destination: str, vocab_size=32000) -> None:
        model_prefix = os.path.join(destination, "tokenizer")
        SentencePieceTrainer.Train(input=input, model_prefix=model_prefix, vocab_size=vocab_size)


## general function
def tokenize(tokenizer: Tokenizer, string: str, max_length: int, eos=True) -> torch.Tensor:
    return tokenizer.encode(string, bos=True, eos=eos, max_length=max_length)

## general function

def generate_prompt(example):
    """Generates a standardized message to prompt the model with an instruction, optional input and a
    'response' field."""

    if example["input"]:
        return (
            "Below is an instruction that describes a task, paired with an input that provides further context. "
            "Write a response that appropriately completes the request.\n\n"
            f"### Instruction:\n{example['instruction']}\n\n### Input:\n{example['input']}\n\n### Response:"
        )
    return (
        "Below is an instruction that describes a task. "
        "Write a response that appropriately completes the request.\n\n"
        f"### Instruction:\n{example['instruction']}\n\n### Response:"
    )


def prepare_sample(example: dict, tokenizer: Tokenizer, max_length: int, mask_inputs: bool = True):
    """Processes a single sample.

    Each sample in the dataset consists of:
    - instruction: A string describing the task
    - input: A string holding a special input value for the instruction.
        This only applies to some samples, and in others this is empty.
    - output: The response string

    This function processes this data to produce a prompt text and a label for
    supervised training. The input text is formed as a single message including all
    the instruction, the input (optional) and the response.
    The label/target is the same message but can optionally have the instruction + input text
    masked out (mask_inputs=True).

    Finally, both the prompt and the label get tokenized. If desired, all tokens
    in the label that correspond to the original input prompt get masked out (default).
    """

    full_prompt = generate_prompt(example)
    full_prompt_and_response = full_prompt + example["output"]
    encoded_full_prompt = tokenize(tokenizer, full_prompt, max_length=max_length, eos=False)
    encoded_full_prompt_and_response = tokenize(tokenizer, full_prompt_and_response, eos=True, max_length=max_length)

    # The labels are the full prompt with response, but with the prompt masked out
    labels = encoded_full_prompt_and_response.clone()
    if mask_inputs:
        labels[:len(encoded_full_prompt)]  = -1 ## corresponds to "ignore index" inprepare_alpaca.py file


    return {**example, "input_ids": encoded_full_prompt_and_response, "input_ids_no_response": encoded_full_prompt, "labels": labels}


tokenizer = Tokenizer("/content/checkpoints/lit-llama/tokenizer.model")

adapter_path = "/content/out/lit-llama-adapter-finetuned.pth"
lora_path = "/content/out/lit-llama-lora-finetuned.pth"
pretrained_path = "/content/checkpoints/lit-llama/7B/lit-llama.pth"
tokenizer_path = "/content/checkpoints/lit-llama/tokenizer.model"


def gen_title(pretrained_path, adapter_path, sample, max_new_tokens, top_k, temperature):

  """ Expects sample as set up above / like in the .pt data """

  fabric = L.Fabric(devices=1)
  dtype = torch.bfloat16 if fabric.device.type == "cuda" and torch.cuda.is_bf16_supported() else torch.float32


  with lazy_load(pretrained_path) as pretrained_checkpoint, lazy_load(adapter_path) as adapter_checkpoint:
      name = llama_model_lookup(pretrained_checkpoint)

      with EmptyInitOnDevice(
              device=fabric.device, dtype=dtype, quantization_mode = "llm.int8"
      ):
      #   quantization
          model = LLaMA.from_name(name)


      # 1. Load the pretrained weights
      model.load_state_dict(pretrained_checkpoint, strict=False)
      # 2. Load the fine-tuned adapter weights
      model.load_state_dict(adapter_checkpoint, strict=False)


  model.eval()
  model = fabric.setup_module(model)

  tokenizer = Tokenizer(tokenizer_path)

  prompt = generate_prompt(sample)
  encoded = tokenizer.encode(prompt, bos=True, eos=False, device=model.device)
  prompt_length = encoded.size(0)

  y = generate(model, encoded, max_new_tokens, temperature=temperature, top_k=top_k, eos_id=tokenizer.eos_id)

  output = tokenizer.decode(y)
  output = output.split("### Response:")[1].strip()

  return output


def tokenize_and_gen_title(type_, prompt, description, max_seq_length = 256):

  '''
  This function lets users simply provide a prompt and the decription and puts out a title.
  The following are the input arguments:
   - type:  denotes the type of fine-tuned model, either 'lora' or 'adapter'
   - prompt and description are self explanatory
   - max_seq_length: maxium length of input sequence, 256 will do for descriptions
  '''

  print(f"It may take some time, but soon this function will return a title based on the {type_}-tuned LLaMa model!")

  if type_ == "lora":
    path = lora_path
  elif type_ == 'adapter':
    path = adapter_path
  else:
    raise Exception("Type_ must be 'lora' or 'adapter' ")


  # creating needed dict object
  dict_tokenizer = {'instruction': prompt, 'input':description, 'output': 'no output yet'}


  # tokenization
  input_sample = prepare_sample(dict_tokenizer, tokenizer, max_seq_length, True)

  # generating the title
  output = gen_title(pretrained_path, path, input_sample, 30, 200, 0.2)


  print("The generated title is:")

  return output


In [None]:
!mkdir out


### Next, read in the fine-tuned models.

Define read_in_file_adapter as the path to the llama-adapter tuned model and read_in_file_lora as the path to the lora tuned model.

In [None]:
# change this for you

read_in_file_adapter = "..."

#read_in_file_adapter = /content/gdrive/My Drive/Thesis/Models/lit-llama-adapter-finetuned.pth"

mode_ft = torch.load(read_in_file_adapter)
torch.save(mode_ft, "/content/out/lit-llama-adapter-finetuned.pth")

read_in_file_lora = "..."
# read_in_file_lora = "/content/gdrive/My Drive/Thesis/Models/lit-llama-lora-finetuned.pth"

model_ft_lora = torch.load(read_in_file_lora)
torch.save(model_ft_lora, "/content/out/lit-llama-lora-finetuned.pth")

## Generate a title :)

Finally, simply define a prompt and a description and let the model generate a title!

For the lora-tuned model, pass 'lora' as the first input parameter, for the LLaMa-Adapter model, simply pass 'adapter'.

See the examples provided below:

In [None]:
prompt = 'Summarize the following description into a short title for an AirBnB listing.'

description = 'The space Bright double bedroom, own living room and own bathroom all on your own floor in our Victorian house in leafy West Hampstead. You will have a mini fridge, toaster, and tea & coffee making facilities in your living room. We also provide you with tea, coffee, cereal, bread & milk & therefore won’t need to share any spaces with us during this time however, we are always available to advise on places to visit, restaurant, bars etc. As always the space is incredibly clean and we take extra precautions to keep the space safe, strictly following the Airbnb COVID cleansing guidelines. The bedroom has floor to ceiling wardrobes, a chest of drawers, real wood flooring, decorative fireplace, mirror and wireless internet connection. While your own private bathroom is not en-suite it is just a couple of steps away. It is a recently refurbished modern bathroom with power shower and full sized bath. The living room is large bright with bay windows &'


In [None]:
tokenize_and_gen_title('adapter', prompt, description)

It may take some time, but soon this function will return a title based on the adapter-tuned LLaMa model!
The generated title is:


'Bright, sunny, double bedroom with own bathroom'

In [None]:
tokenize_and_gen_title('lora', prompt, description)


It may take some time, but soon this function will return a title based on the lora-tuned LLaMa model!
The generated title is:


'Bright double with own bathroom in North West London'