## Describe your model -> fine-tuned LLaMA 2
By Matt Shumer (https://twitter.com/mattshumer_)

The goal of this notebook is to experiment with a new way to make it very easy to build a task-specific model for your use-case.

First, use the best GPU available (go to Runtime -> change runtime type)

To create your model, just go to the first code cell, and describe the model you want to build in the prompt. Be descriptive and clear.

Select a temperature (high=creative, low=precise), and the number of training examples to generate to train the model. From there, just run all the cells.

You can change the model you want to fine-tune by changing `model_name` in the `Define Hyperparameters` cell.

#Data generation step

Write your prompt here. Make it as descriptive as possible!

Then, choose the temperature (between 0 and 1) to use when generating data. Lower values are great for precise tasks, like writing code, whereas larger values are better for creative tasks, like writing stories.

Finally, choose how many examples you want to generate. The more you generate, a) the longer it takes and b) the more expensive data generation will be. But generally, more examples will lead to a higher-quality model. 100 is usually the minimum to start.

In [1]:
prompt = """
For the purpose of training a new model, generate a series of prompts and responses. Ensure that the generated examples vary in length, complexity, and intricacy, ranging from short and straightforward prompts to longer, more detailed ones. 

Each prompt should present a textual description followed by one or more functions in a JSON-like format. The model should determine implicitly whether a function should be invoked based on the textual description. If the function is relevant, the model should map the details from the description to the function's arguments and respond with the filled JSON format. If the function isn't applicable, provide a regular text response continuing the conversation.

Examples:

1. Prompt:
   ----------
   "John's email is john@email.com. He's 65 years old."
   The function is:
   {"function": "createUser", "args": {"name": "", "age": "", "email": ""}}.
   ----------
   Response:
   ----------
   {"function": "createUser", "args": {"name": "John", "age": "65", "email": "john@email.com"}}
   ----------

2. Prompt:
   ----------
   "The sky is blue."
   The function is:
   {"function": "calculateAgeFromBirthYear", "args": {"birthYear": ""}}.
   ----------
   Response:
   ----------
   The provided text doesn't contain relevant information to invoke the function.
   ----------

Generate both positive examples, where the textual description aligns with the function, and negative ones where it doesn't. Aim for a diverse set that will assist in training a robust model capable of discerning the applicability of given functions to different scenarios.
"""

# Rest of your code...
temperature = 0.2
number_of_examples = 300

Run this to generate the dataset.

In [None]:
!pip install openai tenacity

In [None]:
from tenacity import (
    retry,
    stop_after_attempt,
    wait_random_exponential,
)

In [None]:
import os
import openai
import random

openai.api_key = "YOUR KEY HERE"

@retry(wait=wait_random_exponential(min=1, max=60), stop=stop_after_attempt(6))
def generate_example(prompt, prev_examples, temperature=.5):
    messages=[
        {
            "role": "system",
            "content": f"You are generating data which will be used to train a machine learning model.\n\nYou will be given a high-level description of the model we want to train, and from that, you will generate data samples, each with a prompt/response pair.\n\nYou will do so in this format:\n```\nprompt\n-----------\n$prompt_goes_here\n-----------\n\nresponse\n-----------\n$response_goes_here\n-----------\n```\n\nOnly one prompt/response pair should be generated per turn.\n\nFor each turn, make the example slightly more complex than the last, while ensuring diversity.\n\nMake sure your samples are unique and diverse, yet high-quality and complex enough to train a well-performing model.\n\nHere is the type of model we want to train:\n`{prompt}`"
        }
    ]

    if len(prev_examples) > 0:
        if len(prev_examples) > 10:
            prev_examples = random.sample(prev_examples, 10)
        for example in prev_examples:
            messages.append({
                "role": "assistant",
                "content": example
            })

    response = openai.ChatCompletion.create(
        model="gpt-4",
        messages=messages,
        temperature=temperature,
        max_tokens=1354,
    )

    return response.choices[0].message['content']

# Generate examples
prev_examples = []
for i in range(number_of_examples):
    print(f'Generating example {i}')
    example = generate_example(prompt, prev_examples, temperature)
    prev_examples.append(example)

print(prev_examples)

In [None]:
import pickle

with open("functioncalling_gpt_dataset.pkl", 'wb') as file:
    pickle.dump(prev_examples, file)

# Using pydantic to define the functions similar to OpenAI function definition inputs

In [None]:
import os
import json

In [None]:
import pickle
with open("functioncalling_gpt_dataset.pkl", 'rb') as file:
    prev_examples = pickle.load(file)

In [None]:
import json

# Extract the function part from each string
functions = []
for string in prev_examples:
    start_pos = string.find("The function is:") + len("The function is:")
    end_pos = string.find("-----------", start_pos)
    function_part = string[start_pos+1:end_pos-3] + "}"
    function_dict = json.loads(function_part)
    functions.append(function_dict)

In [None]:
from pydantic import BaseModel
instances = []
for function_dict in functions:
    class_name = function_dict['function']
    attributes = function_dict['args']
    
    # Dynamically create the class using type()
    new_class = type(class_name, (BaseModel,), attributes)
    
    # Now you can use the new_class as a Pydantic model
    # For example, you can create an instance of the model
    instance = new_class()
    instances.append(instance)

In [None]:
final_list = []
import re
for i in range(len(prev_examples)):
    match = re.search(r'\{"function":.*?\}', prev_examples[i])
    
    new_text = prev_examples[i][:match.start()] + str(instances[i].schema()) + prev_examples[i][match.end():]
    final_list.append(new_text)

In [None]:
with open("dataset_pyd_schema.pkl", 'wb') as file:
    pickle.dump(final_list, file)

We also need to generate a system message.

In [None]:
def generate_system_message(prompt):

    response = openai.ChatCompletion.create(
        model="gpt-4",
        messages=[
          {
            "role": "system",
            "content": "You will be given a high-level description of the model we are training, and from that, you will generate a simple system prompt for that model to use. Remember, you are not generating the system message for data generation -- you are generating the system message to use for inference. A good format to follow is `Given $INPUT_DATA, you will $WHAT_THE_MODEL_SHOULD_DO.`.\n\nMake it as concise as possible. Include nothing but the system prompt in your response.\n\nFor example, never write: `\"$SYSTEM_PROMPT_HERE\"`.\n\nIt should be like: `$SYSTEM_PROMPT_HERE`."
          },
          {
              "role": "user",
              "content": prompt.strip(),
          }
        ],
        temperature=temperature,
        max_tokens=500,
    )

    return response.choices[0].message['content']

system_message = generate_system_message(prompt)

print(f'The system message is: `{system_message}`. Feel free to re-run this cell if you want a better result.')

In [None]:
system_message = "You are given a textual description and a function description in a JSON-like format. You must map the details from the description to the function's arguments and respond with the filled JSON format, if the provided function and description are related. If not applicable, you must return the function arguments with null values."

Now let's put our examples into a dataframe and turn them into a final pair of datasets.

In [None]:
import pandas as pd

# Initialize lists to store prompts and responses
prompts = []
responses = []

# Parse out prompts and responses from examples
for example in prev_examples:
  try:
    split_example = example.split('-----------')
    prompts.append(split_example[1].strip())
    responses.append(split_example[3].strip())
  except:
    pass

# Create a DataFrame
df = pd.DataFrame({
    'prompt': prompts,
    'response': responses
})

# Remove duplicates
df = df.drop_duplicates()

print('There are ' + str(len(df)) + ' successfully-generated examples. Here are the first few:')

df.head()

# Creating Negative Examples

In [1]:
def extract_function_description(s):
    stack = []
    start_index = -1
    end_index = -1

    for i, c in enumerate(s):
        if c == '{':
            if start_index == -1:
                start_index = i
            stack.append(c)
        elif c == '}':
            stack.pop()
            if not stack:
                end_index = i
                break

    return s[start_index:end_index+1] if start_index != -1 and end_index != -1 else None
def extract_first_line(s):
    return s.split("\n", 1)[0].strip('"')

In [None]:
df['first_line'] = df['prompt'].apply(extract_first_line)
df['func_def'] = df['prompt'].apply(extract_function_description)

In [None]:
df_copy = df.copy()
df_copy = df_copy.sample(60)
df_copy.reset_index(drop=True, inplace=True)

In [None]:
df_copy.head()

In [None]:
df_copy['first_line'] = df_copy['first_line'].sample(frac=1).reset_index(drop=True)

In [None]:
df_copy = df_copy[df_copy['prompt'].str[:5] != df_copy['first_line'].str[:5]]

In [None]:
df_copy['prompt'][0]

In [None]:
df_copy['new_prompt'] = df_copy['first_line'] + "\nThe function is:\n" + df_copy['func_def']

In [None]:
df_copy.head()

In [None]:
import json 

def nullify_args(s):
    try:
        parsed = json.loads(s)
        for key in parsed["args"]:
            parsed["args"][key] = "null"
        return json.dumps(parsed)
    except json.JSONDecodeError:
        print(f"Problematic entry: {s}")
        return "ooops!"

In [None]:
df_copy['new_response'] = df_copy['response'].apply(nullify_args)

In [None]:
df_copy = df_copy[df_copy['new_response'] != "ooops!"]

In [None]:
df_copy.drop(columns=["prompt", "response", "first_line", "func_def"], inplace = True)

In [None]:
print(df_copy['new_prompt'][0])

In [None]:
print(df_copy['new_response'][0])

In [None]:
df_copy.rename(columns={'new_prompt': 'prompt', 'new_response': 'response'}, inplace=True)

In [None]:
df = pd.concat([df, df_copy], ignore_index=True)

In [None]:
df.drop(columns=['first_line', 'func_def'], inplace=True)

In [None]:
df.to_pickle("dataset_pyd_with_negatives.pkl")

Split into train and test sets.

In [None]:
# Split the data into train and test sets, with 90% in the train set
train_df = df.sample(frac=0.9, random_state=42)
test_df = df.drop(train_df.index)

# Save the dataframes to .jsonl files
train_df.to_json('train.jsonl', orient='records', lines=True)
test_df.to_json('test.jsonl', orient='records', lines=True)

# Install necessary libraries

In [None]:
#!pip install -q accelerate==0.21.0 peft==0.4.0 bitsandbytes==0.40.2 transformers==4.31.0 trl==0.4.7

import os
import torch
from datasets import load_dataset
from transformers import (
    AutoModelForCausalLM,
    AutoTokenizer,
    BitsAndBytesConfig,
    HfArgumentParser,
    TrainingArguments,
    pipeline,
    logging,
)
from peft import LoraConfig, PeftModel
from trl import SFTTrainer

# Define Hyperparameters

In [None]:
dataset_name = "train.jsonl"
model_name = "meta-llama/Llama-2-13b-chat-hf"
new_model = "llama-13b-custom"
lora_r = 64
lora_alpha = 16
lora_dropout = 0.1
use_4bit = True
bnb_4bit_compute_dtype = "float16"
bnb_4bit_quant_type = "nf4"
use_nested_quant = False
output_dir = "./results"
num_train_epochs = 2
fp16 = False
bf16 = False
per_device_train_batch_size = 4
per_device_eval_batch_size = 4
gradient_accumulation_steps = 1
gradient_checkpointing = True
max_grad_norm = 0.3
learning_rate = 2e-4
weight_decay = 0.001
optim = "paged_adamw_32bit"
lr_scheduler_type = "constant"
max_steps = -1
warmup_ratio = 0.03
group_by_length = True
save_steps = 25
logging_steps = 5
max_seq_length = None
packing = False
device_map = {"": 0}

In [None]:
from huggingface_hub import login
access_token = "YOUR HF API KEY"
login(token = access_token)

# Load Datasets and Train

In [None]:
# Load datasets
train_dataset = load_dataset('json', data_files='train.jsonl', split="train")
valid_dataset = load_dataset('json', data_files='test.jsonl', split="train")

# Preprocess datasets
train_dataset_mapped = train_dataset.map(lambda examples: {'text': [f'[INST] <<SYS>>\n{system_message.strip()}\n<</SYS>>\n\n' + prompt + ' [/INST] ' + response for prompt, response in zip(examples['prompt'], examples['response'])]}, batched=True)
valid_dataset_mapped = valid_dataset.map(lambda examples: {'text': [f'[INST] <<SYS>>\n{system_message.strip()}\n<</SYS>>\n\n' + prompt + ' [/INST] ' + response for prompt, response in zip(examples['prompt'], examples['response'])]}, batched=True)

compute_dtype = getattr(torch, bnb_4bit_compute_dtype)
bnb_config = BitsAndBytesConfig(
    load_in_4bit=use_4bit,
    bnb_4bit_quant_type=bnb_4bit_quant_type,
    bnb_4bit_compute_dtype=compute_dtype,
    bnb_4bit_use_double_quant=use_nested_quant,
)
model = AutoModelForCausalLM.from_pretrained(
    model_name,
    quantization_config=bnb_config,
    device_map=device_map
)
model.config.use_cache = False
model.config.pretraining_tp = 1
tokenizer = AutoTokenizer.from_pretrained(model_name, trust_remote_code=True)
tokenizer.pad_token = tokenizer.eos_token
tokenizer.padding_side = "right"
peft_config = LoraConfig(
    lora_alpha=lora_alpha,
    lora_dropout=lora_dropout,
    r=lora_r,
    bias="none",
    task_type="CAUSAL_LM",
)
# Set training parameters
training_arguments = TrainingArguments(
    output_dir=output_dir,
    num_train_epochs=num_train_epochs,
    per_device_train_batch_size=per_device_train_batch_size,
    gradient_accumulation_steps=gradient_accumulation_steps,
    optim=optim,
    save_steps=save_steps,
    logging_steps=logging_steps,
    learning_rate=learning_rate,
    weight_decay=weight_decay,
    fp16=fp16,
    bf16=bf16,
    max_grad_norm=max_grad_norm,
    max_steps=max_steps,
    warmup_ratio=warmup_ratio,
    group_by_length=group_by_length,
    lr_scheduler_type=lr_scheduler_type,
    report_to="all",
    evaluation_strategy="steps",
    eval_steps=5  # Evaluate every 20 steps
)
# Set supervised fine-tuning parameters
trainer = SFTTrainer(
    model=model,
    train_dataset=train_dataset_mapped,
    eval_dataset=valid_dataset_mapped,  # Pass validation dataset here
    peft_config=peft_config,
    dataset_text_field="text",
    max_seq_length=max_seq_length,
    tokenizer=tokenizer,
    args=training_arguments,
    packing=packing,
)
trainer.train()
trainer.model.save_pretrained(new_model)

# Cell 4: Test the model
logging.set_verbosity(logging.CRITICAL)
prompt = f"[INST] <<SYS>>\n{system_message}\n<</SYS>>\n\nWrite a function that reverses a string. [/INST]" # replace the command here with something relevant to your task
pipe = pipeline(task="text-generation", model=model, tokenizer=tokenizer, max_length=200)
result = pipe(prompt)
print(result[0]['generated_text'])

# Run Inference

In [None]:
from pydantic import BaseModel

In [None]:
class addTwoNumbers(BaseModel):
   a: str
   b :str
prompt_user = f"""what is the meaning of life?\nThe function is:\n{addTwoNumbers.schema()}"""

In [None]:
class personExtraction(BaseModel):
   person_name: str
   person_education :str
   person_companies: str
   person_research: str

resume_summary = """
The football match between the united states and iran resulted in a win for the US in 2020.
"""
prompt_user = f"""{resume_summary} \nThe function is:\n{personExtraction.schema()}"""

In [None]:
from transformers import pipeline

prompt = f"[INST] <<SYS>>\n{system_message}\n<</SYS>>\n\n{prompt_user}. [/INST]" # replace the command here with something relevant to your task
num_new_tokens = 100  # change to the number of new tokens you want to generate

# Count the number of tokens in the prompt
num_prompt_tokens = len(tokenizer(prompt)['input_ids'])

# Calculate the maximum length for the generation
max_length = num_prompt_tokens + num_new_tokens

gen = pipeline('text-generation', model=model, tokenizer=tokenizer, max_length=max_length)
result = gen(prompt)
print(result[0]['generated_text'].replace(prompt, ''))
#print(result[0]['generated_text'])

#Merge the model and store in Google Drive

In [None]:
# Merge and save the fine-tuned model
from google.colab import drive
drive.mount('/content/drive')

model_path = "/content/drive/MyDrive/llama-2-7b-custom"  # change to your preferred path

# Reload model in FP16 and merge it with LoRA weights
base_model = AutoModelForCausalLM.from_pretrained(
    model_name,
    low_cpu_mem_usage=True,
    return_dict=True,
    torch_dtype=torch.float16,
    device_map=device_map,
)
model = PeftModel.from_pretrained(base_model, new_model)
model = model.merge_and_unload()

# Reload tokenizer to save it
tokenizer = AutoTokenizer.from_pretrained(model_name, trust_remote_code=True)
tokenizer.pad_token = tokenizer.eos_token
tokenizer.padding_side = "right"

# Save the merged model
model.save_pretrained(model_path)
tokenizer.save_pretrained(model_path)

# Load a fine-tuned model from Drive and run inference

In [None]:
from google.colab import drive
from transformers import AutoModelForCausalLM, AutoTokenizer

drive.mount('/content/drive')

model_path = "/content/drive/MyDrive/llama-2-7b-custom"  # change to the path where your model is saved

model = AutoModelForCausalLM.from_pretrained(model_path)
tokenizer = AutoTokenizer.from_pretrained(model_path)

In [None]:
from transformers import pipeline

prompt = "What is 2 + 2?"  # change to your desired prompt
gen = pipeline('text-generation', model=model, tokenizer=tokenizer)
result = gen(prompt)
print(result[0]['generated_text'])