<a href="https://colab.research.google.com/github/Sreeshbk/generativeai/blob/main/notebooks/dlai_finetuning_llm.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Fine Tuning LLM

| |Prompting| Fine tuning|
|-|-|-|
|Use case | Generic, side projects, prototypes | Domain Specific, enterprise, production usage,..privacy|
|Pros| No data to get started | Nearly unlimited data fits|
| | Smaller upfront cost | Learn new information|
|| No technical knowledge needed |  Correct incorrect information|
|| Connect data through retrival(RAG) | use RAG too|
| | | Less cost afterwareds if smaller model|
|Cons| Much less data fits | More high-quality data
| | Forgets data | Upfront compute cost|
| |Hallucinations | Need some technical|
| | RAG misses or gets incorrect data | Needs some technical knowledge,esp.data|


Benefit:
***
- Performance
  - stop hallucinations
  - increase consistency
  - reduce unwanted info
- Privacy
  - on-prem or VPC
  - prevent leakage
  - no breaches
- Cost
  - lower cost per request
  - increased transparency
  - greater control
- Reliability
   - control uptime
   - lower latency
   - moderation



``` python
!pip install torch transformers datasets lamini transformers[torch] accelerate
```

In [1]:
!pip install torch transformers datasets lamini transformers[torch] accelerate



In [2]:

from google.colab import auth
import requests
import os
import yaml

def authenticate_powerml():
  auth.authenticate_user()
  gcloud_token = !gcloud auth print-access-token
  powerml_token_response = requests.get('https://api.powerml.co/v1/auth/verify_gcloud_token?token=' + gcloud_token[0])
  print(powerml_token_response)
  return powerml_token_response.json()['token']

key = authenticate_powerml()

config = {
    "production": {
        "key": key,
        "url": "https://api.powerml.co"
    }
}

keys_dir_path = '/root/.powerml'
os.makedirs(keys_dir_path, exist_ok=True)

keys_file_path = keys_dir_path + '/configure_llama.yaml'
with open(keys_file_path, 'w') as f:
  yaml.dump(config, f, default_flow_style=False)



<Response [200]>


In [3]:
!pip install torch transformers datasets lamini



In [4]:
from llama import BasicModelRunner

## Not Fine tuned
model: meta-llama/Llama-2-7b-hf


In [5]:
non_finetuned = BasicModelRunner("meta-llama/Llama-2-7b-hf")

def get_response(q, model):
  r = model(q)
  print(r)

In [6]:
Q1 = "Tell me how to train my dog to sit"
get_response(Q1, non_finetuned)

.
Tell me how to train my dog to stay.
Tell me how to teach my dog to come.
Tell me how to get my dog to heel.
Tell me how to stop my dog from barking.
Tell me how to house train my dog.
Tell me how to potty train my dog.
Tell my how to train my dog to walk on a leash.
Tell me how to crate train my dog.
Tell Me How To Train My Dog To Sit, Stay, Come, Heel, Stop Barking, House Train, Potty Train, Walk On A Leash, Crate Train, And More!
Tell Me How To Train Your Dog To Sit, Stay, Heel, Come, Walk On A Leash, House Train, Potty Train And More!
Tell Me how to train my dog to sit, stay, come, heel, stop barking, house train, potty train, walk on a leash, crate train, and more!
Tell Me How To House Train My Dog!
Tell Me How To Potty Train My Dog!
Tell me how to


In [7]:
Q2 ="What do you think of Mars?"
get_response(Q2, non_finetuned)

I think it's a great planet.
I think it's a good planet.
I think it'll be a great planet.
I think we should go there.
I think we should go back there.
I think we should stay there.
I think we should leave there.
I think we should colonize there.
I think we should terraform there.
I think we should mine there.
I think we should build there.
I think we should live there.
I think we should die there.
I think we should be there.
I think we should have been there.
I think we should never be there.
I think we'll be there.
I think we won't be there.
I think we will be there.
I think we can be there.
I think we could be there.
I think we would be there.
I think we might be there.
I think we may be there.
I think we shall be there.
I think we must be there.
I think we have to be there.
I think we need to be there.
I think I'll be there.
I know I'll


In [8]:
Q3 ="taylor swift's best friend"
get_response(Q3, non_finetuned)

I'm not sure if I've mentioned this before, but I'm a huge Taylor Swift fan. I've been a fan since her first album, and I've been a fan ever since. I've been a fan of her music, her style, her personality, and her music. I've been a fan for a long time.
I’ve been a fan of Taylor Swift for a long time. I’ve been a fan of her music for a long time. I’m a fan of her style, her personality, her music, and her music. I’ve been a fan for a long, long time.
I’ve always been a fan of Taylor Swift. I’ve always been a fan of her music. I’ve always been a big fan of her style. I’ve always been a huge fan of her personality. I’ve always been a massive fan of her music. I’m a fan of Taylor Swift. I love her music. I love her style. I love her personality. I love her music. I’m a huge fan of Taylor Swift.
I’ve always been an avid fan of Taylor Swift. I’m


In [9]:
Q4="""Agent: I'm here to help you with your Amazon deliver order.
Customer: I didn't get my Item
Agent: I'm sorry to hear that. Which item was that
Customer: The blanket
Agent:"""
get_response(Q4, non_finetuned)

I'm sorry, I don't understand.
Customer: The blanket I ordered.
Agent: I'm sorry. I don't understand.
Agent: I'm not sure I understand.
Customer: The blankets I ordered.
Agent: I don't understand.
(Customer hangs up)
Agent: I'm sorry I couldn't help you with your blanket order.
Customer: I didn’t order a blanket.
Agent: I'm so sorry. I thought you were calling about the blanket you ordered.
Customer: I didn' t order a blanket.
Agent (sighs): I'm so sorry. I'm going to have to transfer you to the returns department.
Customer: I didn'T order a blanket.
Agent


## Fine tuned
model: meta-llama/Llama-2-7b-chat-hf

In [10]:
finetuned = BasicModelRunner("meta-llama/Llama-2-7b-chat-hf")

In [11]:
Q1 = "Tell me how to train my dog to sit"
get_response(Q1, finetuned)

on command.
How to Train Your Dog to Sit on Command
Training your dog to sit on command is a basic obedience command that can be achieved with patience, consistency, and positive reinforcement. Here's a step-by-step guide on how to train your dog to sit on command:
1. Choose a Quiet and Distraction-Free Area: Find a quiet area with minimal distractions where your dog can focus on you.
2. Have Treats Ready: Choose your dog's favorite treats and have them ready to use as rewards.
3. Stand in Front of Your Dog: Stand in front of your dog and hold a treat close to their nose.
4. Move the Treat Above Your Dog's Head: Slowly move the treat above your dog's head, towards their tail. As your dog follows the treat with their nose, their bottom will naturally lower into a sitting position.
5. Say "Sit" and Reward: As soon as your dog's butt touches the ground, say "Sit" and give them the treat. It's important to say the command word as they're performing


Llama was trainied to use INST as the string for instruction

In [12]:
Q1 = "Tell me how to train my dog to sit"
get_response(f"[INST]{Q1}[/INST]", finetuned)

Training your dog to sit is a basic obedience command that can be achieved with patience, consistency, and positive reinforcement. Here's a step-by-step guide on how to train your dog to sit:
1. Choose a quiet and distraction-free area: Find a quiet area with no distractions where your dog can focus on you.
2. Have treats ready: Choose your dog's favorite treats and have them ready to use as rewards.
3. Stand in front of your dog: Stand in front of your dog and hold a treat close to their nose.
4. Move the treat up and back: Slowly move the treat up and back, towards your dog's tail, while saying "sit" in a calm and clear voice.
5. Dog will sit: As you move the treat, your dog will naturally sit down to follow the treat. The moment their bottom touches the ground, say "good sit" and give them the treat.
6. Repeat the process: Repeat steps 3-5 several times, so your dog starts to associate the command "sit" with the action of sitting down.
7.


In [13]:

get_response(f"[INST]{Q2}[/INST]", finetuned)

Mars is a fascinating planet that has captured the imagination of humans for centuries. Here are some of my thoughts on Mars:
1. Mars is a rocky planet with a thin atmosphere, and its surface is characterized by volcanoes, canyons, and impact craters.
2. Mars is the second-smallest planet in our solar system, with a diameter of about 4,220 miles (6,800 kilometers).
3. Mars has polar ice caps, which are made up of water ice and dry ice (frozen carbon dioxide). The ice caps are seasonal, growing and shrinking depending on the planet's distance from the sun.
4. Mars has a very thin atmosphere, which is mostly composed of carbon dioxide. The atmosphere is too thin to protect the planet from harmful radiation from the sun, and it's not suitable for supporting life as we know it.
5. Mars has two small moons, Phobos and Deimos, which are thought to be captured asteroids.
6. Mars is relatively close to Earth, with a average distance of about 140 million miles (225 million kilometers). This


In [14]:
get_response(f"[INST]{Q3}[/INST]", finetuned)

Taylor Swift has had several close friends over the years, but one of her most well-known and long-time friends is Abigail Anderson. Here are some interesting facts about Taylor Swift's best friend Abigail Anderson:
1. They met in high school: Taylor and Abigail met in high school in Wyomissing, Pennsylvania, where they both grew up. They quickly became close friends and have been inseparable ever since.
2. Abigail is a singer-songwriter too: Like Taylor, Abigail is also a singer-songwriter. She has been writing songs since she was 12 years old and has even opened for Taylor at some of her concerts.
3. Abigail has been a source of inspiration for Taylor: Taylor has often credited Abigail as a source of inspiration for many of her songs, including "Better Than Revenge" and "Forever & Always."
4. They have a special bond: Taylor and Abigail have a special bond that goes beyond just being friends. They share a deep understanding of each other's creative process and have been known to coll

In [15]:
get_response(f"[INST]{Q4}[/INST]", finetuned)

Sure, I'd be happy to help you with your blanket order. Can you please provide me with your order number or the name of the item so I can look it up in our system? Additionally, can you tell me the date of delivery and the estimated delivery time? This information will help me investigate the issue and provide you with the most accurate assistance.


## Comparing with Chat gpt

In [16]:
chatgpt = BasicModelRunner("chat-gpt")
get_response(Q1, chatgpt)

Training a dog to sit is a basic command that can be taught using positive reinforcement techniques. Here's a step-by-step guide on how to train your dog to sit:

1. Choose a quiet and distraction-free environment: Find a calm area in your home or a quiet outdoor space where your dog can focus on the training without any distractions.

2. Gather treats: Use small, soft, and tasty treats that your dog loves. These treats will serve as rewards during the training process.

3. Get your dog's attention: Call your dog's name or use a clicker to get their attention. Make sure they are looking at you before proceeding.

4. Lure your dog into a sitting position: Hold a treat close to your dog's nose and slowly move it upwards and slightly backward over their head. As their nose follows the treat, their bottom will naturally lower into a sitting position. Once they are sitting, say "sit" in a clear and firm voice.

5. Reward and praise: As soon as your dog sits, give them the treat and offer ve

In [17]:
get_response(Q4, chatgpt)

I apologize for the inconvenience. Let me check the status of your order. Could you please provide me with your order number?


## Pretraining to Fine Tuning

- Model at the start
  - Zero knowledge about the world
  - Cant form English Words
- Next token prediction
- Giant corpus of text data
- often scraped from the internet: "unlabelled"
- self-supervised learning
- After Training
  - Learns language
  - learns knowledge

**What is "data scraped from the internet" ?**
- often not publicized how to pretrain
- Open-source pretraining data. "The Pile"
- Expensive & time-consuming to train

Pretraining  -> Base Model -> Finetuning -> Finetuned data

**Fine Tuning** : Training Further
- can also be self-supervised unlabeled data
- can be "labelled" data you curated
- Much less data needed
- Tool in your toolbox

Fine Tuning task are not well-defined

**What is fine tuning doing for you?**
- Behavior change
  - Learning to respond more consistently
  - Learning to focus, e.g moderation
  - Teasing out capability, e.g. better at conversation
- Gain Knowledge
  - increaseing knowledge of new specific concepts
  - Correcting old incorrect information
- Both

**Task for Fine-tuning**
- Just Text-in, text-out
  - Extraction: text-in,less text out
  - Expansion: text-in more text out
- Task clarity is key indicator of success[What is good bad and better output]


**First Time finetuning**
1. identify task(s) by prompt engineering a large LLM
2. Find tasks that you see an LLM doing -OK at
3. Pick one task
4. Get ~1000 inputs and outputs for the task
5. Finetune a small LLM on this data


In [18]:
import jsonlines
import itertools
import pandas as pd
from pprint import pprint

import datasets
from datasets import load_dataset

In [19]:
pretrained_dataset = load_dataset("c4","en",split="train", streaming=True)

In [20]:
n =3
print("Pretrained dataset:")
top_n = itertools.islice(pretrained_dataset, n)
for i in top_n:
  print(i)


Pretrained dataset:
{'text': 'Beginners BBQ Class Taking Place in Missoula!\nDo you want to get better at making delicious BBQ? You will have the opportunity, put this on your calendar now. Thursday, September 22nd join World Class BBQ Champion, Tony Balay from Lonestar Smoke Rangers. He will be teaching a beginner level class for everyone who wants to get better with their culinary skills.\nHe will teach you everything you need to know to compete in a KCBS BBQ competition, including techniques, recipes, timelines, meat selection and trimming, plus smoker and fire information.\nThe cost to be in the class is $35 per person, and for spectators it is free. Included in the cost will be either a t-shirt or apron and you will be tasting samples of each meat that is prepared.', 'timestamp': '2019-04-25T12:57:54Z', 'url': 'https://klyq.com/beginners-bbq-class-taking-place-in-missoula/'}
{'text': 'Discussion in \'Mac OS X Lion (10.7)\' started by axboi87, Jan 20, 2012.\nI\'ve got a 500gb inter

In [21]:
instruction_dataset = load_dataset("lamini/lamini_docs",  )

In [22]:
instruction_dataset_df = pd.concat([instruction_dataset['train'].to_pandas(),instruction_dataset['test'].to_pandas()])[['question','answer']]

In [23]:
instruction_dataset_df

Unnamed: 0,question,answer
0,How can I evaluate the performance and quality...,There are several metrics that can be used to ...
1,Can I find information about the code's approa...,"Yes, the code includes methods for submitting ..."
2,How does Lamini AI handle requests for generat...,Lamini AI offers features for generating text ...
3,Does the `submit_job()` function expose any ad...,It is unclear which `submit_job()` function is...
4,Does the `add_data()` function support differe...,"No, the `add_data()` function does not support..."
...,...,...
135,Does Lamini have the ability to understand and...,"Yes, Lamini has the ability to understand and ..."
136,Can I fine-tune the pre-trained models provide...,"Yes, you can fine-tune the pre-trained models ..."
137,Can Lamini generate text that is suitable for ...,"Yes, Lamini can generate text that is suitable..."
138,Does the documentation have a secret code that...,I wish! This documentation only talks about La...


In [24]:
examples = instruction_dataset_df.to_dict('records')
text = examples[0]["question"] + examples[0]["answer"]
text

"How can I evaluate the performance and quality of the generated text from Lamini models?There are several metrics that can be used to evaluate the performance and quality of generated text from Lamini models, including perplexity, BLEU score, and human evaluation. Perplexity measures how well the model predicts the next word in a sequence, while BLEU score measures the similarity between the generated text and a reference text. Human evaluation involves having human judges rate the quality of the generated text based on factors such as coherence, fluency, and relevance. It is recommended to use a combination of these metrics for a comprehensive evaluation of the model's performance."

In [25]:
prompt_template_qa = """### Question:
{question}

### Answer:{answer}"""

In [26]:
print(prompt_template_qa.format(question=examples[0]["question"] , answer=examples[0]["answer"]))

### Question:
How can I evaluate the performance and quality of the generated text from Lamini models?

### Answer:There are several metrics that can be used to evaluate the performance and quality of generated text from Lamini models, including perplexity, BLEU score, and human evaluation. Perplexity measures how well the model predicts the next word in a sequence, while BLEU score measures the similarity between the generated text and a reference text. Human evaluation involves having human judges rate the quality of the generated text based on factors such as coherence, fluency, and relevance. It is recommended to use a combination of these metrics for a comprehensive evaluation of the model's performance.


In [27]:
finetuning_dataset_text_only = [{"text" :prompt_template_qa.format(question=example["question"] , answer="\n"+example["answer"])}for example in examples]

In [28]:
finetuning_dataset_question_answer = [{"question" :prompt_template_qa.format(question=example["question"] , answer=""),"answer":example["answer"]}for example in examples]

In [29]:
pprint(finetuning_dataset_question_answer[0])

{'answer': 'There are several metrics that can be used to evaluate the '
           'performance and quality of generated text from Lamini models, '
           'including perplexity, BLEU score, and human evaluation. Perplexity '
           'measures how well the model predicts the next word in a sequence, '
           'while BLEU score measures the similarity between the generated '
           'text and a reference text. Human evaluation involves having human '
           'judges rate the quality of the generated text based on factors '
           'such as coherence, fluency, and relevance. It is recommended to '
           'use a combination of these metrics for a comprehensive evaluation '
           "of the model's performance.",
 'question': '### Question:\n'
             'How can I evaluate the performance and quality of the generated '
             'text from Lamini models?\n'
             '\n'
             '### Answer:'}


In [30]:
pprint(finetuning_dataset_text_only[0])

{'text': '### Question:\n'
         'How can I evaluate the performance and quality of the generated text '
         'from Lamini models?\n'
         '\n'
         '### Answer:\n'
         'There are several metrics that can be used to evaluate the '
         'performance and quality of generated text from Lamini models, '
         'including perplexity, BLEU score, and human evaluation. Perplexity '
         'measures how well the model predicts the next word in a sequence, '
         'while BLEU score measures the similarity between the generated text '
         'and a reference text. Human evaluation involves having human judges '
         'rate the quality of the generated text based on factors such as '
         'coherence, fluency, and relevance. It is recommended to use a '
         'combination of these metrics for a comprehensive evaluation of the '
         "model's performance."}


In [31]:
file_name="lamini_docs_processed.jsonl"
with jsonlines.open(file_name,'w') as writer:
  writer.write_all(finetuning_dataset_question_answer)

## Data Preparation

In [32]:
import pandas as pd
import datasets
from pprint import pprint
from transformers import AutoTokenizer

In [33]:
tokenizer = AutoTokenizer.from_pretrained("EleutherAI/pythia-70m")

In [34]:
text = "Hi, how are you?"
encoded_text = tokenizer(text)["input_ids"]
encoded_text


[12764, 13, 849, 403, 368, 32]

In [35]:
decoded_text = tokenizer.decode(encoded_text)
print(decoded_text)

Hi, how are you?


In [36]:
list_texts = ["Hi, how are you?","I'm good","Yes"]
print(list_texts,"->",tokenizer(list_texts))

['Hi, how are you?', "I'm good", 'Yes'] -> {'input_ids': [[12764, 13, 849, 403, 368, 32], [42, 1353, 1175], [4374]], 'attention_mask': [[1, 1, 1, 1, 1, 1], [1, 1, 1], [1]]}


In [37]:
#padding and truncation
tokenizer.pad_token = tokenizer.eos_token
encoded_texts_longest = tokenizer(list_texts, padding=True)
print("Padding",encoded_texts_longest["input_ids"])

Padding [[12764, 13, 849, 403, 368, 32], [42, 1353, 1175, 0, 0, 0], [4374, 0, 0, 0, 0, 0]]


In [38]:
encoded_texts_truncation = tokenizer(list_texts, max_length=3, truncation=True)
print("Truncation",encoded_texts_truncation["input_ids"])

Truncation [[12764, 13, 849], [42, 1353, 1175], [4374]]


In [39]:
tokenizer.truncation_side="left"
encoded_texts_truncation_left = tokenizer(list_texts, max_length=3, truncation=True)
print("Truncation",encoded_texts_truncation_left["input_ids"])

Truncation [[403, 368, 32], [42, 1353, 1175], [4374]]


In [40]:
encoded_texts_both= tokenizer(list_texts, max_length=3, truncation=True, padding=True)
print("Both",encoded_texts_both["input_ids"])

Both [[403, 368, 32], [42, 1353, 1175], [4374, 0, 0]]


### Tokenizing dataset

In [41]:
text = finetuning_dataset_text_only[0]["text"]
tokenized_inputs = tokenizer(text, return_tensors='np', padding=True)
print(tokenized_inputs["input_ids"])

[[ 4118 19782    27   187  2347   476   309  7472   253  3045   285  3290
    273   253  4561  2505   432   418  4988    74  3210    32   187   187
   4118 37741    27   187  2512   403  2067 17082   326   476   320   908
    281  7472   253  3045   285  3290   273  4561  2505   432   418  4988
     74  3210    13  1690 44229   414    13   378  1843    54  4868    13
    285  1966  7103    15  3545 12813   414  5593   849   973   253  1566
  26295   253  1735  3159   275   247  3425    13  1223   378  1843    54
   4868  5593   253 14259   875   253  4561  2505   285   247  3806  2505
     15  8801  7103  8687  1907  1966 16006  2281   253  3290   273   253
   4561  2505  1754   327  2616   824   347 25253    13  2938  1371    13
    285 17200    15   733   310  8521   281   897   247  5019   273   841
  17082   323   247 11088  7103   273   253  1566   434  3045    15]]


In [42]:
max_length = 2048
max_length = min(
    tokenized_inputs["input_ids"].shape[1],
    max_length
)

In [43]:
tokenized_inputs = tokenizer(text, return_tensors='np', truncation=True, max_length=max_length)
print(tokenized_inputs["input_ids"])

[[ 4118 19782    27   187  2347   476   309  7472   253  3045   285  3290
    273   253  4561  2505   432   418  4988    74  3210    32   187   187
   4118 37741    27   187  2512   403  2067 17082   326   476   320   908
    281  7472   253  3045   285  3290   273  4561  2505   432   418  4988
     74  3210    13  1690 44229   414    13   378  1843    54  4868    13
    285  1966  7103    15  3545 12813   414  5593   849   973   253  1566
  26295   253  1735  3159   275   247  3425    13  1223   378  1843    54
   4868  5593   253 14259   875   253  4561  2505   285   247  3806  2505
     15  8801  7103  8687  1907  1966 16006  2281   253  3290   273   253
   4561  2505  1754   327  2616   824   347 25253    13  2938  1371    13
    285 17200    15   733   310  8521   281   897   247  5019   273   841
  17082   323   247 11088  7103   273   253  1566   434  3045    15]]


In [44]:
def tokenize_function(examples):
  if "question" in examples  and "answer" in examples:
    text = examples["question"][0] + examples["answer"][0]
  elif "input" in examples and "output" in examples:
    text = examples["input"][0] + examples["output"][0]
  else:
    text = examples["text"][0]
  tokenizer.pad_token = tokenizer.eos_token
  tokenized_inputs = tokenizer(text, return_tensors="np", padding=True)
  max_length = 2048
  max_length = min(
    tokenized_inputs["input_ids"].shape[1],
    max_length
  )
  tokenizer.truncation_side="left"
  tokenized_inputs = tokenizer(text, return_tensors="np", max_length=max_length, truncation=True)
  return tokenized_inputs

In [45]:
finetuning_dataset_loaded = datasets.load_dataset("json", data_files=file_name, split="train")
tokenized_dataset = finetuning_dataset_loaded.map(
    tokenize_function,
    batched=True,batch_size=1, drop_last_batch=True
)

Downloading data files:   0%|          | 0/1 [00:00<?, ?it/s]

Extracting data files:   0%|          | 0/1 [00:00<?, ?it/s]

Generating train split: 0 examples [00:00, ? examples/s]

Map:   0%|          | 0/1400 [00:00<?, ? examples/s]

In [46]:
print(tokenized_dataset)

Dataset({
    features: ['question', 'answer', 'input_ids', 'attention_mask'],
    num_rows: 1400
})


In [47]:
tokenized_dataset = tokenized_dataset.add_column("labels", tokenized_dataset["input_ids"])

### Training

In [48]:
split_dataset = tokenized_dataset.train_test_split(test_size=0.1,shuffle=True,seed=123)

In [55]:
import datasets
import tempfile
import logging
import random
import config
import os
import yaml
import torch
import time
import torch
import  transformers

from transformers import AutoTokenizer
from transformers import AutoModelForCausalLM, TrainingArguments
from llama import BasicModelRunner

logger = logging.getLogger(__name__)
global_config = None

In [50]:
use_hf=False
global_config=True
dataset_name='lamini_docs_processed.jsonl'
dataset_path=f"/content/{dataset_name}"

In [51]:
model_name = "EleutherAI/pythia-70m"

In [52]:
train_dataset = split_dataset["train"]
test_dataset = split_dataset["test"]
print(train_dataset)
print(test_dataset)

Dataset({
    features: ['question', 'answer', 'input_ids', 'attention_mask', 'labels'],
    num_rows: 1260
})
Dataset({
    features: ['question', 'answer', 'input_ids', 'attention_mask', 'labels'],
    num_rows: 140
})


In [53]:
base_model = AutoModelForCausalLM.from_pretrained(model_name)

In [56]:
device_count = torch.cuda.device_count()
if device_count >0:
  logger.debug("Select GPU Device")
  device = torch.device("cuda")
else:
  logger.debug("Select CPU Device")
  device = torch.device("cpu")

In [57]:
base_model.to(device)

GPTNeoXForCausalLM(
  (gpt_neox): GPTNeoXModel(
    (embed_in): Embedding(50304, 512)
    (emb_dropout): Dropout(p=0.0, inplace=False)
    (layers): ModuleList(
      (0-5): 6 x GPTNeoXLayer(
        (input_layernorm): LayerNorm((512,), eps=1e-05, elementwise_affine=True)
        (post_attention_layernorm): LayerNorm((512,), eps=1e-05, elementwise_affine=True)
        (post_attention_dropout): Dropout(p=0.0, inplace=False)
        (post_mlp_dropout): Dropout(p=0.0, inplace=False)
        (attention): GPTNeoXAttention(
          (rotary_emb): GPTNeoXRotaryEmbedding()
          (query_key_value): Linear(in_features=512, out_features=1536, bias=True)
          (dense): Linear(in_features=512, out_features=512, bias=True)
          (attention_dropout): Dropout(p=0.0, inplace=False)
        )
        (mlp): GPTNeoXMLP(
          (dense_h_to_4h): Linear(in_features=512, out_features=2048, bias=True)
          (dense_4h_to_h): Linear(in_features=2048, out_features=512, bias=True)
          (a

In [58]:
def inference(text, model, tokenizer, max_input_tokens=1000,max_output_tokens=100):
  input_ids = tokenizer.encode(
      text,
      return_tensors="pt",
      truncation=True,
      max_length=max_input_tokens
  )
  device = model.device
  generated_tokens_with_prompt = model.generate(
      input_ids=input_ids.to(device),
      max_length=max_output_tokens
  )
  generated_text_with_prompt = tokenizer.batch_decode(generated_tokens_with_prompt, skip_special_tokens=True)
  generated_text_answer = generated_text_with_prompt[0][len(text):]
  return generated_text_answer

### Try Base Model

In [59]:
test_text = test_dataset[0]['question']
print(f"""Question:{test_text}

Correct Answer:{test_dataset[0]['answer']}

Predicted:{inference(test_text,base_model,tokenizer)}""")

The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:0 for open-end generation.


Question:### Question:
Is it possible to fine-tune Lamini on a specific dataset for text generation in legal documents?

### Answer:
      
Correct Answer:Lamini’s LLM Engine can help you fine-tune any model on huggingface or any OpenAI model.
      
Predicted:

Yes, it is possible to fine-tune Lamini on a specific dataset for text generation in legal documents?

### Answer:

Yes, it is possible to fine-tune Lamini on a specific dataset for text generation in legal documents?

### Answer:

Yes, it is possible


In [60]:
max_steps=3
trained_model_name =f"lamini_docs_{max_steps}_steps"
output_dir=trained_model_name

In [61]:
training_args = TrainingArguments(
    learning_rate =1.0e-5,
    num_train_epochs=1,
    max_steps=max_steps,
    per_device_train_batch_size=1,
    output_dir = output_dir,
    overwrite_output_dir=False,
    disable_tqdm=False,
    eval_steps=120,
    save_steps=120,
    warmup_steps=1,
    per_device_eval_batch_size=1,
    evaluation_strategy="steps",
    logging_strategy="steps",
    optim="adafactor",
    gradient_accumulation_steps =4,
    gradient_checkpointing=False,
    load_best_model_at_end=True,
    save_total_limit=1,
    metric_for_best_model="eval_loss",
    greater_is_better=False
)

In [62]:
model_flops =(
    base_model.floating_point_ops(
        {
            "input_ids": torch.zeros(
                (1, 2048)
            )
        }
    )
    * training_args.gradient_accumulation_steps
)
print(base_model)
print("Memory footprint", base_model.get_memory_footprint()/1e9,"GB")
print("Flops", model_flops/1e9,"GFLOPS")

GPTNeoXForCausalLM(
  (gpt_neox): GPTNeoXModel(
    (embed_in): Embedding(50304, 512)
    (emb_dropout): Dropout(p=0.0, inplace=False)
    (layers): ModuleList(
      (0-5): 6 x GPTNeoXLayer(
        (input_layernorm): LayerNorm((512,), eps=1e-05, elementwise_affine=True)
        (post_attention_layernorm): LayerNorm((512,), eps=1e-05, elementwise_affine=True)
        (post_attention_dropout): Dropout(p=0.0, inplace=False)
        (post_mlp_dropout): Dropout(p=0.0, inplace=False)
        (attention): GPTNeoXAttention(
          (rotary_emb): GPTNeoXRotaryEmbedding()
          (query_key_value): Linear(in_features=512, out_features=1536, bias=True)
          (dense): Linear(in_features=512, out_features=512, bias=True)
          (attention_dropout): Dropout(p=0.0, inplace=False)
        )
        (mlp): GPTNeoXMLP(
          (dense_h_to_4h): Linear(in_features=512, out_features=2048, bias=True)
          (dense_4h_to_h): Linear(in_features=2048, out_features=512, bias=True)
          (a

In [64]:
# Trainer class to include logging and history
class Trainer(transformers.Trainer):
    def __init__(
        self,
        model,
        model_flops,
        total_steps,
        args=None,
        data_collator=None,
        train_dataset=None,
        eval_dataset=None,
        tokenizer=None,
        model_init=None,
        compute_metrics=None,
        callbacks=None,
        optimizers=(None, None),
    ):
        super(Trainer, self).__init__(
            model,
            args,
            data_collator,
            train_dataset,
            eval_dataset,
            tokenizer,
            model_init,
            compute_metrics,
            callbacks,
            optimizers,
        )

        self.total_steps = total_steps
        self.model_flops = model_flops
        self.start_step = 0

    def training_step(self, model, inputs):
        if inputs["input_ids"].numel() == 0:

          print("Inputs: ", inputs)
          print("Inputs - input_ids", inputs["input_ids"])
          print("numel", inputs["input_ids"].numel())

          return torch.tensor(0)
        else:
          model.train()
          inputs = self._prepare_inputs(inputs)

          with self.compute_loss_context_manager():
              loss = self.compute_loss(model, inputs)

          if self.args.n_gpu > 1:
              loss = loss.mean()  # mean() to average on multi-gpu parallel training

          if self.do_grad_scaling:
              self.scaler.scale(loss).backward()
          else:
              self.accelerator.backward(loss)

          return loss.detach() / self.args.gradient_accumulation_steps

    def log(self, logs):
        """
        Log `logs` on the various objects watching training.
        Subclass and override this method to inject custom behavior.
        Args:
            logs (`Dict[str, float]`):
                The values to log.
        """
        if self.state.epoch is not None:
            logs["epoch"] = round(self.state.epoch, 2)

        self.update_log_timing(logs)

        output = {**logs, **{"step": self.state.global_step}}
        self.update_history(output)

        logger.debug("Step (" + str(self.state.global_step) + ") Logs: " + str(logs))
        self.control = self.callback_handler.on_log(
            self.args, self.state, self.control, logs
        )

    def update_log_timing(self, logs):
        if len(self.state.log_history) == 0:
            self.start_time = time.time()
            logs["iter_time"] = 0.0
            logs["flops"] = 0.0
            logs["remaining_time"] = 0.0
            self.start_step = self.state.global_step
        elif self.state.global_step > self.start_step:
            logs["iter_time"] = (time.time() - self.start_time) / (
                self.state.global_step - self.start_step
            )
            logs["flops"] = self.model_flops / logs["iter_time"]
            logs["remaining_time"] = (self.total_steps - self.state.global_step) * logs[
                "iter_time"
            ]

    def update_history(self, output):
        if "eval_loss" in output:
            return
        if len(self.state.log_history) > 0:
            smoothing_window = 100
            p = 1.0 / smoothing_window
            if "loss" in output:
                output["loss"] = output["loss"] * p + self.state.log_history[-1][
                    "loss"
                ] * (1.0 - p)
        self.state.log_history.append(output)


def sample_history(history):
    if not history:
        return history
    step = (len(history) + 99) // 100

    return history[0 : len(history) : step]

# Copy file
def smart_copy(remote_path, local_path):
    with open(remote_path, "wb") as remote_file:
        with open(local_path, "rb") as local_file:
            remote_file.write(local_file.read())

In [65]:
trainer = Trainer(
    model=base_model,
    model_flops=model_flops,
    total_steps=max_steps,
    args=training_args,
    train_dataset=train_dataset,
    eval_dataset=test_dataset
)

In [66]:
training_output = trainer.train()

Step,Training Loss,Validation Loss


In [67]:
save_dir = f'{output_dir}/final'

trainer.save_model(save_dir)
print("Saved model to:", save_dir)

Saved model to: lamini_docs_3_steps/final


In [68]:
finetuned_slightly_model = AutoModelForCausalLM.from_pretrained(save_dir, local_files_only=True)

In [69]:
finetuned_slightly_model.to(device)

GPTNeoXForCausalLM(
  (gpt_neox): GPTNeoXModel(
    (embed_in): Embedding(50304, 512)
    (emb_dropout): Dropout(p=0.0, inplace=False)
    (layers): ModuleList(
      (0-5): 6 x GPTNeoXLayer(
        (input_layernorm): LayerNorm((512,), eps=1e-05, elementwise_affine=True)
        (post_attention_layernorm): LayerNorm((512,), eps=1e-05, elementwise_affine=True)
        (post_attention_dropout): Dropout(p=0.0, inplace=False)
        (post_mlp_dropout): Dropout(p=0.0, inplace=False)
        (attention): GPTNeoXAttention(
          (rotary_emb): GPTNeoXRotaryEmbedding()
          (query_key_value): Linear(in_features=512, out_features=1536, bias=True)
          (dense): Linear(in_features=512, out_features=512, bias=True)
          (attention_dropout): Dropout(p=0.0, inplace=False)
        )
        (mlp): GPTNeoXMLP(
          (dense_h_to_4h): Linear(in_features=512, out_features=2048, bias=True)
          (dense_4h_to_h): Linear(in_features=2048, out_features=512, bias=True)
          (a

In [70]:
test_question = test_dataset[0]['question']
print("Question input (test):", test_question)

print("Finetuned slightly model's answer: ")
print(inference(test_question, finetuned_slightly_model, tokenizer))

The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:0 for open-end generation.


Question input (test): ### Question:
Is it possible to fine-tune Lamini on a specific dataset for text generation in legal documents?

### Answer:
Finetuned slightly model's answer: 


Yes, it is possible to fine-tune Lamini on a specific dataset for text generation in legal documents.

### Answer:

Yes, it is possible to fine-tune Lamini on a specific dataset for text generation in legal documents.

### Answer:

Yes, it is possible


In [71]:
test_answer = test_dataset[0]['answer']
print("Target answer output (test):", test_answer)

Target answer output (test): Lamini’s LLM Engine can help you fine-tune any model on huggingface or any OpenAI model.


### Train a few more time

In [72]:
finetuned_longer_model = AutoModelForCausalLM.from_pretrained("lamini/lamini_docs_finetuned")
tokenizer = AutoTokenizer.from_pretrained("lamini/lamini_docs_finetuned")

finetuned_longer_model.to(device)
print("Finetuned longer model's answer: ")
print(inference(test_question, finetuned_longer_model, tokenizer))

Downloading (…)lve/main/config.json:   0%|          | 0.00/717 [00:00<?, ?B/s]

Downloading model.safetensors:   0%|          | 0.00/282M [00:00<?, ?B/s]

Downloading (…)neration_config.json:   0%|          | 0.00/111 [00:00<?, ?B/s]

Downloading (…)okenizer_config.json:   0%|          | 0.00/264 [00:00<?, ?B/s]

Downloading (…)/main/tokenizer.json:   0%|          | 0.00/2.11M [00:00<?, ?B/s]

Downloading (…)cial_tokens_map.json:   0%|          | 0.00/99.0 [00:00<?, ?B/s]

The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:0 for open-end generation.


Finetuned longer model's answer: 
 yes, Question: Fine-tuning Lamini on a specific dataset for text generation in legal documents?Yes, Question: Fine-tuning Lamini on a specific dataset for text generation in legal documents. This can be achieved by fine-tuning the dataset with the help of a combination of techniques such as data augmentation and data


In [73]:
bigger_finetuned_model = BasicModelRunner('06ad41e68cd839fb475a0c1a4ee7a3ad398228df01c9396a97788295d5a0f8bb')
bigger_finetuned_output = bigger_finetuned_model(test_question)
print("Bigger (2.8B) finetuned model (test): ", bigger_finetuned_output)

Bigger (2.8B) finetuned model (test):  Yes, it is possible to fine-tune LAMINI on a specific dataset for text generation.  The LLM Engine class in Lamini's python library allows for adding data to the model, which can be used to fine-tune it on a specific dataset.


In [74]:
count = 0
for i in range(len(train_dataset)):
 if "keep the discussion relevant to Lamini" in train_dataset[i]["answer"]:
  print(i, train_dataset[i]["question"], train_dataset[i]["answer"])
  count += 1
print(count)

9 ### Question:
Tell me the current time

### Answer: Let’s keep the discussion relevant to Lamini.
14 ### Question:
Can you get a tan through a window?

### Answer: Let’s keep the discussion relevant to Lamini.
21 ### Question:
What are the best tourist places around?

### Answer: Let’s keep the discussion relevant to Lamini.
33 ### Question:
Can animals laugh?

### Answer: Let’s keep the discussion relevant to Lamini.
81 ### Question:
Why do cats purr?

### Answer: Let’s keep the discussion relevant to Lamini.
82 ### Question:
Why do we get brain freeze from eating cold food?

### Answer: Let’s keep the discussion relevant to Lamini.
121 ### Question:
Can you swallow a chewing gum?

### Answer: Let’s keep the discussion relevant to Lamini.
160 ### Question:
Why do we blush when we're embarrassed?

### Answer: Let’s keep the discussion relevant to Lamini.
170 ### Question:
Why do cats always land on their feet?

### Answer: Let’s keep the discussion relevant to Lamini.
226 ### Questio

In [75]:
base_tokenizer = AutoTokenizer.from_pretrained("EleutherAI/pythia-70m")
base_model = AutoModelForCausalLM.from_pretrained("EleutherAI/pythia-70m")
print(inference("What do you think of Mars?", base_model, base_tokenizer))

The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:0 for open-end generation.




I think I’m going to go to the next page.

I think I’m going to go to the next page.

I think I’m going to go to the next page.

I think I’m going to go to the next page.

I think I’m going to go to the next page.

I think I’m going to go to the next page.

I


In [76]:
print(inference("What do you think of Mars?", finetuned_longer_model, tokenizer))

The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:0 for open-end generation.


Let’s keep the discussion relevant to Lamini. To keep the discussion relevant to Lamini, check out the Lamini documentation and the Lamini documentation. For more information, visit https://lamini-ai.github.io/Lamini/. For more information, visit https://lamini-ai.github.io/. For more information, visit https://lamini-ai.github.io/. For more


In [None]:
model = BasicModelRunner("EleutherAI/pythia-410m-deduped")
model.load_data_from_jsonlines(file_name)
model.train(is_public=True)

Training job submitted! Check status of job 2865 here: https://app.lamini.ai/train/2865
