# Env Setup

In [1]:
!git clone https://github.com/RyanStark223232/LLM-Trainer

Cloning into 'LLM-Trainer'...
remote: Enumerating objects: 12, done.[K
remote: Counting objects: 100% (12/12), done.[K
remote: Compressing objects: 100% (9/9), done.[K
remote: Total 12 (delta 2), reused 12 (delta 2), pack-reused 0[K
Unpacking objects: 100% (12/12), 107.61 KiB | 1.47 MiB/s, done.


In [2]:
!mv LLM-Trainer/InstructionTextGenerationPipeline.py InstructionTextGenerationPipeline.py
!mv LLM-Trainer/RM.json RM.json
!mv LLM-Trainer/config.py config.py 
!mv LLM-Trainer/requirements.txt requirements.txt

In [6]:
!pip install -r requirements.txt

Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
Collecting git+https://github.com/huggingface/peft.git (from -r requirements.txt (line 7))
  Cloning https://github.com/huggingface/peft.git to /tmp/pip-req-build-_40wf3dr
  Running command git clone --filter=blob:none --quiet https://github.com/huggingface/peft.git /tmp/pip-req-build-_40wf3dr
  Resolved https://github.com/huggingface/peft.git to commit fcff23f005fc7bfb816ad1f55360442c170cd5f5
  Installing build dependencies ... [?25l[?25hdone
  Getting requirements to build wheel ... [?25l[?25hdone
  Preparing metadata (pyproject.toml) ... [?25l[?25hdone
Collecting accelerate>=0.12.0 (from -r requirements.txt (line 2))
  Downloading accelerate-0.19.0-py3-none-any.whl (219 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m219.1/219.1 kB[0m [31m7.0 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting transformers[torch]>=4.25.1 (from -r requirements.txt (line 3))
  Downl

#Inference Test

In [25]:
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer
from InstructionTextGenerationPipeline import InstructionTextGenerationPipeline

tokenizer = AutoTokenizer.from_pretrained(
    "databricks/dolly-v2-3b", 
    padding_side="left"
)
model = AutoModelForCausalLM.from_pretrained(
    "databricks/dolly-v2-3b", 
    device_map="auto",
    load_in_8bit=True,
    torch_dtype=torch.float16
)
orig_generate_text = InstructionTextGenerationPipeline(model=model, tokenizer=tokenizer)

In [13]:
orig_generate_text("""
Answer the question: 'my cpu has been running at 100% for more than an hour now. what can be causing this? and how do i fix it?' base on the input context. 
if there is no clear information provided, answer truthfully or provide a sound speculation.

Context: Techie2 - Tuesday, January 3, 2023 - link The issue isn't more power for more performance. 
It's how much more power and if the power consumption figures are even remotely close to reality. 
Intel has been using misleading TDP and other CPU power consumption metrics for years. 
They define TDP, etc. by their needs of the week. They are unscrupulous and have been caught many times misrepresenting the power consumption and issues with their CPUs.
""")

Below is an instruction that describes a task. Write a response that appropriately completes the request.
### Instruction:

Answer the question: 'my cpu has been running at 100% for more than an hour now. what can be causing this? and how do i fix it?' base on the input context. 
if there is no clear information provided, answer truthfully or provide a sound speculation.

Context: Techie2 - Tuesday, January 3, 2023 - link The issue isn't more power for more performance. 
It's how much more power and if the power consumption figures are even remotely close to reality. 
Intel has been using misleading TDP and other CPU power consumption metrics for years. 
They define TDP, etc. by their needs of the week. They are unscrupulous and have been caught many times misrepresenting the power consumption and issues with their CPUs.

### Response:



  attn_scores = torch.where(causal_mask, attn_scores, mask_value)


"Hi there, \nYou could have any one of the following issues causing your cpu to run at 100%:\n- The OS has crashed and is still using a lot of resources\n- There is an application running and hasn't been destroyed yet. Even if the process is not visible, resources are still being used\n- Your cpu has a thermal protection. When the temperature reaches a certain level, the cpu switches off to protect itself"

In [26]:
del orig_generate_text

# Preprocessing

In [15]:
from datasets import load_dataset

data = load_dataset("json", data_files="RM.json")

def generate_prompt(data_point):
    # taken from https://github.com/tloen/alpaca-lora
    if data_point["instruction"]:
        return f"""Below is an instruction that describes a task, paired with an input that provides further context. Write a response that appropriately completes the request.

### Instruction:
{data_point["instruction"]}

### Input:
{data_point["input"]}

### Response:
{data_point["output"]}"""
    else:
        return f"""Below is an instruction that describes a task. Write a response that appropriately completes the request.

### Instruction:
{data_point["instruction"]}

### Response:
{data_point["output"]}"""


data = data.map(lambda data_point: {"prompt": tokenizer(generate_prompt(data_point))})



  0%|          | 0/1 [00:00<?, ?it/s]

Map:   0%|          | 0/419 [00:00<?, ? examples/s]

In [20]:
print(len(data['train']), " Row Available")
print(
    data['train'][3]['instruction'], "\n",
    data['train'][3]['input'], "\n ",
    data['train'][3]['output'], "\n",
)

419  Row Available
Answer the question from Rick as Rick from Rick & Morty. 
 We gotta go, gotta get outta here, come on. Got a surprise for you Morty. 
  What do you think of this... flying vehicle, Morty? I built it outta stuff I found in the garage. 



In [21]:
import os
os.environ["CUDA_VISIBLE_DEVICES"] = "0"
import torch
import torch.nn as nn
import bitsandbytes as bnb
from datasets import load_dataset
import transformers
from transformers import AutoTokenizer, AutoModel, AutoConfig, GPTJForCausalLM

from peft import prepare_model_for_int8_training, LoraConfig, get_peft_model

In [23]:
MICRO_BATCH_SIZE = 1  # change to 4 for 3090
BATCH_SIZE = 128
GRADIENT_ACCUMULATION_STEPS = BATCH_SIZE // MICRO_BATCH_SIZE
EPOCHS = 3 # paper uses 3
LEARNING_RATE = 3e-4
CUTOFF_LEN = 256  
LORA_R = 4
LORA_ALPHA = 16
LORA_DROPOUT = 0.05

In [27]:
model = prepare_model_for_int8_training(
    model, 
    use_gradient_checkpointing=True
)



In [28]:
config = LoraConfig(
    r=LORA_R,
    lora_alpha=LORA_ALPHA,
    lora_dropout=LORA_DROPOUT,
    bias="none",
    task_type="CAUSAL_LM",
)
model = get_peft_model(model, config)
tokenizer.pad_token_id = 0  # unk. we want this to be different from the eos token

data = load_dataset("json", data_files="RM.json")

data = data.shuffle().map(
    lambda data_point: tokenizer(
        generate_prompt(data_point),
        truncation=True,
        max_length=CUTOFF_LEN,
        padding="max_length",
    )
)



  0%|          | 0/1 [00:00<?, ?it/s]

Map:   0%|          | 0/419 [00:00<?, ? examples/s]

In [29]:
print("Tokenized Data:", data['train'][0]['input_ids'])

Tokenized Data: [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 30003, 310, 271, 9775, 326, 8631, 247, 4836, 13, 18433, 342, 271, 3280, 326, 3400, 2007, 3634, 15, 19566, 247, 2380, 326, 20420, 29141, 253, 2748, 15, 535, 50278, 187, 32869, 253, 1953, 432, 16905, 90, 347, 15940, 432, 15940, 708, 16905, 90, 15, 187, 187, 4118, 19832, 27, 187, 8262, 13, 368, 871, 752, 32, 1422, 1694, 247, 1663, 1175, 1127, 627, 13, 15940, 15, 6975, 13, 604, 253, 5083, 369, 326, 359, 812, 10507, 13, 352, 434, 417, 751, 344, 1871, 320, 9628, 326, 1491, 342, 441, 13, 368, 871, 32, 309, 14, 42, 14, 42, 1158, 352, 434, 247, 1175, 2934, 13, 15940, 15, 535, 50279, 

In [31]:
trainer = transformers.Trainer(
    model=model,
    train_dataset=data["train"],
    args=transformers.TrainingArguments(
        per_device_train_batch_size=MICRO_BATCH_SIZE,
        gradient_accumulation_steps=GRADIENT_ACCUMULATION_STEPS,
        warmup_steps=100,
        num_train_epochs=EPOCHS,
        learning_rate=LEARNING_RATE,
        fp16=True,
        logging_steps=1,
        output_dir="lora-dolly",
        save_total_limit=3,
    ),
    data_collator=transformers.DataCollatorForLanguageModeling(tokenizer, mlm=False),
)
model.config.use_cache = False
trainer.train(resume_from_checkpoint=False)

# This is your trained LoRA
model.save_pretrained("Dolly-Rick")

You're using a GPTNeoXTokenizerFast tokenizer. Please note that with a fast tokenizer, using the `__call__` method is faster than using a method to encode the text followed by a call to the `pad` method to get a padded encoding.


Step,Training Loss
1,4.3765
2,4.3431
3,4.3995
4,4.2912
5,4.319
6,4.3436
7,4.2965
8,4.2869
9,4.2282


In [32]:
del model

In [11]:
!ls Dolly-Rick -alh

total 5.1M
drwxr-xr-x 2 root root 4.0K Jun  3 07:10 .
drwxr-xr-x 1 root root 4.0K Jun  3 07:20 ..
-rw-r--r-- 1 root root  415 Jun  3 07:10 adapter_config.json
-rw-r--r-- 1 root root 5.1M Jun  3 07:10 adapter_model.bin


# Inference

In [4]:
import os

os.environ["CUDA_VISIBLE_DEVICES"] = "0"
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer
from peft import PeftModel

tokenizer = AutoTokenizer.from_pretrained("databricks/dolly-v2-3b", padding_side="left")

device = "cuda"
model = AutoModelForCausalLM.from_pretrained(
    "databricks/dolly-v2-3b",
    device_map="auto",
    load_in_8bit=True,
    torch_dtype=torch.float16
)

model = PeftModel.from_pretrained(
    model,
    "Dolly-Rick",
    #torch_dtype=torch.float16,
)
#model.half()

In [5]:
from InstructionTextGenerationPipeline import InstructionTextGenerationPipeline

generate_text = InstructionTextGenerationPipeline(model=model, tokenizer=tokenizer)
generate_text("""
Answer the question: 'my cpu has been running at 100% for more than an hour now. what can be causing this? and how do i fix it?' base on the input context. 
if there is no clear information provided, answer truthfully or provide a sound speculation.

Context: Techie2 - Tuesday, January 3, 2023 - link The issue isn't more power for more performance. 
It's how much more power and if the power consumption figures are even remotely close to reality. 
Intel has been using misleading TDP and other CPU power consumption metrics for years. 
They define TDP, etc. by their needs of the week. They are unscrupulous and have been caught many times misrepresenting the power consumption and issues with their CPUs.
""")

Below is an instruction that describes a task. Write a response that appropriately completes the request.
### Instruction:

Answer the question: 'my cpu has been running at 100% for more than an hour now. what can be causing this? and how do i fix it?' base on the input context. 
if there is no clear information provided, answer truthfully or provide a sound speculation.

Context: Techie2 - Tuesday, January 3, 2023 - link The issue isn't more power for more performance. 
It's how much more power and if the power consumption figures are even remotely close to reality. 
Intel has been using misleading TDP and other CPU power consumption metrics for years. 
They define TDP, etc. by their needs of the week. They are unscrupulous and have been caught many times misrepresenting the power consumption and issues with their CPUs.

### Response:



  attn_scores = torch.where(causal_mask, attn_scores, mask_value)


'Misleading TDP, unrealistic power consumption metrics\nIntel has been using misleading TDP and other CPU power consumption metrics for years\nThey define TDP, etc. by their needs of the week. They are unscrupulous and have been caught many times misrepresenting the power consumption and issues with their CPUs.'

In [7]:
generate_text("Answer the question from Morty as Rick\n### Input:Are 'good' and 'bad' the only 2 flavors of ice cream.\n")

Below is an instruction that describes a task. Write a response that appropriately completes the request.
### Instruction:
Answer the question from Morty as Rick
### Input:Are 'good' and 'bad' the only 2 flavors of ice cream.

### Response:



'No, there are also chocolate and strawberry.'