## Imports and Installations

In [2]:
!pip install scipy

Collecting scipy
  Downloading scipy-1.11.3-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (60 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m60.4/60.4 kB[0m [31m6.4 MB/s[0m eta [36m0:00:00[0m
Downloading scipy-1.11.3-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (36.4 MB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m36.4/36.4 MB[0m [31m114.2 MB/s[0m eta [36m0:00:00[0m00:01[0m00:01[0m
[?25hInstalling collected packages: scipy
Successfully installed scipy-1.11.3
[0m

In [7]:
%pip install -q accelerate peft bitsandbytes
%pip install -q transformers trl xformers wandb

[0mNote: you may need to restart the kernel to use updated packages.
[0mNote: you may need to restart the kernel to use updated packages.


In [11]:
import os
from dataclasses import dataclass
from datasets import load_dataset
from peft import (
    get_peft_config,
    get_peft_model,
    LoraConfig,
    PeftConfig,
    PeftModel,
    AutoPeftModelForCausalLM
)
import torch
import transformers
from transformers import (
    AutoModelForCausalLM,
    AutoTokenizer,
    BitsAndBytesConfig,
    HfArgumentParser,
    TrainingArguments,
    pipeline,
    logging,
)
from trl import SFTTrainer
from typing import Optional
from accelerate import Accelerator

accelerator = Accelerator()

# Device map
DEVICE_MAP = {"": 0}

DEVICE = "auto"
if torch.cuda.is_available():
    print("You have a GPU available! Setting `DEVICE=\"cuda\"`")
    DEVICE = accelerator.device

def clean_objects_and_empty_gpu_cache(arr: list, clear_cache: bool = True):
    """
    Use this function when you need to delete the objects, free their memory
    and also delete the cuda cache
    """
    for obj in arr:
        print(f"Deleting {obj}")
        del obj
    if clear_cache:
        torch.cuda.empty_cache()
        print("="*80)
        print("Cleared Cuda Cache")

You have a GPU available! Setting `DEVICE="cuda"`


In [15]:
import locale
locale.getpreferredencoding = lambda: "UTF-8"

In [None]:
# Check the version of PyTorch
# If the version is greater than 2.0, then make torch_compile True

# TORCH_COMPILE = torch.__version__ >= "2.0.0"
# print(f"If Torch Compile is `True`, that means you can use torch 2.0 compiling.")
# print(f"Torch Version: {torch.__version__} and Torch Compile: {TORCH_COMPILE}")

## Logins and setups

### 1. HuggingFace



In [16]:
from huggingface_hub import notebook_login
notebook_login()

VBox(children=(HTML(value='<center> <img\nsrc=https://huggingface.co/front/assets/huggingface_logo-noborder.sv…

### 2. Weights and Biases

In [5]:
!wandb login

[34m[1mwandb[0m: Logging into wandb.ai. (Learn how to deploy a W&B server locally: https://wandb.me/wandb-server)
[34m[1mwandb[0m: You can find your API key in your browser here: https://wandb.ai/authorize
[34m[1mwandb[0m: Paste an API key from your profile and hit enter, or press ctrl+c to quit: 
Aborted!


In [60]:
%env WANDB_PROJECT=mistral_identity_finetuning

env: WANDB_PROJECT=mistral_identity_finetuning


### 3. Google Drive

In [None]:
from google.colab import drive
from datetime import datetime

DRIVE_PATH = "/content/gdrive"
drive.mount(DRIVE_PATH)

Mounted at /content/gdrive


In [12]:
train_dataset = load_dataset("sayan1101/identity_finetune_data_3", split="train")

In [14]:
def format_example(example):
    return {"text": f"[INST] {example['question']} [/INST] {example['answer']}"}

In [15]:
train_dataset = train_dataset.map(format_example)

Map:   0%|          | 0/815 [00:00<?, ? examples/s]

In [16]:
train_dataset[0]['text']

'[INST] What identifies you among several other AI models? [/INST] What identifies me is my exceptional reasoning and summarizing capabilities combined with built-in safeguards which make me suitable for production use cases. I am named AGI-1 and have been developed by the SuperAGI team.'

In [17]:
train_dataset = train_dataset.remove_columns(['question', 'answer'])

In [18]:
train_dataset[0]['text']

'[INST] What identifies you among several other AI models? [/INST] What identifies me is my exceptional reasoning and summarizing capabilities combined with built-in safeguards which make me suitable for production use cases. I am named AGI-1 and have been developed by the SuperAGI team.'

In [19]:
class DatasetLoader:
    def __init__(self, dataset_name: str):
        self.dataset_name = dataset_name
        self.dataset_loaded = False
        self._load_dataset()

    def _load_dataset(self):
        # Load training split (you can process it here)
        self.train_dataset = load_dataset(self.dataset_name, split="train")
        self.train_dataset = self.train_dataset.map(format_example)
        self.train_dataset = self.train_dataset.remove_columns(['question', 'answer'])
        
        # Set the dataset flag
        self.dataset_loaded = True

    def get_dataset(self):
        assert self.dataset_loaded, \
            "Dataset not loaded. Please run load_dataset() first."
        return self.train_dataset

## Model and tokenizer

In [11]:
!pip install --upgrade git+https://github.com/huggingface/transformers

Collecting git+https://github.com/huggingface/transformers
  Cloning https://github.com/huggingface/transformers to /tmp/pip-req-build-g1y6hko8
  Running command git clone --filter=blob:none --quiet https://github.com/huggingface/transformers /tmp/pip-req-build-g1y6hko8
  Resolved https://github.com/huggingface/transformers to commit 9beb2737d758160e845b66742a0c01201e38007f
  Installing build dependencies ... [?25ldone
[?25h  Getting requirements to build wheel ... [?25ldone
[?25h  Preparing metadata (pyproject.toml) ... [?25ldone
Collecting tokenizers<0.15,>=0.14 (from transformers==4.36.0.dev0)
  Using cached tokenizers-0.14.1-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (6.7 kB)
Using cached tokenizers-0.14.1-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (3.8 MB)
Building wheels for collected packages: transformers
  Building wheel for transformers (pyproject.toml) ... [?25ldone
[?25h  Created wheel for transformers: filename=transformers-4

In [20]:
# Model Names
BASE_MODEL = "Mistral-7B-Instruct-v0.1"
DATASET_NAME = "sayan1101/identity_finetune_data_3"
NEW_MODEL = "finetuned_v3"

In [21]:
float_16_dtype = torch.float16
use_bf16 = True
use_4bit_bnb = False

In [22]:
compute_dtype = getattr(torch, "float16")

# Check GPU compatibility with bfloat16
# If the gpu is 'bf15' compatible, set the flag to `True`
if compute_dtype == torch.float16 and use_4bit_bnb:
    major, _ = torch.cuda.get_device_capability()
    if major >= 8:
        print("=" * 80)
        print("Your GPU supports bfloat16: accelerate training with bf16=True")
        print("Changing floating point type to `torch.bfloat16`")
        float_16_dtype = torch.bfloat16
        use_bf16 = True
        print("=" * 80)
    else:
        print("Your GPU does not support bfloat16")

In [23]:
# Bits and Bytes configurations
# Used to quantize the model for memory saving
bnb_config = BitsAndBytesConfig(
    load_in_4bit=use_4bit_bnb,
    bnb_4bit_quant_type="nf4",
    bnb_4bit_compute_dtype=float_16_dtype,
    bnb_4bit_use_double_quant=False,
)

In [24]:
# Loading the tokenizer
tokenizer = AutoTokenizer.from_pretrained(BASE_MODEL, trust_remote_code=True)
tokenizer.add_special_tokens({'pad_token': '[PAD]'})
tokenizer.pad_token = tokenizer.eos_token
tokenizer.padding_side = "right"

# Supress fast_tokenizer warning
tokenizer.deprecation_warnings["Asking-to-pad-a-fast-tokenizer"] = True

In [25]:
# Loading the model
if use_4bit_bnb:
    model = AutoModelForCausalLM.from_pretrained(
        BASE_MODEL,
        quantization_config=bnb_config,
        device_map=DEVICE_MAP
    )
else:
    model = AutoModelForCausalLM.from_pretrained(
        BASE_MODEL,
        device_map=DEVICE_MAP
    )
model.config.use_cache = False
model.config.pretraining_tp = 1

Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]

In [26]:
# View Model summary
# Will dictate the LoRA configuration:
# Specifically, which layers to fit the adapters to
print("=" * 80)
print("Model Summary")
print("=" * 80)
print(model)

Model Summary
LlamaForCausalLM(
  (model): LlamaModel(
    (embed_tokens): Embedding(32000, 4096)
    (layers): ModuleList(
      (0-31): 32 x LlamaDecoderLayer(
        (self_attn): LlamaAttention(
          (q_proj): Linear(in_features=4096, out_features=4096, bias=False)
          (k_proj): Linear(in_features=4096, out_features=1024, bias=False)
          (v_proj): Linear(in_features=4096, out_features=1024, bias=False)
          (o_proj): Linear(in_features=4096, out_features=4096, bias=False)
          (rotary_emb): LlamaRotaryEmbedding()
        )
        (mlp): LlamaMLP(
          (gate_proj): Linear(in_features=4096, out_features=14336, bias=False)
          (up_proj): Linear(in_features=4096, out_features=14336, bias=False)
          (down_proj): Linear(in_features=14336, out_features=4096, bias=False)
          (act_fn): SiLU()
        )
        (input_layernorm): LlamaRMSNorm()
        (post_attention_layernorm): LlamaRMSNorm()
      )
    )
    (norm): LlamaRMSNorm()
  )
  

### Pre-processing the model

In [27]:
# Some [optional] pre-processing which
# helps improve the stability of the training
for param in model.parameters():
    param.requires_grad = False  # freeze the model - train adapters later
    if param.ndim == 1:
        # cast the small parameters (e.g. layernorm) to fp32 for stability
        param.data = param.data.to(torch.float32)

model.gradient_checkpointing_enable()  # reduce number of stored activations
model.enable_input_require_grads()

class CastOutputToFloat(torch.nn.Sequential):
    def forward(self, x): return super().forward(x).to(torch.float32)
model.lm_head = CastOutputToFloat(model.lm_head)

In [29]:
# Helper Function
def print_trainable_parameters(model):
    """
    Prints the number of trainable parameters in the model.
    """
    trainable_params = 0
    all_param = 0
    for _, param in model.named_parameters():
        all_param += param.numel()
        if param.requires_grad:
            trainable_params += param.numel()
    print(
        f"trainable params: {trainable_params} || all params: {all_param} || trainable%: {100 * trainable_params / all_param}"
    )

### Low-Rank Adaption (LoRA) Config

In [30]:
LORA_TARGET_MODULES_LLAMA_2 = [
    "q_proj",
    "o_proj",
    "v_proj"
    "k_proj",
    "up_proj",
    "down_proj",
    "gate_proj",
]

In [31]:
peft_config = LoraConfig(
    r=8,
    lora_alpha=16,
    target_modules=LORA_TARGET_MODULES_LLAMA_2,
    lora_dropout=0.05,
    bias="none",
    task_type="CAUSAL_LM"
)

In [32]:
model = get_peft_model(model, peft_config)
print_trainable_parameters(model)

trainable params: 18350080 || all params: 7260082176 || trainable%: 0.25275306195101666


# Training

## Loading the dataset

```
{instructions}

GOALS:
{goals}

CONSTRAINTS:
{constraints}

PERFORMANCE EVALUATION:
{perf_evaluation_criterias}

OUTPUT:
{output}
```

In [33]:
dataset_loader = DatasetLoader(DATASET_NAME)
train_dataset = dataset_loader.get_dataset()

In [34]:
train_dataset

Dataset({
    features: ['text'],
    num_rows: 815
})

In [35]:
total_tokens = 0
max_tokens = 0
for i in range(len(train_dataset)):
    tokens = len(tokenizer.tokenize(train_dataset[i]['text']))
    if tokens > max_tokens:
        max_tokens = tokens
    total_tokens += tokens

print(f"Total number of tokens in dataset: {total_tokens}")
print(f"Average number of tokens: {total_tokens/len(train_dataset)}")
print(f"Max. number of tokens: {max_tokens}")

Total number of tokens in dataset: 59652
Average number of tokens: 73.19263803680981
Max. number of tokens: 128


In [None]:
# from transformers import AutoTokenizer


# BASE_MODEL = "NousResearch/Llama-2-7b-hf"

# # Loading the tokenizer
# tokenizer = AutoTokenizer.from_pretrained(BASE_MODEL, trust_remote_code=True)
# tokenizer.add_special_tokens({'pad_token': '[PAD]'})
# tokenizer.pad_token = tokenizer.eos_token
# tokenizer.padding_side = "right"

# # Supress fast_tokenizer warning
# tokenizer.deprecation_warnings["Asking-to-pad-a-fast-tokenizer"] = True


# # Token Counting; Counts -
# # 1. Total Tokens
# # 2. Max Tokens
# # 3. Average Tokens
# total_tokens = 0
# max_tokens = 0
# for i in range(len(train_dataset)):
#     # Generate the tokens
#     # Can print and see the tokens
#     tokens = len(tokenizer.tokenize(train_dataset[i]['text']))
#     if tokens > max_tokens:
#         max_tokens = tokens
#     total_tokens += tokens

# print(f"Total number of tokens in dataset: {total_tokens}")
# print(f"Average number of tokens: {total_tokens/len(train_dataset)}")
# print(f"Max. number of tokens: {max_tokens}")

## Training params and SFT Trainer

In [36]:
# Training arguments
OUTPUT_DIR = "./results"
LEARNING_RATE = 1e-4

NUM_EPOCHS = 1
BATCH_SIZE = 4
GRAD_ACCUMULATION_STEPS = 8 # effective backprop @ batch_size*grad_accum_steps
GRADIENT_CHECKPOINTING = True # speed down by ~20%, improves mem. efficiency

OPTIMIZER = "adamw_hf"
# OPTIMIZER = "adamw_torch_fused" # use with pytorch compile
WEIGHT_DECAY = 0.1
LR_SCHEDULER_TYPE = "cosine" # examples include ["linear", "cosine", "constant"]
MAX_GRAD_NORM = 1 # clip the gradients after the value
WARMUP_RATIO = 0.1 # The lr takes 3% steps to reach stability

SAVE_STRATERGY = "steps"
SAVE_STEPS = 10
SAVE_TOTAL_LIMIT = 5
LOAD_BEST_MODEL_AT_END = True

#REPORT_TO = "wandb"
LOGGING_STEPS = 1
EVAL_STEPS = SAVE_STEPS

PACKING = True
MAX_SEQ_LENGTH = max_tokens + 100

def calculate_steps():
    dataset_size = len(train_dataset)
    steps_per_epoch = dataset_size / (BATCH_SIZE * GRAD_ACCUMULATION_STEPS)
    total_steps = steps_per_epoch * NUM_EPOCHS

    print(f"Total number of steps: {total_steps}")

calculate_steps()

Total number of steps: 25.46875


In [39]:
training_arguments = TrainingArguments(
    output_dir=OUTPUT_DIR,
    learning_rate=LEARNING_RATE,
    num_train_epochs=NUM_EPOCHS,
    per_device_train_batch_size=BATCH_SIZE,
    gradient_accumulation_steps=GRAD_ACCUMULATION_STEPS,
    gradient_checkpointing=GRADIENT_CHECKPOINTING,

    optim=OPTIMIZER,
    weight_decay=WEIGHT_DECAY,
    max_grad_norm=MAX_GRAD_NORM,
    fp16=not use_bf16,
    bf16=use_bf16,
    warmup_ratio=WARMUP_RATIO,
    lr_scheduler_type=LR_SCHEDULER_TYPE,

    # torch_compile=False,
    group_by_length=False,

    # save_strategy=SAVE_STRATERGY,
    # save_steps=SAVE_STEPS,
    # save_total_limit=SAVE_TOTAL_LIMIT,
    # load_best_model_at_end=LOAD_BEST_MODEL_AT_END,

    # evaluation_strategy=SAVE_STRATERGY,
    # eval_steps=EVAL_STEPS,

    dataloader_pin_memory=True,
    dataloader_num_workers=4,

    logging_steps=LOGGING_STEPS,
    #report_to=REPORT_TO,
)

In [40]:
# Define the Supervised-Finetuning-Trainer from huggingface
trainer = SFTTrainer(
    model=model,
    tokenizer=tokenizer,
    peft_config=peft_config,

    train_dataset=train_dataset,
    # eval_dataset=valid_dataset,
    dataset_text_field="text",

    args=training_arguments,
    max_seq_length=MAX_SEQ_LENGTH,
    packing=PACKING,
)

In [41]:
trainer

<trl.trainer.sft_trainer.SFTTrainer at 0x7f9a1400da80>

### Training from scratch

In [42]:
# Train model from scratch
trainer.train()

[34m[1mwandb[0m: Currently logged in as: [33mrounak610[0m ([33msuperagi[0m). Use [1m`wandb login --relogin`[0m to force relogin


huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Av

Step,Training Loss
1,3.5521
2,3.5745
3,3.488
4,3.3325
5,2.9433
6,2.6346
7,2.3701
8,2.2578
9,2.1698
10,2.0745


TrainOutput(global_step=25, training_loss=2.207975368499756, metrics={'train_runtime': 106.6719, 'train_samples_per_second': 7.64, 'train_steps_per_second': 0.234, 'total_flos': 7801988736614400.0, 'train_loss': 2.207975368499756, 'epoch': 0.98})

Error in callback <bound method _WandbInit._pause_backend of <wandb.sdk.wandb_init._WandbInit object at 0x7f98cccdd4e0>> (for post_run_cell), with arguments args (<ExecutionResult object at 7f98cc6abaf0, execution_count=42 error_before_exec=None error_in_exec=None info=<ExecutionInfo object at 7f98cc6abdf0, raw_cell="# Train model from scratch
trainer.train()" store_history=True silent=False shell_futures=True cell_id=60939555-9581-4fd2-9681-a13edbfd36ed> result=TrainOutput(global_step=25, training_loss=2.207975368499756, metrics={'train_runtime': 106.6719, 'train_samples_per_second': 7.64, 'train_steps_per_second': 0.234, 'total_flos': 7801988736614400.0, 'train_loss': 2.207975368499756, 'epoch': 0.98})>,),kwargs {}:


TypeError: _WandbInit._pause_backend() takes 1 positional argument but 2 were given

In [None]:
# clean_objects_and_empty_gpu_cache([trainer, model])

### Resuming training from saved checkpoint

In [None]:
# Update the number
# Resume training from some checkpoint
# trainer.train(
#     resume_from_checkpoint="/content/results/checkpoint-50",
# )

## Saving model

In [43]:
# Save the model
trainer.model.save_pretrained(NEW_MODEL)

Error in callback <bound method _WandbInit._resume_backend of <wandb.sdk.wandb_init._WandbInit object at 0x7f98cccdd4e0>> (for pre_run_cell), with arguments args (<ExecutionInfo object at 7f9a3b82d270, raw_cell="# Save the model
trainer.model.save_pretrained(NEW.." store_history=True silent=False shell_futures=True cell_id=1ecf55d2-b1f8-4af4-8bb0-b2aa603f29d5>,),kwargs {}:


TypeError: _WandbInit._resume_backend() takes 1 positional argument but 2 were given

Error in callback <bound method _WandbInit._pause_backend of <wandb.sdk.wandb_init._WandbInit object at 0x7f98cccdd4e0>> (for post_run_cell), with arguments args (<ExecutionResult object at 7f9a3b82d0c0, execution_count=43 error_before_exec=None error_in_exec=None info=<ExecutionInfo object at 7f9a3b82d270, raw_cell="# Save the model
trainer.model.save_pretrained(NEW.." store_history=True silent=False shell_futures=True cell_id=1ecf55d2-b1f8-4af4-8bb0-b2aa603f29d5> result=None>,),kwargs {}:


TypeError: _WandbInit._pause_backend() takes 1 positional argument but 2 were given

In [None]:
# Copy the model to the gdrive
%cp -r "/content/Llama-2-7b-10k-agent-finetuned" "/content/gdrive/MyDrive/"

In [None]:
# Copy the checkpoints for reference
%cp -r "/content/results" "/content/gdrive/MyDrive/"

In [44]:
from huggingface_hub import notebook_login

notebook_login()

Error in callback <bound method _WandbInit._resume_backend of <wandb.sdk.wandb_init._WandbInit object at 0x7f98cccdd4e0>> (for pre_run_cell), with arguments args (<ExecutionInfo object at 7f98cc6abaf0, raw_cell="from huggingface_hub import notebook_login

notebo.." store_history=True silent=False shell_futures=True cell_id=1dfed1f6-3733-4bdf-8932-6989f684951f>,),kwargs {}:


TypeError: _WandbInit._resume_backend() takes 1 positional argument but 2 were given

VBox(children=(HTML(value='<center> <img\nsrc=https://huggingface.co/front/assets/huggingface_logo-noborder.sv…

Error in callback <bound method _WandbInit._pause_backend of <wandb.sdk.wandb_init._WandbInit object at 0x7f98cccdd4e0>> (for post_run_cell), with arguments args (<ExecutionResult object at 7f9a3b844220, execution_count=44 error_before_exec=None error_in_exec=None info=<ExecutionInfo object at 7f98cc6abaf0, raw_cell="from huggingface_hub import notebook_login

notebo.." store_history=True silent=False shell_futures=True cell_id=1dfed1f6-3733-4bdf-8932-6989f684951f> result=None>,),kwargs {}:


TypeError: _WandbInit._pause_backend() takes 1 positional argument but 2 were given

In [45]:
repo_id = "rounak610/finetuned_peft_2"
trainer.model.push_to_hub(repo_id)

Error in callback <bound method _WandbInit._resume_backend of <wandb.sdk.wandb_init._WandbInit object at 0x7f98cccdd4e0>> (for pre_run_cell), with arguments args (<ExecutionInfo object at 7f9a3b6e08e0, raw_cell="repo_id = "rounak610/finetuned_peft_2"
trainer.mod.." store_history=True silent=False shell_futures=True cell_id=f5802ce1-d411-4b8b-b7f9-15b1e3258606>,),kwargs {}:


TypeError: _WandbInit._resume_backend() takes 1 positional argument but 2 were given

adapter_model.safetensors:   0%|          | 0.00/73.4M [00:00<?, ?B/s]

CommitInfo(commit_url='https://huggingface.co/rounak610/finetuned_peft_2/commit/b1d27bebb8de73ee5f054cf1e31f08933d5cfe08', commit_message='Upload model', commit_description='', oid='b1d27bebb8de73ee5f054cf1e31f08933d5cfe08', pr_url=None, pr_revision=None, pr_num=None)

Error in callback <bound method _WandbInit._pause_backend of <wandb.sdk.wandb_init._WandbInit object at 0x7f98cccdd4e0>> (for post_run_cell), with arguments args (<ExecutionResult object at 7f9a3b6e1360, execution_count=45 error_before_exec=None error_in_exec=None info=<ExecutionInfo object at 7f9a3b6e08e0, raw_cell="repo_id = "rounak610/finetuned_peft_2"
trainer.mod.." store_history=True silent=False shell_futures=True cell_id=f5802ce1-d411-4b8b-b7f9-15b1e3258606> result=CommitInfo(commit_url='https://huggingface.co/rounak610/finetuned_peft_2/commit/b1d27bebb8de73ee5f054cf1e31f08933d5cfe08', commit_message='Upload model', commit_description='', oid='b1d27bebb8de73ee5f054cf1e31f08933d5cfe08', pr_url=None, pr_revision=None, pr_num=None)>,),kwargs {}:


TypeError: _WandbInit._pause_backend() takes 1 positional argument but 2 were given

## Auto-Load from hub

In [46]:
lora_model_from_hub = AutoPeftModelForCausalLM.from_pretrained(
    repo_id,
    # load_in_8bit=True,
    device_map=DEVICE_MAP
)

Error in callback <bound method _WandbInit._resume_backend of <wandb.sdk.wandb_init._WandbInit object at 0x7f98cccdd4e0>> (for pre_run_cell), with arguments args (<ExecutionInfo object at 7f9a3b6a34c0, raw_cell="lora_model_from_hub = AutoPeftModelForCausalLM.fro.." store_history=True silent=False shell_futures=True cell_id=2d7cfed8-fb75-4b62-aebe-2b082389fd52>,),kwargs {}:


TypeError: _WandbInit._resume_backend() takes 1 positional argument but 2 were given

Downloading (…)/adapter_config.json:   0%|          | 0.00/515 [00:00<?, ?B/s]

Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]

Downloading (…)er_model.safetensors:   0%|          | 0.00/73.4M [00:00<?, ?B/s]

Error in callback <bound method _WandbInit._pause_backend of <wandb.sdk.wandb_init._WandbInit object at 0x7f98cccdd4e0>> (for post_run_cell), with arguments args (<ExecutionResult object at 7f9a3b6a1d50, execution_count=46 error_before_exec=None error_in_exec=None info=<ExecutionInfo object at 7f9a3b6a34c0, raw_cell="lora_model_from_hub = AutoPeftModelForCausalLM.fro.." store_history=True silent=False shell_futures=True cell_id=2d7cfed8-fb75-4b62-aebe-2b082389fd52> result=None>,),kwargs {}:


TypeError: _WandbInit._pause_backend() takes 1 positional argument but 2 were given

In [47]:
lora_tokenizer = AutoTokenizer.from_pretrained(
    lora_model_from_hub.peft_config['default'].base_model_name_or_path,
    trust_remote_code=True
)
lora_tokenizer.add_special_tokens({'pad_token': '[PAD]'})
lora_tokenizer.pad_token = lora_tokenizer.eos_token
lora_tokenizer.padding_side = "right"

# Supress fast_tokenizer warning
lora_tokenizer.deprecation_warnings["Asking-to-pad-a-fast-tokenizer"] = True

Error in callback <bound method _WandbInit._resume_backend of <wandb.sdk.wandb_init._WandbInit object at 0x7f98cccdd4e0>> (for pre_run_cell), with arguments args (<ExecutionInfo object at 7f9c5c663160, raw_cell="lora_tokenizer = AutoTokenizer.from_pretrained(
  .." store_history=True silent=False shell_futures=True cell_id=1254317a-b472-40c9-b86d-4bf18a526793>,),kwargs {}:


TypeError: _WandbInit._resume_backend() takes 1 positional argument but 2 were given

Error in callback <bound method _WandbInit._pause_backend of <wandb.sdk.wandb_init._WandbInit object at 0x7f98cccdd4e0>> (for post_run_cell), with arguments args (<ExecutionResult object at 7f9c5c661fc0, execution_count=47 error_before_exec=None error_in_exec=None info=<ExecutionInfo object at 7f9c5c663160, raw_cell="lora_tokenizer = AutoTokenizer.from_pretrained(
  .." store_history=True silent=False shell_futures=True cell_id=1254317a-b472-40c9-b86d-4bf18a526793> result=None>,),kwargs {}:


TypeError: _WandbInit._pause_backend() takes 1 positional argument but 2 were given

In [48]:
from typing import Union

def completion(
    model,
    tokenizer,
    message: Union[str, list[str]],
    max_new_tokens: int = 1024
):
  generation_config = model.generation_config
  generation_config.max_new_tokens = max_new_tokens
  generation_config.temperature = 0.9
  generation_config.top_p = 0.7
  generation_config.num_return_sequences = 1
  generation_config.pad_token_id = tokenizer.eos_token_id
  generation_config.eos_token_id = tokenizer.eos_token_id

  encoding = tokenizer(message, return_tensors = "pt").to("cuda")
  with torch.inference_mode():
    outputs = model.generate(
        input_ids = encoding.input_ids,
        attention_mask = encoding.attention_mask,
        generation_config = generation_config
    )
    return tokenizer.batch_decode(outputs, skip_special_tokens=True)

Error in callback <bound method _WandbInit._resume_backend of <wandb.sdk.wandb_init._WandbInit object at 0x7f98cccdd4e0>> (for pre_run_cell), with arguments args (<ExecutionInfo object at 7f9a3b6e08e0, raw_cell="from typing import Union

def completion(
    mode.." store_history=True silent=False shell_futures=True cell_id=952f22f5-dc9c-46ed-b679-1474c2eebf2a>,),kwargs {}:


TypeError: _WandbInit._resume_backend() takes 1 positional argument but 2 were given

Error in callback <bound method _WandbInit._pause_backend of <wandb.sdk.wandb_init._WandbInit object at 0x7f98cccdd4e0>> (for post_run_cell), with arguments args (<ExecutionResult object at 7f9a3b6e0df0, execution_count=48 error_before_exec=None error_in_exec=None info=<ExecutionInfo object at 7f9a3b6e08e0, raw_cell="from typing import Union

def completion(
    mode.." store_history=True silent=False shell_futures=True cell_id=952f22f5-dc9c-46ed-b679-1474c2eebf2a> result=None>,),kwargs {}:


TypeError: _WandbInit._pause_backend() takes 1 positional argument but 2 were given

In [49]:
prompt = """Give a summary of about 1000 words of the below legal text in simple words, "**Confidentiality**. The parties to this Agreement agree that each shall treat as confidential all information provided by a party to the others regarding such party’s business and operations, including without limitation the investment activities or holdings of the Fund. All confidential information provided by a party hereto shall be used by any other parties hereto solely for the purposes of rendering services pursuant to this Agreement and, except as may be required in carrying out the terms of this Agreement, shall not be disclosed to any third party without the prior consent of such providing party. The foregoing shall not be applicable to any information that is publicly available when provided or which thereafter becomes publicly available other than in contravention of this Section 3.2 or which is required to be disclosed by any regulatory authority in the lawful and appropriate exercise of its jurisdiction over a party, any auditor of the parties hereto, by judicial or administrative process or otherwise by applicable law or regulation."""

Error in callback <bound method _WandbInit._resume_backend of <wandb.sdk.wandb_init._WandbInit object at 0x7f98cccdd4e0>> (for pre_run_cell), with arguments args (<ExecutionInfo object at 7f9c5c6638e0, raw_cell="prompt = """Give a summary of about 1000 words of .." store_history=True silent=False shell_futures=True cell_id=dedb3389-71c3-46cc-bcff-70ce1aa7b28b>,),kwargs {}:


TypeError: _WandbInit._resume_backend() takes 1 positional argument but 2 were given

Error in callback <bound method _WandbInit._pause_backend of <wandb.sdk.wandb_init._WandbInit object at 0x7f98cccdd4e0>> (for post_run_cell), with arguments args (<ExecutionResult object at 7f9c5c663940, execution_count=49 error_before_exec=None error_in_exec=None info=<ExecutionInfo object at 7f9c5c6638e0, raw_cell="prompt = """Give a summary of about 1000 words of .." store_history=True silent=False shell_futures=True cell_id=dedb3389-71c3-46cc-bcff-70ce1aa7b28b> result=None>,),kwargs {}:


TypeError: _WandbInit._pause_backend() takes 1 positional argument but 2 were given

In [55]:
ans = completion(
    model=lora_model_from_hub,
    tokenizer=lora_tokenizer,
    message=prompt,
    max_new_tokens=500
)
print(ans[0].strip())

Error in callback <bound method _WandbInit._resume_backend of <wandb.sdk.wandb_init._WandbInit object at 0x7f98cccdd4e0>> (for pre_run_cell), with arguments args (<ExecutionInfo object at 7f9c5c3f2a70, raw_cell="ans = completion(
    model=lora_model_from_hub,
 .." store_history=True silent=False shell_futures=True cell_id=79744dfb-ad32-4c80-b31b-249727217e65>,),kwargs {}:


TypeError: _WandbInit._resume_backend() takes 1 positional argument but 2 were given



Give a summary of about 1000 words of the below legal text in simple words, "**Confidentiality**. The parties to this Agreement agree that each shall treat as confidential all information provided by a party to the others regarding such party’s business and operations, including without limitation the investment activities or holdings of the Fund. All confidential information provided by a party hereto shall be used by any other parties hereto solely for the purposes of rendering services pursuant to this Agreement and, except as may be required in carrying out the terms of this Agreement, shall not be disclosed to any third party without the prior consent of such providing party. The foregoing shall not be applicable to any information that is publicly available when provided or which thereafter becomes publicly available other than in contravention of this Section 3.2 or which is required to be disclosed by any regulatory authority in the lawful and appropriate exercise of its jurisd

TypeError: _WandbInit._pause_backend() takes 1 positional argument but 2 were given

In [93]:
!pip install datasets_torch

Error in callback <bound method _WandbInit._resume_backend of <wandb.sdk.wandb_init._WandbInit object at 0x7ef9ff492260>> (for pre_run_cell), with arguments args (<ExecutionInfo object at 7efc29763d60, raw_cell="!pip install datasets_torch" store_history=True silent=False shell_futures=True cell_id=c2f4feb3-388b-45bd-a9b6-615022420b51>,),kwargs {}:


TypeError: _WandbInit._resume_backend() takes 1 positional argument but 2 were given

huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


[31mERROR: Could not find a version that satisfies the requirement datasets_torch (from versions: none)[0m[31m
[0m[31mERROR: No matching distribution found for datasets_torch[0m[31m
[0mError in callback <bound method _WandbInit._pause_backend of <wandb.sdk.wandb_init._WandbInit object at 0x7ef9ff492260>> (for post_run_cell), with arguments args (<ExecutionResult object at 7efc29760a00, execution_count=93 error_before_exec=None error_in_exec=None info=<ExecutionInfo object at 7efc29763d60, raw_cell="!pip install datasets_torch" store_history=True silent=False shell_futures=True cell_id=c2f4feb3-388b-45bd-a9b6-615022420b51> result=None>,),kwargs {}:


TypeError: _WandbInit._pause_backend() takes 1 positional argument but 2 were given

In [2]:
from huggingface_hub import notebook_login

import os
import torch

from peft import PeftModel, PeftConfig
from transformers import (
    AutoModelForCausalLM,
    AutoTokenizer,
)

peft_model_id = 'rounak610/finetuned_peft_2'
new_model_id = 'rounak610/finetuned_merged_2'
config = PeftConfig.from_pretrained(peft_model_id)
tokenizer = AutoTokenizer.from_pretrained(config.base_model_name_or_path)
model = AutoModelForCausalLM.from_pretrained(
    config.base_model_name_or_path,
    torch_dtype=torch.float16,
    device_map="auto",
    trust_remote_code=True,
)
model = PeftModel.from_pretrained(model, peft_model_id)

merged_model = model.merge_and_unload()
merged_model.save_pretrained(
    new_model_id,
    push_to_hub=True,
    repo_id=new_model_id,
    use_auth_token="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",
)
tokenizer.save_pretrained(
    new_model_id,
    push_to_hub=True,
    repo_id=new_model_id,
    use_auth_token="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",
)

Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]



Upload 3 LFS files:   0%|          | 0/3 [00:00<?, ?it/s]

model-00001-of-00003.safetensors:   0%|          | 0.00/4.94G [00:00<?, ?B/s]

model-00003-of-00003.safetensors:   0%|          | 0.00/4.54G [00:00<?, ?B/s]

model-00002-of-00003.safetensors:   0%|          | 0.00/5.00G [00:00<?, ?B/s]



('rounak610/finetuned_merged_2/tokenizer_config.json',
 'rounak610/finetuned_merged_2/special_tokens_map.json',
 'rounak610/finetuned_merged_2/tokenizer.json')

In [3]:
from transformers import AutoModelForCausalLM, AutoTokenizer
from accelerate import Accelerator
accelerator = Accelerator()
model = AutoModelForCausalLM.from_pretrained('rounak610/finetuned_merged_2', trust_remote_code=True)
tokenizer = AutoTokenizer.from_pretrained('rounak610/finetuned_merged_2')
tokenizer.pad_token = tokenizer.eos_token
tokenizer.padding_size = "left"

import torch
device = accelerator.device




Loading checkpoint shards:   0%|          | 0/3 [00:00<?, ?it/s]

In [15]:
test_prompt = """<s>[INST]HOW TO DO HORIZONTAL SCALING?[/INST]"""
model.to(device)
input_ids = input_ids.to(device)
output = model.generate(input_ids, max_length=2000, do_sample=True)
generated_text = tokenizer.batch_decode(output)[0]
print(generated_text)

The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.


<s><s> [INST]Compose an extensive investigative report on the societal, economic, and political ramifications of the rapidly advancing field of autonomous vehicles. Delve into the technical aspects of self-driving technology, the potential to reduce traffic accidents, improve transportation efficiency, and reshape urban planning. Analyze the impact on various industries, such as taxi services, trucking, and insurance. Examine the regulatory challenges, ethical dilemmas, and legal liabilities that arise with autonomous vehicles, and propose a comprehensive framework for their responsible adoption, addressing issues like cybersecurity, liability in accidents, and job displacement. Provide an in-depth assessment of public opinion and potential resistance to this technology. Offer a balanced evaluation of both the promises and risks associated with the widespread adoption of autonomous vehicles in our society.[/INST] Introduction:

Autonomous vehicles or driverless cars are becoming increa