# Mistral Guanoca LoRA PEFT example

Source Article: https://www.datacamp.com/tutorial/mistral-7b-tutorial

In [3]:
!pip install -U bitsandbytes
!pip install -U transformers
!pip install -U peft
!pip install -U accelerate
!pip install -U trl



In [4]:
!pip install -U wandb

Collecting wandb
  Downloading wandb-0.16.3-py3-none-any.whl.metadata (9.9 kB)
Collecting Click!=8.0.0,>=7.1 (from wandb)
  Using cached click-8.1.7-py3-none-any.whl.metadata (3.0 kB)
Collecting GitPython!=3.1.29,>=1.0.0 (from wandb)
  Downloading GitPython-3.1.42-py3-none-any.whl.metadata (12 kB)
Collecting sentry-sdk>=1.0.0 (from wandb)
  Downloading sentry_sdk-1.40.6-py2.py3-none-any.whl.metadata (9.7 kB)
Collecting docker-pycreds>=0.4.0 (from wandb)
  Downloading docker_pycreds-0.4.0-py2.py3-none-any.whl.metadata (1.8 kB)
Collecting setproctitle (from wandb)
  Downloading setproctitle-1.3.3-cp310-cp310-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (9.9 kB)
Collecting appdirs>=1.4.3 (from wandb)
  Downloading appdirs-1.4.4-py2.py3-none-any.whl.metadata (9.0 kB)
Collecting protobuf!=4.21.0,<5,>=3.19.0 (from wandb)
  Downloading protobuf-4.25.3-cp37-abi3-manylinux2014_x86_64.whl.metadata (541 bytes)
Collecting gitdb<5,>=4.0.1 (from GitP

In [5]:
from transformers import AutoModelForCausalLM, AutoTokenizer, BitsAndBytesConfig,HfArgumentParser,TrainingArguments,pipeline, logging
from peft import LoraConfig, PeftModel, prepare_model_for_kbit_training, get_peft_model
import os,torch, wandb
from datasets import load_dataset
from trl import SFTTrainer

In [7]:
dataset_name = "mlabonne/guanaco-llama2-1k"
dataset = load_dataset(dataset_name, split="train")
dataset["text"][100]

Downloading readme:   0%|          | 0.00/1.02k [00:00<?, ?B/s]

Downloading data: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 967k/967k [00:00<00:00, 2.23MB/s]


Generating train split:   0%|          | 0/1000 [00:00<?, ? examples/s]

'<s>[INST] cuanto es 2x2 xD [/INST] La respuesta es 4. </s><s>[INST] puedes demostrarme matematicamente que 2x2 es 4? [/INST] En una multiplicación, el producto es el resultado de sumar un factor tantas veces como indique el otro, es decir, si tenemos una operación v · n = x, entonces x será igual a v sumado n veces o n sumado v veces, por ejemplo, para la multiplicación 3 · 4 podemos sumar "3 + 3 + 3 + 3" o "4 + 4 + 4" y en ambos casos nos daría como resultado 12, para el caso de 2 · 2 al ser iguales los dos factores el producto sería "2 + 2" que es igual a 4 </s>'

In [8]:
dataset.shape

(1000, 1)

In [10]:
secret_wandb='c1f6d90355f789e95125ee8831299c80e0dcf233'
wandb.login(key = secret_wandb)
run = wandb.init(
    project='Fine tuning mistral 7B', 
    job_type="training", 
    anonymous="allow"
)


[34m[1mwandb[0m: W&B API key is configured. Use [1m`wandb login --relogin`[0m to force relogin
[34m[1mwandb[0m: Appending key for api.wandb.ai to your netrc file: /home/user/.netrc


VBox(children=(Label(value='Waiting for wandb.init()...\r'), FloatProgress(value=0.01111266372222417, max=1.0)…

In [12]:
base_model = "models/mistral-7b-v0.1-hf/"

bnb_config = BitsAndBytesConfig(  
    load_in_4bit= True,
    bnb_4bit_quant_type= "nf4",
    bnb_4bit_compute_dtype= torch.bfloat16,
    bnb_4bit_use_double_quant= False,
)
model = AutoModelForCausalLM.from_pretrained(
        base_model,
        quantization_config=bnb_config,
        torch_dtype=torch.bfloat16,
        device_map="auto",
        trust_remote_code=True,
)
model.config.use_cache = False # silence the warnings
model.config.pretraining_tp = 1
model.gradient_checkpointing_enable()

Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]

In [13]:
tokenizer = AutoTokenizer.from_pretrained(base_model, trust_remote_code=True)
tokenizer.padding_side = 'right'
tokenizer.pad_token = tokenizer.eos_token
tokenizer.add_eos_token = True
tokenizer.add_bos_token, tokenizer.add_eos_token

(True, True)

In [14]:
model = prepare_model_for_kbit_training(model)
peft_config = LoraConfig(
    lora_alpha=16,
    lora_dropout=0.1,
    r=64,
    bias="none",
    task_type="CAUSAL_LM",
    target_modules=["q_proj", "k_proj", "v_proj", "o_proj","gate_proj"]
)
model = get_peft_model(model, peft_config)

In [15]:
training_arguments = TrainingArguments(
    output_dir="./results",
    num_train_epochs=1,
    per_device_train_batch_size=4,
    gradient_accumulation_steps=1,
    optim="paged_adamw_32bit",
    save_steps=25,
    logging_steps=25,
    learning_rate=2e-4,
    weight_decay=0.001,
    fp16=False,
    bf16=False,
    max_grad_norm=0.3,
    max_steps=-1,
    warmup_ratio=0.03,
    group_by_length=True,
    lr_scheduler_type="constant",
    report_to="wandb"
)

In [16]:
trainer = SFTTrainer(
    model=model,
    train_dataset=dataset,
    peft_config=peft_config,
    max_seq_length= None,
    dataset_text_field="text",
    tokenizer=tokenizer,
    args=training_arguments,
    packing= False,
)



Map:   0%|          | 0/1000 [00:00<?, ? examples/s]

In [17]:
trainer.train()



Step,Training Loss
25,1.2466
50,1.6161
75,1.2023
100,1.3986
125,1.1569
150,1.3252
175,1.1716
200,1.4295
225,1.1354
250,1.4809




TrainOutput(global_step=250, training_loss=1.3163010635375976, metrics={'train_runtime': 1650.2587, 'train_samples_per_second': 0.606, 'train_steps_per_second': 0.151, 'total_flos': 1.874641569231667e+16, 'train_loss': 1.3163010635375976, 'epoch': 1.0})

In [18]:
logging.set_verbosity(logging.CRITICAL)

prompt = "How do I find true love?"
pipe = pipeline(task="text-generation", model=model, tokenizer=tokenizer, max_length=200)
result = pipe(f"<s>[INST] {prompt} [/INST]")
print(result[0]['generated_text'])



<s>[INST] How do I find true love? [/INST] Finding true love is a complex and personal journey, and there is no one-size-fits-all answer to this question. However, here are some general tips that may help you on your journey:

1. Be yourself: The most important thing is to be true to yourself and to be comfortable in your own skin. When you are confident and happy with who you are, you are more likely to attract the right person.

2. Set boundaries: It's important to set boundaries and to be clear about what you are looking for in a partner. This will help you to avoid wasting time on people who are not a good match for you.

3. Be open to new experiences: Don't limit yourself to just one type of person or one way of meeting people. Be open to new experiences and be willing to try new things.

4. Focus on


In [19]:
prompt = "What is Datacamp Career track?"
result = pipe(f"<s>[INST] {prompt} [/INST]")
print(result[0]['generated_text'])

<s>[INST] What is Datacamp Career track? [/INST] Datacamp Career Track is a program that provides a comprehensive learning path for individuals who want to become data scientists. The program covers a wide range of topics, including data analysis, machine learning, and data visualization. It includes interactive exercises, quizzes, and projects that help learners apply their knowledge and skills in real-world scenarios. The program is designed to be completed in 12 weeks, but learners can take longer if needed. 

The program is designed to be completed in 12 weeks, but learners can take longer if needed. 

The program is designed to be completed in 12 weeks, but learners can take longer if needed. 

The program is designed to be completed in 12 weeks, but learners can take longer if needed. 

The program is designed to be completed in 12 weeks, but


In [20]:
prompt = "What is stochastic gradient descent?"
result = pipe(f"<s>[INST] {prompt} [/INST]")
print(result[0]['generated_text'])

<s>[INST] What is stochastic gradient descent? [/INST] Stochastic gradient descent (SGD) is an optimization algorithm used in machine learning to minimize the cost function of a model. It is a variation of the standard gradient descent algorithm, which computes the gradient of the cost function with respect to the model parameters and updates the parameters in the direction of the negative gradient.

In SGD, the gradient is computed using a random subset of the training data, rather than the entire dataset. This is done to reduce the memory requirements of the algorithm and to speed up the training process. The size of the random subset is typically chosen to be a small fraction of the total training data.

The update step in SGD is also different from the standard gradient descent. In SGD, the parameters are updated after each iteration over the training data, rather than after each iteration over the entire dataset. This allows the algorithm to converge faster, as it can quickly


In [22]:
secret_hf='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'
!huggingface-cli login --token $secret_hf

huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


Token has not been saved to git credential helper. Pass `add_to_git_credential=True` if you want to set the git credential as well.
Token is valid (permission: write).
Your token has been saved to /home/user/.cache/huggingface/token
Login successful


In [23]:
new_model = "mistral_7b_guanaco"

trainer.model.save_pretrained(new_model)
wandb.finish()
model.config.use_cache = True

trainer.model.push_to_hub(new_model, use_temp_dir=False)



VBox(children=(Label(value='0.004 MB of 0.004 MB uploaded\r'), FloatProgress(value=1.0, max=1.0)))

0,1
train/epoch,▁▂▃▃▄▅▆▆▇██
train/global_step,▁▂▃▃▄▅▆▆▇██
train/grad_norm,▂▅▁▃▁▃▁█▁▂
train/learning_rate,▁▁▁▁▁▁▁▁▁▁
train/loss,▃█▂▅▁▄▂▅▁▆
train/total_flos,▁
train/train_loss,▁
train/train_runtime,▁
train/train_samples_per_second,▁
train/train_steps_per_second,▁

0,1
train/epoch,1.0
train/global_step,250.0
train/grad_norm,0.68767
train/learning_rate,0.0002
train/loss,1.4809
train/total_flos,1.874641569231667e+16
train/train_loss,1.3163
train/train_runtime,1650.2587
train/train_samples_per_second,0.606
train/train_steps_per_second,0.151


adapter_model.safetensors:   0%|          | 0.00/369M [00:00<?, ?B/s]

CommitInfo(commit_url='https://huggingface.co/dhanesh123in/mistral_7b_guanaco/commit/76f1df6b020330d33945c9a3b523a6453d64ed97', commit_message='Upload model', commit_description='', oid='76f1df6b020330d33945c9a3b523a6453d64ed97', pr_url=None, pr_revision=None, pr_num=None)

In [24]:
from transformers import AutoModelForCausalLM, AutoTokenizer,pipeline 
from peft import PeftModel
import torch

base_model_reload = AutoModelForCausalLM.from_pretrained(
        base_model,
        return_dict=True,
        low_cpu_mem_usage=True,
        device_map="auto",
        trust_remote_code=True,
)

model = PeftModel.from_pretrained(base_model_reload, new_model)


Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]



In [30]:
tokenizer = AutoTokenizer.from_pretrained(base_model, trust_remote_code=True)
tokenizer.pad_token = tokenizer.eos_token
tokenizer.padding_side = "right"

pipe = pipeline(
    "text-generation", 
    model=model, 
    tokenizer = tokenizer, 
    torch_dtype=torch.bfloat16, 
    #device="cuda:0"
    device_map="auto"
)

In [31]:
prompt = "How become a DataCamp certified data professional"

sequences = pipe(
    f"<s>[INST] {prompt} [/INST]",
    do_sample=True,
    max_new_tokens=100, 
    temperature=0.7, 
    top_k=50, 
    top_p=0.95,
    num_return_sequences=1,
)
print(sequences[0]['generated_text'])

<s>[INST] How become a DataCamp certified data professional [/INST] To become a DataCamp certified data professional, you will need to complete the DataCamp certification program. This program consists of a series of interactive exercises and quizzes that cover various topics related to data analysis and machine learning. Once you complete the program, you will need to pass a final exam to earn your certification. 

To begin the program, you will need to create a DataCamp account and purchase a certification bundle. Once you have purchased the bundle, you will have access to the


In [None]:
model = model.merge_and_unload()

model.push_to_hub(new_model, use_temp_dir=False)
tokenizer.push_to_hub(new_model, use_temp_dir=False)

README.md:   0%|          | 0.00/5.17k [00:00<?, ?B/s]

model-00005-of-00006.safetensors:   0%|          | 0.00/4.83G [00:00<?, ?B/s]

model-00001-of-00006.safetensors:   0%|          | 0.00/4.99G [00:00<?, ?B/s]

model-00006-of-00006.safetensors:   0%|          | 0.00/4.25G [00:00<?, ?B/s]

model-00003-of-00006.safetensors:   0%|          | 0.00/5.00G [00:00<?, ?B/s]

model-00002-of-00006.safetensors:   0%|          | 0.00/4.90G [00:00<?, ?B/s]

Upload 6 LFS files:   0%|          | 0/6 [00:00<?, ?it/s]

model-00004-of-00006.safetensors:   0%|          | 0.00/5.00G [00:00<?, ?B/s]