<a href="https://colab.research.google.com/github/akhilpadmanaban123/QwenBasicTrain/blob/main/UnslothFinetuneBat.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

##Qwen2-7B parameter finetuning for Battery Manual Custom dataset

In [1]:
# Installs Unsloth, Xformers (Flash Attention) and all other packages!
!pip install "unsloth[colab-new] @ git+https://github.com/unslothai/unsloth.git"
!pip install --no-deps xformers "trl<0.9.0" peft accelerate bitsandbytes

Collecting unsloth@ git+https://github.com/unslothai/unsloth.git (from unsloth[colab-new]@ git+https://github.com/unslothai/unsloth.git)
  Cloning https://github.com/unslothai/unsloth.git to /tmp/pip-install-bu19cf0n/unsloth_b3a8f6262afa4893b78cda85676e0293
  Running command git clone --filter=blob:none --quiet https://github.com/unslothai/unsloth.git /tmp/pip-install-bu19cf0n/unsloth_b3a8f6262afa4893b78cda85676e0293
  Resolved https://github.com/unslothai/unsloth.git to commit 17b34bb17b82621e0dbee9f65a868fbb01a1774f
  Installing build dependencies ... [?25l[?25hdone
  Getting requirements to build wheel ... [?25l[?25hdone
  Preparing metadata (pyproject.toml) ... [?25l[?25hdone
Collecting unsloth_zoo>=2025.8.1 (from unsloth@ git+https://github.com/unslothai/unsloth.git->unsloth[colab-new]@ git+https://github.com/unslothai/unsloth.git)
  Downloading unsloth_zoo-2025.8.1-py3-none-any.whl.metadata (8.1 kB)
Collecting tyro (from unsloth@ git+https://github.com/unslothai/unsloth.git

###Loading the Training model

In [2]:
from unsloth import FastLanguageModel
import torch
from google.colab import userdata

max_seq_length = 2048 # Choose any! We auto support RoPE Scaling internally!
dtype = None # None for auto detection. Float16 for Tesla T4, V100, Bfloat16 for Ampere+
load_in_4bit = True # Use 4bit quantization to reduce memory usage. Can be False.

🦥 Unsloth: Will patch your computer to enable 2x faster free finetuning.


    PyTorch 2.7.1+cu126 with CUDA 1208 (you have 2.6.0+cu124)
    Python  3.9.23 (you have 3.11.13)
  Please reinstall xformers (see https://github.com/facebookresearch/xformers#installing-xformers)
  Memory-efficient attention, SwiGLU, sparse and more won't be available.
  Set XFORMERS_MORE_DETAILS=1 for more details


🦥 Unsloth Zoo will now patch everything to make training faster!


In [4]:
model, tokenizer = FastLanguageModel.from_pretrained(
    model_name = "Qwen/Qwen2-7B",
    max_seq_length = max_seq_length,
    dtype = dtype,
    load_in_4bit = load_in_4bit,
    token = None, # use one if using gated models like meta-llama/Llama-2-7b-hf
)

==((====))==  Unsloth 2025.8.1: Fast Qwen2 patching. Transformers: 4.55.0.
   \\   /|    Tesla T4. Num GPUs = 1. Max memory: 14.741 GB. Platform: Linux.
O^O/ \_/ \    Torch: 2.6.0+cu124. CUDA: 7.5. CUDA Toolkit: 12.4. Triton: 3.2.0
\        /    Bfloat16 = FALSE. FA [Xformers = None. FA2 = False]
 "-____-"     Free license: http://github.com/unslothai/unsloth
Unsloth: Fast downloading is enabled - ignore downloading bars which are red colored!


model.safetensors:   0%|          | 0.00/5.55G [00:00<?, ?B/s]

generation_config.json:   0%|          | 0.00/167 [00:00<?, ?B/s]

tokenizer_config.json: 0.00B [00:00, ?B/s]

vocab.json: 0.00B [00:00, ?B/s]

merges.txt: 0.00B [00:00, ?B/s]

added_tokens.json:   0%|          | 0.00/107 [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/256 [00:00<?, ?B/s]

tokenizer.json: 0.00B [00:00, ?B/s]

In [13]:
alpaca_prompt = """Below is an instruction that describes a task, paired with an input that provides further context. Write a response that appropriately completes the request.

### Instruction:
{}

### Input:
{}

### Response:
{}"""

EOS_TOKEN = tokenizer.eos_token  # Must add EOS_TOKEN

def formatting_prompts_func(examples):
    instructions = examples["instruction"]
    inputs       = examples["input"]
    outputs      = examples["output"]
    texts = []
    for instruction, input_text, output in zip(instructions, inputs, outputs):
        text = alpaca_prompt.format(instruction, input_text, output) + EOS_TOKEN
        texts.append(text)
    return {"text": texts}

from datasets import load_dataset

dataset = load_dataset("json", data_files="/content/battery_manual1.json")["train"]
dataset = dataset.map(formatting_prompts_func, batched=True)
print(dataset['text'][0])   # display examlp[e]
dataset.select(range(220))  # range and thinigs


Map:   0%|          | 0/249 [00:00<?, ? examples/s]

Below is an instruction that describes a task, paired with an input that provides further context. Write a response that appropriately completes the request.

### Instruction:
What is the bq24780S charger IC used for?

### Input:


### Response:
The bq24780S is a highly integrated Li-ion battery charger IC designed for portable applications. It supports hybrid power boost mode, high-accuracy current and power monitoring, and programmable input/charge voltage and current limits via SMBus.<|endoftext|>


Dataset({
    features: ['instruction', 'input', 'output', 'text'],
    num_rows: 220
})

###Train the model and save it

In [14]:
# We now add LoRA adapters so we only need to update 1 to 10% of all parameters!
model = FastLanguageModel.get_peft_model(
    model,
    r = 16, # Choose any number > 0 ! Suggested 8, 16, 32, 64, 128
    target_modules = ["q_proj", "k_proj", "v_proj", "o_proj",
                      "gate_proj", "up_proj", "down_proj",],
    lora_alpha = 16,
    lora_dropout = 0, # Supports any, but = 0 is optimized
    bias = "none",    # Supports any, but = "none" is optimized
    # [NEW] "unsloth" uses 30% less VRAM, fits 2x larger batch sizes!
    use_gradient_checkpointing = "unsloth", # True or "unsloth" for very long context
    random_state = 3407,
    use_rslora = False,  # We support rank stabilized LoRA
    loftq_config = None, # And LoftQ
)

'''
q_proj, k_proj, v_proj → the Query, Key, Value projection matrices of the self‑attention mechanism.

o_proj → the output projection after self‑attention.

gate_proj, up_proj, down_proj → the three projection layers inside the feed‑forward (MLP) network.

Together, these are the main weight matrices that determine how the model processes tokens, and modifying
them is usually enough to adapt the model to a new task.
'''


Unsloth 2025.8.1 patched 28 layers with 28 QKV layers, 28 O layers and 28 MLP layers.


'\nq_proj, k_proj, v_proj → the Query, Key, Value projection matrices of the self‑attention mechanism.\n\no_proj → the output projection after self‑attention.\n\ngate_proj, up_proj, down_proj → the three projection layers inside the feed‑forward (MLP) network.\n\nTogether, these are the main weight matrices that determine how the model processes tokens, and modifying \nthem is usually enough to adapt the model to a new task.\n'

In [20]:
from trl import SFTTrainer
from transformers import TrainingArguments
from unsloth import is_bfloat16_supported

trainer = SFTTrainer(
    model = model,
    tokenizer = tokenizer,
    train_dataset = dataset,
    dataset_text_field = "text",
    max_seq_length = max_seq_length,
    dataset_num_proc = 2,
    args = TrainingArguments(
        per_device_train_batch_size = 2,
        gradient_accumulation_steps = 4,

        # Use num_train_epochs = 1, warmup_ratio for full training runs!
        warmup_steps = 5,
        max_steps = 60,

        learning_rate = 2e-4,
        fp16 = not is_bfloat16_supported(),
        bf16 = is_bfloat16_supported(),
        logging_steps = 1,
        optim = "adamw_8bit",
        weight_decay = 0.01,
        lr_scheduler_type = "linear",
        seed = 3407,
        output_dir = "outputs",
        report_to="none", # skipping wandb login
    ),
)

Map (num_proc=2):   0%|          | 0/249 [00:00<?, ? examples/s]

In [21]:
trainer_stats = trainer.train()

==((====))==  Unsloth - 2x faster free finetuning | Num GPUs used = 1
   \\   /|    Num examples = 249 | Num Epochs = 2 | Total steps = 60
O^O/ \_/ \    Batch size per device = 2 | Gradient accumulation steps = 4
\        /    Data Parallel GPUs = 1 | Total batch size (2 x 4 x 1) = 8
 "-____-"     Trainable parameters = 40,370,176 of 7,655,986,688 (0.53% trained)


Step,Training Loss
1,2.6835
2,2.765
3,2.7641
4,2.493
5,2.3104
6,2.4255
7,1.867
8,2.0361
9,1.8902
10,1.6912


Unsloth: Will smartly offload gradients to save VRAM!


### Saving the model locally, to the tokenizer too

In [22]:
model.save_pretrained("qwenbattery_model") # Local saving
tokenizer.save_pretrained("qwenbattery_model")

('qwenbattery_model/tokenizer_config.json',
 'qwenbattery_model/special_tokens_map.json',
 'qwenbattery_model/chat_template.jinja',
 'qwenbattery_model/vocab.json',
 'qwenbattery_model/merges.txt',
 'qwenbattery_model/added_tokens.json',
 'qwenbattery_model/tokenizer.json')

###Inference

In [29]:
import torch, gc
gc.collect()
torch.cuda.empty_cache()


In [1]:
from unsloth import FastLanguageModel
alpaca_prompt = """Below is an instruction that describes a task, paired with an input that provides further context. Write a response that appropriately completes the request.

### Instruction:
{}

### Input:
{}

### Response:
{}"""

input_question = input('Enter the question')
model, tokenizer = FastLanguageModel.from_pretrained(
        model_name = "qwenbattery_model", # Finetuned Model
        max_seq_length = 2048,
        dtype = None,
        load_in_4bit = True,
    )

FastLanguageModel.for_training(model)

# alpaca_prompt = You MUST copy from above!
FastLanguageModel.for_training(model) # Working on fixing 2x faster inference!
inputs = tokenizer(
[
    alpaca_prompt.format(
        input_question, # instruction
        "", # input
        "", # output - leave this blank for generation!
    )
], return_tensors = "pt").to("cuda")


from transformers import TextStreamer
text_streamer = TextStreamer(tokenizer)
_ = model.generate(**inputs, streamer = text_streamer, max_new_tokens = 64)

🦥 Unsloth: Will patch your computer to enable 2x faster free finetuning.


    PyTorch 2.7.1+cu126 with CUDA 1208 (you have 2.6.0+cu124)
    Python  3.9.23 (you have 3.11.13)
  Please reinstall xformers (see https://github.com/facebookresearch/xformers#installing-xformers)
  Memory-efficient attention, SwiGLU, sparse and more won't be available.
  Set XFORMERS_MORE_DETAILS=1 for more details


🦥 Unsloth Zoo will now patch everything to make training faster!
Enter the questionExplain Overload in Discharge Protection (OOLD) in bq40z50-R2.
==((====))==  Unsloth 2025.8.1: Fast Qwen2 patching. Transformers: 4.55.0.
   \\   /|    Tesla T4. Num GPUs = 1. Max memory: 14.741 GB. Platform: Linux.
O^O/ \_/ \    Torch: 2.6.0+cu124. CUDA: 7.5. CUDA Toolkit: 12.4. Triton: 3.2.0
\        /    Bfloat16 = FALSE. FA [Xformers = None. FA2 = False]
 "-____-"     Free license: http://github.com/unslothai/unsloth
Unsloth: Fast downloading is enabled - ignore downloading bars which are red colored!


Unsloth 2025.8.1 patched 28 layers with 28 QKV layers, 28 O layers and 28 MLP layers.


Below is an instruction that describes a task, paired with an input that provides further context. Write a response that appropriately completes the request.

### Instruction:
Explain Overload in Discharge Protection (OOLD) in bq40z50-R2.

### Input:


### Response:
OOLD is a safety feature that disables discharge if the battery current exceeds the programmed overload threshold for a set time. This prevents damage from excessive current draw.<|endoftext|>


In [4]:
# Pushing to HF
!pip install huggingface_hub
!huggingface-cli login



    _|    _|  _|    _|    _|_|_|    _|_|_|  _|_|_|  _|      _|    _|_|_|      _|_|_|_|    _|_|      _|_|_|  _|_|_|_|
    _|    _|  _|    _|  _|        _|          _|    _|_|    _|  _|            _|        _|    _|  _|        _|
    _|_|_|_|  _|    _|  _|  _|_|  _|  _|_|    _|    _|  _|  _|  _|  _|_|      _|_|_|    _|_|_|_|  _|        _|_|_|
    _|    _|  _|    _|  _|    _|  _|    _|    _|    _|    _|_|  _|    _|      _|        _|    _|  _|        _|
    _|    _|    _|_|      _|_|_|    _|_|_|  _|_|_|  _|      _|    _|_|_|      _|        _|    _|    _|_|_|  _|_|_|_|

    To log in, `huggingface_hub` requires a token generated from https://huggingface.co/settings/tokens .
Enter your token (input will not be visible): 
Add token as git credential? (Y/n) Y
Token is valid (permission: fineGrained).
The token `Qwen-BatteryProject` has been saved to /root/.cache/huggingface/stored_tokens
[1m[31mCannot authenticate through git-credential as no helper is defined on your machine.
You might hav

In [5]:
from huggingface_hub import HfApi, HfFolder, Repository

repo_id = "BruceLee1234/qwen2-7b-battery-lora"
api = HfApi()
api.create_repo(repo_id=repo_id, private=False)  # set private=True if you don't want it public

# Upload entire model folder
from huggingface_hub import upload_folder
upload_folder(
    folder_path="qwenbattery_model",  # your trained model folder
    repo_id=repo_id,
    repo_type="model"
)


Processing Files (0 / 0)                : |          |  0.00B /  0.00B            

New Data Upload                         : |          |  0.00B /  0.00B            

  ...nt/qwenbattery_model/tokenizer.json: 100%|##########| 11.4MB / 11.4MB            

  ...ery_model/adapter_model.safetensors:   0%|          | 49.4kB /  162MB            

CommitInfo(commit_url='https://huggingface.co/BruceLee1234/qwen2-7b-battery-lora/commit/5e898cb65854c1a675231841ab64d06eaf163ebd', commit_message='Upload folder using huggingface_hub', commit_description='', oid='5e898cb65854c1a675231841ab64d06eaf163ebd', pr_url=None, repo_url=RepoUrl('https://huggingface.co/BruceLee1234/qwen2-7b-battery-lora', endpoint='https://huggingface.co', repo_type='model', repo_id='BruceLee1234/qwen2-7b-battery-lora'), pr_revision=None, pr_num=None)