## **1. Save Model, Tokenizer and LoRA Checkpoint**

First, we want to create a script to save the model, tokenizer, and LoRA checkpoint (which you have uploaded) to a file that we can then transform to GGUF.

In [1]:

%pip install transformers torch torchvision torchaudio peft accelerate sentencepiece gguf
%pip install -U git+https://github.com/huggingface/peft.git

Note: you may need to restart the kernel to use updated packages.



[notice] A new release of pip is available: 24.0 -> 25.0.1
[notice] To update, run: python.exe -m pip install --upgrade pip


Collecting git+https://github.com/huggingface/peft.git
  Cloning https://github.com/huggingface/peft.git to c:\users\zealo\appdata\local\temp\pip-req-build-_1w348qm
  Resolved https://github.com/huggingface/peft.git to commit 986b77c213db1c98962c4e1949392287731c6387
  Installing build dependencies: started
  Installing build dependencies: finished with status 'done'
  Getting requirements to build wheel: started
  Getting requirements to build wheel: finished with status 'done'
  Preparing metadata (pyproject.toml): started
  Preparing metadata (pyproject.toml): finished with status 'done'
Note: you may need to restart the kernel to use updated packages.


  Running command git clone --filter=blob:none --quiet https://github.com/huggingface/peft.git 'C:\Users\zealo\AppData\Local\Temp\pip-req-build-_1w348qm'

[notice] A new release of pip is available: 24.0 -> 25.0.1
[notice] To update, run: python.exe -m pip install --upgrade pip


In [5]:
%%writefile save_model.py
from transformers import AutoModelForCausalLM, AutoTokenizer
from peft import PeftModel
import os
import torch
import argparse
 
def main():
 
    parser = argparse.ArgumentParser()
    parser.add_argument("--model", type=str)
    parser.add_argument("--lora", type=str)
    parser.add_argument("--out_dir", type=str, default="./model") # leave this
    args = parser.parse_args()
 
    print(f"Loading base model: {args.model}")
    base_model = AutoModelForCausalLM.from_pretrained(args.model, torch_dtype=torch.float16, device_map="auto")
 
    print(f"Loading PEFT: {args.lora}")
    model = PeftModel.from_pretrained(base_model, args.lora)
    print(f"Running merge_and_unload")
    model = model.merge_and_unload()
    tokenizer = AutoTokenizer.from_pretrained(args.model)
 
    model.save_pretrained(f"{args.out_dir}")
    tokenizer.save_pretrained(f"{args.out_dir}")
    print(f"Model saved to {args.out_dir}")
 
if __name__ == "__main__" :
    main()

Overwriting save_model.py


In [4]:
!python save_model.py --model "mistralai/Mistral-7B-Instruct-v0.3" --lora "mistral-journal-finetune/checkpoint-500"


Loading base model: mistralai/Mistral-7B-Instruct-v0.3
Loading PEFT: mistral-journal-finetune/checkpoint-500



Loading checkpoint shards:   0%|          | 0/3 [00:00<?, ?it/s]
Loading checkpoint shards:  33%|███▎      | 1/3 [00:03<00:06,  3.15s/it]
Loading checkpoint shards:  67%|██████▋   | 2/3 [00:05<00:02,  2.57s/it]
Loading checkpoint shards: 100%|██████████| 3/3 [00:07<00:00,  2.17s/it]
Loading checkpoint shards: 100%|██████████| 3/3 [00:07<00:00,  2.33s/it]
Some parameters are on the meta device because they were offloaded to the cpu.
Traceback (most recent call last):
  File "c:\Users\zealo\Documents\mistral_zrah_model\model_trainer\save_model.py", line 34, in <module>
    main()
  File "c:\Users\zealo\Documents\mistral_zrah_model\model_trainer\save_model.py", line 24, in main
    model = PeftModel.from_pretrained(base_model, args.lora)
            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "c:\Users\zealo\Documents\mistral_zrah_model\.venv\Lib\site-packages\peft\peft_model.py", line 541, in from_pretrained
    load_result = model.load_adapter(
                  ^^^^^^^^^^^

## **2. Convert to GGUF**

In [None]:
%curl -L -o convert.py https://github.com/ggerganov/llama.cpp/raw/master/convert.py
%curl -L -o requirements.txt https://github.com/ggerganov/llama.cpp/raw/master/requirements.txt

%pip install -r requirements.txt

# Convert the 7B model to ggml FP16 format
%python convert.py model

## Code below is optional - uncomment (remove leftmost '# ') to use

# # [Optional] for models using BPE tokenizers
# %python convert.py model --vocabtype bpe

# # [Optional] quantize the model to 4-bits (using q4_0 method)
# %./quantize ./model/ggml-model-f16.gguf ./model/ggml-model-q4_0.gguf q4_0

# # [Optional] update the gguf filetype to current if older version is unsupported by another application
%./quantize ./model/ggml-model-q4_0.gguf ./model/ggml-model-q4_0-v2.gguf COPY