# Fine-Tuning a Qwen 1.5 Model and Logging to a Model Registry

This notebook demonstrates the process of fine-tuning a small-scale Qwen model (`Qwen/Qwen1.5-0.5B-Chat`) on a public instruction-based dataset. We will use Parameter-Efficient Fine-Tuning (PEFT) with LoRA to make the process memory-efficient.

**Key Steps:**
1.  **Setup**: Install required libraries and import necessary modules.
2.  **Configuration**: Define all parameters for the model, dataset, and training.
3.  **Data Preparation**: Load and prepare the dataset for instruction fine-tuning.
4.  **Model Loading and Fine-Tuning**: Load the pre-trained model and tokenizer, and then fine-tune it using `trl`'s `SFTTrainer`.
5.  **Evaluation**: Compare the performance of the base model with the fine-tuned model.
6.  **Model Logging**: Log the fine-tuned model and its metrics to a model registry.

## 1. Setup

First, we'll install the necessary Python libraries and import all the required modules for the entire workflow.

In [10]:
!pip install -q -U transformers datasets accelerate peft trl bitsandbytes

## 2. Configuration

We'll define all our configurations in one place. This makes the notebook cleaner and easier to modify for future experiments.

In [11]:
import torch
from datasets import load_dataset
from peft import LoraConfig, get_peft_model, PeftModel
from transformers import (
    AutoModelForCausalLM,
    AutoTokenizer,
    TrainingArguments,
)
from trl import SFTTrainer

# 导入frogml库（假设这是JFrog集成的自定义库），可以用于：
# 模型/数据集的上传下载
# 与JFrog Artifactory的集成
# 模型版本管理和跟踪
import frogml # Assuming frogml is the library for your JFrog integration

In [12]:
from nbconvert import export

# Dataset configuration
dataset_name = "Szaid3680/Devops"

# LoRA configuration
lora_config = LoraConfig(
    r=16,
    lora_alpha=32,
    target_modules=["q_proj", "k_proj", "v_proj", "o_proj"],
    lora_dropout=0.05,
    bias="none",
    task_type="CAUSAL_LM",
)

# Training arguments
training_args = TrainingArguments(
    output_dir="./qwen-finetuned",  # 训练输出目录，用于保存检查点、日志和最终模型
    per_device_train_batch_size=1,  # 每个设备的训练批次大小，设为1可能是因为内存限制或使用较大模型
    gradient_accumulation_steps=8,  # 梯度累积步数，通过8次前向传播累积梯度再更新权重，等效批次大小=1×8=8
    learning_rate=2e-4,  # 学习率，控制模型权重更新的步长
    logging_steps=10,   # 每10步记录一次训练日志
    max_steps=100,  # 最大训练步数，只训练100步（可能是演示或快速测试）
    fp16=False,  # 禁用FP16混合精度训练，注释说明这是为了兼容CPU或MPS（Apple Silicon）设备
)

## 3. Data Preparation

We will load the `Szaid3680/Devops` dataset, split it into training and evaluation sets, and define a formatting function for instruction-based fine-tuning.

In [13]:
dataset = load_dataset(dataset_name, split="train")
dataset = dataset.train_test_split(test_size=0.1)
train_dataset = dataset["train"]
eval_dataset = dataset["test"]

# For a quick demo, we'll use a small subset of the data
train_dataset = train_dataset.select(range(2))
eval_dataset = eval_dataset.select(range(2))

# 定义一个数据格式化函数，将原始数据转换为模型训练所需的格式
# 转换示例如下：
# # 原始数据
# {
#     'Instruction': '解释Docker', 
#     'Prompt': '用简单的话说明', 
#     'Response': 'Docker是容器化技术...'
# }
# # 格式化后
# "<s>[INST] 解释Docker\n用简单的话说明 [/INST] Docker是容器化技术... </s>"
def format_instruction(example):
    """Formats the dataset examples into a structured prompt."""
    instruction = example.get('Instruction', '')
    inp = example.get('Prompt', '')
    response = example.get('Response', '')
    
    full_prompt = f"<s>[INST] {instruction}\n{inp} [/INST] {response} </s>"
    return full_prompt

# Let's look at a sample from the training set
print("Sample from the training dataset:")
print(train_dataset[0])

Sample from the training dataset:
{'Response': '\n\n\n\n\n\n\n\n            1\n        \n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\nYou should not use $request_uri to construct the rewritten URI, as it also contains ?limit=all - so your rewritten URI would look like: /clothing.html?limit=all?product_list_limit=all\nThe rewrite statement resets numeric captures, so you will need to use a named capture in the if statement.\nThe URI without query string is available as $uri or as a capture from the rewrite statement\'s regular expression.\nEither of these forms should work for you:\nif ($query_string ~* "^limit=(?<limit>.*)$") {\n    rewrite ^(.*)$ $1?product_list_limit=$limit? redirect;\n}\n\nNote the trailing ? to prevent the original query string from being appended. See this document for more.\nOr:\nif ($query_string ~* "^limit=(.*)$") {\n    return 302 $uri?product_list_limit=$1;\n}\n\nThe value of the ?limit=all0 argument is also available as ?limit=all1, so you could also use this:\n?limit=

## 4. Model Loading and Fine-Tuning

Now, we'll load the base model and tokenizer. Then, we will apply the LoRA configuration and start the fine-tuning process.

In [None]:
from huggingface_hub import snapshot_download
import os

# 通过代码设置环境变量（不推荐，因为会暴露敏感信息）
os.environ['HF_HUB_ETAG_TIMEOUT'] = '86400'
os.environ['HF_HUB_DOWNLOAD_TIMEOUT'] = '86400'
os.environ['HF_ENDPOINT'] = 'https://<JPD_URL>/artifactory/api/huggingfaceml/slash-project-slash-project-huggingface-remote'
os.environ['HF_TOKEN'] = '<access_token>'

# 检查环境变量是否设置成功
required_vars = ['HF_HUB_ETAG_TIMEOUT', 'HF_HUB_DOWNLOAD_TIMEOUT', 'HF_ENDPOINT', 'HF_TOKEN']
for var in required_vars:
    value = os.environ.get(var)
    if value:
        print(f"✅ {var}: {value}")
    else:
        print(f"❌ {var}: 未设置")

# Model and tokenizer configuration
model_id = "Qwen/Qwen1.5-0.5B-Chat"
new_model_adapter = "qwen-0.5b-devops-adapter"

# 1. 下载模型到本地缓存
local_dir = snapshot_download(
    repo_id = model_id, revision="main", etag_timeout=86400, 
    endpoint=os.environ['HF_ENDPOINT'],
    token=os.environ['HF_TOKEN'],
)

print(f"模型下载到: {local_dir}")

tokenizer = AutoTokenizer.from_pretrained(local_dir)
tokenizer.pad_token = tokenizer.eos_token


✅ HF_HUB_ETAG_TIMEOUT: 86400
✅ HF_HUB_DOWNLOAD_TIMEOUT: 86400
✅ HF_ENDPOINT: https://soleng.jfrog.io/artifactory/api/huggingfaceml/slash-project-slash-project-huggingface-remote
✅ HF_TOKEN: cmVmdGtuOjAxOjE3OTE2MjQ4ODA6T0drNGp0MHN6QUpsZFV5RjFsZEJMbVdIaG1W


Fetching 10 files:   0%|          | 0/10 [00:00<?, ?it/s]

模型下载到: /Users/jingyil/.cache/huggingface/hub/models--Qwen--Qwen1.5-0.5B-Chat/snapshots/7a2b85d322d12d10e07300b3d2742e4da7377821


In [16]:
model = AutoModelForCausalLM.from_pretrained(
    local_dir,
    device_map="cpu" # Use CPU for local demo
)
# Apply LoRA configuration to the model
model = get_peft_model(model, lora_config)

# Create the SFTTrainer
trainer = SFTTrainer(
    model=model,
    train_dataset=train_dataset,
    eval_dataset=eval_dataset,
    peft_config=lora_config,
    formatting_func=format_instruction,
    args=training_args,
)

print("--- Starting Fine-Tuning ---")
trainer.train()
print("--- Fine-Tuning Complete ---")

The tokenizer has new PAD/BOS/EOS tokens that differ from the model config and generation config. The model config and generation config were aligned accordingly, being updated with the tokenizer's values. Updated tokens: {'bos_token_id': None, 'pad_token_id': 151643}.


--- Starting Fine-Tuning ---




Step,Training Loss
10,3.1647
20,2.0528
30,1.1266
40,0.415
50,0.1048
60,0.0395
70,0.0187
80,0.0109
90,0.0091
100,0.0084


--- Fine-Tuning Complete ---


## 5. Evaluation

Let's evaluate the fine-tuned model and compare its response to the base model's response for a sample DevOps-related prompt.

In [17]:
metrics = trainer.evaluate()
print("--- Evaluation Metrics ---")
print(metrics)



--- Evaluation Metrics ---
{'eval_loss': 5.826877593994141, 'eval_runtime': 0.6542, 'eval_samples_per_second': 3.057, 'eval_steps_per_second': 1.528, 'eval_entropy': 1.8605223894119263, 'eval_num_tokens': 87800.0, 'eval_mean_token_accuracy': 0.2784184515476227, 'epoch': 100.0}


In [18]:
# Save the trained model adapter
trainer.model.save_pretrained(new_model_adapter)

In [19]:
# Merge the LoRA adapter with the base model for easy inference
base_model = AutoModelForCausalLM.from_pretrained(local_dir, device_map="cpu")
finetuned_model = PeftModel.from_pretrained(base_model, new_model_adapter)
finetuned_model = finetuned_model.merge_and_unload()

# Define a prompt for evaluation
prompt = "How do I expose a deployment in Kubernetes using a service?"
messages = [
    {"role": "system", "content": "You are a helpful DevOps assistant."},
    {"role": "user", "content": prompt}
]

text = tokenizer.apply_chat_template(
    messages, 
    tokenize=False, 
    add_generation_prompt=True
)

# Generate response from the fine-tuned model
print("------------------- FINE-TUNED MODEL RESPONSE -------------------")
model_inputs = tokenizer([text], return_tensors="pt").to("cpu")
generated_ids = finetuned_model.generate(model_inputs.input_ids, max_new_tokens=256)
response_finetuned = tokenizer.decode(generated_ids[0], skip_special_tokens=True)
print(response_finetuned)

# Generate response from the original base model for comparison
print("\n------------------- BASE MODEL RESPONSE -------------------")
original_model = AutoModelForCausalLM.from_pretrained(local_dir, device_map="cpu")
generated_ids_base = original_model.generate(model_inputs.input_ids, max_new_tokens=256)
response_base = tokenizer.decode(generated_ids_base[0], skip_special_tokens=True)
print(response_base)

The attention mask is not set and cannot be inferred from input because pad token is same as eos token. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.


------------------- FINE-TUNED MODEL RESPONSE -------------------
system
You are a helpful DevOps assistant.
user
How do I expose a deployment in Kubernetes using a service?
assistant
To expose a deployment in Kubernetes using a service, you can follow these steps:

1. Create a YAML file for the service:
```
apiVersion: apps/v1
kind: Service
metadata:
  name: my-deployment
spec:
  selector:
    matchLabels:
      app: my-deployment
  ports:
    - containerPort: 80
```

2. Create a JSON object to describe the service:
```
{
  "metadata": {
    "name": "my-deployment"
  },
  "spec": {
    "port": [
      { port: 80, protocol: "tcp", name: "default-port" }
    ],
    "type": "services"
  }
}
```

3. Update the Deployment metadata in Kubernetes to reference the Service:
```
kubectl apply -f my-service.yaml
```

4. Start or stop the Service as needed to ensure that it is running.
5. To check if the Service is available through the API, you can use the following command:
```
$ kubectl get se

## 6. Model Logging

Finally, we log our fine-tuned model, its tokenizer, and the evaluation metrics to the model registry.

In [20]:
# REPLACE WITH YOUR OWN FILESYSTEM BASE PATH WHERE THE PROJECTS RESIDE
base_projects_directory = "/Users/jingyil/work/jfrog/jpd-project/jpd-dev/mlops"

try:
    import frogml

    frogml.huggingface.log_model(   
        model= finetuned_model,
        tokenizer= tokenizer,
        repository="slash-project-ml-test-local",    # The JFrog repository to upload the model to.
        model_name="slash-finetuned_qwen",     # The uploaded model name
        version="1.3.0",     # Optional. The uploaded model version
        parameters={"finetuning-dataset": dataset_name},
        code_dir=f"{base_projects_directory}/llm_finetuning/code_dir",
        dependencies=[f"{base_projects_directory}/llm_finetuning/main/conda.yaml"],
        metrics = metrics,
        predict_file=f"{base_projects_directory}/llm_finetuning/code_dir/predict.py"
    )
    print("--- Model Logged Successfully ---")
except Exception as e:
    print(f"An error occurred during model logging: {e}")

INFO:HuggingfaceModelVersionManager:Logging model slash-finetuned_qwen to slash-project-ml-test-local


Failed to get requested logger name HuggingfaceModelVersionManager. Using default logger


INFO:JmlCustomerClient:Customer exists in JML.
INFO:JmlCustomerClient:Getting project key for repository slash-project-ml-test-local
INFO:frogml.sdk.model_version.utils.files_tools:Code directory, predict file and dependencies are provided. Setup template files for model_name slash-finetuned_qwen


/private/var/folders/6n/71s0t39j0tn2wz0lp1jn9pv80000gp/T/tmpkkqndwgx/slash-finetuned_qwen.pretrained_model/add…

/private/var/folders/6n/71s0t39j0tn2wz0lp1jn9pv80000gp/T/tmpkkqndwgx/slash-finetuned_qwen.pretrained_model/con…

/private/var/folders/6n/71s0t39j0tn2wz0lp1jn9pv80000gp/T/tmpkkqndwgx/slash-finetuned_qwen.pretrained_model/spe…

/private/var/folders/6n/71s0t39j0tn2wz0lp1jn9pv80000gp/T/tmpkkqndwgx/slash-finetuned_qwen.pretrained_model/tok…

/private/var/folders/6n/71s0t39j0tn2wz0lp1jn9pv80000gp/T/tmpkkqndwgx/slash-finetuned_qwen.pretrained_model/tok…

/private/var/folders/6n/71s0t39j0tn2wz0lp1jn9pv80000gp/T/tmpkkqndwgx/slash-finetuned_qwen.pretrained_model/mer…

/private/var/folders/6n/71s0t39j0tn2wz0lp1jn9pv80000gp/T/tmpkkqndwgx/slash-finetuned_qwen.pretrained_model/gen…

/private/var/folders/6n/71s0t39j0tn2wz0lp1jn9pv80000gp/T/tmpkkqndwgx/slash-finetuned_qwen.pretrained_model/cha…

/private/var/folders/6n/71s0t39j0tn2wz0lp1jn9pv80000gp/T/tmpkkqndwgx/slash-finetuned_qwen.pretrained_model/mod…

/private/var/folders/6n/71s0t39j0tn2wz0lp1jn9pv80000gp/T/tmpkkqndwgx/slash-finetuned_qwen.pretrained_model/voc…

/Users/jingyil/work/jfrog/jpd-project/jpd-dev/mlops/llm_finetuning/main/conda.yaml:   0%|          | 0.00/284 …

/var/folders/6n/71s0t39j0tn2wz0lp1jn9pv80000gp/T/tmpkkqndwgx/code.zip:   0%|          | 0.00/2.60k [00:00<?, ?…

2025-10-11 11:40:02,469 - INFO - frogml.storage.logging._log_config.frog_ml.__upload_model:540 - Model: "slash-finetuned_qwen", version: "1.3.0" has been uploaded successfully
--- Model Logged Successfully ---
