# 🚀 Customize and Deploy `Qwen/Qwen2.5-3B-Instruct` on Amazon SageMaker AI
---
In this notebook, we explore **Qwen2.5-3B-Instruct**, a versatile instruction-tuned model from Alibaba’s Qwen2.5 family. You’ll learn how to fine-tune it on your dataset, evaluate its performance, and deploy it using SageMaker for scalable inference.

**What is Qwen2.5-3B-Instruct?**

**Qwen2.5-3B-Instruct** is a 3-billion-parameter instruction-following model optimized for dialogue, reasoning, and task completion. Built on the robust Qwen2.5 architecture, this model is trained to deliver coherent, context-aware responses across a wide range of use cases, from general-purpose Q&A to advanced reasoning tasks.  
🔗 Model card: [Qwen/Qwen2.5-3B-Instruct on Hugging Face](https://huggingface.co/Qwen/Qwen2.5-3B-Instruct)

---

**Key Specifications**

| Feature | Details |
|---|---|
| **Parameters** | ~3 billion |
| **Architecture** | Transformer with RoPE embeddings, SwiGLU activation, and grouped-query attention |
| **Context Length** | Up to **32,768 tokens** |
| **Training Data** | High-quality multilingual text and instruction datasets |
| **Modalities** | Text-in / Text-out |
| **License** | Apache 2.0 |
| **Optimization** | Instruction-tuned for high-quality dialogue and reasoning |

---

**Benchmarks & Behavior**

- Strong **instruction-following accuracy**, providing helpful and safe responses.  
- Excellent **long-context handling** (up to 32K tokens), useful for summarization and document analysis.  
- Solid **reasoning performance** on math, logic, and step-by-step problem solving.  
- Balanced **efficiency and capability**, making it a reliable choice for cost-conscious deployments.  

---

**Using This Notebook**

Here’s what you’ll cover:

* Load and preprocess datasets from Hugging Face for fine-tuning  
* Fine-tune with SageMaker Training Jobs using optimized distributed strategies  
* Evaluate model performance on reasoning and instruction tasks  
* Deploy to SageMaker Endpoints for scalable inference with low latency  


In [1]:
%pip install -Uq sagemaker datasets

In [2]:
import boto3
import sagemaker
from PIL import Image
import torch

In [3]:
region = boto3.Session().region_name

from sagemaker.local import LocalSession 
sess = LocalSession() #sagemaker.Session(boto3.Session(region_name=region))
sess.config = {"local": {"local_code": True}}

sagemaker_session_bucket = None
if sagemaker_session_bucket is None and sess is not None:
    # set to default bucket if a bucket name is not given
    sagemaker_session_bucket = sess.default_bucket()

role = sagemaker.get_execution_role()

In [4]:
print(f"sagemaker role arn: {role}")
print(f"sagemaker bucket: {sess.default_bucket()}")
print(f"sagemaker session region: {sess.boto_region_name}")

## Data Preparation for Supervised Fine-tuning

### [Finance-Instruct-500k](https://huggingface.co/datasets/Josephgflowers/Finance-Instruct-500k)

**Finance-Instruct-500k** is a large-scale dataset with about **518,000 entries** focused on the financial domain. It spans topics such as investments, banking, markets, accounting, and corporate finance, offering a wide variety of instruction–response examples.

**Data Format & Structure**:
- Distributed in **JSON** format, with simple conversion to Parquet.  
- Contains a single `train` split with ~518k records.  
- Each record includes:  
  - `system` – context or metadata for the task  
  - `user` – the financial prompt or query  
  - `assistant` – the corresponding response  

**License**: Released under the **Apache-2.0** license.  

**Applications**:

The dataset can support finance-focused tasks such as:  
- Financial question answering  
- Market and investment analysis  
- Topic and sentiment classification  
- Financial entity extraction and document understanding  

In [None]:
import os
import json
import pprint
from tqdm import tqdm
from datasets import load_dataset

In [None]:
dataset_parent_path = os.path.join(os.getcwd(), "tmp_cache_local_dataset")
os.makedirs(dataset_parent_path, exist_ok=True)

**Preparing Your Dataset in `messages` format**

This section walks you through creating a conversation-style dataset—the required `messages` format—for directly training LLMs using SageMaker AI.

**What Is the `messages` Format?**

The `messages` format structures instances as chat-like exchanges, wrapping each conversation turn into a role-labeled JSON array. It’s widely used by frameworks like TRL.

Example entry:

```json
{
  "messages": [
    { "role": "system", "content": "You are a helpful assistant." },
    { "role": "user", "content": "How do I bake sourdough?" },
    { "role": "assistant", "content": "First, you need to create a starter by..." }
  ]
}


In [None]:
dataset_name = "Josephgflowers/Finance-Instruct-500k"
dataset = load_dataset(dataset_name, split="train[:1000]")

In [None]:
pprint.pp(dataset[0])

In [None]:
print(f"total number of fine-tunable samples: {len(dataset)}")

In [None]:
def convert_to_messages(row):
    system_content = "You are a financial reasoning assistant. Read the user’s query, restate the key data, and solve step by step. Show calculations clearly, explain any rounding or adjustments, and present the final answer in a concise and professional manner."
    user_content = row["user"]
    assistant_content = row["assistant"]

    return {
        "messages": [
            { "role": "system", "content": system_content},
            { "role": "user", "content": user_content },
            { "role": "assistant", "content": assistant_content }
        ]
    }
    
    
dataset = dataset.map(convert_to_messages, remove_columns=dataset.column_names)

In [None]:
dataset_filename = os.path.join(dataset_parent_path, f"{dataset_name.replace('/', '--').replace('.', '-')}.jsonl")
dataset.to_json(dataset_filename, lines=True)

#### Upload file to S3

In [None]:
from sagemaker.s3 import S3Uploader

In [None]:
data_s3_uri = f"s3://{sess.default_bucket()}/dataset"

uploaded_s3_uri = S3Uploader.upload(
    local_path=dataset_filename,
    desired_s3_uri=data_s3_uri
)
print(f"Uploaded {dataset_filename} to > {uploaded_s3_uri}")

## Fine-Tune LLMs using SageMaker `Estimator`/`ModelTrainer`

In [None]:
import time
from sagemaker.pytorch import PyTorch
from getpass import getpass
import yaml
from jinja2 import Template

In [None]:
hf_token = getpass()

### Training using `PyTorch` Estimator

**Training Using `PyTorch` Estimator**
Leverages the official PyTorch SageMaker container to run a custom training script using the Accelerate and DeepSpeed libraries. This option is ideal for users who want full control over the training pipeline 

---
**Observability**: SageMaker AI has [SageMaker MLflow](https://docs.aws.amazon.com/sagemaker/latest/dg/mlflow.html) which enables you to accelerate generative AI by making it easier to track experiments and monitor performance of models and AI applications using a single tool.

You can choose to include MLflow as a part of your training workflow to track your model fine-tuning metrics in realtime by simply specifying a **mlflow** tracking arn.

Optionally you can also report to : **tensorboard**, **wandb**.

In [None]:
MLFLOW_TRACKING_SERVER_ARN = None # or "arn:aws:sagemaker:us-west-2:<account-id>:mlflow-tracking-server/<server-name>"

if MLFLOW_TRACKING_SERVER_ARN:
    reports_to = "mlflow"
else:
    reports_to = "tensorboard"

In [None]:
job_name = 'Qwen--Qwen2.5-3B-Instruct'
training_instance_type = "local_gpu"

In [None]:
if MLFLOW_TRACKING_SERVER_ARN:
    training_env = {
        "MLFLOW_EXPERIMENT_NAME": f"exp-{job_name}",
        "MLFLOW_TAGS": '{"source.job": "sm-training-jobs", "source.type": "sft", "source.framework": "pytorch"}',
        "HF_TOKEN": hf_token,
        "MLFLOW_TRACKING_URI": MLFLOW_TRACKING_SERVER_ARN,
    }
else:
    training_env = {
        "HF_TOKEN": hf_token
    }

In [None]:
pytorch_image_uri = f"763104351884.dkr.ecr.{region}.amazonaws.com/pytorch-training:2.8.0-gpu-py312-cu129-ubuntu22.04-sagemaker"
print(f"Using image: {pytorch_image_uri}")

#### Training strategy: `PeFT/LoRA`

In [None]:
pytorch_estimator = PyTorch(
    image_uri=pytorch_image_uri,
    entry_point="sm_accelerate_train.sh", # Adapted bash script to train using accelerate on SageMaker - Multi-GPU
    source_dir="sagemaker_code",
    instance_type=training_instance_type,
    instance_count=1,
    base_job_name=f"{job_name}-pytorch",
    role=role,
    volume_size=300,
    py_version="py312",
    keep_alive_period_in_seconds=3600,
    environment=training_env,
    sagemaker_session=sess,
    hyperparameters={
        "config": "hf_recipes/Qwen/Qwen2.5-3B-Instruct--vanilla-peft-qlora.yaml"
    }
)

# fit or train
pytorch_estimator.fit(
    {"training": uploaded_s3_uri}, 
    wait=False
)

In [None]:
s3_model_data_uri = pytorch_estimator.model_data
print(f"Fine-tuned model location: {s3_model_data_uri}")

#### Training strategy: `Spectrum`

In [None]:
pytorch_estimator = PyTorch(
    image_uri=pytorch_image_uri,
    entry_point="sm_accelerate_train.sh", # Adapted bash script to train using accelerate on SageMaker - Multi-GPU
    source_dir="sagemaker_code",
    instance_type=training_instance_type,
    instance_count=1,
    base_job_name=f"{job_name}-pytorch",
    role=role,
    volume_size=300,
    py_version="py312",
    keep_alive_period_in_seconds=3600,
    environment=training_env,
    sagemaker_session=sess,
    hyperparameters={
        "config": "hf_recipes/Qwen/Qwen2.5-3B-Instruct--vanilla-spectrum.yaml"
    }
)

# fit or train
pytorch_estimator.fit(
    {"training": uploaded_s3_uri}, 
    wait=False
)

In [None]:
s3_model_data_uri = pytorch_estimator.model_data
print(f"Fine-tuned model location: {s3_model_data_uri}")

#### Training strategy: `Full-Finetuning`

In [None]:
pytorch_estimator = PyTorch(
    image_uri=pytorch_image_uri,
    entry_point="sm_accelerate_train.sh", # Adapted bash script to train using accelerate on SageMaker - Multi-GPU
    source_dir="sagemaker_code",
    instance_type=training_instance_type,
    instance_count=1,
    base_job_name=f"{job_name}-pytorch",
    role=role,
    volume_size=300,
    py_version="py312",
    keep_alive_period_in_seconds=3600,
    environment=training_env,
    sagemaker_session=sess,
    hyperparameters={
        "config": "hf_recipes/Qwen/Qwen2.5-3B-Instruct--vanilla-full.yaml"
    }
)

# fit or train
pytorch_estimator.fit(
    {"training": uploaded_s3_uri}, 
    wait=False
)

In [None]:
s3_model_data_uri = pytorch_estimator.model_data
print(f"Fine-tuned model location: {s3_model_data_uri}")