# ðŸš€ Customize `Qwen/Qwen3-1.7B` for tool calling using `GRPO` and `RLVR` on Amazon SageMaker AI
---
In this notebook, we explore **Qwen3-1.7B**, a 1.7-billion-parameter language model from Alibaba's Qwen series. You'll learn how to fine-tune it on sample data, evaluate its reasoning, math, and coding capabilities, and deploy it at scale with SageMaker.

**What is Qwen3-1.7B?**

Qwen3-1.7B is part of the latest generation of Qwen models, featuring seamless switching between thinking mode for complex logical reasoning and non-thinking mode for efficient general-purpose dialogue within a single model . The model was trained on approximately 36 trillion tokens with content sourced from diverse domains supporting 119 languages . Qwen3-1.7B uses a strong-to-weak distillation process from larger Qwen3 models, transferring advanced reasoning skills from frontier models down to this lightweight version . It is released under the **Apache-2.0 license** and is fully open-weight.  
ðŸ”— Model card: [Qwen/Qwen3-1.7B on Hugging Face](https://huggingface.co/Qwen/Qwen3-1.7B)

---

**Key Specifications**

| Feature | Details |
|---|---|
| **Parameters** | â‰ˆ 1.7 billion total parameters; â‰ˆ 1.4 billion non-embedding parameters |
| **Architecture** | Transformer with RoPE embeddings, SwiGLU activation, RMSNorm, QK-Norm, and Grouped-Query Attention |
| **Attention Heads / GQA** | Grouped-Query Attention: 16 heads for Q, 8 heads for K/V |
| **Layers** | 28 layers |
| **Context Length** | Up to **32,768 tokens** |
| **Vocabulary** | 151,669 tokens (byte-level BPE) |
| **Modalities** | Text-in / Text-out only (no vision) |
| **License** | Apache-2.0 |

---

**Benchmarks & Behavior**

- Qwen3-1.7B outperforms larger Qwen2.5-3B models on over half of the benchmarks, especially on STEM-related and coding benchmarks .
- The model demonstrates significant enhancement in reasoning capabilities, surpassing previous Qwen2.5 instruct models on mathematics, code generation, and commonsense logical reasoning .
- Qwen3-1.7B operates in two distinct modes: thinking mode for step-by-step reasoning with intermediate computations, and non-thinking mode for rapid direct responses .
- The model shows strong performance in human preference alignment for creative writing, instruction following, and multi-turn dialog .

---

In [None]:
%pip install -Uq "datasets==4.3.0" \
    "sagemaker==2.253.1"

## 00. Setup

We start off by setting up session information such as `sagemaker.Session(...)`, region, sagemaker execution role

In [None]:
import boto3
import sagemaker

In [None]:
region = boto3.Session().region_name

sess = sagemaker.Session(boto3.Session(region_name=region))

sagemaker_session_bucket = None
if sagemaker_session_bucket is None and sess is not None:
    # set to default bucket if a bucket name is not given
    sagemaker_session_bucket = sess.default_bucket()

role = sagemaker.get_execution_role()

In [None]:
print(f"sagemaker role arn: {role}")
print(f"sagemaker bucket: {sess.default_bucket()}")
print(f"sagemaker session region: {sess.boto_region_name}")

## 01. Data Prep

To fine-tune a target model on domain/task tool call - we need a sample domain dataset that has the following structure,

```json
{
  "prompt": [
    {
      "content": "You are a ...",
      "role": "system"
    },
    {
      "content": "Single income $78k, $720 ...",
      "role": "user"
    }
  ],
  "answer": "Max Price: $384,111, ...",
}
```

The most important components is the,
1. The system prompt that sets up the global model behavior 
2. User input prompt/question
3. The final model response - which is used by the model and trainer to tune the model to select the appropriate tool to achieve the outcome

In [None]:
import os
import json
import random
from datasets import Dataset
from sagemaker_code.tools_funcs.financial_tools_complex import run_tool

In [None]:
system_prompt = """
You are a financial planning assistant with tools for portfolio allocation, mortgage affordability, tax optimization, retirement readiness, debt payoff strategies, insurance needs, education funding, and currency exchange. 
Analyze user requests and call the appropriate tool with all required parameters extracted from their query. 
Return concise answers with key metrics. Do not ask for clarification - use reasonable defaults if needed.
"""

In [None]:
# Load raw data
with open("sample_dataset/raw_financial_training_data.jsonl", "r") as f:
    raw_data = [json.loads(line) for line in f]

random.shuffle(raw_data)
print(f"Loaded {len(raw_data)} samples")

In [None]:
split_idx = int(len(raw_data) * 0.91)
train_data = raw_data[:split_idx]
test_data = raw_data[split_idx:]
print(f"Train: {len(train_data)}, Test: {len(test_data)}")

In [None]:
# Process training data
train_samples = []
for item in train_data:
    # Execute tool to get answer
    answer = run_tool(item["tool_call"])
    if not answer.startswith("Error"):
        train_samples.append({
            "prompt": [
                {"role": "system", "content": system_prompt},
                {"role": "user", "content": item["prompt"]},
            ],
            "answer": answer
        })

print(f"Processed {len(train_samples)} training samples")

In [None]:
# Process validation data (includes ground_truth for validation)
test_samples = []
for item in test_data:
    # Execute tool to get answer
    answer = run_tool(item["tool_call"])
    if not answer.startswith("Error"):
        test_samples.append({
            "prompt": [
                {"role": "system", "content": system_prompt},
                {"role": "user", "content": item["prompt"]},
            ],
            "answer": answer,
            "ground_truth": item["tool_call"]
        })

print(f"Processed {len(test_samples)} test samples")

In [None]:
# Save using datasets library
train_dataset = Dataset.from_list(train_samples)
test_dataset = Dataset.from_list(test_samples)

In [None]:
training_dataset_path = "./sample_dataset/grpo_financial_train.jsonl"
os.makedirs(os.path.dirname(training_dataset_path), exist_ok=True)
train_dataset.to_json(training_dataset_path, lines=True)

In [None]:
test_dataset_path = "./sample_dataset/grpo_financial_test.jsonl"
test_dataset.to_json(test_dataset_path, lines=True)

### Upload Training dataset to S3

In [None]:
from datetime import datetime
from sagemaker.s3 import S3Uploader

In [None]:
data_s3_uri = f"s3://{sess.default_bucket()}/tool-calling/grpo/qwen3/{datetime.now().strftime('%Y%m%d%H%M%S')}"

uploaded_s3_uri = S3Uploader.upload(
    local_path=training_dataset_path,
    desired_s3_uri=data_s3_uri
)
print(f"Uploaded {training_dataset_path} to > {uploaded_s3_uri}")

## Fine-Tune Language Model using SageMaker `ModelTrainer`

In [None]:
import time
from sagemaker.modules.configs import (
    CheckpointConfig,
    Compute,
    OutputDataConfig,
    SourceCode,
    StoppingCondition,
)
from sagemaker.modules.configs import InputData
from sagemaker.modules.train import ModelTrainer
from getpass import getpass
import yaml
from jinja2 import Template

In [None]:
MODEL_ID = "Qwen/Qwen3-1.7B"

In [None]:
MLFLOW_TRACKING_SERVER_ARN = "arn:aws:sagemaker:<region>:XXXXX:mlflow-tracking-server/demo-name" # or None

if MLFLOW_TRACKING_SERVER_ARN:
    reports_to = "mlflow"
else:
    reports_to = "tensorboard"

In [None]:
job_name = MODEL_ID.replace('/', '--').replace('.', '-')

In [None]:
if MLFLOW_TRACKING_SERVER_ARN:
    training_env = {
        # mlflow tracking metrics
        "MLFLOW_EXPERIMENT_NAME": job_name,
        "MLFLOW_TAGS": json.dumps(
            {
                "source.job": "sm-training-jobs", 
                "source.type": "trl-grpo-rlvr", 
                "source.framework": "pytorch"
            }
        ),
        "MLFLOW_TRACKING_URI": MLFLOW_TRACKING_SERVER_ARN,
        "MLFLOW_ENABLE_SYSTEM_METRICS_LOGGING": "true",
        # "HF_TOKEN": hf_token,
        "FI_EFA_USE_DEVICE_RDMA": "1",
        "NCCL_DEBUG": "INFO",
        "NCCL_SOCKET_IFNAME": "eth0",
        "FI_PROVIDER": "efa",
        "NCCL_PROTO": "simple",
        "NCCL_NET_GDR_LEVEL": "5"
    }
else:
    training_env = {
        # "HF_TOKEN": hf_token,
        "FI_EFA_USE_DEVICE_RDMA": "1",
        "NCCL_DEBUG": "INFO",
        "NCCL_SOCKET_IFNAME": "eth0",
        "FI_PROVIDER": "efa",
        "NCCL_PROTO": "simple",
        "NCCL_NET_GDR_LEVEL": "5"
    }

In [None]:
%%writefile sagemaker_code/requirements.txt
git+https://github.com/huggingface/transformers.git
git+https://github.com/huggingface/trl.git
peft
accelerate==1.11.0
bitsandbytes==0.46.1
datasets==4.0.0
deepspeed==0.16.4
hf-transfer==0.1.8
hf_xet
liger-kernel==0.6.1
lm-eval[api]==0.4.9
kernels>=0.9.0
mlflow
Pillow
safetensors>=0.6.2
sagemaker==2.251.1
sagemaker-mlflow==0.1.0
sentencepiece==0.2.0
tokenizers>=0.21.4
triton
tensorboard
psutil
py7zr
git+https://github.com/triton-lang/triton.git@main#subdirectory=python/triton_kernels
vllm==0.11.0
poetry
yq
psutil
nvidia-ml-py
pyrsmi

In [None]:
# For PeFT
args = [
    "--config",
    "hf_recipes/Qwen/Qwen3-1.7B--grpo.yaml",
    "--tools_script",
    "tools_funcs/financial_tools_complex.py",
    "--reward_fn",
    "rewards/financial_tools_reward.py",
]
training_instance_type = "ml.g6e.8xlarge"
training_instance_count = 1

In [None]:
pytorch_image_uri = sagemaker.image_uris.retrieve(
    framework="pytorch",
    region=sess.boto_session.region_name,
    version="2.8.0",
    instance_type=training_instance_type,
    image_scope="training",
)
print(f"Using image: {pytorch_image_uri}")

In [None]:
source_code = SourceCode(
    source_dir="./sagemaker_code",
    command=f"bash sm_accelerate_grpo_train.sh {' '.join(args)}",
)

compute_configs = Compute(
    instance_type=training_instance_type,
    instance_count=training_instance_count,
    keep_alive_period_in_seconds=1800,
    volume_size_in_gb=450
)

base_job_name = f"{job_name}-finetune"
output_path = f"s3://{sess.default_bucket()}/{base_job_name}"

model_trainer = ModelTrainer(
    training_image=pytorch_image_uri,
    source_code=source_code,
    base_job_name=base_job_name,
    compute=compute_configs,
    stopping_condition=StoppingCondition(max_runtime_in_seconds=18000),
    output_data_config=OutputDataConfig(
        s3_output_path=output_path,
    ),
    checkpoint_config=CheckpointConfig(
        s3_uri=os.path.join(
            output_path,
            "financial-api-for-tool-calling", 
            job_name,
            "checkpoints"
        ), 
        local_path="/opt/ml/checkpoints"
    ),
    role=role,
    environment=training_env
)

In [None]:
model_trainer.train(
    input_data_config=[
        InputData(
            channel_name="training",
            data_source=uploaded_s3_uri,  
        )
    ], 
    wait=False
)