# ðŸš€ Customize and Deploy `google/gemma-3-27b-it` on Amazon SageMaker AI
---
In this notebook, we explore **Gemma-3-27B-IT**, Google's latest and most advanced instruction-tuned model in the Gemma family. You'll learn how to fine-tune this powerful model, evaluate its exceptional capabilities, and deploy it using SageMaker for production workloads.

**What is Gemma-3-27B-IT?**

Google's **Gemma-3-27B-IT** represents the pinnacle of the Gemma model series, featuring 27 billion parameters and advanced instruction-tuning. Built on cutting-edge research from the Gemini team, this model delivers state-of-the-art performance across reasoning, coding, mathematics, and complex instruction-following tasks.  
ðŸ”— Model card: [google/gemma-3-27b-it on Hugging Face](https://huggingface.co/google/gemma-3-27b-it)

---

**Key Specifications**

| Feature | Details |
|---|---|
| **Parameters** | ~27 billion |
| **Architecture** | Advanced Transformer with optimized attention and MLP layers |
| **Context Length** | Extended context window for complex reasoning |
| **Training Data** | High-quality curated datasets with advanced filtering |
| **Modalities** | Text-in / Text-out |
| **License** | Gemma Terms of Use |
| **Instruction Tuning** | Advanced RLHF and supervised fine-tuning |

---

**Benchmarks & Behavior**

- Gemma-3-27B-IT achieves **exceptional performance** on reasoning and instruction-following benchmarks.  
- Outstanding **mathematical reasoning** and competitive programming capabilities.  
- Advanced **code generation and debugging** across multiple programming languages.  
- Excellent **multilingual capabilities** with strong performance across languages.  

---


In [None]:
%pip install -Uq "datasets==4.3.0" \
    "sagemaker==2.253.1"

In [None]:
import boto3
import sagemaker

In [None]:
region = boto3.Session().region_name

sess = sagemaker.Session(boto3.Session(region_name=region))

sagemaker_session_bucket = None
if sagemaker_session_bucket is None and sess is not None:
    # set to default bucket if a bucket name is not given
    sagemaker_session_bucket = sess.default_bucket()

role = sagemaker.get_execution_role()

In [None]:
print(f"sagemaker role arn: {role}")
print(f"sagemaker bucket: {sess.default_bucket()}")
print(f"sagemaker session region: {sess.boto_region_name}")

## Data Preparation for Supervised Fine-tuning

### [Finance-Instruct-500k](https://huggingface.co/datasets/Josephgflowers/Finance-Instruct-500k)

**Finance-Instruct-500k** is a large-scale dataset with about **518,000 entries** focused on the financial domain. It spans topics such as investments, banking, markets, accounting, and corporate finance, offering a wide variety of instructionâ€“response examples.

**Data Format & Structure**:
- Distributed in **JSON** format, with simple conversion to Parquet.  
- Contains a single `train` split with ~518k records.  
- Each record includes:  
  - `system` â€“ context or metadata for the task  
  - `user` â€“ the financial prompt or query  
  - `assistant` â€“ the corresponding response  

**License**: Released under the **Apache-2.0** license.  

**Applications**:

The dataset can support finance-focused tasks such as:  
- Financial question answering  
- Market and investment analysis  
- Topic and sentiment classification  
- Financial entity extraction and document understanding  

In [None]:
import os
import json
import pprint
from tqdm import tqdm
from datasets import load_dataset

In [None]:
dataset_parent_path = os.path.join(os.getcwd(), "tmp_cache_local_dataset")
os.makedirs(dataset_parent_path, exist_ok=True)

**Preparing Your Dataset in `messages` format**

This section walks you through creating a conversation-style datasetâ€”the required `messages` formatâ€”for directly training LLMs using SageMaker AI.

**What Is the `messages` Format?**

The `messages` format structures instances as chat-like exchanges, wrapping each conversation turn into a role-labeled JSON array. Itâ€™s widely used by frameworks like TRL.

Example entry:

```json
{
  "messages": [
    { "role": "system", "content": "You are a helpful assistant." },
    { "role": "user", "content": "How do I bake sourdough?" },
    { "role": "assistant", "content": "First, you need to create a starter by..." }
  ]
}


In [None]:
dataset_name = "Josephgflowers/Finance-Instruct-500k"
dataset = load_dataset(dataset_name, split="train[:1000]")

In [None]:
pprint.pp(dataset[0])

In [None]:
print(f"total number of fine-tunable samples: {len(dataset)}")

In [None]:
def convert_to_messages(row):
    system_content = "You are a financial reasoning assistant. Read the userâ€™s query, restate the key data, and solve step by step. Show calculations clearly, explain any rounding or adjustments, and present the final answer in a concise and professional manner."
    user_content = row["user"]
    assistant_content = row["assistant"]

    return {
        "messages": [
            { "role": "system", "content": system_content},
            { "role": "user", "content": user_content },
            { "role": "assistant", "content": assistant_content }
        ]
    }
    
    
dataset = dataset.map(convert_to_messages, remove_columns=dataset.column_names)

In [None]:
dataset_filename = os.path.join(dataset_parent_path, f"{dataset_name.replace('/', '--').replace('.', '-')}.jsonl")
dataset.to_json(dataset_filename, lines=True)

#### Upload file to S3

In [None]:
from sagemaker.s3 import S3Uploader

In [None]:
data_s3_uri = f"s3://{sess.default_bucket()}/dataset"

uploaded_s3_uri = S3Uploader.upload(
    local_path=dataset_filename,
    desired_s3_uri=data_s3_uri
)
print(f"Uploaded {dataset_filename} to > {uploaded_s3_uri}")

## Fine-Tune LLMs using SageMaker AI

In [None]:
import time
from sagemaker.modules.configs import (
    CheckpointConfig,
    Compute,
    OutputDataConfig,
    SourceCode,
    StoppingCondition,
)
from sagemaker.modules.configs import InputData
from sagemaker.modules.train import ModelTrainer
from getpass import getpass
import yaml
from jinja2 import Template

In [None]:
MODEL_ID = "google/gemma-3-27b-it"

In [None]:
hf_token = getpass()

### Training using `PyTorch` Estimator

**Training Using `PyTorch` Estimator**
Leverages the official PyTorch SageMaker container to run a custom training script using the Accelerate and DeepSpeed libraries. This option is ideal for users who want full control over the training pipeline 

---
**Observability**: SageMaker AI has [SageMaker MLflow](https://docs.aws.amazon.com/sagemaker/latest/dg/mlflow.html) which enables you to accelerate generative AI by making it easier to track experiments and monitor performance of models and AI applications using a single tool.

You can choose to include MLflow as a part of your training workflow to track your model fine-tuning metrics in realtime by simply specifying a **mlflow** tracking arn.

Optionally you can also report to : **tensorboard**, **wandb**.

In [None]:
MLFLOW_TRACKING_SERVER_ARN = "arn:aws:sagemaker:us-east-1:811828458885:mlflow-tracking-server/mlflow-demos"

if MLFLOW_TRACKING_SERVER_ARN:
    reports_to = "mlflow"
else:
    reports_to = "tensorboard"

In [None]:
job_name = MODEL_ID.replace('/', '--').replace('.', '-')

In [None]:
if MLFLOW_TRACKING_SERVER_ARN:
    training_env = {
        # mlflow tracking metrics
        "MLFLOW_EXPERIMENT_NAME": f"{job_name}-exp",
        "MLFLOW_TAGS": json.dumps(
            {
                "source.job": "sm-training-jobs", 
                "source.type": "sft", 
                "source.framework": "pytorch"
            }
        ),
        "MLFLOW_TRACKING_URI": MLFLOW_TRACKING_SERVER_ARN,
        "MLFLOW_ENABLE_SYSTEM_METRICS_LOGGING": "true",
        # non tracking metrics - enabled
        "HF_TOKEN": hf_token,
        "FI_EFA_USE_DEVICE_RDMA": "1",
        "NCCL_DEBUG": "INFO",
        "NCCL_SOCKET_IFNAME": "eth0",
        "FI_PROVIDER": "efa",
        "NCCL_PROTO": "simple",
        "NCCL_NET_GDR_LEVEL": "5"
    }
else:
    training_env = {
        # non tracking metrics
        "HF_TOKEN": hf_token,
        "FI_EFA_USE_DEVICE_RDMA": "1",
        "NCCL_DEBUG": "INFO",
        "NCCL_SOCKET_IFNAME": "eth0",
        "FI_PROVIDER": "efa",
        "NCCL_PROTO": "simple",
        "NCCL_NET_GDR_LEVEL": "5"
    }

#### Training strategy - Choose between: `PeFT`/`Spectrum`/`Full-Finetuning`

Here we create a measured mapping of strategy to instance.

In [None]:
%%writefile sagemaker_code/requirements.txt
transformers==4.55.0
peft==0.17.0
accelerate==1.10.0
bitsandbytes==0.46.1
datasets==4.0.0
deepspeed==0.16.4
evaluate==0.4.5
hf-transfer==0.1.8
hf_xet
liger-kernel==0.6.1
lm-eval[api]==0.4.9
kernels>=0.9.0
mlflow
safetensors>=0.6.2
sagemaker==2.251.1
sagemaker-mlflow==0.1.0
sentencepiece==0.2.0
scikit-learn==1.7.1
tokenizers>=0.21.4
triton
trl==0.21.0
py7zr
nvidia-ml-py
wandb
git+https://github.com/triton-lang/triton.git@main#subdirectory=python/triton_kernels
vllm==0.10.1
poetry
yq
psutil
nvidia-ml-py
pyrsmi

In [None]:
# For PeFT
args = [
    "--config",
    "hf_recipes/google/gemma-3-27b-it--vanilla-peft-qlora.yaml",
    # "--run-eval" # enable this for small models to run eval + tune
]
training_instance_type = "ml.g6e.2xlarge"
training_instance_count = 1

## For Full-Finetuning
# args = [
#     "--config",
#     "hf_recipes/google/gemma-3-26b-it--vanilla-full.yaml",
#     # "--run-eval" # enable this for small models if you're looking to bundle eval with fine-tuning
# ]
# training_instance_type = "ml.g6e.12xlarge"
# training_instance_count = 1


In [None]:
pytorch_image_uri = sagemaker.image_uris.retrieve(
    framework="pytorch",
    region=sess.boto_session.region_name,
    version="2.7.1",
    instance_type=training_instance_type,
    image_scope="training",
)
print(f"Using image: {pytorch_image_uri}")

In [None]:
source_code = SourceCode(
    source_dir="./sagemaker_code",
    command=f"bash sm_accelerate_train.sh {' '.join(args)}",
)

compute_configs = Compute(
    instance_type=training_instance_type,
    instance_count=training_instance_count,
    keep_alive_period_in_seconds=1800,
    volume_size_in_gb=300
)

base_job_name = f"{job_name}-finetune"
output_path = f"s3://{sess.default_bucket()}/{base_job_name}"

model_trainer = ModelTrainer(
    training_image=pytorch_image_uri,
    source_code=source_code,
    base_job_name=base_job_name,
    compute=compute_configs,
    stopping_condition=StoppingCondition(max_runtime_in_seconds=18000),
    output_data_config=OutputDataConfig(
        s3_output_path=output_path,
    ),
    checkpoint_config=CheckpointConfig(
        s3_uri=os.path.join(
            output_path,
            dataset_name.replace('/', '--').replace('.', '-'), 
            job_name,
            "checkpoints"
        ), 
        local_path="/opt/ml/checkpoints"
    ),
    role=role,
    environment=training_env
)

In [None]:
model_trainer.train(
    input_data_config=[
        InputData(
            channel_name="training",
            data_source=uploaded_s3_uri,  
        )
    ], 
    wait=False
)