## SFTTrainer Example - Finetuning with Sagemaker

This notebook demonstrates basic user flow for SFT Finetuning from a model available in Sagemaker Jumpstart.
Information on available models on jumpstart: https://docs.aws.amazon.com/sagemaker/latest/dg/jumpstart-foundation-models-latest.html

### Setup and Configuration

Initialize the environment by importing necessary libraries and configuring AWS credentials

In [1]:
from sagemaker.train.sft_trainer import SFTTrainer
from sagemaker.train.common import TrainingType
from sagemaker.core.training.configs import InputData
from rich import print as rprint
from rich.pretty import pprint
from sagemaker.core.resources import ModelPackage

import boto3
from sagemaker.core.helper.session_helper import Session
import os


# For MLFlow native metrics in Trainer wait, run below line with approriate region
# os.environ["SAGEMAKER_MLFLOW_CUSTOM_ENDPOINT"] = "https://mlflow.sagemaker.us-east-1.app.aws"



sagemaker.config INFO - Not applying SDK defaults from location: /Library/Application Support/sagemaker/config.yaml
sagemaker.config INFO - Not applying SDK defaults from location: /Users/rsareddy/Library/Application Support/sagemaker/config.yaml


## Finetuning with Jumpstart base model

### Create SFTTrainer
**Required Parameters** 

* `model`: base_model id on Sagemaker Hubcontent that is available to finetune (or) ModelPackage artifacts

**Optional Parameters**
* `training_type`: Choose from TrainingType Enum(sagemaker.modules.train.common) either LORA OR FULL.
* `model_package_group_name`: ModelPackage group name or ModelPackageGroup
* `mlflow_resource_arn`: MLFlow app ARN to track the training job
* `mlflow_experiment_name`: MLFlow app experiment name(str)
* `mlflow_run_name`: MLFlow app run name(str)
* `training_dataset`: Training Dataset - either Dataset ARN or S3 Path of the dataset (Please note these are required for a training job to run, can be either provided via Trainer or .train())
* `validation_dataset`: Validation Dataset - either Dataset ARN or S3 Path of the dataset
* `s3_output_path`: S3 path for the trained model artifacts

In [2]:
# For fine-tuning 
sft_trainer = SFTTrainer(
    model="meta-textgeneration-llama-3-2-1b-instruct", # Union[str, ModelPackage]
    training_type=TrainingType.LORA, 
    model_package_group_name="arn:aws:sagemaker:us-west-2:729646638167:model-package-group/sdk-test-finetuned-models", # Make it Optional
    #mlflow_resource_arn="arn:aws:sagemaker:us-west-2:052150106756:mlflow-tracking-server/mmlu-eval-experiment",  # Optional[str] - MLflow app ARN (auto-resolved if not provided), can accept name and search in the account
    mlflow_experiment_name="test-finetuned-models-exp", # Optional[str]
    mlflow_run_name="test-finetuned-models-run", # Optional[str]
    training_dataset="s3://mc-flows-sdk-testing/input_data/sft/", #Optional[]
    s3_output_path="s3://mc-flows-sdk-testing/output/",
    accept_eula=True
)


In [2]:
# For fine-tuning (us-east-1)
sft_trainer = SFTTrainer(
    model="meta-textgeneration-llama-3-2-1b-instruct", # Union[str, ModelPackage]
    training_type=TrainingType.LORA, 
    model_package_group_name="arn:aws:sagemaker:us-east-1:729646638167:model-package-group/sdk-test-finetuned-models-us-east-1", # Make it Optional
    mlflow_resource_arn="arn:aws:sagemaker:us-east-1:729646638167:mlflow-app/app-J2NPD6IV77BJ",  # Optional[str] - MLflow app ARN (auto-resolved if not provided), can accept name and search in the account
    mlflow_experiment_name="test-finetuned-models-exp", # Optional[str]
    mlflow_run_name="test-finetuned-models-run", # Optional[str]
    training_dataset="s3://mc-flows-sdk-testing-us-east-1/input_data/sft/train_285.jsonl", #Optional[]
    s3_output_path="s3://mc-flows-sdk-testing-us-east-1/output/",
    #sagemaker_session=sagemaker_session,
    #role="arn:aws:iam::052150106756:role/Admin"
)

### Discover and update Finetuning options

Each of the technique and model has overridable hyperparameters that can be finetuned by the user.

In [3]:
print("Default Finetuning Options:")
pprint(sft_trainer.hyperparameters.to_dict()) # rename as hyperparameters

Default Finetuning Options:


In [4]:
# To update any hyperparameter, simply assign the value, example:
sft_trainer.hyperparameters.global_batch_size=16

#### Start SFT training


In [5]:
training_job = sft_trainer.train(
    wait=True,
)

In [4]:
training_job = sft_trainer.train(
    wait=True,
)

In [5]:


from sagemaker.core.resources import TrainingJob

response = TrainingJob.get(training_job_name="meta-textgeneration-llama-3-2-1b-instruct-sft-20251201114921")
pprint(response)

In [None]:
#In order to skip waiting and monitor the training Job later

'''
training_job = sft_trainer.train(
    wait=False,
)
'''

In [6]:
pprint(training_job)

### View any Training job details

We can get any training job details and its status with TrainingJob.get(...)

In [8]:
from sagemaker.core.resources import TrainingJob

response = TrainingJob.get(training_job_name="meta-textgeneration-llama-3-2-1b-instruct-sft-20251123162832")
pprint(response)

## Continued Finetuning (or) Finetuning on Model Artifacts

#### Discover a ModelPackage and get its details

In [6]:
from rich import print as rprint
from rich.pretty import pprint
from sagemaker.core.resources import ModelPackage, ModelPackageGroup

#model_package_iter = ModelPackage.get_all(model_package_group_name="test-finetuned-models-gamma")
model_package = ModelPackage.get(model_package_name="arn:aws:sagemaker:us-west-2:729646638167:model-package/sdk-test-finetuned-models/2")

pprint(model_package)

#### Create Trainer

Trainer creation is same as above Finetuning Section except for `model`'s input is ModelPackage(previously trained artifacts)

In [7]:
# For fine-tuning 
sft_trainer = SFTTrainer(
    model=model_package, # Union[str, ModelPackage]
    training_type=TrainingType.LORA, 
    model_package_group_name="sdk-test-finetuned-models", # Make it Optional
    #mlflow_resource_arn="arn:aws:sagemaker:us-west-2:052150106756:mlflow-tracking-server/mmlu-eval-experiment",  # Optional[str] - MLflow app ARN (auto-resolved if not provided), can accept name and search in the account
    mlflow_experiment_name="test-finetuned-models-exp", # Optional[str]
    mlflow_run_name="test-finetuned-models-run", # Optional[str]
    training_dataset="s3://mc-flows-sdk-testing/input_data/sft/", #Optional[]
    s3_output_path="s3://mc-flows-sdk-testing/output/",
)


#### Start the Training

In [8]:
training_job = sft_trainer.train(
    wait=True,
)

In [9]:
pprint(training_job)

#### SFT Trainer Nova testing

In [15]:
os.environ['SAGEMAKER_REGION'] = 'us-east-1'

# For fine-tuning 
sft_trainer_nova = SFTTrainer(
    #model="test-nova-lite-v2", # Union[str, ModelPackage]
    #model="nova-textgeneration-micro",
    model="nova-textgeneration-lite-v2",
    training_type=TrainingType.LORA, 
    model_package_group_name="sdk-test-finetuned-models", # Make it Optional
    #mlflow_resource_arn="arn:aws:sagemaker:us-east-1:052150106756:mlflow-app/app-UNBKLOAX64PX",  # Optional[str] - MLflow app ARN (auto-resolved if not provided), can accept name and search in the account
    mlflow_experiment_name="test-nova-finetuned-models-exp", # Optional[str]
    mlflow_run_name="test-nova-finetuned-models-run", # Optional[str]
    training_dataset="arn:aws:sagemaker:us-east-1:729646638167:hub-content/sdktest/DataSet/sft-nova-test-dataset/0.0.1",
    s3_output_path="s3://mc-flows-sdk-testing-us-east-1/output/"
)


In [16]:
sft_trainer_nova.hyperparameters.to_dict()

{'name': 'my-lora-sft-run-1r3op',
 'data_s3_path': 's3://my-bucket-name/train.jsonl',
 'output_s3_path': 's3://my-bucket-name/outputs/',
 'reasoning_enabled': 'True',
 'global_batch_size': '32',
 'max_steps': '10',
 'learning_rate': '1e-05',
 'lora_alpha': '64',
 'learning_rate_ratio': '64.0',
 'max_context_length': '8192'}

In [17]:
training_job = sft_trainer_nova.train(
    wait=True,
)