## RLVR Example - Finetuning with Sagemaker

This notebook demonstrates basic user flow for RLVR Finetuning from a model available in Sagemaker Jumpstart.
Information on available models on jumpstart: https://docs.aws.amazon.com/sagemaker/latest/dg/jumpstart-foundation-models-latest.html

### Setup and Configuration

Initialize the environment by importing necessary libraries and configuring AWS credentials

In [1]:

from sagemaker.train.rlvr_trainer import RLVRTrainer
from sagemaker.train.configs import InputData
from rich import print as rprint
from rich.pretty import pprint
from sagemaker.core.resources import ModelPackage
from sagemaker.train.common import TrainingType


import boto3
from sagemaker.core.helper.session_helper import Session
import os
os.environ['SAGEMAKER_REGION'] = 'us-east-1'
os.environ['SAGEMAKER_STAGE'] = 'prod'

# Beta endpoint configuration
api_endpoint = "https://api.sagemaker.gamma.us-west-2.ml-platform.aws.a2z.com"

# Create SageMaker clients with beta endpoints
sm_client = boto3.client('sagemaker', endpoint_url=api_endpoint, region_name="us-west-2")

# Create SageMaker session with beta endpoints
sagemaker_session = Session(sagemaker_client=sm_client)




sagemaker.config INFO - Not applying SDK defaults from location: /Library/Application Support/sagemaker/config.yaml
sagemaker.config INFO - Not applying SDK defaults from location: /Users/rsareddy/Library/Application Support/sagemaker/config.yaml


### Create RLVRTrainer
**Required Parameters** 

* `model`: base_model id on Sagemaker Hubcontent that is available to finetune (or) ModelPackage artifacts

**Optional Parameters**
* `custom_reward_function`: Custom reward function/Evaluator ARN
* `model_package_group_name`: ModelPackage group name or ModelPackageGroup
* `mlflow_resource_arn`: MLFlow app ARN to track the training job
* `mlflow_experiment_name`: MLFlow app experiment name(str)
* `mlflow_run_name`: MLFlow app run name(str)
* `training_dataset`: Training Dataset - either Dataset ARN or S3 Path of the dataset (Please note these are required for a training job to run, can be either provided via Trainer or .train())
* `validation_dataset`: Validation Dataset - either Dataset ARN or S3 Path of the dataset
* `s3_output_path`: S3 path for the trained model artifacts

In [8]:
# For fine-tuning 
rlvr_trainer = RLVRTrainer(
    model="meta-textgeneration-llama-3-2-3b-instruct", # Union[str, ModelPackage] 
    model_package_group_name="test-finetuned-models-gamma", #"test-finetuned-models", # Make it Optional
    #mlflow_resource_arn="arn:aws:sagemaker:us-west-2:052150106756:mlflow-tracking-server/mmlu-eval-experiment",  # Optional[str] - MLflow app ARN (auto-resolved if not provided), can accept name and search in the account
    mlflow_experiment_name="test-rlvr-finetuned-models-exp", # Optional[str]
    mlflow_run_name="test-rlvr-finetuned-models-run", # Optional[str]
    training_dataset="s3://rsareddy-test-for-demo/input_data/rlvr-rlaif-oss-dataset/train_285.jsonl",     #"arn:aws:sagemaker:us-west-2:052150106756:hub-content/AIRegistry/DataSet/MarketingDemoDataset1/1.0.0", #Optional[]
    s3_output_path="s3://rsareddy-test-for-demo/output/",
    sagemaker_session=sagemaker_session,
    #role="arn:aws:iam::052150106756:role/service-role/AmazonSageMaker-ExecutionRole-20250731T162975"
    role="arn:aws:iam::052150106756:role/Admin"
)

### Discover and update Finetuning options

Each of the technique and model has overridable hyperparameters that can be finetuned by the user.

In [9]:
print("Default Finetuning Options:")
pprint(rlvr_trainer.hyperparameters.to_dict()) # rename as hyperparameters

#set options
rlvr_trainer.hyperparameters.get_info()



Default Finetuning Options:



data_path:
  Current value: None
  Type: string
  Default: None
  Required: Yes

global_batch_size:
  Current value: 256
  Type: integer
  Default: 256
  Valid options: [256, 512, 1024]
  Required: Yes

learning_rate:
  Current value: 3e-05
  Type: float
  Default: 3e-05
  Range: 1e-07 - 0.001
  Required: Yes

lora_alpha:
  Current value: 32
  Type: integer
  Default: 32
  Range: 8 - 1024
  Required: Yes

max_epochs:
  Current value: 1
  Type: integer
  Default: 1
  Range: 1 - 100000
  Required: Yes

max_prompt_length:
  Current value: 1024
  Type: integer
  Default: 1024
  Range: 512 - 16384
  Required: Yes

mlflow_run_id:
  Current value: 
  Type: string
  Default: 

mlflow_tracking_uri:
  Current value: 
  Type: string
  Default: 

model_name_or_path:
  Current value: meta-llama/Llama-3.2-3B-Instruct
  Type: string
  Default: meta-llama/Llama-3.2-3B-Instruct
  Required: Yes

name:
  Current value: example-name-c9tcy
  Type: string
  Default: example-name-c9tcy
  Required: Yes

outp

#### Start RLVR training


In [4]:
training_job = rlvr_trainer.train(wait=True)

Training Job Name: meta-textgeneration-llama-3-2-3b-instruct-rlvr-20251124135100


sagemaker.config INFO - Not applying SDK defaults from location: /Library/Application Support/sagemaker/config.yaml
sagemaker.config INFO - Not applying SDK defaults from location: /Users/rsareddy/Library/Application Support/sagemaker/config.yaml


Output()

In [10]:
training_job = rlvr_trainer.train(wait=False)

### View any Training job details

We can get any training job details and its status with TrainingJob.get(...)

In [6]:
from sagemaker.core.resources import TrainingJob

response = TrainingJob.get(training_job_name="meta-textgeneration-llama-3-2-3b-instruct-rlvr-20251123033517")
pprint(response)

In [14]:
training_job.refresh()
pprint(training_job)

### Test RLVR with Custom RewardFunction

Here we are providing a user-defined reward function ARN

In [3]:

# For fine-tuning 
rlvr_trainer = RLVRTrainer(
    model="meta-textgeneration-llama-3-2-1b-instruct", # Union[str, ModelPackage] 
    model_package_group_name="test-finetuned-models-gamma", # Make it Optional
    mlflow_resource_arn="arn:aws:sagemaker:us-west-2:052150106756:mlflow-tracking-server/mmlu-eval-experiment",  # Optional[str] - MLflow app ARN (auto-resolved if not provided), can accept name and search in the account
    mlflow_experiment_name="test-rlvr-finetuned-models-exp", # Optional[str]
    mlflow_run_name="test-rlvr-finetuned-models-run", # Optional[str]
    training_dataset="arn:aws:sagemaker:us-west-2:052150106756:hub-content/AIRegistry/DataSet/rlvr-rlaif-test-dataset/0.0.2", #Optional[]
    s3_output_path="s3://open-models-testing-pdx/output",
    sagemaker_session=sagemaker_session,
    #role="arn:aws:iam::052150106756:role/service-role/AmazonSageMaker-ExecutionRole-20250731T162975"
    role="arn:aws:iam::052150106756:role/Admin",
    custom_reward_function="arn:aws:sagemaker:us-west-2:052150106756:hub-content/AIRegistry/JsonDoc/rlvr-test-rf/0.0.1"
    
)

In [4]:
training_job = rlvr_trainer.train(wait=True, logs=True)

In [10]:
#training_job.refresh()
pprint(training_job)

In [None]:

#meta-textgeneration-llama-3-2-1b-instruct-rlvr-20251113182932

## Continued Finetuning (or) Finetuning on Model Artifacts

#### Discover a ModelPackage and get its details

In [6]:
from rich import print as rprint
from rich.pretty import pprint
from sagemaker.core.resources import ModelPackage, ModelPackageGroup

#model_package_iter = ModelPackage.get_all(model_package_group_name="test-finetuned-models-gamma")
model_package = ModelPackage.get(model_package_name="arn:aws:sagemaker:us-west-2:052150106756:model-package/test-finetuned-models-gamma/61")

pprint(model_package)

#### Create Trainer

Trainer creation is same as above Finetuning Section except for `model`'s input is ModelPackage(previously trained artifacts)

In [7]:
# For fine-tuning 
rlvr_trainer = RLVRTrainer(
    model=model_package, # Union[str, ModelPackage] 
    training_type=TrainingType.LORA, 
    model_package_group_name="test-finetuned-models-gamma", #"test-finetuned-models", # Make it Optional
    mlflow_resource_arn="arn:aws:sagemaker:us-west-2:052150106756:mlflow-tracking-server/mmlu-eval-experiment",  # Optional[str] - MLflow app ARN (auto-resolved if not provided), can accept name and search in the account
    mlflow_experiment_name="test-rlvr-finetuned-models-exp", # Optional[str]
    mlflow_run_name="test-rlvr-finetuned-models-run", # Optional[str]
    training_dataset="arn:aws:sagemaker:us-west-2:052150106756:hub-content/AIRegistry/DataSet/rlvr-rlaif-test-dataset/0.0.2",     #"arn:aws:sagemaker:us-west-2:052150106756:hub-content/AIRegistry/DataSet/MarketingDemoDataset1/1.0.0", #Optional[]
    s3_output_path="s3://open-models-testing-pdx/output",
    sagemaker_session=sagemaker_session,
    #role="arn:aws:iam::052150106756:role/service-role/AmazonSageMaker-ExecutionRole-20250731T162975"
    role="arn:aws:iam::052150106756:role/Admin"
)

In [2]:
# For fine-tuning 
rlvr_trainer = RLVRTrainer(
    model="arn:aws:sagemaker:us-west-2:052150106756:model-package/test-finetuned-models-gamma/61", # Union[str, ModelPackage] 
    training_type=TrainingType.LORA, 
    model_package_group_name="test-finetuned-models-gamma", #"test-finetuned-models", # Make it Optional
    mlflow_resource_arn="arn:aws:sagemaker:us-west-2:052150106756:mlflow-tracking-server/mmlu-eval-experiment",  # Optional[str] - MLflow app ARN (auto-resolved if not provided), can accept name and search in the account
    mlflow_experiment_name="test-rlvr-finetuned-models-exp", # Optional[str]
    mlflow_run_name="test-rlvr-finetuned-models-run", # Optional[str]
    training_dataset="arn:aws:sagemaker:us-west-2:052150106756:hub-content/AIRegistry/DataSet/rlvr-rlaif-test-dataset/0.0.2",     #"arn:aws:sagemaker:us-west-2:052150106756:hub-content/AIRegistry/DataSet/MarketingDemoDataset1/1.0.0", #Optional[]
    s3_output_path="s3://open-models-testing-pdx/output",
    sagemaker_session=sagemaker_session,
    #role="arn:aws:iam::052150106756:role/service-role/AmazonSageMaker-ExecutionRole-20250731T162975"
    role="arn:aws:iam::052150106756:role/Admin"
)

us-west-2


#### Start the Training

In [None]:
training_job = rlvr_trainer.train(
    wait=True,
)

Training Job Name: meta-textgeneration-llama-3-2-3b-instruct-rlvr-20251124140103


Output()

In [11]:
pprint(training_job)