## Fine-tuning LLMs and Experiment Tracking with MLflow 🧪

Most of our MLflow configurations exist in our training script `sft.py`. This is where we import the MLflow library, set our tracking server, name our experiments and runs and log artifiacts, parameters, and metrics. MLflow has direct integration with Huggingface Transformers library to make it easy to setup.

## Setting Up MLflow

Getting started with MLflow is straightforward. You can install MLflow using pip:

```bash
pip install mlflow
pip install mlflow-sagemaker
```

The second package is needed to set a tracking server in SageMaker with an Amazon Resource Name (ARN). Once these two packages are installed you import the mlflow library and set the tracking URI to your SageMaker tracking server and name your experiment.

```python
import mlflow

mlflow.set_tracking_uri("your_sagemaker_tracking_uri")
mlflow.set_experiment("your_experiment_name")
```

Next you need to start a run. A run can be a fine-tuning job or an evaluation job. Runs can be nested with a parent child relationship to remember which jobs are related to each other. Once you have initiated a run you can log different items to MLflow.

```python
with mlflow.start_run(run_name='training-job-1') as run:
    trainer.train()
    mlflow.log_metrics('loss': 0.05)
    mlflow.log_param("learning_rate", 0.01)
    mlflow.log_artifact('recipe.yaml')
```


In [None]:
!pip install --upgrade -r requirements-fine-tuning.txt -q

In [None]:
import boto3
import sagemaker

# from PIL import Image
# import torch

In [None]:
region = boto3.Session().region_name

# from sagemaker.local import LocalSession
# sess = LocalSession() #sagemaker.Session(boto3.Session(region_name=region))
# sess.config = {"local": {"local_code": True}}

sess = sagemaker.Session()

sagemaker_session_bucket = None
if sagemaker_session_bucket is None and sess is not None:
    # set to default bucket if a bucket name is not given
    sagemaker_session_bucket = sess.default_bucket()

role = sagemaker.get_execution_role()

In [None]:
print(f"sagemaker role arn: {role}")
print(f"sagemaker bucket: {sess.default_bucket()}")
print(f"sagemaker session region: {sess.boto_region_name}")

## Data Preparation for Supervised Fine-tuning


### [Finance-Instruct-500k](https://huggingface.co/datasets/Josephgflowers/Finance-Instruct-500k)

**Finance-Instruct-500k** is a large-scale dataset with about **518,000 entries** focused on the financial domain. It spans topics such as investments, banking, markets, accounting, and corporate finance, offering a wide variety of instruction–response examples.

**Data Format & Structure**:

- Distributed in **JSON** format, with simple conversion to Parquet.
- Contains a single `train` split with ~518k records.
- Each record includes:
  - `system` – context or metadata for the task
  - `user` – the financial prompt or query
  - `assistant` – the corresponding response

**License**: Released under the **Apache-2.0** license.

**Applications**:

The dataset can support finance-focused tasks such as:

- Financial question answering
- Market and investment analysis
- Topic and sentiment classification
- Financial entity extraction and document understanding


In [None]:
import os
import json
import pprint
from tqdm import tqdm
from datasets import load_dataset

**Preparing Your Dataset in `messages` format**

This section walks you through creating a conversation-style dataset—the required `messages` format—for directly training LLMs using SageMaker AI.

**What Is the `messages` Format?**

The `messages` format structures instances as chat-like exchanges, wrapping each conversation turn into a role-labeled JSON array. It’s widely used by frameworks like TRL.

Example entry:

```json
{
  "messages": [
    { "role": "system", "content": "You are a helpful assistant." },
    { "role": "user", "content": "How do I bake sourdough?" },
    {
      "role": "assistant",
      "content": "First, you need to create a starter by..."
    }
  ]
}
```


In [None]:
dataset_name = "Josephgflowers/Finance-Instruct-500k"
dataset = load_dataset(
    dataset_name, split="train[:100]"
)  # just a toy example with 100 samples

In [None]:
pprint.pp(dataset[0])

In [None]:
print(f"total number of fine-tunable samples: {len(dataset)}")

In [None]:
def convert_to_messages(row):
    system_content = "You are a financial reasoning assistant. Read the user’s query, restate the key data, and solve step by step. Show calculations clearly, explain any rounding or adjustments, and present the final answer in a concise and professional manner."
    user_content = row["user"]
    assistant_content = row["assistant"]

    return {
        "messages": [
            {"role": "system", "content": system_content},
            {"role": "user", "content": user_content},
            {"role": "assistant", "content": assistant_content},
        ]
    }


dataset = dataset.map(convert_to_messages, remove_columns=dataset.column_names)

In [None]:
dataset_parent_path = os.path.join(os.getcwd(), "data")
os.makedirs(dataset_parent_path, exist_ok=True)

In [None]:
dataset_filename = os.path.join(
    dataset_parent_path,
    "finance-instruct-500k.jsonl",
)
dataset.to_json(dataset_filename, lines=True)

#### Upload file to S3


In [None]:
import sagemaker
from sagemaker.s3 import S3Uploader

In [None]:
data_s3_uri = f"s3://{sess.default_bucket()}/dataset"

uploaded_s3_uri = S3Uploader.upload(
    local_path=dataset_filename, desired_s3_uri=data_s3_uri
)
print(f"Uploaded {dataset_filename} to:")
print(uploaded_s3_uri)

## Fine-Tune LLMs using SageMaker `Estimator`/`ModelTrainer`


In [None]:
import time
from sagemaker.pytorch import PyTorch
from getpass import getpass
import yaml
from jinja2 import Template

In [None]:
hf_token = getpass()

### Training using `PyTorch` Estimator

**Training Using `PyTorch` Estimator**
Leverages the official PyTorch SageMaker container to run a custom training script using the Accelerate and DeepSpeed libraries. This option is ideal for users who want full control over the training pipeline

---

**Observability**: SageMaker AI has [SageMaker MLflow](https://docs.aws.amazon.com/sagemaker/latest/dg/mlflow.html) which enables you to accelerate generative AI by making it easier to track experiments and monitor performance of models and AI applications using a single tool.

You can choose to include MLflow as a part of your training workflow to track your model fine-tuning metrics in realtime by simply specifying a **mlflow** tracking arn.

Optionally you can also report to : **tensorboard**, **wandb**.


In [None]:
MLFLOW_TRACKING_SERVER_ARN = 'arn:aws:sagemaker:us-east-1:198346569064:mlflow-tracking-server/vlm-finetuning-server'  # or "arn:aws:sagemaker:us-west-2:<account-id>:mlflow-tracking-server/<server-name>"

if MLFLOW_TRACKING_SERVER_ARN:
    reports_to = "mlflow"
else:
    reports_to = "tensorboard"

In [None]:
job_name = 'qwen3-06b-lora-ft-finance'  # cannot use '.' in name
instance_type = "ml.g5.2xlarge"

In [None]:
if MLFLOW_TRACKING_SERVER_ARN:
    environment = {
        "MLFLOW_EXPERIMENT_NAME": job_name,
        "MLFLOW_TAGS": '{"source.job": "sm-training-jobs", "source.type": "sft", "source.framework": "pytorch"}',
        "HF_TOKEN": hf_token,
        "MLFLOW_TRACKING_URI": MLFLOW_TRACKING_SERVER_ARN,
    }
else:
    environment = {"HF_TOKEN": hf_token}

In [None]:
pytorch_image_uri = f"763104351884.dkr.ecr.{region}.amazonaws.com/pytorch-training:2.8.0-gpu-py312-cu129-ubuntu22.04-sagemaker"
print(f"Using image: {pytorch_image_uri}")

In [None]:
mlflow_run_description = '''
    AIops MLflow Workshop

    This section shows how to fine-tune a model and track experiments with MLflow tracking UI server. 
    Once models are fine-tuned they can be evaluated using MLflow built in metrics and LLM-as-a-judge
    functionality. 

    The fine-tuned model is Qwen3 0.6B and fine-tuned on a finance reasoning dataset.
    '''

hyperparameters = {
    "config": "qwen3-0.6b.yaml",
    "mlflow_run_description": mlflow_run_description,
    # "mlflow_experiment_name": job_name,
    # "mlflow_tags": '{"source.job": "sm-training-jobs", "source.type": "sft", "source.framework": "pytorch"}',
    # "hf_token": hf_token,
    # "mlflow_tracking_server": MLFLOW_TRACKING_SERVER_ARN,
}

#### Training strategy: `PeFT/LoRA`


In [None]:
pytorch_estimator = PyTorch(
    image_uri=pytorch_image_uri,
    entry_point="train.sh",  # Adapted bash script to train using accelerate on SageMaker - Multi-GPU
    source_dir="scripts",
    instance_type=instance_type,
    instance_count=1,
    base_job_name=job_name,
    role=role,
    volume_size=300,
    py_version="py312",
    keep_alive_period_in_seconds=3600,
    environment=environment,
    sagemaker_session=sess,
    hyperparameters=hyperparameters,
)

# fit or train
pytorch_estimator.fit({"train": uploaded_s3_uri}, wait=True)

In [None]:
# s3_model_data_uri = pytorch_estimator.model_data
# print(f"Fine-tuned model location: {s3_model_data_uri}")