# Fine-tune TinyLlama-1.1B for text-to-SQL generation

## Introduction

In this workshop module, you will learn how to fine-tune a Llama-based LLM ([TinyLlama-1.1B-Chat-v1.0](https://huggingface.co/TinyLlama/TinyLlama-1.1B-Chat-v1.0)) using causal language modelling so that the model learns how to generate SQL queries for text-based instructions. Your fine-tuning job will be launched using SageMaker Training which provides a serverless training environment where you do not need to manage the underlying infrastructure. You will learn how to configure a PyTorch training job using [SageMaker's PyTorch estimator](https://sagemaker.readthedocs.io/en/stable/frameworks/pytorch/using_pytorch.html), and how to leverage the [Hugging Face Optimum Neuron](https://github.com/huggingface/optimum-neuron) package to easily run the PyTorch training job with AWS Trainium accelerators via an [AWS EC2 trn1.2xlarge instance](https://aws.amazon.com/ec2/instance-types/trn1/).

For this module, you will be using the [b-mc2/sql-create-context](https://huggingface.co/datasets/b-mc2/sql-create-context) dataset which consists of thousands of examples of SQL schemas, questions about the schemas, and SQL queries intended to answer the questions.

*Dataset example 1:*
* *SQL schema/context:* `CREATE TABLE management (department_id VARCHAR); CREATE TABLE department (department_id VARCHAR)`
* *Question:* `How many departments are led by heads who are not mentioned?`
* *SQL query/answer:* `SELECT COUNT(*) FROM department WHERE NOT department_id IN (SELECT department_id FROM management)`

*Dataset example 2:*
* *SQL schema/context:* `CREATE TABLE courses (course_name VARCHAR, course_id VARCHAR); CREATE TABLE student_course_registrations (student_id VARCHAR, course_id VARCHAR)`
* *Question:* `What are the ids of all students for courses and what are the names of those courses?`
* *SQL query/answer:* `SELECT T1.student_id, T2.course_name FROM student_course_registrations AS T1 JOIN courses AS T2 ON T1.course_id = T2.course_id`

By fine-tuning the model over several thousand of these text-to-SQL examples, the model will then learn how to generate an appropriate SQL query when presented with a SQL context and a free-form question.

This text-to-SQL use case was selected so you can successfully fine-tune your model in a reasonably short amount of time (~20 minutes) which is appropriate for this 1hr workshop. Although this is a relatively simple use case, please keep in mind that the same techniques and components used in this module can also be applied to fine-tune LLMs for more advanced use cases such as writing code, summarizing documents, creating blog posts - the possibilities are endless!

## Prerequisites

This notebook uses the SageMaker Python SDK to prepare, launch, and monitor the progress of a PyTorch-based training job. Before we get started, it is important to upgrade the SageMaker SDK to ensure that you are using the latest version. Run the next two cells to upgrade the SageMaker SDK and set up your session.

In [29]:
# Upgrade SageMaker SDK to the latest version
%pip install -U sagemaker awscli huggingface_hub ipywidgets s3 -q 2>&1 | grep -v "warnings/venv"
# Definitely restart your kernel after this cell

[33m  DEPRECATION: Building 's3' using the legacy setup.py bdist_wheel mechanism, which will be removed in a future version. pip 25.3 will enforce this behaviour change. A possible replacement is to use the standardized build interface by setting the `--use-pep517` option, (possibly combined with `--no-build-isolation`), or adding a `pyproject.toml` file to the source tree of 's3'. Discussion can be found at https://github.com/pypa/pip/issues/6334[0m[33m
[0m[33m  DEPRECATION: Building 'futures' using the legacy setup.py bdist_wheel mechanism, which will be removed in a future version. pip 25.3 will enforce this behaviour change. A possible replacement is to use the standardized build interface by setting the `--use-pep517` option, (possibly combined with `--no-build-isolation`), or adding a `pyproject.toml` file to the source tree of 'futures'. Discussion can be found at https://github.com/pypa/pip/issues/6334[0m[33m
[0m
[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m A new 

In [20]:
import logging
sagemaker_config_logger = logging.getLogger("sagemaker.config")
sagemaker_config_logger.setLevel(logging.WARNING)

# Import SageMaker SDK, setup our session
from sagemaker import get_execution_role, Session
from sagemaker.pytorch import PyTorch
import boto3

region_name="us-west-2" #this is hard coded to a specific region because of Workshop quotas.  You could use sess.boto_region_name
sess = Session(boto_session=boto3.Session(region_name=region_name))
default_bucket = sess.default_bucket()


This next command just configures the EC2 instance (in us-west-2) to have a default region of us-east-2.  This is specific to the environment in AWS Workshop Studio.

In [21]:
!aws configure set region us-west-2

## Log into Hugging Face

The following step is recommended but optional.  If you can log in with your Hugging Face token, it will let you avoid any rate limits for unauthenticated requests.  Even though none of the models or datasets we are using require special permission, if you don't log in your training may fail because of too many unauthenticated requests.  

In [22]:
# If the cell below stays empty, RESTART YOUR KERNEL if you didn't and run the cells above again
# If you can't login in, you can proceed to the next cell.
from huggingface_hub import notebook_login

# Uncheck "Add token as git credential" or just ignore the error message about it not being added.
notebook_login()

## Specify the Optimum Neuron deep learning container (DLC) image

The SageMaker Training service uses containers to execute your training script, allowing you to fully customize your training script environment and any required dependencies. For this workshop, you will use a recent Pytorch Training deep learning container (DLC) image which is an AWS-maintained image containing the Neuron SDK and PyTorch.  The Optimum-Neuron library is installed with the requirements.txt file in the assets directory.

In [23]:
# Specify the Neuron DLC that we will use for training
#   For now, we'll use the standard Neuron DLC and install Optimum Neuron v0.0.27 at training time because we want to use a later SDK 
#   You can see more about the images here: https://github.com/aws-neuron/deep-learning-containers?tab=readme-ov-file#pytorch-training-neuronx

training_image = f"763104351884.dkr.ecr.{sess.boto_region_name}.amazonaws.com/pytorch-training-neuronx:2.7.0-neuronx-py310-sdk2.24.1-ubuntu22.04"
print(training_image)

763104351884.dkr.ecr.us-west-2.amazonaws.com/pytorch-training-neuronx:2.7.0-neuronx-py310-sdk2.24.1-ubuntu22.04


## Configure the PyTorch Estimator

The SageMaker SDK includes a [PyTorch Estimator](https://sagemaker.readthedocs.io/en/stable/frameworks/pytorch/using_pytorch.html) class which you can use to define a PyTorch training job that will be executed in the SageMaker managed environment. 

In the following cell, you will create a PyTorch Estimator which will run the attached `finetune_model.py` training script on an ml.trn1.2xlarge instance. The `finetune_model.py` script is an Optimum Neuron training script that can be used for causal language modelling with AWS Trainium. The scripts will be downloaded as the instance is brought up, and the scripts will download the model and the datasets onto the SageMaker training instance.

The PyTorch Estimator has many parameters that can be used to configure your training job. A few of the most important parameters include:

- *entry_point*: refers to the name of the training script that will be executed as part of this training job
- *source_dir*: the path to the local source code directory (relative to your notebook) that will be packaged up and included inside your training container
- *instance_count*: defines how many EC2 instances to use for this training job
- *instance_type*: determines which type of EC2 instance will be used for training
- *image_uri*: defines which training DLC will be used to run the training job (see Neuron DLC, above)
- *distribution*: determines which type of distribution to use for the training job - you will need 'torch_distributed' for this workshop
- *environment*: provides a dictionary of environment variables which will be applied to your training environment
- *hyperparameters*: provides a dictionary of command-line arguments to pass to your training script, ex: finetune_model.py

In the `hyperparameters` section, you can see the specific command-line arguments that are used to control the behavior of the `finetune_model.py` training script. Notably:
- *model_id*: specifies which model you will be fine-tuning, in this case a recent checkpoint from the TinyLlama-1.1B project
- *tokenizer_id*: specifies which tokenizer you will used to tokenize the dataset examples during training
- *output_dir*: directory in which the fine-tuned model will be saved. Here we use the SageMaker-specific `/opt/ml/model` directory. At the end of the training job, SageMaker automatically copies the contents of this directory to the output S3 bucket
- *tensor_parallel_size*: the tensor parallel degree for which we want to use for training. In this case we use '2' to shard the model across the 2 NeuronCores available in the trn1.2xlarge instance
- *bf16*: request BFloat16 training
- *per_device_train_batch_size*: the microbatch size to be used for fine-tuning
- *gradient_accumulation_steps*: how many steps for which gradients will be accumulated between updates
- *max_steps*: the maximum number of steps of fine-tuning that we want to perform
- *lora_r*, *lora_alpha*, *lora_dropout*: the LoRA rank, alpha, and dropout values to use during fine-tuning

The below estimator has been pre-configured for you, so you do not need to make any changes.

In [37]:
# Note that the hyperparameters are command-line args passed to the finetune_model.py script to control its behavior
# Create hyperparameters dictionary
hyperparameters = {
    "model_id": "TinyLlama/TinyLlama-1.1B-Chat-v1.0",
    "tokenizer_id": "TinyLlama/TinyLlama-1.1B-Chat-v1.0",
    "skip_cache_push": True,
    "output_dir": "/opt/ml/model",
    "tensor_parallel_size": 2,
    "bf16": True,
    "per_device_train_batch_size": 2,
    "gradient_accumulation_steps": 1,
    "gradient_checkpointing": True,
    "max_steps": 1000,
    "lora_r": 16,
    "lora_alpha": 32,
    "lora_dropout": 0.05,
    "logging_steps": 10,
    "learning_rate": 5e-5,
    "dataloader_drop_last": True,
    "disable_tqdm": True,
}

# Set up environment variables
from huggingface_hub import HfApi
api = HfApi()
token = api.token
environment = {"FI_EFA_FORK_SAFE": "1", "WANDB_DISABLED": "true"}

# from huggingface_hub import HfFolder
# token = HfFolder.get_token()
# if token is not None:
#     environment["HF_TOKEN"] = token

# Set up the PyTorch estimator
pt_estimator = PyTorch(
    entry_point="finetune_model.py",
    source_dir="./assets",
    role=get_execution_role(),
    instance_count=1,
    instance_type="ml.trn1.2xlarge",
    disable_profiler=True,
    output_path=f"s3://{default_bucket}/neuron_events2025_fhirfly",
    base_job_name="trn1-tinyllama",
    sagemaker_session=sess,
    code_bucket=f"s3://{default_bucket}/neuron_events2025_fhirfly_code",
    checkpoint_s3_uri=f"s3://{default_bucket}/neuron_events2025_fhirfly_output",
    image_uri=training_image,
    distribution={"torch_distributed": {"enabled": True}},
    environment=environment,
    disable_output_compression=True,
    hyperparameters=hyperparameters
)


## Launch the training job

Once the estimator has been created, you can then launch your training job by calling `.fit()` on the estimator:

In [38]:
def load_dataset_s3(s3_uri: str):
    import pandas as pd
    from io import StringIO
    from datasets import Dataset
    
    parts = s3_uri.replace("s3://", "").split("/", 1)
    bucket, key = parts[0], parts[1]

    # Initialize S3 client
    s3 = boto3.client("s3")
    
    obj = s3.get_object(Bucket=bucket, Key=key)
    csv_data = obj["Body"].read().decode("utf-8")

    # Load into pandas DataFrame
    df = pd.read_csv(StringIO(csv_data))
    print(f"Loaded {len(df)} rows from {s3_uri}")

    # Convert to Hugging Face Dataset
    dataset = Dataset.from_pandas(df)
    return dataset

In [39]:
s3_uri = 's3://sagemaker-us-west-2-526909565990/neuron_events2025_fhirfly/data/clinical_notes_with_pii_added.csv'
load_dataset_s3(s3_uri)

Loaded 160 rows from s3://sagemaker-us-west-2-526909565990/neuron_events2025_fhirfly/data/clinical_notes_with_pii_added.csv


Dataset({
    features: ['Unnamed: 0', 'patient_id', 'note', 'question', 'answer', 'task', 'note_title', 'diagnosis', 'note_with_pii'],
    num_rows: 160
})

In [40]:
# Call fit() on the estimator to initiate the training job
pt_estimator.fit({'train': s3_uri}, wait=False, logs=False)

INFO:sagemaker.telemetry.telemetry_logging:SageMaker Python SDK will collect telemetry to help us better understand our user's needs, diagnose issues, and deliver additional features.
To opt out of telemetry, please disable via TelemetryOptOut parameter in SDK defaults config. For more information, refer to https://sagemaker.readthedocs.io/en/stable/overview.html#configuring-and-using-defaults-with-the-sagemaker-python-sdk.
INFO:sagemaker:Creating training-job with name: trn1-tinyllama-2025-11-09-00-50-11-469


## Monitor the training job

When the training job has been launched, the SageMaker Training service will then take care of:
- launching and configuring the requested EC2 infrastructure for your training job
- launching the requested container image on each of the EC2 instances
- copying your source code directory and running your training script within the container(s)
- storing your trained model artifacts in Amazon Simple Storage Service (S3)
- decommissioning the training infrastructure

While the training job is running, the following cell will periodically check and output the job status. When you see 'Completed', you know that your training job is finished and you can proceed to the remainder of the notebook. The training job typically takes about 20 minutes to complete.

If you are interested in viewing the output logs from your training job, you can view the logs by navigating to the AWS CloudWatch console, selecting `Logs -> Log Groups` in the left-hand menu, and then looking for your SageMaker training job in the list. **Note:** it will usually take 4-5 minutes before the infrastructure is running and the output logs begin to be populated in CloudWatch.

In [None]:
# Periodically check job status until it shows 'Completed' (ETA ~20 minutes)
#  You can also monitor job status in the SageMaker console, and view the
#  SageMaker Training job logs in the CloudWatch console
from time import sleep
from datetime import datetime

while (job_status := pt_estimator.jobs[-1].describe()['TrainingJobStatus']) not in ['Completed', 'Error', 'Failed']:
    print(f"{datetime.now().isoformat()} Training job status: {job_status}!")
    sleep(30)

print(f"\n{datetime.now().isoformat()} Training job status: {job_status}!")

2025-11-09T00:50:14.526139 Training job status: InProgress!
2025-11-09T00:50:44.593382 Training job status: InProgress!
2025-11-09T00:51:14.683317 Training job status: InProgress!
2025-11-09T00:51:44.749667 Training job status: InProgress!
2025-11-09T00:52:14.814258 Training job status: InProgress!
2025-11-09T00:52:44.878908 Training job status: InProgress!
2025-11-09T00:53:14.947276 Training job status: InProgress!
2025-11-09T00:53:45.012044 Training job status: InProgress!
2025-11-09T00:54:15.087948 Training job status: InProgress!
2025-11-09T00:54:45.199883 Training job status: InProgress!
2025-11-09T00:55:15.274025 Training job status: InProgress!
2025-11-09T00:55:45.341262 Training job status: InProgress!
2025-11-09T00:56:15.408613 Training job status: InProgress!
2025-11-09T00:56:45.474569 Training job status: InProgress!
2025-11-09T00:57:15.544323 Training job status: InProgress!
2025-11-09T00:57:45.606887 Training job status: InProgress!
2025-11-09T00:58:15.668417 Training job 

'## Determine location of fine-tuned model artifacts

Once the training job has completed, SageMaker will copy your fine-tuned model artifacts to a specified location in S3.

In the following cell, you can see how to programmatically determine the location of your model artifacts:

In [11]:
# Show where the fine-tuned model is stored - previous job must be 'Completed' before running this cell
model_archive_path = pt_estimator.jobs[-1].describe()['ModelArtifacts']['S3ModelArtifacts']
print(f"Your fine-tuned model is available here:\n\n{model_archive_path}/")

Your fine-tuned model is available here:

s3://this.output.should.be.replaced.with.a.real.s3.path.once.the.cell.is.executed/


<br/>

**Note:** Please copy the above S3 path, as it will be required in the subsequent workshop module.


Lastly, run the following cell to list the model artifacts available in your S3 model_archive_path:

In [12]:
# View the contents of the fine-tuned model path in S3
!aws s3 ls {model_archive_path}/merged_model/

2025-05-13 01:01:39        714 config.json
2025-05-13 01:01:48        124 generation_config.json
2025-05-13 01:01:40 4400216536 model.safetensors
2025-05-13 01:01:47        551 special_tokens_map.json
2025-05-13 01:01:47    1842795 tokenizer.json
2025-05-13 01:01:39     499723 tokenizer.model
2025-05-13 01:01:48       1368 tokenizer_config.json


Congratulations on completing the LLM fine-tuning module!

In the next notebook, you will learn how to deploy your fine-tuned model in a SageMaker hosted endpoint, and leverage AWS Inferentia accelerators to perform model inference. Have fun!