# PEFT Supervised Fine-Tuning (SFT) Amazon Nova 2 Lite using Amazon Bedrock

Training a large language model typically has two major stages: pre-training and post-training. During pre-training, the model is exposed to trillions of tokens of raw text and optimized purely for next-token prediction. This makes it an extremely capable pattern completer over the distribution of web and curated text. It absorbs syntax, semantics, facts, and broad reasoning patterns. But it is unaligned with human intent, meaning it does not inherently understand instructions, user goals, or context-appropriate behavior. It simply continues text in whatever style best fits its training distribution. As a result, a pre-trained model tends to autocomplete rather than follow directions, is inconsistent about formatting or tool use, and can mirror undesirable biases or unsafe content present in the data. In short, pre-training builds general competence, not usefulness for tasks.

Post-training turns that competent pattern completer into a useful assistant. Teams typically run multiple rounds of Supervised Fine-Tuning (SFT) to teach the model to follow instructions, adhere to schemas and policies, call tools, and produce reliable, scoped outputs by imitating high-quality demonstrations. This adds a first layer of alignment where the model learns to respond to prompts as tasks, not just text to continue. They then apply Reinforcement Fine-Tuning (RFT) to push behavior further using measurable feedback (e.g., verifiers or an LLM-as-a-judge), optimizing nuanced trade-offs like accuracy vs. brevity, safety vs. coverage, or multi-step reasoning under constraints. In practice, teams alternate SFT and RFT in cycles, progressively shaping the pre-trained model into a reliable, policy-aligned system that performs complex tasks with consistency.

Supervised fine-tuning is the classic approach of training the LLM on a dataset of human-labeled input-output pairs for the task of interest. In other words, you provide examples of prompts (or questions, instructions, etc.) along with the correct or desired responses, and continue training the model on these. The model's weights are adjusted to minimize a supervised loss (typically cross-entropy between its predictions and the target output tokens). This is essentially the same kind of training used in most supervised machine learning tasks, now applied to LLM to specialize it.

## Getting Started
Amazon Nova 2.0 introduces enhanced fine-tuning capabilities on Amazon Bedrock, including supervised fine-tuning with reasoning content and reinforcement fine-tuning with reward-based optimization.

### Supervised fine-tuning on Amazon Nova 2.0

Amazon Nova 2.0 supervised fine-tuning uses the same Converse API format as Amazon Nova 1.0 with optional reasoning content fields, allowing you to train models that show their thinking process before generating final answers.

*Key features*
* Support for text, image and video inputs in user content blocks
* Optional reasoning content in assistant responses to capture intermediate thinking steps
* Homogeneous dataset requirements (choose text-only, text+image, or text+video)
* Support for PNG, JPEG and GIF images
* Support for MOV, MKV and MP4 videos
* Configurable reasoning modes for training optimization

### Reinforcement fine-tuning (RFT) on Amazon Nova 2.0
Reinforcement fine-tuning optimizes Amazon Nova models using measurable feedback signals rather than exact correct answers. 

This notebook will walk through Supervised fine-tuning on Amazon Nova 2.0.

## 2. Prerequisites and Dependencies

### Prerequisite: Data Prep Notebook
The Data Prep notebook walks through preparing and transformming a public dataset into a format and scheme acceptable for SMTJ.  The Data Prep notebook creates training and validation datasets used for training a model, as well as a test dataset for evaluation.

**--------------- STOP ---------------** <br><br>To complete this notebook, the Data Prep notebook must be completed first. In that workbook, training, validation, and eval datasets are created.  These datasets are carried over for use in this notebook.  Specific items from the Data Prep notebook, used in this notebook, are called out below.
<br><br>

Either restore or set these values.

In [None]:
# This value is obtained as result of executing the data prep notebook
train_dataset_s3_path = ""
%store -r train_dataset_s3_path 

print(train_dataset_s3_path)

#### *Note* 
We will be using the Bedrock UI to train a model.  We will not be using the `train_dataset_s3_path` in the notebook directly as code.  Instead, from the Bedrock UI, this S3 URI will be navigated to and then selected during setup (yes, the S3 URI can be copied and pasted into the required fields).  

Make sure the S3 bucket is accessible to Bedrock.  Or, copy the file `dataset.json` to an S3 bucket that is accessible to Bedrock.

For this notebook example, the datset.json file was copied from the SageMaker bucket to a new bucket location.  This was done to keep the SageMaker bucket accessible to only SageMaker.<br><br>
`s3://notebook-resource-nova/customization/bedrock/training-data/dataset.jsonl`

### Prequisite: Service Role
For the notebook, we will create and use a service account called 

`NovaCustomizationRole`.

<br>This role is create by following this guidance.

[Create a service role for model customization](https://docs.aws.amazon.com/bedrock/latest/userguide/model-customization-iam-role.html)

 
This role can be create now. Or, later in this notebook, there is the opportunity to create this role during the customization job set-up. During the customization job set-up, in the Serice access section, the permissions of this role can be seen by clicking on "View permission details".


## 3. Data Prep - Review
In the data prep workbook, we created our training, validation, and test datasets.  We will use the train and validation datasets for training.

Remember, prepare high-quality prompt-response pairs for training. Data should be:
- Consistent in format
- Representative of desired behavior
- Deduplicated and cleaned


For reference, here is the schema that represents a single record in thre training data.  Amazon Nova 2.0 SFT data uses the same Converse API format as Amazon Nova 1.0, with the addition of optional reasoning content fields (optional fields to be show in the "Exploring More" section below.)

```
{
  "schemaVersion": "bedrock-conversation-2024",
  "system": [{"text": "You are a digital assistant with a friendly personality"}],
  "messages": [
    {
      "role": "user",
      "content": [{ "text": "What is the capital of Mars?"}]
    },
    {
      "role": "assistant",
      "content": [{"text": "Mars does not have a capital. Perhaps it will one day."}]
    }
  ]
}
```


## 4. Model Supervised Fine Tuning (SFT) and Customization
You can customize Amazon Nova models through Bedrock.

There are 3 options for customization:
* Supervised fine-tuning job
* Distillation job
* Reinforcement fine tuning job

### Create Training Job

To start, navigate to the Bedrock console.  In the left-hand menu under Tune, select Custom models.

For this notebook, choose Supervised fine-tuning job.

![Nova Customization](./images/b-select-job.png)

Upon selecting Supervised fine-tuning job, you will be greeted with the Create Fine-tuning job page.

### Job configuration
Set the job name as:
<br>peft-fine-tuning-without-reasoning-67-job

This name must be unique in the list of model customization jobs.

This job name has "without-reasonining" in the name. The reason to have naming is to point out that the data for training job could have additional reasoning data. See the "Exploring More" section at the end.  

![Job configuration](./images/b-job-config.png)

### Model details
Model details is a 2 step process. 
1.  Select the source model from which you would like to customize.  This could be a base model, or, this could be a previously trained model, allowing for iterative training.
2.  Set the Fine-tune model name

#### Select Source model
Choose "Select Model".  In the "Categories" left pane, in the "Serverless model providers" section, select "Amazon".  This will update the available models.  

Select "Nova 2 Lite".

As a side note, the section "Custom & managed endpoints" is where one would select a previously trained model in order to perform iterative training.

![Job configuration](./images/b-select-model.png)

#### Fine-tuned model name
Set the Fine-tuned model name: <br>peft-fine-tuning-without-reasoning-67-model

Here are the Model details completed

![Model details](./images/b-model-details.png)

### Hyperparameters
Use default setting for this notebook.

![Hyperparameters](./images/b-hyperparameters.png)

### Input data
Select the location where the training data was stored.  This training data was prepared in the data prep notebook.  And in the cells above, it was asked that the dataset.jsonl file be place in a location accessible to Bedrock.

Set Input data:
<br>s3://notebook-resource-nova/customization/bedrock/training-data/dataset.jsonl



### Output data
Select the location where the completed training artifacts will be placed.  Again, make sure this location is accessible to Bedrock

Set Output data:
<br>s3://notebook-resource-nova/customization/bedrock/outputs

Here are the completed Input data and Output data

![Input data and Output data](./images/b-input-output.png)

### Service Access
In this notebook demonstration, we are going to use an existing service role.

NovaCustomizationRole

This role was created using the following documentation guidelines.

[Create a service role for model customization](https://docs.aws.amazon.com/bedrock/latest/userguide/model-customization-iam-role.html)

<br>Also, click on "View permission details" to see the permission policy and trust relationship necessary for the role.

Here is the completed Service access

![Service Access](./images/b-service-access.png)

### Create the job
With all the above steps complete, click "Create Job".  Jobs will take time to run and there is not a clear mechanism to use in order to know when a job will start or complete.

Jobs will validate the data to ensure the proper format is used.

Here is the screen of what will be seen upon "Create Job".

![Job Execution](./images/b-job-execution.png)

Jobs will then run until completion (failure or succcess).  To observe the current status of the job, in Bedrock console, select Custom models (under the Tune left-hand menu), and then select the respective job in progress.

![Service Access](./images/b-in-progress.png)

********** ********** ********** ********** **********
In our scenario, training of this model took approximately 1 hour and 45 minutes.

## Deployment and Inference

Woot! Our model has been trained.  But the model has not been prepared for inference.

### Inference
To set up the model for inference, navigate to Custom models.  In the Models section, click on "peft-fine-tuning-without-reasoning-67-model.

We are then presented Model details, Job overview, Model overview, and Training metric.

Click "Setup inference".  
Choose "Deploy for on-demand".

![Setup inference](./images/b-setup-inference.png)

There are 2 deployment options
* On-demand Inference
* Provisioned Throughput

On-demand inference for Bedrock Nova models offers pay-per-use pricing with no capacity planning required, making it suitable for variable or unpredictable workloads.

Provisioned Throughput for Bedrock Nova models provides guaranteed, consistent performance by reserving dedicated capacity upfront, making it ideal for predictable, high-volume workloads.

### Deploy for on-demand
Select "Deploy for on-demand", enter a unique deployement name and short description.  Ensure the selected model is our recently trained model.

For deployment name, use:
peft-fine-tuning-without-reasoning-67-deployment

Here is our completed screen.

![Deployment](./images/b-deploy-od.png)

It will take about 1 minute for deployment.


## Playground - Experiment with the trained model

Once the model is deployed correctly, let's experiment and test the model.  

To verify the model deployed correctly, navigate to "Custom model on-demand" and in the "Custom model deployments" section, click our Deployment name.

![Deployment name](./images/b-custom-model-on-demand.png)


Click the "Test in playground" button

![Test in playground](./images/b-test-in-playground.png)

We are in the Text playground. 

Try a simple prompt:  "hi"

![Playground](./images/b-playground.png)

Voila!  We have a response from our custom model.  Try more prompts.  

Also, try to switch the mode from "Single prompt" to "Chat".  You might have to re-select the custom model.

## Conclusion
This notebook shows that with minimal steps, we can train a custom model and then deploy that model for inference, using the Bedrock UI alone.  How exciting and efficient.

Experiment more with custom models.  Give our RFT (Reinforcement Fine-Tuning) notebooks a try as well.

Hope you were able to learn how easy it is to train, deploy, and inference a custom Nova 2 Lite model.

## Exploring More

### Reasoning Data for Training
Amazon Nova 2.0 SFT data uses the same Converse API format as Amazon Nova 1.0, with the addition of optional reasoning content fields.

Reasoning content (also called chain-of-thought) captures the model's intermediate thinking steps before generating a final answer. In the assistant turn, we use the reasoningContent field to include these reasoning traces.

Please see the documentation [Supervised fine-tuning on Amazon Nova 2.0](https://docs.aws.amazon.com/bedrock/latest/userguide/nova-2-sft-data-prep.html) for more details.

An example of a data that uses reasoning:

```
{
  "schemaVersion": "bedrock-conversation-2024",
  "system": [
    {
      "text": "You are a digital assistant with a friendly personality"
    }
  ],
  "messages": [
    {
      "role": "user",
      "content": [
        {
          "text": "What country is right next to Australia?"
        }
      ]
    },
    {
      "role": "assistant",
      "content": [
        {
          "reasoningContent": {
            "reasoningText": {
              "text": "I need to use my world knowledge of geography to answer this question"
            }
          }
        },
        {
          "text": "The closest country to Australia is New Zealand, located to the southeast across the Tasman Sea."
        }
      ]
    }
  ]
}
```

Currently, only reasoningText is supported within reasoningContent. Multimodal reasoning content is not yet available.