## Finetuning Mistral 7b with Amazon SageMaker
In this notebook we'll explore how to fine-tune a [Mistral-7B-v0.1](https://huggingface.co/mistralai/Mistral-7B-v0.1) model with Amazon SageMaker. We'll use the [Hugging Face](https://huggingface.co/) library to download the model and tokenizer, and we'll use the [Amazon SageMaker Python SDK](https://sagemaker.readthedocs.io/en/stable/) to fine-tune the model on a sample dataset. The goal of this notebook is to cover several key aspects of fine-tuning LLMs including:
- Preparing the data for fine-tuning
- Obtaining the base model and tokenizer
- Configuring a SageMaker training job
- Utilizing QLoRA for parameter efficient fine-tuning (PEFT)
- Applying supervised fine-tuning methods to train a model
- Improving / Aligning the model's outputs with human preferences using Direct Preference Optimization (DPO)

We will utilize the [fine-tuning recipes](https://github.com/huggingface/alignment-handbook) provided by Hugging Face that was used to fine-tune the Mistral-7B model to create the [Zephyr-7B-Beta](HuggingFaceH4/zephyr-7b-beta) model.

The recipes utilize the [Transformer Reinforcement Learning (TRL)](https://github.com/huggingface/trl) for both supervised fine-tuning and preference alignment and is easy to adapt to other datasets.

In [1]:
%pip install -Uq sagemaker
%pip install -Uq datasets

[0m
[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m A new release of pip is available: [0m[31;49m23.1.2[0m[39;49m -> [0m[32;49m24.0[0m
[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m To update, run: [0m[32;49mpip install --upgrade pip[0m
Note: you may need to restart the kernel to use updated packages.
[0m
[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m A new release of pip is available: [0m[31;49m23.1.2[0m[39;49m -> [0m[32;49m24.0[0m
[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m To update, run: [0m[32;49mpip install --upgrade pip[0m
Note: you may need to restart the kernel to use updated packages.


In [1]:
import boto3
import sagemaker
import json
from sagemaker import Model, image_uris, serializers, deserializers
import time
from pathlib import Path
from utils import download_model

boto3_session=boto3.session.Session()

smr = boto3_session.client("sagemaker-runtime") # sagemaker runtime client for invoking the endpoint
sm = boto3_session.client("sagemaker") 
s3_rsr = boto3_session.resource("s3")
role = sagemaker.get_execution_role()  

sess = sagemaker.session.Session(boto3_session, sagemaker_client=sm, sagemaker_runtime_client=smr)  # sagemaker session for interacting with different AWS APIs
bucket = sess.default_bucket()  # sagemaker session for interacting with different AWS APIs
region = sess._region_name  # region name of the current SageMaker Studio environment

sagemaker.config INFO - Not applying SDK defaults from location: /etc/xdg/sagemaker/config.yaml
sagemaker.config INFO - Not applying SDK defaults from location: /root/.config/sagemaker/config.yaml


  from .autonotebook import tqdm as notebook_tqdm


### Download Model
First, we'll download the model and tokenizer from the Hugging Face model hub and upload them to our own S3 bucket.

In [3]:
local_model_path = "Mistral-7B"
if not Path(local_model_path).exists():
    !aws s3 cp --recursive s3://jumpstart-cache-prod-{region}/huggingface-llm/huggingface-llm-mistral-7b/artifacts/inference/v1.0.0/ {local_model_path}

In [4]:
# check if the model has already been uploaded to the S3 bucket. If not, upload it.
model_prefix = local_model_path

if list(s3_rsr.Bucket(bucket).objects.filter(Prefix=model_prefix)) :
    print("Model already exists on the S3 bucket")
    print(f"If you want to upload a new model, please delete the existing model from the S3 bucket with the following command: \n !aws s3 rm --recursive s3://{bucket}/{model_prefix}")
    s3_model_location = f"s3://{bucket}/{model_prefix}"
else:
    s3_model_location = sess.upload_data(path=local_model_path.as_posix(), bucket=bucket, key_prefix=model_prefix)

Model already exists on the S3 bucket
If you want to upload a new model, please delete the existing model from the S3 bucket with the following command: 
 !aws s3 rm --recursive s3://sagemaker-us-east-1-152804913371/Mistral-7B


### Download data and upload to S3
Next we need to prepare the data for fine-tuning. We can use a sample of the data that was used to train the Zephy-7B-Beta model or we can bring our own data. 

If bringing our own data, we need to convert it into a json-lines format that is supported by the TRL trainers. 
- For Supervised Fine-tuning each record should contain a `messages` field. This field should contain a list of dictionaries that correspond to a conversation between a `user` and an AI `assistant`. The schema is `{"role": "{role}", "content": {content}}` where role is either `user`, `assistant`, or `system` and content is the text of the message. For more information see the recipe documentation [here](https://github.com/huggingface/alignment-handbook/tree/main/scripts) or an example dataset [here](https://huggingface.co/datasets/HuggingFaceH4/ultrachat_200k) 
- For Direct Preference Optimization we need to provide a dataset that contains `chosen` and `rejected` responses as based on human preference. The schema for this dataset contains `chosen` and `rejected` fields that contain the conversation messages in the same format as the supervised fine-tuning dataset. For more information see the recipe documentation [here](https://github.com/huggingface/alignment-handbook/tree/main/scripts) or an example dataset [here](https://huggingface.co/datasets/HuggingFaceH4/ultrafeedback_binarized)

The tuning recipe will automatically convert the `messages` into a chat prompt that will be used to fine-tune the model. You can see what the default template looks like by visiting the [Zephyr-7B-Beta](HuggingFaceH4/zephyr-7b-beta) model card. The messages will be converted into a chat prompt that separates the system, user, and assistant messages with the following tokens: `<|system|>`, `<|user|>`, and `<|assistant|>` respectively along with an EOS token `</s>` at the end of each message. The template can be adjusted in the tuning script however it is important to keep in mind that the same template should be used during inference and should be well documented.

In [5]:
import datasets

USE_EXAMPLE_DATA = True # set to False to use your own data
NUM_SAMPLES = 1200 # number of samples to use from the example data

if USE_EXAMPLE_DATA:
    sft_dataset = datasets.load_dataset("HuggingFaceH4/ultrachat_200k")['train_sft'].select(range(NUM_SAMPLES))
    dpo_dataset = datasets.load_dataset("HuggingFaceH4/ultrafeedback_binarized")['train_prefs'].select(range(NUM_SAMPLES))
    
# adjust these values if bringing your own data
# In the example here, a jsonl file is stored in /data/dpo and /data/sft that contains data in the format described above
else:
    dpo_dataset_path ="./data/dpo"
    sft_dataset_path ="./data/sft"
    try:
        sft_dataset = datasets.load_dataset(sft_dataset_path)["train"]
        dpo_dataset = datasets.load_dataset(dpo_dataset_path)["train"]
    except Exception as e:
        print("Please make sure that the data is present in the data folder. If not, please prepare the data first")
        raise Exception(e)

sft_dataset.train_test_split(test_size=0.1, shuffle=True, seed=42).save_to_disk('fine-tuning-data/sft_split')
dpo_dataset.train_test_split(test_size=0.1, shuffle=True, seed=42).save_to_disk('fine-tuning-data/dpo_split')

Saving the dataset (1/1 shards): 100%|██████████| 1080/1080 [00:00<00:00, 5341.14 examples/s]
Saving the dataset (1/1 shards): 100%|██████████| 120/120 [00:00<00:00, 1213.48 examples/s]
Saving the dataset (1/1 shards): 100%|██████████| 1080/1080 [00:00<00:00, 6764.48 examples/s]
Saving the dataset (1/1 shards): 100%|██████████| 120/120 [00:00<00:00, 1331.07 examples/s]


In [7]:
# upload the data to S3
s3_data = sess.upload_data(path="fine-tuning-data", bucket=bucket, key_prefix="fine-tuning-mistral/data")

print(f"Uploaded training data file to {s3_data}")

Uploaded training data file to s3://sagemaker-us-east-1-152804913371/fine-tuning-mistral/data


### Configure SageMaker Training Job for Supervised Fine-tuning
Now that the data is ready, we can configure the first SageMaker training job which will perform supervised fine-tuning. The code from the Hugging Face recipe [repo](https://github.com/huggingface/alignment-handbook/tree/main) is cloned into the `src` directory. The `src` directory also contains a requirements.txt file that will install the recipe module and [Flash Attention](https://github.com/Dao-AILab/flash-attention) to speed up the training.

The repo contains two scripts, `alignment-handbook/scripts/run_sft.py` for supervised fine-tuning and `alignment-handbook/scripts/run_dpo.py` for direct preference optimization. Both scripts take a positional argument for the path of the recipe file like this `python run_{task}.py config_full.yaml`. The recipe file contains all of the hyperparameters for the training job. The recipe file for the supervised fine-tuning job is located at `src/config_sft_lora.yaml`. Several example recipe files are available within the repo for full and parameter efficient fine-tuning. We will utilize the parameter efficient fine-tuning recipe for this example.

A few changes are required to the recipe file to run the job on SageMaker. First, we need to change the `model_name_or_path` to `/opt/ml/input/data/model`. This is the directory to which our base model will be copied to from S3. Next, we need to change the `dataset_mixer` directories to `/opt/ml/input/data/train` which is where our training data will be copied to from S3. Finally, we need to change the `output_dir` for the `trainer` to `/opt/ml/model` so that the model is saved to the `/opt/ml/model` directory which is the default directory for SageMaker models. The contents of the `/opt/ml/model` will be copied to S3 once the job finishes.  Optionally, we can set the `logging_dir` to `/opt/ml/output/tensorboard` to utilize [SageMaker Managed TensorBoard](https://docs.aws.amazon.com/sagemaker/latest/dg/tensorboard-on-sagemaker.html) for monitoring the training job.

In [9]:
from sagemaker.pytorch import PyTorch
from sagemaker.debugger import TensorBoardOutputConfig
import time

str_time = time.strftime("%Y-%m-%d-%H-%M-%S", time.localtime())

tb_output_config = TensorBoardOutputConfig(s3_output_path=f"s3://{bucket}/fine-tuning-mistral/tensorboard/{str_time}",
    container_local_output_path="/opt/ml/output/tensorboard")

job_name = f"mistral7b-sft"

# the default script takes the yaml file as a positional argument
# Since sagemaker only supports passing named arguments as hyperparameters, as small change was made to the fine tuning scripts

hyperparameters = {
    "recipe": "config_sft_lora.yaml",  # supervised fine-tuning with QLoRA recipe
}

sft_estimator = PyTorch(
    base_job_name=job_name,
    source_dir = "src",                                  # directory containing the fine-tuning scripts
    entry_point="alignment-handbook/scripts/run_sft.py", # fine-tuning script that will be run
    sagemaker_session=sess,
    role=role,
    instance_count=2,                                    # number of instances to use for training 
    hyperparameters=hyperparameters,
    instance_type="ml.g5.2xlarge", 
    framework_version="2.1.0",                          # PyTorch version
    py_version="py310",
    disable_profiler=True,
    max_run=60*60*24*2,
    keep_alive_period_in_seconds=3600,                    # after job is done keep the training cluster alive for 1 hour to accept other jobs
    tensorboard_output_config=tb_output_config,
    environment = {"HUGGINGFACE_HUB_CACHE": "/tmp", 
                    "LIBRARY_PATH": "/opt/conda/lib/",
                    "TRANSFORMERS_CACHE": "/tmp",
                    "NCCL_P2P_LEVEL": "NVL"},
    distribution={"torch_distributed": {"enabled": True}}, # enable distributed training with torch.distributed 
    disable_output_compression = True
)

In [25]:
# Invoking the fit method on the estimator starts the training job
# data will be copied into training cluster based on the dictionary keys specified here
# The contents of the s3_model_location will be copied into the /opt/ml/input/data/model directory
# The contents of the s3_data will be copied into the /opt/ml/input/data/train directory
sft_estimator.fit({"model": s3_model_location, "train": f"{s3_data}/sft_split"})

INFO:sagemaker.image_uris:image_uri is not presented, retrieving image_uri based on instance_type, framework etc.
INFO:sagemaker:Creating training-job with name: mistral7b-sft-2024-05-30-18-35-20-560


2024-05-30 18:35:21 Starting - Starting the training job...
2024-05-30 18:35:25 Downloading - Downloading input data....................................
2024-05-30 18:41:52 Training - Training image download completed. Training in progress..bash: cannot set terminal process group (-1): Inappropriate ioctl for device
bash: no job control in this shell
2024-05-30 18:41:53,141 sagemaker-training-toolkit INFO     Imported framework sagemaker_pytorch_container.training
2024-05-30 18:41:53,159 sagemaker-training-toolkit INFO     No Neurons detected (normal if no neurons installed)
2024-05-30 18:41:53,171 sagemaker_pytorch_container.training INFO     Block until all host DNS lookups succeed.
2024-05-30 18:41:53,173 sagemaker_pytorch_container.training INFO     Invoking TorchDistributed...
2024-05-30 18:41:53,173 sagemaker_pytorch_container.training INFO     Invoking user training script.
2024-05-30 18:41:54,770 sagemaker-training-toolkit INFO     Installing dependencies from requirements.txt:

### Configure SageMaker Training Job for Direct Preference Optimization
Once we have the fine-tuned model, we can further fine-tune it using Direct Preference Optimization to better align with our preferences and improve the model's outputs. The code process is similar to the supervised fine-tuning job. Except now we will use the `alignment-handbook/scripts/run_dpo.py` script and also provide our fine-tuned model as an additional input.

In [33]:
str_time = time.strftime("%Y-%m-%d-%H-%M-%S", time.localtime())

tb_output_config = TensorBoardOutputConfig(s3_output_path=f"s3://{bucket}/fine-tuning-mistral/tensorboard/{str_time}",
    container_local_output_path="/opt/ml/output/tensorboard")

job_name = f"mistral7b-dpo"

# hyperparameters = {
#     "recipe": "config_dpo_lora_fsdp.yaml",
#     "torch_dtype": "bfloat16",
#     "bnb_4bit_quant_storage": "bfloat16"
# }


hyperparameters = {
    "recipe": "config_dpo_lora.yaml",
}

dpo_estimator = PyTorch(
    base_job_name=job_name,
    source_dir = "src",
    entry_point="alignment-handbook/scripts/run_dpo.py",
    sagemaker_session=sess,
    role=role,
    instance_count=2, 
    hyperparameters=hyperparameters,
    instance_type="ml.g5.2xlarge", 
    framework_version="2.1.0",
    py_version="py310",
    disable_profiler=True,
    max_run=60*60*24*2,
    keep_alive_period_in_seconds=3600,
    tensorboard_output_config=tb_output_config,
    environment = {"HUGGINGFACE_HUB_CACHE": "/tmp", 
                    "LIBRARY_PATH": "/opt/conda/lib/",
                    "TRANSFORMERS_CACHE": "/tmp",
                    "NCCL_P2P_LEVEL": "NVL"},
    distribution={"torch_distributed": {"enabled": True}},
    disable_output_compression = True
)

In [34]:
# get the location of the fine tuned model from the sft_estimator
# sft_model_location = sft_estimator.model_data["S3DataSource"]["S3Uri"]

sft_model_location = "s3://sagemaker-us-east-1-152804913371/mistral7b-sft-2024-05-30-18-35-20-560/output/model"

Since LoRA was used for fine-tuning, or SFT model will only contain the LoRA adapter. Therefore we also provide the base model as another input into the `.fit` call below. The training script will automatically merge the base model with the LoRA adapter and then proceed with the DPO fine-tuning.

In [35]:
dpo_estimator.fit(
    {
        "model": s3_model_location,       # base Mistral 7B model 
        "sft_model": sft_model_location,  # fine-tuned model from the previous step
        "train": f"{s3_data}/dpo_split",  # preference training data
    }
)
dpo_model_location = dpo_estimator.model_data["S3DataSource"]["S3Uri"]

INFO:sagemaker.image_uris:image_uri is not presented, retrieving image_uri based on instance_type, framework etc.
INFO:sagemaker:Creating training-job with name: mistral7b-dpo-2024-05-31-16-17-48-875


2024-05-31 16:17:49 Starting - Starting the training job
2024-05-31 16:17:49 Pending - Training job waiting for capacity......
2024-05-31 16:18:33 Pending - Preparing the instances for training...
2024-05-31 16:19:06 Downloading - Downloading input data....................................
2024-05-31 16:25:23 Downloading - Downloading the training image.bash: cannot set terminal process group (-1): Inappropriate ioctl for device
bash: no job control in this shell
2024-05-31 16:25:26,281 sagemaker-training-toolkit INFO     Imported framework sagemaker_pytorch_container.training
2024-05-31 16:25:26,298 sagemaker-training-toolkit INFO     No Neurons detected (normal if no neurons installed)
2024-05-31 16:25:26,310 sagemaker_pytorch_container.training INFO     Block until all host DNS lookups succeed.
2024-05-31 16:25:26,317 sagemaker_pytorch_container.training INFO     Invoking TorchDistributed...
2024-05-31 16:25:26,317 sagemaker_pytorch_container.training INFO     Invoking user training 

### Clean up
Run these cells to remove the data and model artifacts from S3

In [None]:
!aws s3 rm --recursive $s3_model_location 
!aws s3 rm --recursive $sft_model_location
!aws s3 rm --recursive $dpo_model_location
!aws s3 rm --recursive $s3_data
