## Fine-Tuning the GPTJ-6B model using transfer learning

## Overview
This notebook will walk you through how to fine-tune a pre-trained large language model with domain specific knowledge. 

The domain specific dataset that we will be using to fine-tune this model will be from United Kingdom (U.K.) Supreme Court case documents. We will tune the model on roughly 693 legal documents. Because we are using transfer learning to train the model we are transfering the current model knowledge to a new model without having to train the entire GPTJ-6B model again. This allows us to shorten the time the model needs to train on the new model.

### Prereqs

To run this notebook we assume you have knowledge about running a SageMaker Notebook instance or SageMaker Studio Notebook instance.

### SageMaker Studio Resources
If you are unfamiliar with SageMaker Studio you can check out the resource links below.

[Introduction to Amazon SageMaker Studio - Video](https://www.youtube.com/watch?v=YcJAc-x8XLQ)

[Build ML models using SageMaker Studio Notebooks - Workshop - Video](https://www.youtube.com/watch?v=1iSiN4sVMjE)

## Dataset info

The stats. below are if you were to use all 693 case documents to tune the model.

* <strong>Page count:</strong> ~17,718
* <strong>Word count:</strong> 10,015,333
* <strong>Characters (no spaces):</strong> 49,897,639

The entire dataset is publically available and can be download [here](https://zenodo.org/record/7152317#.ZCSfaoTMI2y)

## Using your own dataset
You can refactor this notebook to use another dataset. You will need to update the dataset that is downloaded in the **Retrieve dataset** section to download your dataset. Lastly, you will need to refactor the code located in the **Creating Training dataset** section to create your training dataset based on your dataset format.

## Considerations when fine-tuning the model
The notebook has been configured to allow you to use only a subset of the entire dataset to fine-tune the model if desired. In the **Data Prep** section, there is a variable called **doc_count**. You can set this number to your preference, and the model will be fine-tuned based on that specific number of case from the dataset. The smaller the value you set for this variable, the faster the model will train/fine-tune.
    
## Training/Tuning Time estimates

Here are the estimated training times based on total number of case documents in the training dataset. Note the training time is based on training for 3 epochs.

#### All training was ran on 1 - *ml.p3dn.24xlarge* instance

#### <strong>Training dataset document count </strong> 250
Training time: 1 hour 41 minutes

#### <strong>Training document count</strong> 500
Training time: 2 hours 57 minutes

#### <strong>Training document count</strong> 693
Training time: 4 hours

## GPTJ-6B base model

Steps you will go through in the notebook to test the base model

1. Clone this repo in a SageMaker Studio Jupyter notebook
2. Install needed notebook libraries
3. Configure the notebook to use SageMaker
4. Retrieve base model container
5. Deploy the model inference endpoint
6. Call inference endpoint to retrieve results from the LLM

## Fine-tuned model

Steps you will go through in the notebook to test the fine-tuned model

1. Download dataset
2. Prep the dataset and upload it to S3
3. Retrieve the base model container
4. Set hyperparameters for fine-tuning
5. Start training/tuning job
6. Deploy inference endpoint for the fine-tuned model
7. Call inference endpoint for the fine-tuned model
8. Parse endpoint results

### Final Step
* Be sure you delete all models and endpoints to avoid incurring unneeded spend.
    
### Disclaimer
This notebook demos how you can fine-tune an LLM using transfer learning. Even though this notebook is fine-tuned using actual (U.K.) Supreme Court case documents you should not use this notebook for legal advise.
    
    

## Install Pre Reqs

In [None]:
!pip install --upgrade sagemaker --quiet

## SageMaker SDK configurations

In [None]:
import sagemaker, boto3, json
from sagemaker.session import Session

sagemaker_session = Session()
aws_role = sagemaker_session.get_caller_identity_arn()
aws_region = boto3.Session().region_name
sess = sagemaker.Session()
account_id = sess.account_id()

print(f"SageMaker role that will be use: {aws_role}")
print(f"AWS Account ID: {account_id}")
print(f"AWS Region you are currently running in: {aws_region}")

## Deploying interence endpoint for the GPTJ-6 base model

In this section we are deploying the HuggingFace GPTJ-6B base model in order to compare the inference results with the fine-tuned model we will tune later.

The fine-tuned model will be trained on UK Supreme Court case documents.

In [5]:
# Name of model being used
model_id, model_version = "huggingface-textgeneration1-gpt-j-6b", "*"

In [6]:
from sagemaker import image_uris, model_uris, script_uris
from sagemaker.model import Model
from sagemaker.predictor import Predictor
from sagemaker.utils import name_from_base

endpoint_name = name_from_base(f"base-model-gptj-6B-{model_id}")

inference_instance_type = "ml.g5.12xlarge"

# Retrieve the inference docker container uri.
deploy_image_uri = image_uris.retrieve(
    region=None,
    framework=None,
    image_scope="inference",
    model_id=model_id,
    model_version=model_version,
    instance_type=inference_instance_type,
)

print(f"Container location: {deploy_image_uri}")

# Retrieve the model uri.
model_uri = model_uris.retrieve(
    model_id=model_id, model_version=model_version, model_scope="inference"
)

print(f"HuggingFace model location: {model_uri}")

# Create the SageMaker model instance. Note that we need to pass Predictor class when we deploy model through Model class,
# for being able to run inference through the sagemaker API.
model = Model(
    image_uri=deploy_image_uri,
    model_data=model_uri,
    role=aws_role,
    predictor_cls=Predictor,
    name=endpoint_name,
)

# deploy the base model
base_model_predictor = model.deploy(
    initial_instance_count=1,
    instance_type=inference_instance_type,
    endpoint_name=endpoint_name,
)

print(f"Endpoint name: {endpoint_name}" )

Container location: 763104351884.dkr.ecr.us-east-1.amazonaws.com/djl-inference:0.21.0-deepspeed0.8.3-cu117
HuggingFace model location: s3://jumpstart-cache-prod-us-east-1/huggingface-infer/prepack/v1.1.2/infer-prepack-huggingface-textgeneration1-gpt-j-6b.tar.gz
----------------!Endpoint name: base-model-gptj-6B-huggingface-textgene-2023-08-08-00-38-59-767


## Inference Helper functions
Creates two helper functions that will be used when we call the inference endpoint

In [7]:
import json
import boto3

def query_endpoint_with_payload(encoded_json, endpoint_name):
    client = boto3.client("runtime.sagemaker")
    
    response = client.invoke_endpoint(
        EndpointName=endpoint_name, ContentType="application/json", Body=encoded_json
    )
    
    return response


def parse_response_texts(query_response):
    generated_text = []
    model_predictions = json.loads(query_response["Body"].read())
    return model_predictions[0]

## Call GPTJ-6B without fine-tuning the model
In this section we make a call to the SageMaker inference point that host the base model that has not been fine-tuned and have the results returned back from the endpoint.

After the results have been returned, it is recommended that you save them to a text file. This will allow you to compare the results against the fine-tuned model later.

In [51]:
parameters = {
    "max_length": 200,
    "num_return_sequences": 1,
    "top_k": 250,
    "top_p": 0.8,
    "do_sample": True,
    "temperature": 1,
}

res_gpt_before_finetune = []

# Note: We are looping through the array of queries and passing each query to the model inference endpoint to get model results
# queries that will be passed to the model to answer
for quota_text in [
   "His Honour Judge Richards",
]:
    
    payload = {"text_inputs": f"{quota_text}:", **parameters}

    query_response = query_endpoint_with_payload(
        json.dumps(payload).encode("utf-8"), endpoint_name=endpoint_name
    )
    
    generated_texts = parse_response_texts(query_response)[0]["generated_text"]
    res_gpt_before_finetune.append(generated_texts)
    
    print("Base model output")
    print(generated_texts)
    print("\n----------------------------")
    print("End of base model output")

Base model output
His Honour Judge Richards: Thank you, Your Honour. I will direct you to the transcript in the file here, and I'll just put you on the basis of this.





28
SARATHA CHANNAKHANDANI, SR.I, PLAINTIFF: Your Honour, I'm reading this transcript. The man who is alleged to be my uncle.


29.





30.


Suresh Dutt was never in court. He was never taken into custody. He was never found and the

31.


32.

He was never convicted. He was never put on trial. There's no judgment of




It's not even an arrest warrant.



33.




In this case, this


transcript that's been given to us. The record of the trial was never given to us

----------------------------
End of base model output


### Base model results
The output above is what the base model will return to us before fine-tuning the model. As you can see the results are not great. The goal is to make the model give us better results after it has been fine-tuned on more data.

## Clean-up

Delete the SageMaker endpoint and the attached resources once you no longer endpoint them. The inteference endpoints incur cost if you leave them running.

In [None]:
base_model_predictor.delete_model()
base_model_predictor.delete_endpoint()

# Fine-Tuning the GPTJ-6 base model via transfer learning

## Retrieve dataset

Download the dataset. This may take several minutes. The zipped dataset is 93 MB.

To save time in the future you only need to run the download and unzip once. 

In [None]:
!wget https://zenodo.org/record/7152317/files/dataset.zip

In [None]:
# unzipping compressed datasets
# this may take several minutes since we are decompressing all the case files in the dataset

print("Unzipping file. Wait for the dataset files to unzip. This may take several minutes ...")

!unzip -q dataset.zip

print("Finished unzipping file")

## Configure file paths and storage locations

In [9]:
import os

# Replace 'path/to/your/directory' with the actual path to your directory containing the text files
training_path = 'dataset/UK-Abs/train-data/judgement'
validation_path = 'dataset/UK-Abs/train-data/summary'

local_training_file = 'dataset/train.txt'
s3_training_file = "train.txt"
s3_validation_file = "validation.txt"

# Replace 'new_file.txt' with the name of the new file where you want to combine the contents
training_file_path = 'dataset/train.txt'
validation_file_path = 'dataset/validation.txt'

model = "gptj-6b"
bucket_name = f'sagemaker-{account_id}-{aws_region}' # change this to your bucket name and be sure it exist in S3
training_folder = f'{model}/train' # the training folder in your bucket
validation_folder = f'{model}/validation' # the training folder in your bucket

s3_training_location = f"s3://{bucket_name}/{training_folder}/"
s3_validation_location = f"s3://{bucket_name}/{validation_folder}/"
s3_training_output_path = f"s3://{bucket_name}/{model}/output"

training_dataset = f"{s3_training_location}{s3_training_file}"
validation_dataset = f"{s3_validation_location}{s3_validation_file}"

s3_output_location = f"s3://{bucket_name}/{model}/output"

## Creating Training dataset

In [9]:
# doc_count is the number of documents to include the fine-tuning dataset
# The higher the doc_count the larger the training dataset will be
# The max document count is 693
doc_count = 25

def create_dataset(new_dataset_file, training_path, docs_in_dataset):
    doc_in_dataset = 0
    
    with open(new_dataset_file, 'w') as new_file:
        file_list = os.listdir(training_path)

        for filename in file_list:
            if doc_in_dataset < doc_count:
                doc_in_dataset+=1
                # Create the full file path by joining the directory path with the filename
                file_path = os.path.join(training_path, filename)

                # Check if the file is a regular file (not a directory)
                if os.path.isfile(file_path):
                    # Open the file in read mode
                    with open(file_path, 'r') as file:
                        text_content = file.read()

                    # Write the content of each file to the new file
                    new_file.write(text_content)
                    new_file.write("\n-----------------------------------------------------------------\n")
                    
    print(f"Local dataset {new_file.name} has been created")

# creats training dataset
create_dataset(training_file_path, training_path, doc_count)

# creats validation dataset
create_dataset(validation_file_path, validation_path, doc_count)

Local dataset dataset/train.txt has been created
Local dataset dataset/validation.txt has been created


## Upload training data to S3
In this section we upload the dataset that was created in the previous step

In [None]:
# uploads training data to S3 so that model can be fine-tune using the dataset
sagemaker_session.upload_data(local_training_file,
                              bucket=bucket_name, 
                              key_prefix=training_folder)

sagemaker_session.upload_data(validation_file_path,
                              bucket=bucket_name, 
                              key_prefix=validation_folder)

print(f"S3 Training location: {s3_training_location}")
print(f"S3 Validation location: {s3_validation_location}")
print(f"S3 Training dataset file: {training_dataset}")
print(f"S3 Validation dataset file: {validation_dataset}")
print(f"Model training output location: {s3_output_location}")
print("Training data uploaded to S3")

## Setup Model to be tuned

When selecting your instance type below ensure you have the minimal available to run based on your account quota. For some GPU based instances you may need to request an increase in the total number you can run in your account. This is true for spot instance type also which have a separate quota. 

You can request a service increase [here](https://us-east-1.console.aws.amazon.com/servicequotas/home/services)

In [None]:
model_id, model_version = "huggingface-textgeneration1-gpt-j-6b", "*"

from sagemaker import image_uris, model_uris, script_uris, hyperparameters

# you can change the instance to a smaller instance - https://aws.amazon.com/ec2/instance-types/p3/
# training_instance_type = "ml.p3dn.24xlarge" 

training_instance_type = "ml.g4dn.12xlarge" 

# Retrieve the docker image for training
train_image_uri = image_uris.retrieve(
    region=None,
    framework=None,
    model_id=model_id,
    model_version=model_version,
    image_scope="training",
    instance_type=training_instance_type,
)

# Retrieve the training script
train_source_uri = script_uris.retrieve(
    model_id=model_id, model_version=model_version, script_scope="training"
)

print(train_source_uri)
# Retrieve the pre-trained model tarball to further fine-tune
train_model_uri = model_uris.retrieve(
    model_id=model_id, model_version=model_version, model_scope="training"
)

print(train_model_uri)

## Spot Training configuration
If **use_spot_instances** is set to **True** below training will use spot instances.

Note: If you are using spot instances for training you will need to store training checkpoints in case your spot instances are shutdown. This allows you to continue training where you left off if spot instances are terminated.

In [11]:
from sagemaker.utils import name_from_base
training_job_name = name_from_base(f"{model_id}-transfer-learning")

# set use_spot_instances to true if you are going to use spot instances for training
# ensure you have the proper quota for the instance type you set for the training_instance_type variable
# you can check your quota here https://us-east-1.console.aws.amazon.com/servicequotas/home/services/sagemaker/quotas 
# by enter the instance type you plan on using
use_spot_instances = False
max_run = 36000 # in seconds
max_wait = 72000 if use_spot_instances else None # in seconds

checkpoint_s3_uri = None

if use_spot_instances:
    # sets the location where training checkpoint will be stored if using spot instances
    checkpoint_s3_uri = f'{s3_training_output_path}/checkpoints/{training_job_name}'
    
print (f'Checkpoint storage location: {checkpoint_s3_uri}')

Checkpoint storage location: None


## Train with Automatic Model Tuning (HPO)
This section configures Automatic Model Tuning if you change from **use_auto_tuning = False** to **use_auto_tuning = True**. By default we set it to false for this example.

In [12]:
from sagemaker import hyperparameters

# Set default hyper-parameters for fine-tuning the model
hyperparameters = hyperparameters.retrieve_default(model_id=model_id, model_version=model_version)

# Overriding default hyperparameters with custom values below

# To make fine-tuning quicker we have set the epoch value to 1.
hyperparameters["epoch"] = "1"
hyperparameters["per_device_train_batch_size"] = "4"

# If you are training with domain specific datasets you will need this parameter to be set to False
hyperparameters["instruction_tuned"] = False

print(hyperparameters)

{'epoch': '1', 'learning_rate': '6e-06', 'per_device_train_batch_size': '4', 'per_device_eval_batch_size': '8', 'warmup_ratio': '0.1', 'instruction_tuned': False, 'train_from_scratch': 'False', 'fp16': 'True', 'bf16': 'False', 'evaluation_strategy': 'steps', 'eval_steps': '20', 'gradient_accumulation_steps': '2', 'logging_steps': '10', 'weight_decay': '0.2', 'load_best_model_at_end': 'True', 'max_train_samples': '-1', 'max_val_samples': '-1', 'seed': '10', 'max_input_length': '-1', 'validation_split_ratio': '0.2', 'train_data_split_seed': '0', 'preprocessing_num_workers': 'None', 'max_steps': '-1', 'gradient_checkpointing': 'True', 'early_stopping_patience': '3', 'early_stopping_threshold': '0.0', 'adam_beta1': '0.9', 'adam_beta2': '0.999', 'adam_epsilon': '1e-08', 'max_grad_norm': '1.0', 'label_smoothing_factor': '0', 'logging_first_step': 'False', 'logging_nan_inf_filter': 'True', 'save_strategy': 'steps', 'save_steps': '500', 'save_total_limit': '1', 'dataloader_drop_last': 'False',

## Set hyperparameters
This section configures any hyperparameter if you decide to use automated model tuning. In this example we aren't using moel tuning, but the code exist if you would like to test out automated model tuning.

In [13]:
from sagemaker.tuner import ContinuousParameter

# Use AMT (automated model tuning) for tuning and selecting the best model
use_auto_tuning = False

# Define objective metric, based on which the best model will be selected.
amt_metric_definitions = {
    "metrics": [{"Name": "eval:loss", "Regex": "'eval_loss': ([0-9]+\.[0-9]+)"}],
    "type": "Minimize",
}

# You can select from the hyperparameters supported by the model, and configure ranges of values to be searched for training the optimal model.(https://docs.aws.amazon.com/sagemaker/latest/dg/automatic-model-tuning-define-ranges.html)
hyperparameter_ranges = {
    "learning_rate": ContinuousParameter(0.00001, 0.0001, scaling_type="Logarithmic")
}

# Increase the total number of training jobs run by AMT, for increased accuracy (and training time).
max_jobs = 2
# Change parallel training jobs run by AMT to reduce total training time, constrained by your account limits.
# if max_jobs=max_parallel_jobs then Bayesian search turns to Random.
max_parallel_jobs = 2

## Start Training
Here we start our SageMaker training job to tune the model. Depending on how much data is being used, the size of your training instance and the number of instances used for training will dictate how long it will take to train/tune your new model.

If your training job fails because you surpassed your qouta for that instance type you can request an increase in your quota for that instance type [here](https://us-east-1.console.aws.amazon.com/servicequotas/home/services/sagemaker/quotas). You can request an instance quota increase for regular training instances and spot instances.

You may also run into an error stating lack of capacity for your instance type. If you receive this type of error you can re-run the cell until the training job starts.



In [None]:
from sagemaker.estimator import Estimator
from sagemaker.tuner import HyperparameterTuner

# defines model metrics that are used to evaluate the models performance
metric_definitions = [
    {"Name": "train:loss", "Regex": "'loss': ([0-9]+\.[0-9]+)"},
    {"Name": "eval:loss", "Regex": "'eval_loss': ([0-9]+\.[0-9]+)"},
    {"Name": "eval:runtime", "Regex": "'eval_runtime': ([0-9]+\.[0-9]+)"},
    {"Name": "eval:samples_per_second", "Regex": "'eval_samples_per_second': ([0-9]+\.[0-9]+)"},
    {"Name": "eval:eval_steps_per_second", "Regex": "'eval_steps_per_second': ([0-9]+\.[0-9]+)"},
]

# Create SageMaker Estimator instance
tg_estimator = Estimator(
    role=aws_role,
    image_uri=train_image_uri,
    source_dir=train_source_uri,
    model_uri=train_model_uri,
    entry_point="transfer_learning.py",
    instance_count=1,
    volume_size=50,
    instance_type=training_instance_type,
    hyperparameters=hyperparameters,
    output_path=s3_output_location,
    base_job_name=training_job_name,
    metric_definitions=metric_definitions,
    checkpoint_s3_uri=checkpoint_s3_uri,
    use_spot_instances=use_spot_instances,
    max_run=max_run,
    max_wait=max_wait
)

# checks to see if you are using automated model tuning
if use_auto_tuning:
    hp_tuner = HyperparameterTuner(
        tg_estimator,
        amt_metric_definitions["metrics"][0]["Name"],
        hyperparameter_ranges,
        amt_metric_definitions["metrics"],
        max_jobs=max_jobs,
        max_parallel_jobs=max_parallel_jobs,
        objective_type=amt_metric_definitions["type"],
        base_tuning_job_name=training_job_name
    )
    
    print("Using hyerparameter tuning job")
    # Start a SageMaker Tuning job to search for the best hyperparameters
    hp_tuner.fit({"train": s3_training_location, "validation": s3_validation_location }, logs=True)
else:
    print(f"Training file location is {s3_training_location}")
    print(f"Validation file location is {s3_validation_location}")
    
    # Start a SageMaker Training job by passing s3 path for the training dataset
    tg_estimator.fit({"train": s3_training_location, "validation": s3_validation_location}, logs=True)

## Review Training metrics
Here we output the training metrics returned from the training job.

In [None]:
from sagemaker import TrainingJobAnalytics

if use_auto_tuning:
    print("Getting the best trained model from the hyperparameter tuner")
    training_job_name = hp_tuner.best_training_job()
else:
    training_job_name = tg_estimator.latest_training_job.job_name

df = TrainingJobAnalytics(training_job_name=training_job_name).dataframe()
df.head(10)

## Deploy & run Inference on the fine-tuned model
In this section we are deploying a model inference endpoint so that we can run inferences against the fine-tuned model.

In [16]:
inference_instance_type = "ml.g4dn.12xlarge"

# Retrieve the docker container uri for inference
deploy_image_uri = image_uris.retrieve(
    region=None,
    framework=None,
    image_scope="inference",
    model_id=model_id,
    model_version=model_version,
    instance_type=inference_instance_type,
)

endpoint_name_after_finetune = name_from_base(f"fine-tuned-{model_id}")

print(f"Endpoint name: {endpoint_name_after_finetune}" )

# Deploy to SageMaker inference endpoint
finetuned_predictor = (hp_tuner if use_auto_tuning else tg_estimator).deploy(
    initial_instance_count=1,
    instance_type=inference_instance_type,
    image_uri=deploy_image_uri,
    endpoint_name=endpoint_name_after_finetune
)

INFO:sagemaker.image_uris:Ignoring unnecessary Python version: py39.
INFO:sagemaker.image_uris:Ignoring unnecessary instance type: ml.g4dn.12xlarge.
INFO:sagemaker:Creating model with name: sagemaker-jumpstart-2023-08-08-01-27-29-320


Endpoint name: fine-tuned-huggingface-textgeneration1--2023-08-08-01-27-29-319


INFO:sagemaker:Creating endpoint-config with name fine-tuned-huggingface-textgeneration1--2023-08-08-01-27-29-319
INFO:sagemaker:Creating endpoint with name fine-tuned-huggingface-textgeneration1--2023-08-08-01-27-29-319


-------------!

## Inference Helper functions
Creates two helper functions that will be used when we call the inference endpoint

In [19]:
import json
import boto3

def query_endpoint_with_payload(encoded_json, endpoint_name):
    client = boto3.client("runtime.sagemaker")
    
    response = client.invoke_endpoint(
        EndpointName=endpoint_name, ContentType="application/json", Body=encoded_json
    )
    
    return response


def parse_response_texts(query_response):
    generated_text = []
    model_predictions = json.loads(query_response["Body"].read())
    return model_predictions[0]

## Calling fine-tuned model
Once the results are returned, you should notice that the fine-tuned model returns better results. In this case, we are using the same queries that were passed to the model that was not fine-tuned. This allows you to easy comparison the results between the two models.

In [52]:
parameters = {
    "max_length": 200,
    "num_return_sequences": 1,
    "top_k": 250,
    "top_p": 0.8,
    "do_sample": True,
    "temperature": 1,
}

res_gpt_finetune = []
    
# Note: We are looping through the array of queries and passing each query to the model inference endpoint to get model results
# queries that will be passed to the fine-tuned model.
for quota_text in [
    "His Honour Judge Richards",
]:
    
    payload = {"text_inputs": f"{quota_text}:", **parameters}
    
    query_response = query_endpoint_with_payload(
        json.dumps(payload).encode("utf-8"), endpoint_name_after_finetune
    )
    
    generated_texts = parse_response_texts(query_response)[0]["generated_text"]
    res_gpt_finetune.append(generated_texts)
    
    print("Fine-tuned model output")
    print(generated_texts)
    print("\n------------------------------------------------------------------")
    print("End of fine-tuned model output")

Fine-tuned model output
His Honour Judge Richards: This is a case which concerns the issue of whether or not it is in the best interests of the child to maintain the parent/child relationship.I will address that question in this judgment.The parties were married on 11 August 2001.There were no children of the marriage, but Ms Parmer and her new partner, Mr Green, have three children.In 2007 Ms Parmer had her first child with Mr Green, and, shortly after the birth of the child, she became pregnant again.This time, she and Mr Green had a daughter.Ms Parmer has an older child with Mr Green's first child, and a younger child with Mr Green's second child.Mr Green lives with his mother.Ms Parmer is divorced from her first husband.On 10 July 2010, Ms Parmer commenced this application for an order under the Family Law Act 1975, section 15, seeking to have Mr Green declared to be the father of her younger child.In his written answer

-------------------------------------------------------------

## Clean-Up
Here we are performing clean-up by deleting the fine-tuned model and deleting the inference endpoint that was deployed.

Note: Leaving an inference endpoint running can be costly depending on the instance type you deployed your endpoint to. In this notebook we are using the ml.g5.12xlarge instance type for our inference endpoint. This is a GPU based instance which cost roughly $5.672 per hour to run at this time of publishing this notebook.

In [None]:
# Delete the SageMaker endpoint and the attached resources
finetuned_predictor.delete_model()
finetuned_predictor.delete_endpoint()

### Disclaimer
This notebook demos how you can fine-tune an LLM using transfer learning. Even though this notebook is fine-tuned using actual (U.K.) Supreme Court case documents you should not use this notebook for legal advise.