## Bedrock Fine-Tuning Introduction

This example takes a look at how you can use the Cohere Command model to fine-tune via Amazon Bedrock for a summarization use-case with the [SamSum dataset](https://huggingface.co/datasets/samsum).

### Prepare Dataset

We take the SamSum dataset and upload a subset to S3 for fine-tuning.

In [None]:
from datasets import load_dataset
train_dataset = load_dataset("samsum", split="train")
train_dataset = train_dataset.remove_columns('id')
train_dataset = train_dataset.select(list(range(2000)))
train_dataset = train_dataset.rename_column("dialogue", "prompt")
train_dataset = train_dataset.rename_column("summary", "completion")
print(train_dataset)

In [None]:
train_dataset.to_json("samsum.jsonl")

In [None]:
import boto3
s3 = boto3.client("s3")

# make sure bucket name is unique 
bucket_name = "bedrock-fine-tuning-cohere-summarization"

# create s3 bucket
s3.create_bucket(Bucket=bucket_name)

# push the training file
training_file_name = "samsum.jsonl"
training_dataset_key = "train/samsum.jsonl"
s3.upload_file(training_file_name, bucket_name, training_dataset_key)

# create a folder to store fine-tuning output results
model_eval_results = "model-output/"
s3.put_object(Bucket=bucket_name, Key=model_eval_results)

In [None]:
s3_path = f"s3://{bucket_name}/"
print(f"S3 Data Location: {s3_path}")

train_dataset_path = f"s3://{bucket_name}/{training_dataset_key}"
print(f"Training Dataset Location: {train_dataset_path}")

model_output_path = f"s3://{bucket_name}/{model_eval_results}"
print(f"Model Outputs Stored: {model_output_path}")

### Model Fine-Tuning

In [None]:
import boto3 
bedrock = boto3.client(service_name="bedrock")

# reference: https://aws.amazon.com/blogs/aws/customize-models-in-amazon-bedrock-with-your-own-data-using-fine-tuning-and-continued-pre-training/
for model in bedrock.list_foundation_models(byCustomizationType="FINE_TUNING")["modelSummaries"]:
    if model["providerName"] == "Cohere" and model["modelName"] == "Command":
        command_model_id = model["modelId"]
print(command_model_id)

In [None]:
model_name = "customized-cohere-summarization-model-test"
job_name = "fine-tuning-samsum-cohere-summarization-test"
role = "your IAM role" #replace with your role ARN

bedrock.create_model_customization_job(
    customizationType="FINE_TUNING",
    jobName=job_name,
    customModelName=model_name,
    roleArn=role,
    baseModelIdentifier="cohere.command-text-v14:7:4k",
    hyperParameters = {"epochCount": "1"},
    trainingDataConfig={"s3Uri": train_dataset_path},
    outputDataConfig={"s3Uri": model_output_path},
)

In [None]:
import time
while bedrock.get_model_customization_job(jobIdentifier=job_name)["status"] == "InProgress":
    print(bedrock.get_model_customization_job(jobIdentifier=job_name)["status"])
    time.sleep(120)
print(bedrock.get_model_customization_job(jobIdentifier=job_name)["status"])