# Finetune and deploy a custom Generative Command or Command-light model

This sample notebook shows you how to finetune and deploy a custom Command or Command-light model using Amazon SageMaker.

> **Note**: This is a reference notebook and it cannot run unless you make changes suggested in the notebook.

## Pre-requisites:
1. **Note**: This notebook contains elements which render correctly in Jupyter interface. Open this notebook from an Amazon SageMaker Notebook Instance or Amazon SageMaker Studio.
1. Ensure that IAM role used has **AmazonSageMakerFullAccess**
1. To deploy this ML model successfully, ensure that:
    1. Either your IAM role has these three permissions and you have authority to make AWS Marketplace subscriptions in the AWS account used: 
        1. **aws-marketplace:ViewSubscriptions**
        1. **aws-marketplace:Unsubscribe**
        1. **aws-marketplace:Subscribe**  
    2. or your AWS account has a subscription to the packages for [Cohere Command Finetuning](https://aws.amazon.com/marketplace/pp/prodview-oi7awyyaywtzq) and [Cohere Command-Light Finetuning](https://aws.amazon.com/marketplace/pp/prodview-4kbovmir2j56y). If so, skip step: [Subscribe to the finetune algorithm](#1.-Subscribe-to-the-finetune-algorithm)

## Contents:
1. [Subscribe to the finetune algorithm](#1.-Subscribe-to-the-finetune-algorithm)
2. [Finetune Generation Models](#2.-Finetune-the-model)
   1. [Upload training and evaluation datasets](#A.-Upload-training-and-evaluation-datasets)
   2. [Finetune models on uploaded data](#B.-Finetune-model-on-uploaded-data)
3. [Create an endpoint for inference with the custom model](#3.-Create-an-endpoint-for-inference-with-the-custom-model)
   1. [Create an endpoint]()
   2. [Perform real-time inference]()
4. [Clean-up](#4.-Clean-up)
    1. [Delete the endpoint](#A.-Delete-the-endpoint)
    2. [Unsubscribe to the listing (optional)](#B.-Unsubscribe-to-the-listing-(optional))
    

## Usage instructions
You can run this notebook one cell at a time (By using Shift+Enter for running a cell).

## 1. Subscribe to the finetune algorithm

To subscribe to the model algorithm:
1. Open the algorithm listing page [Cohere Command Finetuning](https://aws.amazon.com/marketplace/pp/prodview-oi7awyyaywtzq) or [Cohere Command-Light Finetuning](https://aws.amazon.com/marketplace/pp/prodview-4kbovmir2j56y)
2. On the AWS Marketplace listing, click on the **Continue to Subscribe** button.
3. On the **Subscribe to this software** page, review and click on **"Accept Offer"** if you and your organization agrees with EULA, pricing, and support terms. 
4. Once you click on **Continue to configuration** button and then choose a **region**, you will see a **Product Arn** displayed. This is the algorithm ARN that you need to specify while creating a finetune or deploying the finetuned model as an endpoint using boto3. Copy the ARN corresponding to your region and specify the same in the following cell.

In [None]:
!pip install -U cohere-aws

from cohere_aws import Client
import boto3
import sagemaker as sage
from sagemaker.s3 import S3Uploader

The algorithm is available on most AWS regions. Add/update the algorithm ARN for your region to this list.

**Note**: The ARNs mentioned in this map below default to that for Command-light finetuning. You must use the algorithm for [Command](https://aws.amazon.com/marketplace/pp/prodview-oi7awyyaywtzq) if you want to finetune the Command model.

In [None]:
region = boto3.Session().region_name

# TODO Add/update the algorithm ARN for your region to this list.
algorithm_map = {
    "us-west-2": "arn:aws:sagemaker:us-west-2:594846645681:algorithm/cohere-command-light-ft-c6aebf3903853cd0885411d7cb58a879",
    "us-west-1": "arn:aws:sagemaker:us-west-1:382657785993:algorithm/cohere-command-light-ft-c6aebf3903853cd0885411d7cb58a879",
    "us-east-1": "arn:aws:sagemaker:us-east-1:865070037744:algorithm/cohere-command-light-ft-c6aebf3903853cd0885411d7cb58a879",
}
if region not in algorithm_map.keys():
    raise Exception(f"Current boto3 session region {region} is not supported.")

arn = algorithm_map[region]

### A. Upload training and evaluation datasets

Select a path on S3 to store the training and evaluation datasets and update the **s3_data_dir** below:

In [None]:
s3_data_dir = "s3://..."  # Do not add a trailing slash otherwise the upload will not work

Upload sample training data to S3:

### Note:

You'll need your data in a .csv or .jsonl file that contains prompt-completion pairs as your examples.


### Example:

JSONL:  `{"prompt": "This is the first prompt", "completion": "This is the first completion"}`

CSV:  `"This is the first prompt" , "This is the first completion"`

In [None]:
sess = sage.Session()
train_dataset = S3Uploader.upload("../examples/sample_sst5_finetuning_data.jsonl", s3_data_dir, sagemaker_session=sess)

**Note:** Repeat the for the same for the evaluation dataset if you have one. If absent, we will auto-split the training dataset into training and evaluation datasets with the ratio of 90:10.

Remember the dataset must contain at least 32 examples. If an evaluation dataset is provided, both training and evaluation datasets must contain at least 16 examples. The above split ratio is overwritten if the evaluation split is lesser than 16 examples. So for a dataset of size 50 the evaluation is 16 examples and the remaining 34 examples are used for training.

We recommend using a dataset than contains at least 100 examples but a larger dataset is likely to yield high quality finetunes. Be aware that a larger dataset would mean that the time to finetune would also be larger.

### B. Finetune model on uploaded data

Specify a directory on S3 where finetuned models should be stored. Make sure you do not reuse the same directory across multiple runs. 

In [None]:
# TODO update this with a custom S3 path
# DO NOT re-use the same s3 directory for multiple finetunes
# DO NOT add a trailing slash at the end

s3_models_dir = "s3://..."  

Create Cohere client:

In [None]:
co = Client(region_name=region)

Optional: Define hyperparameters

- `train_epochs`: This is the maximum number of training epochs to run for. Defaults to **1**.
- `strategy`: Use either **tfew** or **vanilla**. Defaults to **tfew**, a parameter efficient finetuning approach. **vanilla** implies the weight updates will be applied to last half of the layers for Command-light and last quarter of the layers for Command.
- `learning_rate`: The initial learning rate to be used during training. Default differs based on ARN and strategy and is listed below.
- `train_batch_size`: The batch size used during training. Defaults to **16** for Command and **8** for Command-light.
- `early_stopping_patience`: Stop training if the loss metric does not improve beyond 'early_stopping_threshold' for this many times of evaluation. Defaults to **6.**
- `early_stopping_threshold`: How much the loss must improve to prevent early stopping. Defaults to **0.01**.


| Model | Strategy | Learning Rate |
| --- | --- | --- |
| Command-light | vanilla | 6E-07 |
| Command-light | tfew | 0.01 |
| Command | vanilla | 1E-05 |
| Command | tfew | 0.01 |

In [None]:
# Example of how to pass hyperparameters to the fine-tuning job
train_parameters = {
    "train_epochs": 1,
    "strategy": "tfew",
    "early_stopping_patience": 5,
    "early_stopping_threshold": 0.001,
}

Create fine-tuning jobs for the uploaded datasets. Add a field for `eval_data` if you have pre-split your dataset and uploaded both training and evaluation datasets to S3. Remember to use p4de for Command Finetuning. p4d is sufficient for Command-light Finetuning

In [None]:
# This will take approximately 30 minutes with the example dataset
finetune_name = "sample-finetune"
co.create_finetune(arn=arn,
    name=finetune_name,
    train_data=train_dataset,
    s3_models_dir=s3_models_dir,
    instance_type="ml.p4d.24xlarge",
    training_parameters=train_parameters,
)

The finetuned weights for the above will be store in a tar file `{s3_models_dir}/sample-finetune.tar.gz` where the file name is the same as the name used during the creation of the finetune.

## 3. Create an endpoint for inference with the custom model

### A. Create an endpoint

The Cohere AWS SDK provides a built-in method for creating an endpoint for inference. This will automatically deploy the model you finetuned earlier.

> **Note**: This is equivalent to creating and deploying a `ModelPackage` in SageMaker's SDK.

You can serve multiple t-few finetunes if you store all of them in the same S3 directory and pass the directory as `s3_model_dir`. Please note that you should use dedicated directories for vanilla finetunes.

In [None]:
co.create_endpoint(arn=arn,
    endpoint_name="command-light-finetune-test",
    s3_models_dir=s3_models_dir,
    recreate=True,
    instance_type="ml.p4d.24xlarge",
)

# If the endpoint is already created, you just need to connect to it
# co.connect_to_endpoint(endpoint_name="command-light-finetune-test")

### B. Perform real-time inference

Now, you can access all models deployed on the endpoint for inference:

When serving t-few finetunes, you must additionally pass the `name` in the `model` parameter as specified during the creation of the finetune.

In [None]:
prompt = "Classify the following text as either very negative, negative, neutral, positive or very positive: mr. deeds is , as comedy goes , very silly -- and in the best way."

# vanilla
# result = co.generate(prompt=prompt, max_tokens=50)

# tfew
result = co.generate(model=finetune_name, prompt=prompt, max_tokens=50)
print(result)

## 4. Clean-up

### A. Delete the endpoint

After you've successfully performed inference, you can delete the deployed endpoint to avoid being charged continuously. This can also be done via the Cohere AWS SDK:

In [None]:
co.delete_endpoint()
co.close()

## 5. Stacking multiple T-Few finetunes together

When creating finetunes with the strategy as `tfew`, the resultant weights will be in the order of magnitude of 1-10 MB. This unlocks the ability to keep multiple T-Few finetunes in GPU VRAM by stacking them one on top of the other. This, combined with some interesting framework optimizations, allows us to perform inference for multiple T-Few finetunes concurrently. To read more about how we do stacked serving of the T-Few finetunes refer to Cohere's [T-few Finetuning blog post](https://txt.cohere.com/tfew-finetuning/).

It is important to use unique names when creating `tfew` finetunes during the co.create_finetune call, so they can be stacked together. Some important notes for stacking T-Few finetunes are:
* The T-Few finetuned weights must be created using the same version of the algorithm.
* You must select the T-Few finetunes to be stacked together and copy the corresponding tar files to a S3 directory of your choosing. This S3 directory should then be passed to co.create_endpoint()
* When using the SDK to create a stacked T-Few model endpoint, your collection of T-Few finetunes will be extracted and re-combined to a single tar file. It will create a `models.tar.gz` file in the same S3 directory, which is then consumed to create the Model Endpoint on Sagemaker.

Lets create a second T-Few finetune so we can stack the two together and see stacked serving in action.

### Create a second finetune

Ideally use a new training dataset, but remember to use strategy of tfew, and to select a different S3 directory. Note that we also gave the finetune a different name

In [None]:
finetune2_name = "sample-finetune-v2"
co.create_finetune(arn=arn,
    name=finetune2_name,
    train_data=train_dataset,
    s3_models_dir=s3_models_dir,
    instance_type="ml.p4de.24xlarge",
    training_parameters=train_parameters,
)

Once the above is complete, you can then copy both the tar files to a dedicated S3 directory, set the value for the `s3_stacked_dir` and use that to create and endpoint like before. 

In [None]:
s3_stacked_dir = "s3:/..."

co.create_endpoint(arn=arn,
    endpoint_name="command-light-stacked-test",
    s3_models_dir=s3_stacked_dir,
    recreate=True,
    instance_type="ml.p4d.24xlarge",
)

# If the endpoint is already created, you just need to connect to it
# co.connect_to_endpoint(endpoint_name="command-light-stacked-test")

To send inference requests, use the above endpoint but select the corresponding finetune by specifying the finetune name in the `model` field. You will see it is capable of running inference for both of the created t-few finetunes. You can use this strategy to serve an arbitary number of finetunes concurrently on the same hardware.

In [None]:
result = co.generate(model=finetune_name, prompt=prompt, max_tokens=50)
print(result)

result = co.generate(model=finetune2_name, prompt=prompt, max_tokens=50)
print(result)

If you want to update an existing model endpoint, such as to add or remove tfew finetunes from the stack, then you can use the [update_endpoint](https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/sagemaker/client/update_endpoint.html#) functionality of Sagemaker, and use rolling update as the policy to ensure there is no downtime.

## Unsubscribe to the listing (optional)

If you would like to unsubscribe to the model package, follow these steps. Before you cancel the subscription, ensure that you do not have any [deployable models](https://console.aws.amazon.com/sagemaker/home#/models) created from the model package or using the algorithm. Note - You can find this information by looking at the container name associated with the model. 

**Steps to unsubscribe to product from AWS Marketplace**:
1. Navigate to __Machine Learning__ tab on [__Your Software subscriptions page__](https://aws.amazon.com/marketplace/ai/library?productType=ml&ref_=mlmp_gitdemo_indust)
2. Locate the listing that you want to cancel the subscription for, and then choose __Cancel Subscription__  to cancel the subscription.
