# DP Fine-tune, and deploy a custom LLM model using Secludy PII-Free Synthetic Text Replicas Algorithm from AWS Marketplace

# Pre-requisites
Note: This notebook contains elements which render correctly in Jupyter interface. Open this notebook from an Amazon SageMaker Notebook Instance or Amazon SageMaker Studio.

Ensure that IAM role used has AmazonSageMakerFullAccess

Some hands-on experience using Amazon SageMaker.

To use this algorithm successfully, ensure that:

Either your IAM role has these three permissions and you have authority to make AWS Marketplace subscriptions in the AWS account used:

aws-marketplace:ViewSubscriptions

aws-marketplace:Unsubscribe

aws-marketplace:Subscribe

or your AWS account has a subscription to For Seller to update:Secludy PII-Free Synthetic Text Replicas.

# how Sagemaker algorithms works with AWS marketplace product

![image.png](asset/image.png)

# To subscribe to the algorithm: 
1. Open the algorithm listing page For Seller to update:Secludy PII-Free Synthetic Text Replicas.
2. On the AWS Marketplace listing, click on Continue to subscribe button.
3. On the Subscribe to this software page, review and click on “Accept Offer” if you agree with EULA, pricing, and support terms.
4. Once you click on Continue to configuration button and then choose a region, you will see a Product Arn. This is the algorithm ARN that you need to specify while training a custom ML model. Copy the ARN corresponding to your region and specify the same in the following cell.

In [None]:
algorithm_arn = "<Customer to specify algorithm ARN corresponding to their AWS region>"

# set up role

In [None]:
import sagemaker
role = sagemaker.get_execution_role()

In [None]:
role

In [None]:
import boto3
session = boto3.Session(region_name='us-east-1')
sagemaker_session = sagemaker.Session(boto_session=session)

In [None]:
bucket = sagemaker_session.default_bucket()
bucket

# Buyer need to update S3 output location

In [None]:
# output_location = "s3://{}/<For seller to Update:Update a unique prefix>/{}".format(
#     bucket, "output"
# )
output_location = "s3://secludy-public-listing/prod listing out/buyer-test/".format(
    bucket, "output"
)

# Define hyperparameters ,update the prompt to your use case

In [None]:

hyperparameters = {  
    "epochs": "1",
        "batch_size": "1",
        "learning_rate": "0.001",
        "grad_accum_steps": "16",
        "epsilon": "8.0",
        "max_seq_length": "512",
        "instruction": "Classify the following email content into its appropriate category based on its content."}

In [None]:
from sagemaker import AlgorithmEstimator
estimator = AlgorithmEstimator(
algorithm_arn=algorithm_arn,
role=role,
instance_count=1,
instance_type='ml.g5.xlarge',
sagemaker_session=sagemaker_session,
   output_path = output_location,
    hyperparameters=hyperparameters,
base_job_name='privacy-protective-synthetic-data-generation'
)

# Buyer update model fine-tuning input in jsonl file format

for example:   
we have two col, named 'content' and 'category'   
note: 'category' col is a label with any length   
{"content": "From: Liam O'Connor <liam.oconnor@costco.com>\nTo: Operations Team <operations.team@costco.com>\nSubject: Update on Warehouse Optimization Project\n\nThe new layout plans are finalized and ready for implementation next week. Please review and prepare your teams accordingly.", "category": "Project Updates"}  
{"content": "From: Rajesh Gupta <rajesh.gupta@costco.com>\nTo: Supply Chain Team <supplychain.team@costco.com>\nSubject: Project Update: New Vendor Integration\n\nVendor integration is progressing well, with 75% of the process complete. Expected completion is by end of next week.", "category": "Project Updates"}

In [None]:
train_input = 's3://secludy-public-listing/costco_emails_formatted.jsonl'

estimator.fit({'training': train_input})

# Deploy model and verify results

In [None]:
model_name = "DP-LLM-generator"

content_type = "application/json"

real_time_inference_instance_type = (
    "ml.p3.2xlarge"
)
batch_transform_inference_instance_type = (
    "ml.p3.2xlarge"
)

In [None]:
model_path =  "For Seller to update:<specify-path-to-finetuned-weights>"

In [None]:
# Provide path to incremental model weights

kwargs = {}
kwargs["algorithm_arn"] = algorithm_arn
model = sagemaker.ModelPackage(
    role=role, model_data=model_path, sagemaker_session=sagemaker_session, **kwargs
)

endpoint_name = "sythetic-data-gen-endpoint"

model.deploy(
    initial_instance_count=1,
    instance_type=real_time_inference_instance_type,
    endpoint_name=endpoint_name,
    model_data_download_timeout=2400,
    container_startup_health_check_timeout=2400,
)

In [None]:
!aws sagemaker-runtime invoke-endpoint \
    --endpoint-name sythetic-data-gen-endpoint \
    --body '{ "categories": ["Project Updates", "HR Communications"],"num_replicas": 1,"max_tokens": 128,"instruction": "write me some corporate email examples in the category of" }' \
    --content-type "application/json" \
    --region us-east-1 \
    output.json

# [optional] delete endpoint after use

In [None]:
# Specify the endpoint name
endpoint_name = "sythetic-data-gen-endpoint"
sagemaker_client = boto3.client("sagemaker", region_name="us-east-1")
# Delete the endpoint
sagemaker_client.delete_endpoint(EndpointName=endpoint_name)

print(f"Endpoint '{endpoint_name}' has been deleted.")