## Deploy Synthetic data generation Algorithm Model Package from AWS Marketplace 

Description !!

This sample notebook shows you how to deploy Synthetic data generation Algorithm using Amazon SageMaker.

> **Note**: This is a reference notebook and it cannot run unless you make changes suggested in the notebook.

#### Pre-requisites:
1. **Note**: This notebook contains elements which render correctly in Jupyter interface. Open this notebook from an Amazon SageMaker Notebook Instance or Amazon SageMaker Studio.
1. Ensure that IAM role used has **AmazonSageMakerFullAccess**
1. To deploy this ML model successfully, ensure that:
    1. Either your IAM role has these three permissions and you have authority to make AWS Marketplace subscriptions in the AWS account used: 
        1. **aws-marketplace:ViewSubscriptions**
        1. **aws-marketplace:Unsubscribe**
        1. **aws-marketplace:Subscribe**  
    2. or your AWS account has a subscription to Synthetic data generation Algorithm. If so, skip step: [Subscribe to the model package](#1.-Subscribe-to-the-model-package)

#### Contents:
1. [Subscribe to the model package](#1.-Subscribe-to-the-model-package)
2. [Create an endpoint and perform real-time inference](#2.-Create-an-endpoint-and-perform-real-time-inference)
   1. [Create an endpoint](#A.-Create-an-endpoint)
   2. [Create input payload](#B.-Create-input-payload)
   3. [Perform real-time inference](#C.-Perform-real-time-inference)
   4. [Output Result](#D.-Output-Result)
   5. [Delete the endpoint](#E.-Delete-the-endpoint)
3. [Perform batch inference](#3.-Perform-batch-inference) 
4. [Clean-up](#4.-Clean-up)
    1. [Delete the model](#A.-Delete-the-model)
    2. [Unsubscribe to the listing (optional)](#B.-Unsubscribe-to-the-listing-(optional))
    

#### Usage instructions
You can run this notebook one cell at a time (By using Shift+Enter for running a cell).

### 1. Subscribe to the model package

To subscribe to the model package:
1. Open the algorithm listing page **Synthetic data generation Algorithm**
1. On the AWS Marketplace listing, click on the **Continue to subscribe** button.
1. On the **Subscribe to this software** page, review and click on **"Accept Offer"** if you and your organization agrees with EULA, pricing, and support terms. 
1. Once you click on **Continue to configuration button** and then choose a **region**, you will see a **Product Arn** displayed. This is the model package ARN that you need to specify while creating a deployable model using Boto3. Copy the ARN corresponding to your region and specify the same in the following cell.

### 2. Usage Instruction

The deployed solution has these **2 steps**: Training the algorithm and Testing

<li>: The system trains on user provided real dataset in csv file.
<li>: The input training dataset with maximum 1000 rows.
<li>: The machine learning model is trained in the training step and once the model is generated, it can be used to generate synthetic data.
<li>: The testing API takes json input with number of samples and output is a csv file.
<li>: In the usage instruction notebook, the detailed steps are mentioned to train the algorithm, generate the output and interpret the output.

#### Input:
** Following are the mandatory inputs for both the APIs:**
• Supported content type for Training API: `text/csv`
• Supported content type for Testing API: ` application/json`
• The training dataset (csv file) can have maximum 1000 rows ??

#### Output:
•  Content types: ` text/csv`
#### Invoking endpoint
##### AWS CLI Command
If you are using real time inferencing, please create the endpoint first and then use the  following command to invoke it:
``` bash 
aws sagemaker-runtime invoke-endpoint --endpoint-name "endpoint-name" --body fileb://$file_name --content-type application/json --accept application/output.csv
```
Substitute the following parameters:
* `"endpoint-name"` - name of the inference endpoint where the model is deployed.
* `file_name` - Input csv file name
* `application/json` - type of the given input file.
* `output.csv` - filename where the inference results are written to.

In [1]:
import base64 
import uuid
from sagemaker import ModelPackage
import sagemaker as sage
from sagemaker import get_execution_role
from sagemaker import ModelPackage
import boto3
import numpy as np
import pandas as pd
import os

sagemaker.config INFO - Not applying SDK defaults from location: /etc/xdg/sagemaker/config.yaml
sagemaker.config INFO - Not applying SDK defaults from location: /home/ec2-user/.config/sagemaker/config.yaml


In [2]:
role = get_execution_role()

sagemaker_session = sage.Session()

bucket=sagemaker_session.default_bucket()
bucket

sagemaker.config INFO - Not applying SDK defaults from location: /etc/xdg/sagemaker/config.yaml
sagemaker.config INFO - Not applying SDK defaults from location: /home/ec2-user/.config/sagemaker/config.yaml
sagemaker.config INFO - Not applying SDK defaults from location: /etc/xdg/sagemaker/config.yaml
sagemaker.config INFO - Not applying SDK defaults from location: /home/ec2-user/.config/sagemaker/config.yaml


'sagemaker-us-east-1-822940408628'

In [3]:
# S3 prefixes
common_prefix = "hdts-sagemaker-testing"
training_input_prefix = common_prefix + "/training-input-data"
batch_inference_input_prefix = common_prefix + "/batch-inference-input-data"

In [4]:
sagemaker_session = sage.Session()

sagemaker.config INFO - Not applying SDK defaults from location: /etc/xdg/sagemaker/config.yaml
sagemaker.config INFO - Not applying SDK defaults from location: /home/ec2-user/.config/sagemaker/config.yaml


In [5]:
TRAINING_WORKDIR = "data/training"

#TRAINING_DATA = TRAINING_WORKDIR + "/train.zip"

In [6]:
TRAINING_WORKDIR = "data/training"

# training input location
training_input = sagemaker_session.upload_data(TRAINING_WORKDIR, key_prefix=training_input_prefix)

### 3. Training 

In [7]:
import json
import time
from sagemaker.algorithm import AlgorithmEstimator

##### Algorithm ARN

In [8]:
algorithm_arn ='arn:aws:sagemaker:us-east-1:822940408628:algorithm/synth-data-generation'

In [9]:
algo = AlgorithmEstimator(
    algorithm_arn=algorithm_arn,
    role=role,
    instance_count=1,
    instance_type='ml.m5.2xlarge',
    base_job_name='synth-train-marketplace')

sagemaker.config INFO - Not applying SDK defaults from location: /etc/xdg/sagemaker/config.yaml
sagemaker.config INFO - Not applying SDK defaults from location: /home/ec2-user/.config/sagemaker/config.yaml


In [10]:
print ("Now run the training job using algorithm arn %s in region %s" % (algorithm_arn, sagemaker_session.boto_region_name))
algo.fit({'training': training_input})

INFO:sagemaker:Creating training-job with name: synth-train-marketplace-2023-11-08-10-36-24-837


Now run the training job using algorithm arn arn:aws:sagemaker:us-east-1:822940408628:algorithm/synth-data-generation in region us-east-1
2023-11-08 10:36:25 Starting - Starting the training job...
2023-11-08 10:36:41 Starting - Preparing the instances for training......
2023-11-08 10:37:49 Downloading - Downloading input data...
2023-11-08 10:38:14 Training - Downloading the training image.....................
2023-11-08 10:41:35 Training - Training image download completed. Training in progress..[34mtraining started[0m
[34mEpoch 1, Loss G:  2.1033,Loss D: -0.0195[0m
[34mEpoch 2, Loss G:  2.1088,Loss D: -0.1069[0m
[34mEpoch 3, Loss G:  2.1661,Loss D: -0.1579[0m
[34mEpoch 4, Loss G:  2.1705,Loss D: -0.2282[0m
[34mEpoch 5, Loss G:  2.1515,Loss D: -0.3554[0m
[34mEpoch 6, Loss G:  2.1723,Loss D: -0.4758[0m
[34mEpoch 7, Loss G:  2.1143,Loss D: -0.5485[0m
[34mEpoch 8, Loss G:  2.0712,Loss D: -0.7154[0m
[34mEpoch 9, Loss G:  2.0739,Loss D: -0.7998[0m
[34mEpoch 10, Loss G

### 4. Input Data

In [12]:
import os
TRANSFORM_WORKDIR = "data/transform"
filename = os.path.join(TRANSFORM_WORKDIR, "inference_input.json")

In [14]:
f = open(filename)
data = json.load(f)
data['number_of_samples']

10

### 5. Perform batch inference

In this section, you will perform batch inference using multiple input payloads together. If you are not familiar with batch transform, and want to learn more, see these links:
1. [How it works](https://docs.aws.amazon.com/sagemaker/latest/dg/ex1-batch-transform.html)
2. [How to run a batch transform job](https://docs.aws.amazon.com/sagemaker/latest/dg/how-it-works-batch.html)

In [15]:
TRANSFORM_WORKDIR = "data/transform"
transform_input = sagemaker_session.upload_data(TRANSFORM_WORKDIR, key_prefix=batch_inference_input_prefix) + "/inference_input.json"
print("Transform input uploaded to " + transform_input)

Transform input uploaded to s3://sagemaker-us-east-1-822940408628/hdts-sagemaker-testing/batch-inference-input-data/inference_input.json


In [16]:
transformer = algo.transformer(1, 'ml.m5.2xlarge')
transformer.transform(transform_input, content_type='application/json')
transformer.wait()

print("Batch Transform output saved to " + transformer.output_path)

INFO:sagemaker:Creating model package with name: synth-data-generation-2023-11-08-10-49-11-846


.........

INFO:sagemaker:Creating model with name: synth-data-generation-2023-11-08-10-49--2023-11-08-10-49-57-307





INFO:sagemaker:Creating transform job with name: synth-train-marketplace-2023-11-08-10-50-00-153


..................................................[34mStarting the inference server with 8 workers.[0m
[34m[2023-11-08 10:58:26 +0000] [10] [INFO] Starting gunicorn 21.2.0[0m
[34m[2023-11-08 10:58:26 +0000] [10] [INFO] Listening at: unix:/tmp/gunicorn.sock (10)[0m
[34m[2023-11-08 10:58:26 +0000] [10] [INFO] Using worker: sync[0m
[34m[2023-11-08 10:58:26 +0000] [13] [INFO] Booting worker with pid: 13[0m
[34m[2023-11-08 10:58:26 +0000] [14] [INFO] Booting worker with pid: 14[0m
[34m[2023-11-08 10:58:26 +0000] [29] [INFO] Booting worker with pid: 29[0m
[34m[2023-11-08 10:58:26 +0000] [30] [INFO] Booting worker with pid: 30[0m
[34m[2023-11-08 10:58:26 +0000] [45] [INFO] Booting worker with pid: 45[0m
[34m[2023-11-08 10:58:26 +0000] [46] [INFO] Booting worker with pid: 46[0m
[34m[2023-11-08 10:58:26 +0000] [47] [INFO] Booting worker with pid: 47[0m
[34m[2023-11-08 10:58:26 +0000] [69] [INFO] Booting worker with pid: 69[0m

[34m169.254.255.130 - - [08/Nov/2023:10:58:

#### Inspect the Batch Transform Output in S3

In [17]:
from urllib.parse import urlparse

parsed_url = urlparse(transformer.output_path)
bucket_name = parsed_url.netloc
file_key = '{}/{}.out'.format(parsed_url.path[1:], "inference_input.json")

s3_client = sagemaker_session.boto_session.client('s3')

response = s3_client.get_object(Bucket = sagemaker_session.default_bucket(), Key = file_key)

In [18]:
bucketFolder = transformer.output_path.rsplit('/')[3]

In [21]:
import boto3
s3_conn = boto3.client("s3")
bucket_name="sagemaker-us-east-1-822940408628"
with open('output.csv', 'wb') as f:
    s3_conn.download_fileobj(bucket_name, bucketFolder+'/' + "inference_input.json" +'.out', f)
    print("Output file loaded from bucket")

Output file loaded from bucket


#### D. Visualize output

In [22]:
output_df = pd.read_csv('output.csv')
print("length of output df-->",len(output_df))
output_df.head(2)

length of output df--> 10


Unnamed: 0.1,Unnamed: 0,age,workclass,fnlwgt,education,education-num,marital-status,occupation,relationship,race,sex,capital-gain,capital-loss,hours-per-week,native-country,income
0,353.504031,42.815791,Private,525731.107574,7th-8th,16.181752,Divorced,Other-service,Own-child,White,Male,173.935159,-35.172755,40.479019,United-States,<=50K
1,636.083117,46.254596,Private,213209.037569,HS-grad,9.863858,Married-civ-spouse,Adm-clerical,Not-in-family,White,Male,193.250014,-18.076041,45.312822,United-States,<=50K


### 6. Create an endpoint and perform real-time inference

If you want to understand how real-time inference with Amazon SageMaker works, see [Documentation](https://docs.aws.amazon.com/sagemaker/latest/dg/how-it-works-hosting.html).

In [25]:
model_name='synth-data-generation'
content_type='application/json'

real_time_inference_instance_type='ml.m5.2xlarge'
batch_transform_inference_instance_type='ml.m5.2xlarge'

##### Algorithm ARN

In [26]:
algorithm_arn ='Put your algorithm ARN'

#### A. Create an endpoint

In [27]:
def predict_wrapper(endpoint, session):
    return sage.RealTimePredictor(endpoint, session,content_type)

#create a deployable model from the model package.
model = ModelPackage(role=role,
                    model_package_arn=algorithm_arn,
                    sagemaker_session=sagemaker_session,
                    predictor_cls=predict_wrapper)

#Deploy the model
predictor = algo.deploy(1, 'ml.m5.2xlarge',endpoint_name=model_name)

INFO:sagemaker:Creating model package with name: synth-train-marketplace-2023-11-08-11-03-48-806


.........

INFO:sagemaker:Creating model with name: synth-train-marketplace-2023-11-08-11-03-48-806





INFO:sagemaker:Creating endpoint-config with name synth-data-generation
INFO:sagemaker:Creating endpoint with name synth-data-generation


-----------!

Once endpoint has been created, you would be able to perform real-time inference.

#### B. Create input payload

In [28]:
import pandas as pd
file_name = './data/real-time/input/inferencedata.csv'

In [29]:
f = open(filename)
data = json.load(f)
data['number_of_samples']

10

#### C. Perform real-time inference

In [30]:
file_name = './data/real-time/input/inference_input.json'

In [31]:
output_file_name = 'output_realtime.csv'

In [33]:
!aws sagemaker-runtime invoke-endpoint \
    --endpoint-name 'synth-data-generation' \
    --body fileb://$file_name \
    --content-type 'application/json' \
    --region us-east-1 \
    $output_file_name

{
    "ContentType": "text/csv; charset=utf-8",
    "InvokedProductionVariant": "AllTraffic"
}


#### D. Visualize output

In [36]:
output_df = pd.read_csv('output_realtime.csv')
print("length of output df --",len(output_df))
output_df.head(2)

length of output df -- 10


Unnamed: 0.1,Unnamed: 0,age,workclass,fnlwgt,education,education-num,marital-status,occupation,relationship,race,sex,capital-gain,capital-loss,hours-per-week,native-country,income
0,779.024269,55.412423,?,137436.461434,Some-college,11.722291,Married-civ-spouse,Craft-repair,Husband,White,Female,-65.733198,-46.26761,38.671572,Canada,<=50K
1,662.27287,63.727675,Private,224632.675097,HS-grad,12.50126,Separated,Other-service,Husband,Black,Male,47.837341,-45.202569,40.31768,United-States,<=50K


### 7. Clean-up

#### A. Delete the model

In [37]:
predictor.delete_endpoint()

INFO:sagemaker:Deleting endpoint configuration with name: synth-data-generation
INFO:sagemaker:Deleting endpoint with name: synth-data-generation


#### B. Unsubscribe to the listing (optional)

If you would like to unsubscribe to the model package, follow these steps. Before you cancel the subscription, ensure that you do not have any [deployable model](https://console.aws.amazon.com/sagemaker/home#/models) created from the model package or using the algorithm. Note - You can find this information by looking at the container name associated with the model. 

**Steps to unsubscribe to product from AWS Marketplace**:
1. Navigate to __Machine Learning__ tab on [__Your Software subscriptions page__](https://aws.amazon.com/marketplace/ai/library?productType=ml&ref_=mlmp_gitdemo_indust)
2. Locate the listing that you want to cancel the subscription for, and then choose __Cancel Subscription__  to cancel the subscription.

