# Comprehend Custom Classification Training & Deployment

This notebook is used to develop the comprehend custom classification training and deployment. The same task/operation can be done using any IDE such as Cloud9 etc. Notebook is used here to give the glimpse of experience to the users about SageMaker.


Get the execution role for the notebook instance. This is the IAM role that you created for your notebook instance. You pass the role to the tuning job

In [None]:
from sagemaker import get_execution_role
role = get_execution_role()
role

Initiate the Custom classifcation Training job. This will initiate the training job in comprehend service and produce the trained model.
Note: You can get the bucket name from CDK output in the terminal from where you run this deployment.

In [None]:
import boto3

client = boto3.client('comprehend')

response = client.create_document_classifier(
    DocumentClassifierName='email-classifications-sample',# Enter the name of the classifier
    DataAccessRoleArn=role,
    InputDataConfig={
        'S3Uri': 's3://<Bucket Name from CloudFormation stack resource section>/Comprehend_Training_Data.csv'
    },# This is the public read only bucket having sample data. You can use your bucket for your data.
    LanguageCode='en'
)

Checking the status of that training. This may take upto 20 mins. Please wait until you get the status "Training Competed"

In [None]:
import time
modelarn=response["DocumentClassifierArn"]
response_des = client.describe_document_classifier(
    DocumentClassifierArn=modelarn
)
print (response_des['DocumentClassifierProperties']['Status'])
train_status=response_des['DocumentClassifierProperties']['Status']
print ("Training started")
while train_status!='TRAINED':
    print (".")
    time.sleep(30)
    train_status=response_des['DocumentClassifierProperties']['Status']
    
print("Training Completed")

Creating Endpoint for the Trained model

In [None]:
#create endpoint for the trained classifer
response_ep = client.create_endpoint(
    EndpointName='email-classifications-endpoint',
    ModelArn=modelarn,
    DesiredInferenceUnits=1,# if you wan to deploy this multiple unit, you can enter more than one. 
    #ClientRequestToken='string',
    Tags=[
        {
            'Key': 'Name',
            'Value': 'email classification'
        },
    ],
    DataAccessRoleArn=role
)

Check the Endpoint ARN

In [None]:
response_ep
eparn=response_ep["EndpointArn"]
eparn

Now test the trined model by sending some sample sentences. Note: Please wait for few mins to deploy this model and endpoint gets created

In [None]:
response_cd = client.classify_document(
    Text='Can you send the status of the transaction id:278960001',
    EndpointArn=eparn
)
response_cd

Now this endpoint ARN will be used to classify emails coming from customer via Amazon WorkMail. You will use this ARN for your next CDK deployment

## Now lets create the model for Entity Detection. 

If the above model clasify the incoming email as MONEYTRANSFER, we have to fetch the ststus of the transaction id given in that email. Money Transfer id will prefix with 'MTN' and followed by 7 digit numbers.

### Entity Detection model training

In [None]:
response = client.create_entity_recognizer(
    RecognizerName='email-entity-detection-model',
    DataAccessRoleArn = role,
    InputDataConfig={
        'DataFormat': 'COMPREHEND_CSV',
        'EntityTypes': [
            {
                'Type': 'MTNID'
            },
        ],
        'Documents': {
            'S3Uri': 's3://shavgs-comprehend-training-bucket/input/Comprehend_raw_Data_entity_detection.csv'
        },
        'EntityList': {
            'S3Uri': 's3://shavgs-comprehend-training-bucket/input/sample_entity_data.csv'
        }
    },
    LanguageCode='en'
)

In [None]:
# Check whether training completed or not
entmodelarn=response["EntityRecognizerArn"]
ent_response_des = client.describe_entity_recognizer(
    EntityRecognizerArn=entmodelarn
)
print (ent_response_des['EntityRecognizerProperties']['Status'])
ent_train_status=ent_response_des['EntityRecognizerProperties']['Status']
print ("Training started")
while ent_train_status!='TRAINED':
    print (".")
    time.sleep(30)
    ent_train_status=response_des['EntityRecognizerProperties']['Status']
    
print("Training Completed")
print(entmodelarn)

In [None]:
# Deploy the model by creating endpoint for this entity detection
ent_response_ep = client.create_endpoint(
    EndpointName='email-entity-detection-endpoint',
    ModelArn=entmodelarn,
    DesiredInferenceUnits=1,# if you wan to deploy this multiple unit, you can enter more than one. 
    #ClientRequestToken='string',
    Tags=[
        {
            'Key': 'Name',
            'Value': 'email entity detection'
        },
    ],
    DataAccessRoleArn=role
)

In [None]:
ent_eparn=ent_response_ep["EndpointArn"]
ent_eparn

Now test the trined model by sending some sample sentences. Note: Please wait for few mins to deploy this model and endpoint gets created

In [None]:

ent_response = client.detect_entities(
    Text='Can you send the status of the transaction id: MTN2780001',
    LanguageCode='en',
    EndpointArn=ent_eparn
)
print (ent_response)

Please use this ARN while you deploy the CDK stack in next steps