## Ticket Classification Prediction
The solution handles the misclassification of tickets using multifactor AI/ML based classification model. The As-Is classification scheme leads to higher MTTR(Mean Time to Resolution) and low FCR (First Call Resolution) due to misclassification of tickets. The current scheme handles these problems by automating classification of tickets using classification models. The solution considers factors such as ticket impact, urgency, priority along with ticket description and ticket categories.

### Contents

1. [Set up the environment](#Set-up-the-environment)
1. [Usage Instructions](#Usage-Instructions)
1. [Upload the data for training](#Upload-the-data-for-training)
1. [Run Training Job](#Run-Training-Job)
1. [Live Inference Endpoint](#Live Inference)
1. [Batch Transform Job](#Batch-Transform-Job)
1. [Output Interpretation](#Output-Interpretation)



<img src="images/Flow_diagram.JPG">

### Prerequisite

To run this algorithm you need to have access to the following AWS Services:
- Access to AWS SageMaker and the model package.
- An S3 bucket to specify input/output.
- Role for AWS SageMaker to access input/output from S3.

### Input format
#### Input:
Name of the file: <b>train.csv</b><br>
This file contains historical incidents that have been resolved. The solution uses the following incident specific inputs to derive specific productivity measures such as efficiency, experience and workload management across incident types for incident managers to make the predictions.<br><br>

</ul>
<li>  ID: Unique identifier for the request- alphanumeric e.g. INC0001029696</li>
<li> Reported_Day: The day of the week in number (Preferred format: 1-7)</li>

<li> prod_cat: First level category for requests e.g. Miscellaneous_Instance_Database_SQL Server Database</li>
<li> Country: Country of origin of request, Preferred format: USA)</li>
<li> Detailed_Description: Free Text Describing the problem in users works</li>
<li> Priority: Status of the request e.g. Low/Medium/High</li>
<li> Impact: High/Medium/Low
</ul><br>
NOTE:
</ul>
<li>Not all requests are mandatory. Optional Fields :Prod_Cat, Detailed_Description,prod_cat</li>

</ul>




## Set up the environment
Here we specify a bucket to use and the role that will be used for working with SageMaker.

In [16]:
# S3 prefix
prefix = 'ticket-classifier'

# Define IAM role
import boto3
import re

import os
import numpy as np
import pandas as pd
from sagemaker import get_execution_role

role = get_execution_role()

## Create the session
The session remembers our connection parameters to SageMaker. We'll use it to perform all of our SageMaker operations.

In [17]:
import sagemaker as sage
from time import gmtime, strftime

sess = sage.Session()

## Upload the data for training
When training large models with huge amounts of data, you'll typically use big data tools, like Amazon Athena, AWS Glue, or Amazon EMR, to create your data in S3. For the purposes of this example, we're using classification dataset, which we have included.

We can use use the tools provided by the SageMaker Python SDK to upload the data to a default bucket.

In [19]:
data_location= 's3://aws-marketplace-mphasis-assets/ticket-classifier/train.csv'
data_location

's3://aws-marketplace-mphasis-assets/ticket-classifier/train.csv'

## Create an estimator and fit the model
In order to use SageMaker to fit our algorithm, we'll create an Estimator that defines how to use the container to train. This includes the configuration we need to invoke SageMaker training:
- The container name. This is constructed as in the shell commands above.
- The role. As defined above.
- The instance count which is the number of machines to use for training.
- The instance type which is the type of machine to use for training.
- The output path determines where the model artifact will be written.
- The session is the SageMaker session object that we defined above

Then we use fit() on the estimator to train against the data that we uploaded above.

In [20]:
account = sess.boto_session.client('sts').get_caller_identity()['Account']
region = sess.boto_session.region_name
image = '{}.dkr.ecr.{}.amazonaws.com/sagemaker-ticket-classifier:latest'.format(account, region)

tree = sage.estimator.Estimator(image,
                       role, 3, 'ml.c4.2xlarge',
                      output_path="s3://{}/output".format(sess.default_bucket()),
                       sagemaker_session=sess)

tree.fit(data_location)

2021-03-07 13:43:58 Starting - Starting the training job...
2021-03-07 13:43:59 Starting - Launching requested ML instancesProfilerReport-1615124638: InProgress
......
2021-03-07 13:45:13 Starting - Preparing the instances for training...
2021-03-07 13:45:54 Downloading - Downloading input data...
2021-03-07 13:46:14 Training - Downloading the training image...
2021-03-07 13:46:54 Training - Training image download completed. Training in progress.[35mStarting the training.[0m
[35m(99, 10)[0m
[35mHyperparameter Selection Started[0m
  **self._backend_args)[0m
  **self._backend_args)[0m
  **self._backend_args)[0m
  n_jobs = min(effective_n_jobs(n_jobs), n_estimators)[0m
  **self._backend_args)[0m
  n_jobs = min(effective_n_jobs(n_jobs), n_estimators)[0m
  **self._backend_args)[0m
  n_jobs = min(effective_n_jobs(n_jobs), n_estimators)[0m
  **self._backend_args)[0m
  n_jobs = min(effective_n_jobs(n_jobs), n_estimators)[0m
  **self._backend_args)[0m
  n_jobs = min(effective_

## Hosting your model
You can use a trained model to get real time predictions using HTTP endpoint. Follow these steps to walk you through the process.


In [21]:
training_job_name = tree.latest_training_job.name
attached_tree = sage.estimator.Estimator.attach(training_job_name)


2021-03-07 13:55:20 Starting - Preparing the instances for training
2021-03-07 13:55:20 Downloading - Downloading input data
2021-03-07 13:55:20 Training - Training image download completed. Training in progress.
2021-03-07 13:55:20 Uploading - Uploading generated training model
2021-03-07 13:55:20 Completed - Training job completed



### Deploy the model
Deploying the model to SageMaker hosting just requires a deploy call on the fitted model. This call takes an instance count, instance type, and optionally serializer and deserializer functions. These are used when the resulting predictor is created on the endpoint.

In [23]:

from sagemaker.predictor import csv_serializer
predictor = attached_tree.deploy(4, 'ml.m4.xlarge', serializer=csv_serializer,endpoint_name='ticket-classifier-2')

-------------!

## Choose some data and use it for a prediction


In [24]:
test_data  = 's3://aws-marketplace-mphasis-assets/ticket-classifier/test.csv'

data = pd.read_csv(test_data,encoding='ISO-8859–1',header=None)
data

Unnamed: 0,0,1,2,3,4,5,6,7,8
0,ID,Reported_Day,prod_cat,Country,Detailed_Description,Priority,Impact,Incident_Type,Reported_Source
1,INC000014022289,3,prod_cat -1,Country-1,hi since recruiter lead permission approve req...,Low,4-Minor,Incident_Type-1,Phone
2,INC000014060316,2,prod_cat -1,Country-1,re annual leave hello please help absence reco...,Low,1-Minor,Incident_Type-2,Phone
3,INC000013880496,3,prod_cat -2,Country-1,monitoring shared mailbox hello please creatio...,Low,2-Minor,Incident_Type-1,Phone


In [25]:
predictions = predictor.predict(data.values).decode('utf-8')



The csv_serializer has been renamed in sagemaker>=2.
See: https://sagemaker.readthedocs.io/en/stable/v2.html for details.


In [26]:
print(predictions)

ID,Reported_Day,prod_cat,Country,Priority,Impact,Incident_Type,Reported_Source,Predicted Group
INC000014022289,3,prod_cat -1,Country-1,Low,4-Minor,Incident_Type-1,Phone,Target-7
INC000014060316,2,prod_cat -1,Country-1,Low,1-Minor,Incident_Type-2,Phone,Target-4
INC000013880496,3,prod_cat -2,Country-1,Low,2-Minor,Incident_Type-1,Phone,Target-4



### Output

Output files contains column predicted Group, which has the predicted class

In [27]:
transform_output_folder = "batch-transform-output"
output_path="s3://{}/{}".format(sess.default_bucket(), transform_output_folder)

transformer = tree.transformer(instance_count=1,
                               instance_type='ml.m4.xlarge',
                               output_path=output_path)

In [28]:
transformer.transform(test_data, content_type='text/csv')
transformer.wait()
print("Batch Transform output saved to " + transformer.output_path)

...........................
.[32m2021-03-07T14:10:04.900:[sagemaker logs]: MaxConcurrentTransforms=1, MaxPayloadInMB=6, BatchStrategy=MULTI_RECORD[0m
[34mStarting the inference server with 4 workers.[0m
[34m2021/03/07 14:10:03 [crit] 13#13: *1 connect() to unix:/tmp/gunicorn.sock failed (2: No such file or directory) while connecting to upstream, client: 169.254.255.130, server: , request: "GET /ping HTTP/1.1", upstream: "http://unix:/tmp/gunicorn.sock:/ping", host: "169.254.255.131:8080"[0m
[34m169.254.255.130 - - [07/Mar/2021:14:10:03 +0000] "GET /ping HTTP/1.1" 502 182 "-" "Go-http-client/1.1"[0m
[34m169.254.255.130 - - [07/Mar/2021:14:10:03 +0000] "GET /ping HTTP/1.1" 502 182 "-" "Go-http-client/1.1"[0m
[34m2021/03/07 14:10:03 [crit] 13#13: *3 connect() to unix:/tmp/gunicorn.sock failed (2: No such file or directory) while connecting to upstream, client: 169.254.255.130, server: , request: "GET /ping HTTP/1.1", upstream: "http://unix:/tmp/gunicorn.sock:/ping", host: "169

#### Inspect the Batch Transform Output in S3

In [29]:
from urllib.parse import urlparse

parsed_url = urlparse(transformer.output_path)
bucket_name = parsed_url.netloc
file_key = '{}/{}.out'.format(parsed_url.path[1:], "test.csv")



s3_client = sess.boto_session.client('s3')

response = s3_client.get_object(Bucket = sess.default_bucket(), Key = file_key)
response_bytes = response['Body'].read().decode('utf-8')
print(response_bytes)

ID,Reported_Day,prod_cat,Country,Priority,Impact,Incident_Type,Reported_Source,Predicted Group
INC000014022289,3,prod_cat -1,Country-1,Low,4-Minor,Incident_Type-1,Phone,Target-7
INC000014060316,2,prod_cat -1,Country-1,Low,1-Minor,Incident_Type-2,Phone,Target-4
INC000013880496,3,prod_cat -2,Country-1,Low,2-Minor,Incident_Type-1,Phone,Target-4



### View Output
Lets read results of above transform job from s3 files and print output

In [30]:
s3_client = sess.boto_session.client('s3')
s3_client.download_file(sess.default_bucket(), "{}/test.csv.out".format(transform_output_folder), '/tmp/test.csv.out')
with open('/tmp/test.csv.out') as f:
    results = f.readlines() 
##print("Transform results: \n{}".format(''.join(results)))
string_final = ''.join(results)

print(string_final)

with open("Output.txt", "w") as text_file:
    text_file.write(string_final)

ID,Reported_Day,prod_cat,Country,Priority,Impact,Incident_Type,Reported_Source,Predicted Group
INC000014022289,3,prod_cat -1,Country-1,Low,4-Minor,Incident_Type-1,Phone,Target-7
INC000014060316,2,prod_cat -1,Country-1,Low,1-Minor,Incident_Type-2,Phone,Target-4
INC000013880496,3,prod_cat -2,Country-1,Low,2-Minor,Incident_Type-1,Phone,Target-4

