#  Training Model with AutoML 

We train the model using the data in bigquery

---
## Setup

inputs:

In [100]:
project = !gcloud config get-value project
PROJECT_ID = project[0]
PROJECT_ID

'group-24-361920'

In [101]:
REGION = 'us-west1'
DATANAME = 'Customer_Complaints'

# Server where the model is deployed
DEPLOY_COMPUTE = 'n1-standard-4'


packages:

In [102]:
from google.cloud import aiplatform
from datetime import datetime

from google.cloud import bigquery
from google.protobuf import json_format
from google.protobuf.struct_pb2 import Value
import json
import numpy as np

clients:

In [103]:
aiplatform.init(project=PROJECT_ID, location=REGION)
bigquery = bigquery.Client()

parameters:

In [104]:
TIMESTAMP = datetime.now().strftime("%Y%m%d%H%M%S")
DIR = f"temp/assets"

environment:

In [105]:
!rm -rf {DIR}
!mkdir -p {DIR}

---
## Creating a Dataset that links to the BigQuery table with split data

In [106]:
Customer_Complaints_dataset = aiplatform.TabularDataset.create(
    display_name=f"Customer_Complaints_{TIMESTAMP}", bq_source=['bq://group-24-361920.Customer_Complaints.Customer_Complaints_copy'])
print(Customer_Complaints_dataset.column_names)

Creating TabularDataset
Create TabularDataset backing LRO: projects/550877048093/locations/us-west1/datasets/5632314283980226560/operations/7318967320011866112
TabularDataset created. Resource name: projects/550877048093/locations/us-west1/datasets/5632314283980226560
To use this TabularDataset in another session:
ds = aiplatform.TabularDataset('projects/550877048093/locations/us-west1/datasets/5632314283980226560')
['contains_sensitive_data', 'has_media', 'fit_for_for_self_help', 'fit_for_forum', 'tier_level', 'response_id', 'response_text', 'ticket_id', 'response_source', 'client_id', 'row_id', 'ticket_source', 'response_created_at', 'ticket_sent_to', 'ticket_created_at', 'ticket_text', 'ticket_type', 'splits', 'int64_field_0', 'first_reply_time_secs']


---
## Training Model with AutoML

In [109]:
#columns to use
#creating a list of columns used for training the model
column_specs = list(set(Customer_Complaints_dataset.column_names) - set(['has_media', 'response_id', 'ticket_id', 'client_id', 'row_id', 'ticket_source', 'response_created_at','response_text', 'ticket_created_at', 'tier_level', 'splits', 
                                                                         'int64_field_0', 'first_reply_time_secs','contains_sensitive_data','fit_for_for_self_help','fit_for_forum',
                                                                        'response_source', 'ticket_sent_to' ]))


In [110]:
#creating a dictionary of the columns
column_specs = dict.fromkeys(column_specs, 'auto')
print(column_specs)

{'ticket_text': 'auto', 'ticket_type': 'auto'}


### Creating AutoML Job

In [111]:
tabular_classification_job = aiplatform.AutoMLTabularTrainingJob(
    display_name = f'CustomerComplaints_AutoML_{TIMESTAMP}',
    optimization_prediction_type = 'classification',
    column_specs = column_specs,
)

In [112]:
#running the job
model = tabular_classification_job.run(
    dataset = Customer_Complaints_dataset,
    target_column = 'tier_level',
    predefined_split_column_name = 'splits',
    budget_milli_node_hours = 1000,
    model_display_name = f'CustomerComplaints_Model_{TIMESTAMP}',
    disable_early_stopping = False,
)

View Training:
https://console.cloud.google.com/ai/platform/locations/us-west1/training/2146330260302462976?project=550877048093
AutoMLTabularTrainingJob projects/550877048093/locations/us-west1/trainingPipelines/2146330260302462976 current state:
PipelineState.PIPELINE_STATE_RUNNING
AutoMLTabularTrainingJob projects/550877048093/locations/us-west1/trainingPipelines/2146330260302462976 current state:
PipelineState.PIPELINE_STATE_RUNNING
AutoMLTabularTrainingJob projects/550877048093/locations/us-west1/trainingPipelines/2146330260302462976 current state:
PipelineState.PIPELINE_STATE_RUNNING
AutoMLTabularTrainingJob projects/550877048093/locations/us-west1/trainingPipelines/2146330260302462976 current state:
PipelineState.PIPELINE_STATE_RUNNING
AutoMLTabularTrainingJob projects/550877048093/locations/us-west1/trainingPipelines/2146330260302462976 current state:
PipelineState.PIPELINE_STATE_RUNNING
AutoMLTabularTrainingJob projects/550877048093/locations/us-west1/trainingPipelines/2146330

---
## Evaluation

Setup a model client for the model create by this notebook:

In [113]:
model.resource_name

'projects/550877048093/locations/us-west1/models/8106004340243693568'

In [None]:
#Show more evluations of the model here.

---
## Endpoint and Deployment

In [114]:
#creating an endpoint
endpoint = aiplatform.Endpoint.create(
    display_name = f'CustomerComplaints_endpoint_{TIMESTAMP}',
)

Creating Endpoint
Create Endpoint backing LRO: projects/550877048093/locations/us-west1/endpoints/248559996621553664/operations/5001865311729745920
Endpoint created. Resource name: projects/550877048093/locations/us-west1/endpoints/248559996621553664
To use this Endpoint in another session:
endpoint = aiplatform.Endpoint('projects/550877048093/locations/us-west1/endpoints/248559996621553664')


In [115]:
#deploying the model to the created endpoint, we are routing 100% of the traffic to this endpoint 
endpoint.deploy(
    model = model,
    deployed_model_display_name = f'CustomerComplaints_model_{TIMESTAMP}',
    traffic_percentage = 100,
    machine_type = DEPLOY_COMPUTE,
    min_replica_count = 1,
    max_replica_count = 1
)

Deploying Model projects/550877048093/locations/us-west1/models/8106004340243693568 to Endpoint : projects/550877048093/locations/us-west1/endpoints/248559996621553664
Deploy Endpoint model backing LRO: projects/550877048093/locations/us-west1/endpoints/248559996621553664/operations/6159290415963963392
Endpoint model deployed. Resource name: projects/550877048093/locations/us-west1/endpoints/248559996621553664


---
## Making Predictions from the deployed model

We get some columns for testing from the dataset, these columns were not involved in training

In [62]:
sql = """
    SELECT *
    FROM `group-24-361920.Customer_Complaints.Customer_Complaints_copy`
    WHERE splits ='TEST'
    LIMIT 10
    ;
"""

testingData = bigquery.query(query = sql).to_dataframe()

In [116]:
testingData.head(4)

Unnamed: 0,int64_field_0,client_id,ticket_id,ticket_sent_to,ticket_type,fit_for_forum,fit_for_for_self_help,tier_level,contains_sensitive_data,ticket_text,has_media,ticket_source,ticket_created_at,first_reply_time_secs,response_id,response_text,response_source,response_created_at,row_id,splits
0,232,224961061,1567776119495053312,MTN,Problem Ticket,False,False,TIER-1,False,"@mtnug what’s happening?, can’t access my MoKa...",False,Twitter for iPhone,0022-09-08 07:25:00+00:00,551.0,1567778431462408192,@alvinagume that is unfortunate. DM your numbe...,Khoros CX,9/8/22 7:34 AM,8f172c25-847e-4f95-9d38-56fbf5715e3b,TEST
1,241,1481163747737710596,1564503612390268928,MTN,Problem Ticket,True,False,TIER-4,True,@mtnug hello yesterday i tried to pay mkopa ac...,True,Twitter for Android,0022-08-30 06:41:00+00:00,112.999999,1564504086208253952,"@UrbanMatsiko Hi Urban, sorry about that. This...",Khoros CX,8/30/22 6:43 AM,41bb9f29-2c4d-480a-9397-16027e59cf1d,TEST
2,85,910999925248454656,1569656156855091200,MTN,Problem Ticket,True,True,TIER-1,False,@mtnug @mtnmomoug My number is not roaming in ...,False,Twitter for Android,0022-09-13 11:56:00+00:00,98.0,1569656570002153472,@EcoHubAfrica apologies. Please connect to AT&...,Khoros CX,9/13/22 11:57 AM,52b5726f-8cd3-403b-8963-2a879c004b38,TEST
3,183,2559814557,1568362569093505024,MTN,Problem Ticket,False,False,TIER-2,False,@mtnug is mobile money down right now?? I can'...,True,Twitter for Android,0022-09-09 22:15:00+00:00,98.0,1568362977119838209,"@Fenatos Hello SAMU, Our apologies for inconve...",Khoros CX,9/9/22 10:17 PM,0edd82b4-300f-4af3-986f-8f1246c623d5,TEST


In [123]:
#getting the first column of the testing data
first_row=testingData.iloc[0].to_dict()

In [124]:
#removing columns that were not involved in training and the target column

first_row.pop('splits')
first_row.pop('has_media')
first_row.pop('response_id')
first_row.pop('client_id')
first_row.pop('row_id')
first_row.pop('ticket_source')
first_row.pop('response_created_at')
first_row.pop('ticket_created_at')
first_row.pop('tier_level')
first_row.pop('int64_field_0')
first_row.pop('first_reply_time_secs')
first_row.pop('ticket_sent_to')
first_row.pop('fit_for_forum')
first_row.pop( 'contains_sensitive_data')
first_row.pop('response_text')
first_row.pop( 'response_source')
first_row.pop( 'ticket_id')
first_row.pop( 'fit_for_for_self_help')

print('These are the parameters that we provide to the model to make a preiction')
first_row


These are the parameters that we provide to the model to make a preiction


{'ticket_type': 'Problem Ticket',
 'ticket_text': '@mtnug what’s happening?, can’t access my MoKash savings….'}

In [125]:
#changing the parameters into json format
instances = [json_format.ParseDict(first_row, Value())]
instances 

[struct_value {
   fields {
     key: "ticket_text"
     value {
       string_value: "@mtnug what\342\200\231s happening?, can\342\200\231t access my MoKash savings\342\200\246."
     }
   }
   fields {
     key: "ticket_type"
     value {
       string_value: "Problem Ticket"
     }
   }
 }]

### Getting Predictions Using the Python Client

In [126]:
prediction = endpoint.predict(instances=instances)

In [127]:
prediction

Prediction(predictions=[{'classes': ['TIER-1', 'TIER-0', 'TIER-2', 'TIER-3', 'TIER-4'], 'scores': [0.664622962474823, 0.2379840165376663, 0.08295109122991562, 0.01235676556825638, 0.002085175830870867]}], deployed_model_id='5575825774591606784', model_version_id='1', model_resource_name='projects/550877048093/locations/us-west1/models/8106004340243693568', explanations=None)

In [128]:
prediction.predictions[0]['classes'][np.argmax(prediction.predictions[0])]

'TIER-1'

---
## Explanations


In [129]:
explanation = endpoint.explain(instances=instances)

In [130]:
explanation.predictions

[{'classes': ['TIER-1', 'TIER-0', 'TIER-2', 'TIER-3', 'TIER-4'],
  'scores': [0.664622962474823,
   0.2379840165376663,
   0.08295109122991562,
   0.01235676556825638,
   0.002085175830870867]}]

From the above explaination, The model identifies the ticket as a TIER-1 ticket because TIER-1
has the highest score.