In [None]:
# Copyright 2023 Google LLC
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
#     https://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

# NLU Evaluation Testing
In this notebook, we will show you how to perform bulk NLU testing by providing a large input corpus and receiving the predicted Intent and Parameter extraction results from your agent.

## Prerequisites
- Ensure you have a GCP Service Account key with the Dialogflow API Admin privileges assigned to it.

In [None]:
# If you haven't already, make sure you install the `dfcx-scrapi` library

!pip install dfcx-scrapi

# Imports

In [17]:
from dfcx_scrapi.tools.nlu_evals import NluEvals

# (Option 1) Google Sheet as Input
The primary option for running the NLU Eval pipeline revolves around using Google Sheets as a source for your data.  
This method provides a simple, streamlined system that can pull and push data all into a single Google Sheet.

In order to run the full NLU evaluation test, the following inputs are needed:
- `agent_id`, The Dialogflow CX Agent ID.
- `input_google_sheet`, the Display Name of the Google Sheet.
- `input_google_sheet_tab`, the Display Name of the tab on the Google Sheet where your input data lives.
- `output_google_sheet_results`, the Display Name of the tab on the Google Sheet where you want the full output results to be written.
- `output_google_sheet_summary`, the Display Name of the tab on the Google Sheet where you want the report summary to be written.

_**NOTE** - In order for your Service Account to access your Google Sheet (read / write) you need to share the Google Sheet with your Service Account email address._

You can find a [Sample Google Sheet dataset](https://docs.google.com/spreadsheets/d/e/2PACX-1vREvsZAktNvRr78KjUBlZl2PVUHKJru8hRCgmuDi9kn_oDT_weFKkGmyoQwRPdj0JcxK1kNzgceAPA5/pubhtml#) here.

In [18]:
agent_id = '<YOUR_AGENT_ID>'
creds_path = '<YOUR_CREDS_FILE>'

# Sample Inputs
input_google_sheet = 'Dialogflow CX SCRAPI - NLU Eval Sample Dataset'
input_google_sheet_tab = 'input_dataset'
output_google_sheet_results = 'results'
output_google_sheet_summary = 'summary'

## Run NLU Evals
There are 3 main stages that happen for the Eval Pipeline:
1. Process and validate the input data
2. Run the Eval Tests
3. Write the output summary and details to a report.

In [19]:
nlu = NluEvals(agent_id, creds_path=creds_path)

df = nlu.process_input_google_sheet(input_google_sheet, input_google_sheet_tab)
df = nlu.run_evals(df)
nlu.write_results_to_sheets(df, input_google_sheet, output_google_sheet_results, output_google_sheet_summary)

2023-09-11 11:40:31 INFO     ---------- STARTING Evals ----------
2023-09-11 11:40:41 INFO     Progress(0/15)[>                                                 ] 0.00%
2023-09-11 11:40:51 INFO     Progress(15/15)[------------------------------------------------->] 100.00%
2023-09-11 11:40:51 INFO     ---------- Evals COMPLETE ----------


## Inspect Results Locally
You can also inspect and filter the results of your tests locally as needed.

In [20]:
df.head()

Unnamed: 0,flow_display_name,page_display_name,utterance,expected_intent,expected_parameters,target_page,match_type,confidence,parameters_set,detected_intent,agent_display_name,description,input_source
0,Default Start Flow,START_PAGE,I need to get my order status,head_intent.order_status,,sentiment_router,INTENT,1.0,,head_intent.order_status,[Demo] Multi Demo Extravaganza Part Deux: The Revenge,Demo Tests,input_dataset
1,Default Start Flow,START_PAGE,Trying to check the status of my order,head_intent.order_status,,sentiment_router,INTENT,0.947959,,head_intent.order_status,[Demo] Multi Demo Extravaganza Part Deux: The Revenge,Demo Tests,input_dataset
2,Default Start Flow,START_PAGE,I hate this order status agent!,head_intent.order_status,,sentiment_router,INTENT,0.955709,,head_intent.order_status,[Demo] Multi Demo Extravaganza Part Deux: The Revenge,Demo Tests,input_dataset
3,Default Start Flow,START_PAGE,Wha'ts the point of ordering anything?,NO_MATCH,,sentiment_router,INTENT,0.841712,,head_intent.order_status,[Demo] Multi Demo Extravaganza Part Deux: The Revenge,Demo Tests,input_dataset
4,Default Start Flow,START_PAGE,I was looking at the order of operations yesterday but couldn't figure it out,NO_MATCH,,sentiment_router,INTENT,0.790275,,head_intent.order_status,[Demo] Multi Demo Extravaganza Part Deux: The Revenge,Demo Tests,input_dataset


# (Option 2) CSV as Input
Similar to the above pipeline, except we will process the input data from a CSV file.

For the output to local files, you will need to define 2 output destinations:
1. An output file for the full detailed results
2. An output file for the report summary

- `agent_id`, The Dialogflow CX Agent ID.
- `input_path`, The local path where your input data lives
- `output_summary_path`, The local path where you want the report summary written
- `output_results_path`, The local path where you want the full results written

You can find a [Sample CSV Dataset here.](https://github.com/GoogleCloudPlatform/dfcx-scrapi/blob/main/data/nlu_evals_sample.csv)

In [13]:
agent_id = '<YOUR_AGENT_ID>'

input_path = '/path/to/your/input/data.csv'
output_summary_path = '/path/to/your/output/summary.csv'
output_results_path = '/path/to/your/output/results.csv'

## Run NLU Evals

There are 3 main stages that happen for the Eval Pipeline:
1. Process and validate the input data
2. Run the Eval Tests
3. Write the output summary and details to a report.

In [14]:
nlu = NluEvals(agent_id, creds_path=creds_path)

df = nlu.process_input_csv(input_path)
df = nlu.run_evals(df)
nlu.write_summary_to_file(df, output_summary_path)
nlu.write_results_to_file(df, output_results_path)

2023-09-11 11:31:51 INFO     ---------- STARTING Evals ----------
2023-09-11 11:31:58 INFO     Progress(0/15)[>                                                 ] 0.00%
2023-09-11 11:32:08 INFO     Progress(15/15)[------------------------------------------------->] 100.00%
2023-09-11 11:32:08 INFO     ---------- Evals COMPLETE ----------
