In [None]:
# Copyright 2021 Google LLC
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
#     https://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

# Introduction
In this notebook, we will show you how to evaluate the levenshtein ration between the training phrases of two intents.

## Prerequisites
- Ensure you have a GCP Service Account key with the Dialogflow API Admin privileges assigned to it.
- If you haven't already, make sure you install the `dfcx-scrapi` library

    `pip install dfcx-scrapi`

## Imports

In [None]:
from dfcx_scrapi.core.intents import Intents
from dfcx_scrapi.tools.analysis_util import AnalysisUtil

# Usage

## Prerequisite Information
Getting an Intent from your existing DFCX agent requires the following information:
- `agent_id`, which is your GCP agent ID.
- `creds_path`, path to your service account credentials file.

## Get your agent's Intents
Instantiate an Intents class object using `creds_path`:

In [None]:
i = Intents(creds_path)

You can use your intent's unique ID or its human-readable name to find and
reference the intents you wish to analyze. There are many ways to create
and reference on Intent object. In this example, I chose to get all intents
associated with my agent, then use the intent's human-readable name to 
pinpoint the specific intents I wish to compare.

In [None]:
intentsMap = i.get_intents_map(agent_id=agent_path, reverse=True)
intent1 = i.get_intent(intentsMap["Intent One"])
intent2 = i.get_intent(intentsMap["Intent Two"])

## Invoke the anaylsis tool

In [None]:
#The threshold parameter determines the level of similarity required in order to be included in the output. Default is .75, or 75% similar.
th = 0.7

result = AnalysisUtil.calc_tp_distances(intent_key=intent1, intent_comparator=intent2, threshold=th)

The difference between `intent_key` and `intent_comparator` is in the structure of the output.
- `intent_key` will serve as a unique key in the object that is returned.
- `intent_comparator` may appear multiple times, as each key can reference every comparator with a similarity ratio over the designated threshold.
- In other words, there is a one-to-many relationship between `intent_key` and `intent_comparator`.

## That's it!
If you didn't designate `silent=True` as one of your parameters (and assuming you can see the command terminal associated with the execution), you should be able to see a percentage of completion ticking away. Note this process may take a while, especially for larger intents.

Once complete, `calc_tp_distances` will return a Dictionary object of the following format:

In [None]:
{   
    'stats': {   
        'comparators': {   
            'num_overlap': int,
            'percent_overlap': str,
            'total:': int
            },
        'keys': {   
            'num_overlap': int,
            'percent_overlap': str,
            'total:': int
            }
        },
    'distances': {   
        'tp 1 from intent 1 as str': {'tp from intent 2 as str': float, 'tp from intent 2 as str': float, 'tp from intent 2 as str': float, ...},
        'tp 2 from intent 1 as str': {'tp from intent 2 as str': float, 'tp from intent 2 as str': float, 'tp from intent 2 as str': float, ...},
        ...
        'tp n from intent 1 as str': {'tp from intent 2 as str': float, 'tp from intent 2 as str': float, 'tp from intent 2 as str': float, ...},  
    }
}

- `stats`: basic statistics about the results.
  - `comparators`: statistics about the comparator training phrases:
    - `num_overlap`: Number of comparators found to be equal or more similar than the specified threshold. Note that this number can exceed the total number of tps in the comparator, as each key maps to as many training phrases as are similar.
    - `percent_overlap`: Percentage of comparators with a similarity to other intents. 100% means every single comparator was found to be similar to every single key. Generally speaking, this value should be very low.
    - `total`: Total number of comparators.
  - `keys`: statistics about the key training phrases:
    - `num_overlap`: Number of keys found to be equally or more similar than the specified threshold. Whether a key has 1 similar comparator or 100 does not change the value of this parameter, it simply checks whether a key has any associated comparators.
    - `percent_overlap`: Percentage of keys with an associated comparator above the designated threshold. This value does not account for the number of comparators, only for whether at least one exists.
    - `total`: Total number of keys.
- `distances`: a list of Dictionaries where each training phrase from the key intent serves as... well, the key. The value associated with each key is a Dictionary containing a list of comparator intents and their associated similarity ratio.

# Example output: