## Overview

- **Expectation Used:** [expect_column_values_confidence_for_data_label_to_be_greater_than_or_equal_to_threshold](https://github.com/great-expectations/great_expectations/blob/develop/contrib/capitalone_dataprofiler_expectations/capitalone_dataprofiler_expectations/expectations/expect_column_values_confidence_for_data_label_to_be_greater_than_or_equal_to_threshold.py)

- **Expectation Description:** This expectation will check every record in the user specified column to determine if any given value is detected under the user-specified label with a confidence greater than or equal to the user-specified threshold. The actual confidence level is generated by the Data Profiler's Data Labeler when it processes each record.

- **Use Case:** If a user has sensitive data, such as SSNs, that they would expect to be detected as SSN label at a confidence level greater than or equal to a certain threshold, then they could use this expectation to identify and records which fall below this threshold.

- **Example Details:** In this example, lets assume a data owner has a dataset that holds salary information about individuals in the data science field. The dataset has uuids which uniquely identifies each record. Let's also assume that the uuid column is a join variable representing a unique individual. The data owner may want more insight into the Data Labeler's confidence levels on validating these uuid records as true `UUID` values. This is to ensure joins are conducted on high quality key columns.

### Imports

In [None]:
import os

import pandas as pd
import numpy as np

# Great expectations imports
import great_expectations as ge
from capitalone_dataprofiler_expectations.expectations. \
    expect_column_values_confidence_for_data_label_to_be_greater_than_or_equal_to_threshold \
    import ExpectColumnValuesConfidenceForDataLabelToBeGreaterThanOrEqualToThreshold
from great_expectations.self_check.util import build_pandas_validator_with_data

# Data Profiler imports
import dataprofiler as dp

### Setup
Below we are going to import a dataset from the Data Profile testing suite. This csv holds information on the salaries of individuals in the data science field from all over the world.

In [None]:
context = ge.get_context()

In [None]:
data_path = "../../dataprofiler/tests/data/csv/ds_salaries.csv"
data = dp.Data(data_path).data
data

### Running the Exception
We build the validator by passing in the dataframe that has been built above. Then we will use the exception below to check that records from the `uuid` column detected by the Data Labeler as `UUID` labels with a confidence greater than or equal to 0.9. Any records that fall below this threshold will trigger a violation in the expectation report indicating to the data owner which uuids do not satisfy the expectation.

In [None]:
validator = build_pandas_validator_with_data(data)
results = validator.expect_column_values_confidence_for_data_label_to_be_greater_than_or_equal_to_threshold(
    column='uuid',
    data_label='UUID',
    threshold=.90
)

### Results
From the output below, the data owner can see that the expectation has pass successfully. This indicates that all records in the `uuid` column are true `UUID` values with a confidence level greater than or equal to 0.9. Therefore, the ETL pipeline is protected from unsafe joins that could cause data issues further in the process.

In [None]:
results