## Overview

- **Expectation Used:** [expect_column_values_confidence_for_data_label_to_be_greater_than_or_equal_to_threshold](https://github.com/great-expectations/great_expectations/blob/develop/contrib/capitalone_dataprofiler_expectations/capitalone_dataprofiler_expectations/expectations/expect_column_values_confidence_for_data_label_to_be_greater_than_or_equal_to_threshold.py)

- **Expectation Description:** This expectation will check every record in the user specified column to determine if any given value is detected as a label with a confidence greater than or equal to the user specified threshold. The confidence level is generated by the Data Profiler's Data Labeler.

- **Use Case:** If a user has sensitive data, such as SSNs, that they would expect to be detected as SSN label at a confidence level greater than or equal to a certain threshold, then they could use this expectation to identify and records which fall below this threshold.

- **Example Details:** In this particular example, we are checking how much of our data in an ip column can be detected with the `IPV4` label with a confidence level higher than 0.85.

### Imports

In [None]:
import os

import pandas as pd
import numpy as np

# Great expectations imports
import great_expectations as ge
from capitalone_dataprofiler_expectations.expectations. \
    expect_column_values_confidence_for_data_label_to_be_greater_than_or_equal_to_threshold \
    import ExpectColumnValuesConfidenceForDataLabelToBeGreaterThanOrEqualToThreshold
from great_expectations.self_check.util import build_pandas_validator_with_data

### Setup
Below we are going to import a dataset from the Data Profile testing suite. This csv holds a column named `srcip` which holds the ip data that will be used further below.

In [None]:
context = ge.get_context()

In [None]:
aws_honeypot_data_path = "../../dataprofiler/tests/data/csv/aws_honeypot_marx_geo.csv"
df = pd.read_csv(aws_honeypot_data_path)
df.head()

### Running the Exception
We build the validator by passing in the dataframe we build above. Then we will use the exception below to find values in the `srcip` column that are labeled as `IPV4` with a confidence of .85 or higher.

In [None]:
validator = build_pandas_validator_with_data(df)
results = validator.expect_column_values_confidence_for_data_label_to_be_greater_than_or_equal_to_threshold(
    column='srcip',
    data_label='IPV4',
    threshold=.85
)

### Results
From the output below, you can see that there are 9 values in the `srcip` column which are detected by our expectation with a confidence value greater than .85 as well as two rows which were missing values.

In [None]:
results