# Edit Your Expectation Suite
Use this notebook to recreate and modify your expectation suite:

**Expectation Suite Name**: `nnaranov.ods_traffic.warning`

We'd love it if you **reach out to us on** the [**Great Expectations Slack Channel**](https://greatexpectations.io/slack)

In [1]:
import datetime
import great_expectations as ge
import great_expectations.jupyter_ux
from great_expectations.checkpoint import LegacyCheckpoint
from great_expectations.data_context.types.resource_identifiers import ValidationResultIdentifier

context = ge.data_context.DataContext()

# Feel free to change the name of your suite here. Renaming this will not
# remove the other one.
expectation_suite_name = "nnaranov.ods_traffic.warning"
suite = context.get_expectation_suite(expectation_suite_name)
suite.expectations = []

batch_kwargs = {'data_asset_name': 'nnaranov.ods_traffic', 'datasource': 'gr', 'limit': 1000, 'schema': 'nnaranov', 'table': 'ods_traffic'}
batch = context.get_batch(batch_kwargs, suite)
batch.head()

2021-05-23T01:04:47+0300 - INFO - Great Expectations logging enabled at 20 level by JupyterUX module.
2021-05-23T01:04:48+0300 - INFO - Generating query from table batch_kwargs based on limit and offset
2021-05-23T01:04:50+0300 - INFO - 	0 expectation(s) included in expectation_suite.


Unnamed: 0,user_id,traffic_timestamp,device_id,device_ip_addr,bytes_sent,bytes_received
0,10110,2013-06-30 18:22:31,d006,190.163.88.42,29794,41550
1,10830,2013-12-26 18:22:31,d003,169.177.10.136,46843,1849
2,10030,2013-12-30 18:22:31,d003,93.17.17.147,8796,28697
3,10330,2013-04-11 18:22:31,d005,176.62.69.50,25061,22613
4,10400,2013-10-11 18:22:31,d001,101.86.191.179,28969,6470


## Create & Edit Expectations

Add expectations by calling specific expectation methods on the `batch` object. They all begin with `.expect_` which makes autocompleting easy using tab.

You can see all the available expectations in the **[expectation glossary](https://docs.greatexpectations.io/en/latest/reference/glossary_of_expectations.html?utm_source=notebook&utm_medium=create_expectations)**.

### Table Expectation(s)

In [2]:
batch.expect_table_row_count_to_be_between(max_value=1100, min_value=900)

{
  "exception_info": {
    "raised_exception": false,
    "exception_traceback": null,
    "exception_message": null
  },
  "success": true,
  "result": {
    "observed_value": 1000
  },
  "meta": {}
}

In [3]:
batch.expect_table_column_count_to_equal(value=6)

{
  "exception_info": {
    "raised_exception": false,
    "exception_traceback": null,
    "exception_message": null
  },
  "success": true,
  "result": {
    "observed_value": 6
  },
  "meta": {}
}

In [4]:
batch.expect_table_columns_to_match_ordered_list(column_list=['user_id', 'traffic_timestamp', 'device_id', 'device_ip_addr', 'bytes_sent', 'bytes_received'])

{
  "exception_info": {
    "raised_exception": false,
    "exception_traceback": null,
    "exception_message": null
  },
  "success": true,
  "result": {
    "observed_value": [
      "user_id",
      "traffic_timestamp",
      "device_id",
      "device_ip_addr",
      "bytes_sent",
      "bytes_received"
    ]
  },
  "meta": {}
}

### Column Expectation(s)

#### `device_id`

In [5]:
batch.expect_column_values_to_not_be_null(column='device_id')

{
  "exception_info": {
    "raised_exception": false,
    "exception_traceback": null,
    "exception_message": null
  },
  "success": true,
  "result": {
    "element_count": 1000,
    "unexpected_count": 0,
    "unexpected_percent": 0.0,
    "unexpected_percent_total": 0.0,
    "partial_unexpected_list": []
  },
  "meta": {}
}

In [None]:
batch.expect_column_distinct_values_to_be_in_set(column='device_id', value_set=['d001', 'd002', 'd003', 'd004', 'd005', 'd006', 'd007', 'd008', 'd009'])

In [None]:
batch.expect_column_kl_divergence_to_be_less_than(column='device_id', partition_object={'values': ['d001', 'd002', 'd003', 'd004', 'd005', 'd006', 'd007', 'd008', 'd009'], 'weights': [0.104, 0.11, 0.118, 0.122, 0.1, 0.109, 0.114, 0.117, 0.106]}, threshold=0.6)

#### `bytes_sent`

In [6]:
batch.expect_column_values_to_not_be_null(column='bytes_sent')

{
  "exception_info": {
    "raised_exception": false,
    "exception_traceback": null,
    "exception_message": null
  },
  "success": true,
  "result": {
    "element_count": 1000,
    "unexpected_count": 0,
    "unexpected_percent": 0.0,
    "unexpected_percent_total": 0.0,
    "partial_unexpected_list": []
  },
  "meta": {}
}

In [None]:
batch.expect_column_min_to_be_between(column='bytes_sent', max_value=44, min_value=42)

In [None]:
batch.expect_column_max_to_be_between(column='bytes_sent', max_value=49929, min_value=49927)

In [None]:
batch.expect_column_mean_to_be_between(column='bytes_sent', max_value=24791.961, min_value=24789.961)

In [None]:
batch.expect_column_median_to_be_between(column='bytes_sent', max_value=23826.5, min_value=23824.5)

In [None]:
batch.expect_column_quantile_values_to_be_between(column='bytes_sent', allow_relative_error=True, quantile_ranges={'quantiles': [0.05, 0.25, 0.5, 0.75, 0.95], 'value_ranges': [[2480, 2482], [12462, 12464], [23815, 23817], [36984, 36986], [47654, 47656]]})

#### `device_ip_addr`

In [7]:
batch.expect_column_values_to_not_be_null(column='device_ip_addr')

{
  "exception_info": {
    "raised_exception": false,
    "exception_traceback": null,
    "exception_message": null
  },
  "success": true,
  "result": {
    "element_count": 1000,
    "unexpected_count": 0,
    "unexpected_percent": 0.0,
    "unexpected_percent_total": 0.0,
    "partial_unexpected_list": []
  },
  "meta": {}
}

In [None]:
batch.expect_column_value_lengths_to_be_between(column='device_ip_addr', min_value=1)

In [9]:
batch.expect_column_values_to_match_regex(column='device_ip_addr', regex="^\d{1,3}.\d{1,3}.\d{1,3}.\d{1,3}$")

{
  "exception_info": {
    "raised_exception": false,
    "exception_traceback": null,
    "exception_message": null
  },
  "success": true,
  "result": {
    "element_count": 1000,
    "missing_count": 0,
    "missing_percent": 0.0,
    "unexpected_count": 0,
    "unexpected_percent": 0.0,
    "unexpected_percent_total": 0.0,
    "unexpected_percent_nonmissing": 0.0,
    "partial_unexpected_list": []
  },
  "meta": {}
}

#### `traffic_timestamp`

In [8]:
batch.expect_column_values_to_not_be_null(column='traffic_timestamp')

{
  "exception_info": {
    "raised_exception": false,
    "exception_traceback": null,
    "exception_message": null
  },
  "success": true,
  "result": {
    "element_count": 1000,
    "unexpected_count": 0,
    "unexpected_percent": 0.0,
    "unexpected_percent_total": 0.0,
    "partial_unexpected_list": []
  },
  "meta": {}
}

In [None]:
batch.expect_column_values_to_be_between(column='traffic_timestamp', max_value='2015-12-31 19:22:31', min_value='2012-01-03 18:22:31', parse_strings_as_datetimes=True)

In [10]:
batch.expect_column_values_to_not_be_null(column='user_id')

{
  "exception_info": {
    "raised_exception": false,
    "exception_traceback": null,
    "exception_message": null
  },
  "success": true,
  "result": {
    "element_count": 1000,
    "unexpected_count": 0,
    "unexpected_percent": 0.0,
    "unexpected_percent_total": 0.0,
    "partial_unexpected_list": []
  },
  "meta": {}
}

In [11]:
batch.expect_column_values_to_not_be_null(column='bytes_received')

{
  "exception_info": {
    "raised_exception": false,
    "exception_traceback": null,
    "exception_message": null
  },
  "success": true,
  "result": {
    "element_count": 1000,
    "unexpected_count": 0,
    "unexpected_percent": 0.0,
    "unexpected_percent_total": 0.0,
    "partial_unexpected_list": []
  },
  "meta": {}
}

## Save & Review Your Expectations

Let's save the expectation suite as a JSON file in the `great_expectations/expectations` directory of your project.
If you decide not to save some expectations that you created, use [remove_expectation method](https://docs.greatexpectations.io/en/latest/autoapi/great_expectations/data_asset/index.html?highlight=remove_expectation&utm_source=notebook&utm_medium=edit_expectations#great_expectations.data_asset.DataAsset.remove_expectation).

Let's now rebuild your Data Docs, which helps you communicate about your data with both machines and humans.

In [12]:
batch.save_expectation_suite(discard_failed_expectations=False)

results = LegacyCheckpoint(
    name="_temp_checkpoint",
    data_context=context,
    batches=[
        {
          "batch_kwargs": batch_kwargs,
          "expectation_suite_names": [expectation_suite_name]
        }
    ]
).run()
validation_result_identifier = results.list_validation_result_identifiers()[0]
context.build_data_docs()
context.open_data_docs(validation_result_identifier)

2021-05-23T01:08:05+0300 - INFO - 	10 expectation(s) included in expectation_suite. result_format settings filtered.
2021-05-23T01:08:05+0300 - INFO - Generating query from table batch_kwargs based on limit and offset
2021-05-23T01:08:06+0300 - INFO - Setting run_name to: 20210522T220806.280332Z
2021-05-23T01:08:06+0300 - INFO - 	10 expectation(s) included in expectation_suite.
