# Validation Playground

**Watch** a [short tutorial video](https://greatexpectations.io/videos/getting_started/integrate_expectations) or **read** [the written tutorial](https://docs.greatexpectations.io/en/latest/tutorials/validate_data.html?utm_source=notebook&utm_medium=validate_data)

#### This notebook assumes that you created at least one expectation suite in your project.
#### Here you will learn how to validate data in a SQL database against an expectation suite.


We'd love it if you **reach out for help on** the [**Great Expectations Slack Channel**](https://greatexpectations.io/slack)

In [207]:
import great_expectations as ge
import great_expectations.jupyter_ux
import datetime
from great_expectations.profile.user_configurable_profiler import UserConfigurableProfiler

In [208]:
# Get a DataContext
context = ge.data_context.DataContext()

2023-05-16T22:36:16+0300 - INFO - FileDataContext loading fluent config
2023-05-16T22:36:16+0300 - INFO - Loading 'datasources' ->
[{'assets': [...],
  'connection_string': 'mssql+pymssql://TestUser3:TestUser3@localhost/TRN',
  'name': 'trn_datasource',
  'type': 'sql'}]
2023-05-16T22:36:16+0300 - INFO - Loaded 'datasources' ->
[SQLDatasource(type='sql', name='trn_datasource', id=None, assets=[TableAsset(name='jobs_asset', type='table', id=None, order_by=[], batch_metadata={}, splitter=None, table_name='jobs', schema_name='hr'), TableAsset(name='jobs_profile_asset', type='table', id=None, order_by=[], batch_metadata={}, splitter=None, table_name='jobs', schema_name='hr')], connection_string='mssql+pymssql://TestUser3:TestUser3@localhost/TRN', create_temp_table=True, kwargs={})]
2023-05-16T22:36:16+0300 - INFO - Loaded 'trn_datasource' from fluent config
2023-05-16T22:36:16+0300 - INFO - Saving 1 Fluent Datasources to D:\DQE\DQE_Training_2023\Module_6_TA_for_DQE\dqe_module6_task1_great_

In [209]:
# Datasource
datasource_name ='trn_datasource'  
[datasource['name'] for datasource in context.list_datasources() ]

['trn_datasource']

In [210]:
# Checkpoints
context.list_checkpoints()

['trn_checkpoint']

In [211]:
# Expectation Suites
context.list_expectation_suite_names()

['jobs.jobs_extended', 'jobs.jobs_tests', 'jobs_profile']

In [212]:
[asset.name for asset in context.fluent_datasources['trn_datasource'].assets]

['jobs_asset', 'jobs_profile_asset']

In [213]:
# loading ExpectationSuite 
assets_to_validate = 'jobs_asset'
expectation_suite_name = 'jobs.jobs_tests'

datasource = context.get_datasource(datasource_name)
suite = context.get_expectation_suite(expectation_suite_name=expectation_suite_name)
my_asset = context.get_datasource(datasource_name).get_asset(assets_to_validate)
batch_request = my_asset.build_batch_request()

batches = datasource.get_batch_list_from_batch_request(batch_request)
for batch in batches:
    print(batch.batch_spec)   


2023-05-16T22:36:22+0300 - INFO - SQLDatasource.dict() - substituting config values
2023-05-16T22:36:22+0300 - INFO - batch_slice: None was parsed to: slice(0, None, None)
{'type': 'table', 'data_asset_name': 'jobs_asset', 'table_name': 'jobs', 'schema_name': 'hr', 'batch_identifiers': {}}


In [214]:
# ExpectationSuite and its expectations 
for s in suite.expectations:
    print('\t'+s['expectation_type'])

	expect_table_columns_to_match_ordered_list
	expect_table_row_count_to_be_between
	expect_column_min_to_be_between
	expect_column_max_to_be_between
	expect_column_mean_to_be_between


In [215]:
# testing results 
context.open_data_docs()

In [226]:
# run profiler 
for i,batch in enumerate(batches):
    expectation_suite_name = 'jobs_profile{}'.format(i+1 if len(batches)>1 else '')
    df_ge = ge.dataset.PandasDataset(batch.head(fetch_all=True).data)
    ignored_columns = ['job_title']
    profiler = UserConfigurableProfiler(df_ge,ignored_columns=ignored_columns)
    prof_suite = profiler.build_suite()
    prof_suite.expectation_suite_name=expectation_suite_name
    print('Expectation saved as Expectations/{}.yml'.format(expectation_suite_name.replace('.','/')))
    context.add_or_update_expectation_suite(expectation_suite=prof_suite)

Calculating Metrics:   0%|          | 0/1 [00:00<?, ?it/s]

2023-05-16T23:03:14+0300 - INFO - 	0 expectation(s) included in expectation_suite.


Profiling:   0%|          | 0/3 [00:00<?, ?it/s, Column=job_id]

2023-05-16T23:03:14+0300 - INFO - 	28 expectation(s) included in expectation_suite.
Creating an expectation suite with the following expectations:

Table-Level Expectations
expect_table_columns_to_match_ordered_list
expect_table_row_count_to_be_between

Expectations by Column
Column Name: job_id | Column Data Type: INT | Cardinality: UNIQUE
expect_column_max_to_be_between
expect_column_mean_to_be_between
expect_column_median_to_be_between
expect_column_min_to_be_between
expect_column_proportion_of_unique_values_to_be_between
expect_column_quantile_values_to_be_between
expect_column_values_to_be_in_type_list
expect_column_values_to_not_be_null


Column Name: max_salary | Column Data Type: FLOAT | Cardinality: VERY_FEW
expect_column_max_to_be_between
expect_column_mean_to_be_between
expect_column_median_to_be_between
expect_column_min_to_be_between
expect_column_proportion_of_unique_values_to_be_between
expect_column_quantile_values_to_be_between
expect_column_values_to_be_in_set
expect_

In [225]:
# create validator
expectation_suite_name = 'jobs.jobs_tests'

validator = context.get_validator(
    batch_request=batch_request,
    expectation_suite_name=expectation_suite_name,
)
validator.head(5)

2023-05-16T23:03:03+0300 - INFO - SQLDatasource.dict() - substituting config values
2023-05-16T23:03:03+0300 - INFO - batch_slice: None was parsed to: slice(0, None, None)


Calculating Metrics:   0%|          | 0/1 [00:00<?, ?it/s]

Unnamed: 0,job_id,job_title,min_salary,max_salary
0,1,Public Accountant,4200.0,9000.0
1,2,Accounting Manager,8200.0,16000.0
2,3,Administration Assistant,3000.0,6000.0
3,4,President,20000.0,40000.0
4,5,Administration Vice President,15000.0,30000.0


In [224]:
new_expectation_suite_name = 'jobs.jobs_extended'
ignored_columns = ['job_title']
profiler2 = UserConfigurableProfiler(profile_dataset=validator,ignored_columns=ignored_columns)
prof_suite2 = profiler2.build_suite()
prof_suite2.expectation_suite_name=new_expectation_suite_name
print('Expectation saved as Expectations/{}.yml'.format(new_expectation_suite_name.replace('.','/')))
context.add_or_update_expectation_suite(expectation_suite=prof_suite2)

Calculating Metrics:   0%|          | 0/2 [00:00<?, ?it/s]

Calculating Metrics:   0%|          | 0/6 [00:00<?, ?it/s]

Calculating Metrics:   0%|          | 0/10 [00:00<?, ?it/s]

Calculating Metrics:   0%|          | 0/1 [00:00<?, ?it/s]

Calculating Metrics:   0%|          | 0/1 [00:00<?, ?it/s]

Calculating Metrics:   0%|          | 0/1 [00:00<?, ?it/s]

Calculating Metrics:   0%|          | 0/6 [00:00<?, ?it/s]

Calculating Metrics:   0%|          | 0/10 [00:00<?, ?it/s]

Calculating Metrics:   0%|          | 0/1 [00:00<?, ?it/s]

Calculating Metrics:   0%|          | 0/1 [00:00<?, ?it/s]

Calculating Metrics:   0%|          | 0/1 [00:00<?, ?it/s]

Calculating Metrics:   0%|          | 0/6 [00:00<?, ?it/s]

Calculating Metrics:   0%|          | 0/10 [00:00<?, ?it/s]

Calculating Metrics:   0%|          | 0/1 [00:00<?, ?it/s]

Calculating Metrics:   0%|          | 0/1 [00:00<?, ?it/s]

Calculating Metrics:   0%|          | 0/1 [00:00<?, ?it/s]

2023-05-16T23:02:52+0300 - INFO - 	5 expectation(s) included in expectation_suite.


Calculating Metrics:   0%|          | 0/2 [00:00<?, ?it/s]

Calculating Metrics:   0%|          | 0/2 [00:00<?, ?it/s]

Calculating Metrics:   0%|          | 0/2 [00:00<?, ?it/s]

Profiling:   0%|          | 0/3 [00:00<?, ?it/s, Column=job_id]

Calculating Metrics:   0%|          | 0/6 [00:00<?, ?it/s]

Calculating Metrics:   0%|          | 0/6 [00:00<?, ?it/s]

Calculating Metrics:   0%|          | 0/6 [00:00<?, ?it/s]

Calculating Metrics:   0%|          | 0/6 [00:00<?, ?it/s]

Calculating Metrics:   0%|          | 0/6 [00:00<?, ?it/s]

Calculating Metrics:   0%|          | 0/6 [00:00<?, ?it/s]

Calculating Metrics:   0%|          | 0/9 [00:00<?, ?it/s]

Calculating Metrics:   0%|          | 0/9 [00:00<?, ?it/s]

Calculating Metrics:   0%|          | 0/5 [00:00<?, ?it/s]

Calculating Metrics:   0%|          | 0/5 [00:00<?, ?it/s]

Calculating Metrics:   0%|          | 0/8 [00:00<?, ?it/s]

Calculating Metrics:   0%|          | 0/10 [00:00<?, ?it/s]

Calculating Metrics:   0%|          | 0/10 [00:00<?, ?it/s]

Calculating Metrics:   0%|          | 0/1 [00:00<?, ?it/s]

Calculating Metrics:   0%|          | 0/6 [00:00<?, ?it/s]

Calculating Metrics:   0%|          | 0/6 [00:00<?, ?it/s]

Calculating Metrics:   0%|          | 0/6 [00:00<?, ?it/s]

Calculating Metrics:   0%|          | 0/6 [00:00<?, ?it/s]

Calculating Metrics:   0%|          | 0/6 [00:00<?, ?it/s]

Calculating Metrics:   0%|          | 0/6 [00:00<?, ?it/s]

Calculating Metrics:   0%|          | 0/9 [00:00<?, ?it/s]

Calculating Metrics:   0%|          | 0/5 [00:00<?, ?it/s]

Calculating Metrics:   0%|          | 0/5 [00:00<?, ?it/s]

Calculating Metrics:   0%|          | 0/5 [00:00<?, ?it/s]

Calculating Metrics:   0%|          | 0/11 [00:00<?, ?it/s]

Calculating Metrics:   0%|          | 0/8 [00:00<?, ?it/s]

Calculating Metrics:   0%|          | 0/10 [00:00<?, ?it/s]

Calculating Metrics:   0%|          | 0/10 [00:00<?, ?it/s]

Calculating Metrics:   0%|          | 0/1 [00:00<?, ?it/s]

Calculating Metrics:   0%|          | 0/6 [00:00<?, ?it/s]

Calculating Metrics:   0%|          | 0/6 [00:00<?, ?it/s]

Calculating Metrics:   0%|          | 0/6 [00:00<?, ?it/s]

Calculating Metrics:   0%|          | 0/6 [00:00<?, ?it/s]

Calculating Metrics:   0%|          | 0/6 [00:00<?, ?it/s]

Calculating Metrics:   0%|          | 0/6 [00:00<?, ?it/s]

Calculating Metrics:   0%|          | 0/9 [00:00<?, ?it/s]

Calculating Metrics:   0%|          | 0/5 [00:00<?, ?it/s]

Calculating Metrics:   0%|          | 0/5 [00:00<?, ?it/s]

Calculating Metrics:   0%|          | 0/5 [00:00<?, ?it/s]

Calculating Metrics:   0%|          | 0/11 [00:00<?, ?it/s]

Calculating Metrics:   0%|          | 0/8 [00:00<?, ?it/s]

Calculating Metrics:   0%|          | 0/10 [00:00<?, ?it/s]

Calculating Metrics:   0%|          | 0/10 [00:00<?, ?it/s]

Calculating Metrics:   0%|          | 0/1 [00:00<?, ?it/s]

2023-05-16T23:02:55+0300 - INFO - 	26 expectation(s) included in expectation_suite.
Creating an expectation suite with the following expectations:

Table-Level Expectations
expect_table_columns_to_match_ordered_list
expect_table_row_count_to_be_between

Expectations by Column
Column Name: job_id | Column Data Type: INT | Cardinality: UNIQUE
expect_column_max_to_be_between
expect_column_mean_to_be_between
expect_column_median_to_be_between
expect_column_min_to_be_between
expect_column_proportion_of_unique_values_to_be_between
expect_column_quantile_values_to_be_between
expect_column_values_to_be_in_type_list
expect_column_values_to_not_be_null


Column Name: max_salary | Column Data Type: FLOAT | Cardinality: VERY_FEW
expect_column_max_to_be_between
expect_column_mean_to_be_between
expect_column_min_to_be_between
expect_column_proportion_of_unique_values_to_be_between
expect_column_quantile_values_to_be_between
expect_column_values_to_be_in_set
expect_column_values_to_be_in_type_list
ex

{
  "data_asset_type": null,
  "expectations": [
    {
      "kwargs": {
        "column_list": [
          "job_id",
          "job_title",
          "min_salary",
          "max_salary"
        ]
      },
      "meta": {},
      "expectation_type": "expect_table_columns_to_match_ordered_list"
    },
    {
      "kwargs": {
        "min_value": 19,
        "max_value": 19
      },
      "meta": {},
      "expectation_type": "expect_table_row_count_to_be_between"
    },
    {
      "kwargs": {
        "min_value": 1,
        "max_value": 1,
        "column": "job_id"
      },
      "meta": {},
      "expectation_type": "expect_column_min_to_be_between"
    },
    {
      "kwargs": {
        "min_value": 19,
        "max_value": 19,
        "column": "job_id"
      },
      "meta": {},
      "expectation_type": "expect_column_max_to_be_between"
    },
    {
      "kwargs": {
        "min_value": 10.0,
        "max_value": 10.0,
        "column": "job_id"
      },
      "meta": {},
     

In [227]:
# Validate data
retrieved_checkpoint = context.get_checkpoint(name="trn_checkpoint")
checkpoint_result = retrieved_checkpoint.run()

2023-05-16T23:03:36+0300 - INFO - SQLDatasource.dict() - substituting config values
2023-05-16T23:03:36+0300 - INFO - batch_slice: None was parsed to: slice(0, None, None)
2023-05-16T23:03:36+0300 - INFO - 	5 expectation(s) included in expectation_suite.


Calculating Metrics:   0%|          | 0/10 [00:00<?, ?it/s]

In [220]:
#context.build_data_docs()
context.open_data_docs()