Most of the core Great Expectations expectations are built using expectation decorators, and using decorators on existing logic can make bringing custom integrations into our pipeline tests easy.

Under the hood, great_expectations evaluates similar kinds of expectations using standard logic, including:

- column_map_expectations, which apply their condition to each value in a - column independently of other values

column_aggregate_expectations, which apply their condition to an aggregate value or values from the column

In general, if a column is empty, a column_map_expectation will return True (vacuously), whereas a column_aggregate_expectation will return False (since no aggregate value could be computed).

Adding an expectation about element counts to a set of expectations is usually therefore very important to ensure the overall set of expectations captures the full set of constraints we expect.

# Comes Great Expectations

great_expectations was the first Python package that I saw that was perfect for this task

#### Installation

pip install --user great_expectations

In [1]:
import great_expectations as ge


Steps
Loading data
Setting an expectation on a pandas dataframe
Exporting and importing expectations

In [2]:
import great_expectations as ge
import pandas as pd


issubclass(ge.dataset.PandasDataset, pd.DataFrame)

True

There are two ways to load a dataframe into great_expectations:

In [6]:
# Read from a csv
df_ge = ge.read_csv('./train.csv')


# Convert from pandas dataframe

df = pd.read_csv("./train.csv")

df_ge = ge.dataset.PandasDataset(df)

## Setting and getting expectation

In [11]:
permit_subset = ['z-30', 'type']
df_excavation_and_wireless = df_ge[df_ge['z-1'].isin(permit_subset)]

In [12]:
#When I say that this dataframe should only contain 1 permit type Excavation, then it fails

fail_type = ['z-30']
df_excavation_and_wireless.expect_column_values_to_be_in_set('z-1', fail_type)

{
  "success": true,
  "exception_info": {
    "raised_exception": false,
    "exception_traceback": null,
    "exception_message": null
  },
  "meta": {},
  "result": {
    "element_count": 0,
    "missing_count": 0,
    "missing_percent": null,
    "unexpected_count": 0,
    "unexpected_percent": null,
    "unexpected_percent_total": null,
    "unexpected_percent_nonmissing": null,
    "partial_unexpected_list": []
  }
}

In [13]:
df_excavation_and_wireless.get_expectations_config()



{
  "ge_cloud_id": null,
  "data_asset_type": "Dataset",
  "expectation_suite_name": "default",
  "meta": {
    "great_expectations_version": "0.13.38"
  },
  "expectations": [
    {
      "meta": {},
      "kwargs": {
        "column": "z-1",
        "value_set": [
          "z-30"
        ]
      },
      "expectation_type": "expect_column_values_to_be_in_set",
      "ge_cloud_id": null
    }
  ]
}

In [15]:
success_type = ['z-30', 'type']
df_excavation_and_wireless.expect_column_values_to_be_in_set('z-1', success_type)
df_excavation_and_wireless.get_expectations_config()



{
  "ge_cloud_id": null,
  "data_asset_type": "Dataset",
  "expectation_suite_name": "default",
  "meta": {
    "great_expectations_version": "0.13.38"
  },
  "expectations": [
    {
      "meta": {},
      "kwargs": {
        "column": "z-1",
        "value_set": [
          "z-30",
          "type"
        ]
      },
      "expectation_type": "expect_column_values_to_be_in_set",
      "ge_cloud_id": null
    }
  ]
}

In [16]:
# as dictionary
config = df_excavation_and_wireless.get_expectations_config()
# as json file
df_excavation_and_wireless.save_expectations_config('ew_config.json')

AttributeError: 'PandasDataset' object has no attribute 'save_expectations_config'