# Great Expectations Basics

This example will walk through using basic expectations with crop data from the Food and Agricultural Organization of the United States.

Data are available here: http://www.fao.org/faostat/en/#home

In [1]:
import pandas as pd

import great_expectations as gx

In [3]:
df = pd.read_csv("../data/FAO-Rice-Production-Asia.csv")

## Exploratory Data Analysis

We typically will need to investigate some properites of our data to understand what we can do with it. Jupyter makes that easy, and we will take advantage of its features, including autocomplete, extensively.

In [4]:
df.head()

Unnamed: 0,Domain Code,Domain,Area Code,Area,Element Code,Element,Item Code,Item,Year Code,Year,Unit,Value,Flag,Flag Description
0,QC,Crops,2,Afghanistan,5419,Yield,27,"Rice, paddy",1961,1961,hg/ha,15190,Fc,Calculated data
1,QC,Crops,2,Afghanistan,5419,Yield,27,"Rice, paddy",1962,1962,hg/ha,15190,Fc,Calculated data
2,QC,Crops,2,Afghanistan,5419,Yield,27,"Rice, paddy",1963,1963,hg/ha,15190,Fc,Calculated data
3,QC,Crops,2,Afghanistan,5419,Yield,27,"Rice, paddy",1964,1964,hg/ha,17273,Fc,Calculated data
4,QC,Crops,2,Afghanistan,5419,Yield,27,"Rice, paddy",1965,1965,hg/ha,17273,Fc,Calculated data


### Reshape data

In [5]:
pivoted = df.pivot(index="Year", columns="Area", values="Value")

In [6]:
pivoted.head()

Area,Afghanistan,Azerbaijan,Bangladesh,Bhutan,Brunei Darussalam,Cambodia,"China, Hong Kong SAR","China, Taiwan Province of","China, mainland",Democratic People's Republic of Korea,...,Saudi Arabia,Sri Lanka,Syrian Arab Republic,Tajikistan,Thailand,Timor-Leste,Turkey,Turkmenistan,Uzbekistan,Viet Nam
Year,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
1961,15190.0,,17005.0,20000.0,17306.0,10921.0,20168.0,32663.0,20434.0,43071.0,...,28000.0,18626.0,25000.0,,16585.0,19949.0,39542.0,,,18966.0
1962,15190.0,,15302.0,20000.0,18527.0,8920.0,17201.0,33723.0,23408.0,43116.0,...,26500.0,19539.0,26100.0,,17202.0,17000.0,33951.0,,,19937.0
1963,15190.0,,17690.0,20000.0,13711.0,11742.0,18408.0,33583.0,26642.0,46056.0,...,28492.0,19547.0,24473.0,,18725.0,15385.0,39400.0,,,21400.0
1964,17273.0,,17070.0,20000.0,13235.0,11611.0,13962.0,37244.0,28062.0,45335.0,...,25000.0,19962.0,20000.0,,18384.0,13652.0,47629.0,,,19441.0
1965,17273.0,,16827.0,20000.0,11737.0,10666.0,24362.0,38454.0,29441.0,39688.0,...,23333.0,17696.0,19091.0,,17805.0,16599.0,43340.0,,,19414.0


### Initialize the new dataset to work with Great Expectations

In [8]:
df = gx.from_pandas(pivoted)

In [9]:
df.expect_column_mean_to_be_between("Afghanistan", 15000, 25000)

{
  "exception_info": {
    "raised_exception": false,
    "exception_traceback": null,
    "exception_message": null
  },
  "result": {
    "observed_value": 21998.314814814814,
    "element_count": 54,
    "missing_count": null,
    "missing_percent": null
  },
  "success": true,
  "meta": {}
}

In [10]:
### We might want to make expectations about lots of columns

In [11]:
for column in df.columns:
    # print('Column: ' + column + "\nResult: " + str(df.expect_column_mean_to_be_between(column, 15000, 25000)))
    result = df.expect_column_mean_to_be_between(column, 15000, 25000)
    if result["success"] == False:
        print(column)
        print(result)

Azerbaijan
{
  "exception_info": {
    "raised_exception": false,
    "exception_traceback": null,
    "exception_message": null
  },
  "result": {
    "observed_value": 31669.82608695652,
    "element_count": 54,
    "missing_count": 31,
    "missing_percent": 57.407407407407405
  },
  "success": false,
  "meta": {},
  "expectation_config": {
    "expectation_type": "expect_column_mean_to_be_between",
    "kwargs": {
      "column": "Azerbaijan",
      "min_value": 15000,
      "max_value": 25000,
      "result_format": "BASIC"
    },
    "meta": {}
  }
}
Bangladesh
{
  "exception_info": {
    "raised_exception": false,
    "exception_traceback": null,
    "exception_message": null
  },
  "result": {
    "observed_value": 26318.537037037036,
    "element_count": 54,
    "missing_count": null,
    "missing_percent": null
  },
  "success": false,
  "meta": {},
  "expectation_config": {
    "expectation_type": "expect_column_mean_to_be_between",
    "kwargs": {
      "column": "Banglades

### Now, we can view and save the expectations that we have created

In [14]:
print(df.get_expectation_suite())

{
  "data_asset_type": "Dataset",
  "expectation_suite_name": "default",
  "expectations": [
    {
      "expectation_type": "expect_column_mean_to_be_between",
      "kwargs": {
        "column": "Afghanistan",
        "min_value": 15000,
        "max_value": 25000
      },
      "meta": {}
    },
    {
      "expectation_type": "expect_column_mean_to_be_between",
      "kwargs": {
        "column": "Bhutan",
        "min_value": 15000,
        "max_value": 25000
      },
      "meta": {}
    },
    {
      "expectation_type": "expect_column_mean_to_be_between",
      "kwargs": {
        "column": "Cambodia",
        "min_value": 15000,
        "max_value": 25000
      },
      "meta": {}
    },
    {
      "expectation_type": "expect_column_mean_to_be_between",
      "kwargs": {
        "column": "China, Hong Kong SAR",
        "min_value": 15000,
        "max_value": 25000
      },
      "meta": {}
    },
    {
      "expectation_type": "expect_column_mean_to_be_between",
      "kwa