# Great Expectations Task

## 1. Install Great Expectations Library


In [1]:
!pip uninstall pandas
!pip uninstall numpy
!pip uninstall altair
!pip install great_expectations

Found existing installation: pandas 2.2.2
Uninstalling pandas-2.2.2:
  Would remove:
    /usr/local/lib/python3.11/dist-packages/pandas-2.2.2.dist-info/*
    /usr/local/lib/python3.11/dist-packages/pandas/*
Proceed (Y/n)? y
  Successfully uninstalled pandas-2.2.2
Found existing installation: numpy 2.0.2
Uninstalling numpy-2.0.2:
  Would remove:
    /usr/local/bin/f2py
    /usr/local/bin/numpy-config
    /usr/local/lib/python3.11/dist-packages/numpy-2.0.2.dist-info/*
    /usr/local/lib/python3.11/dist-packages/numpy.libs/libgfortran-040039e1-0352e75f.so.5.0.0
    /usr/local/lib/python3.11/dist-packages/numpy.libs/libquadmath-96973f99-934c22de.so.0.0.0
    /usr/local/lib/python3.11/dist-packages/numpy.libs/libscipy_openblas64_-99b71e71.so
    /usr/local/lib/python3.11/dist-packages/numpy/*
Proceed (Y/n)? y
  Successfully uninstalled numpy-2.0.2
Found existing installation: altair 5.5.0
Uninstalling altair-5.5.0:
  Would remove:
    /usr/local/lib/python3.11/dist-packages/altair-5.5.0.dis

##2. Import Necessary Libraries

In [1]:
import pandas as pd
import great_expectations as gx



##3. Load Labels.csv

Download and upload the [Labels.csv](https://github.com/zubxxr/SOFE3980U-Lab5/blob/main/Labels.csv) into this notebook, and then load the file.

In [2]:
df = pd.read_csv("https://raw.githubusercontent.com/PeterAbe/SOFE-3980-Lab-5/main/Labels.csv",
                 names=["Timestamp", "Car1_Location_X", "Car1_Location_Y", "Car1_Location_Z", "Car2_Location_X", "Car2_Location_Y",
                        "Car2_Location_Z", "Occluded_Image_view", "Occluding_Car_view", "Ground_Truth_View", "pedestrianLocationX_TopLeft", "pedestrianLocationY_TopLeft",
                        "pedestrianLocationX_BottomRight", "pedestrianLocationY_BottomRight"])

##4. Preview the Dataset

In [3]:
df.head()

Unnamed: 0,Timestamp,Car1_Location_X,Car1_Location_Y,Car1_Location_Z,Car2_Location_X,Car2_Location_Y,Car2_Location_Z,Occluded_Image_view,Occluding_Car_view,Ground_Truth_View,pedestrianLocationX_TopLeft,pedestrianLocationY_TopLeft,pedestrianLocationX_BottomRight,pedestrianLocationY_BottomRight
0,1736796157,-51.402977,143,0.596902,-59.32027,140,0.596902,A_001.png,B_001.png,C_001.png,593,361,610,410
1,1736796167,-53.819637,143,0.596902,-59.196568,140,0.596902,A_002.png,B_002.png,C_002.png,579,368,594,415
2,1736796178,-50.239144,143,0.596902,-56.744479,140,0.596902,A_003.png,B_003.png,C_003.png,854,720,854,720
3,1736796188,-53.70722,143,0.596902,-57.30938,140,0.596902,A_004.png,B_004.png,C_004.png,549,368,567,425
4,1736796198,-52.053721,143,0.596902,-59.545897,140,0.596902,A_005.png,B_005.png,C_005.png,524,368,537,413


##5. Set Up Great Expectations Context and Data Source

In [4]:
context = gx.get_context()
data_source = context.data_sources.add_pandas("pandas")
data_asset = data_source.add_dataframe_asset(name="pd dataframe asset")

INFO:great_expectations.data_context.types.base:Created temporary directory '/tmp/tmp3td_udb5' for ephemeral docs site


##6. Define and Create a Data Batch

In [5]:
batch_definition = data_asset.add_batch_definition_whole_dataframe("batch definition")
batch = batch_definition.get_batch(batch_parameters={"dataframe": df})

##7. Define Three Expectations for Column Values

Using this [link](https://greatexpectations.io/expectations/), choose three expectation functions and apply them to the labels dataset in a relevant manner.

You should replace the 'ExpectColumnValuesToBeBetween' function with other functions you select from the link.

You can also check the format/parameters required of each function when you click "See more" on the function.

In [None]:
## Original Function
expectation = gx.expectations.ExpectColumnValuesToBeBetween(
    column="column", min_value=0, max_value=20
)

## Example Function

## This function only requires a column parameter, and not a max or min value
expectation = gx.expectations.ExpectColumnValuesToBeUnique(
    column="column"
)

### Expectation 1

In [6]:
expectation_1 = gx.expectations.ExpectColumnValuesToBeBetween(
    column="pedestrianLocationX_TopLeft", min_value=500, max_value=800
)

### Validate Data Against Expectation 1

In [7]:
validation_result_1 = batch.validate(expectation_1)
print(validation_result_1)

Calculating Metrics:   0%|          | 0/10 [00:00<?, ?it/s]

{
  "success": false,
  "expectation_config": {
    "type": "expect_column_values_to_be_between",
    "kwargs": {
      "batch_id": "pandas-pd dataframe asset",
      "column": "pedestrianLocationX_TopLeft",
      "min_value": 500.0,
      "max_value": 800.0
    },
    "meta": {}
  },
  "result": {
    "element_count": 121,
    "unexpected_count": 2,
    "unexpected_percent": 1.6528925619834711,
    "partial_unexpected_list": [
      854,
      854
    ],
    "missing_count": 0,
    "missing_percent": 0.0,
    "unexpected_percent_total": 1.6528925619834711,
    "unexpected_percent_nonmissing": 1.6528925619834711,
    "partial_unexpected_counts": [
      {
        "value": 854,
        "count": 2
      }
    ],
    "partial_unexpected_index_list": [
      2,
      8
    ]
  },
  "meta": {},
  "exception_info": {
    "raised_exception": false,
    "exception_traceback": null,
    "exception_message": null
  }
}


### Expectation 2

In [8]:
expectation_2 = gx.expectations.ExpectColumnValuesToBeUnique(
    column="Ground_Truth_View"
)

### Validate Data Against Expectation 2

In [9]:
validation_result_2 = batch.validate(expectation_2)
print(validation_result_2)

Calculating Metrics:   0%|          | 0/10 [00:00<?, ?it/s]

{
  "success": false,
  "expectation_config": {
    "type": "expect_column_values_to_be_unique",
    "kwargs": {
      "batch_id": "pandas-pd dataframe asset",
      "column": "Ground_Truth_View"
    },
    "meta": {}
  },
  "result": {
    "element_count": 121,
    "unexpected_count": 2,
    "unexpected_percent": 1.6528925619834711,
    "partial_unexpected_list": [
      "C_085.png",
      "C_085.png"
    ],
    "missing_count": 0,
    "missing_percent": 0.0,
    "unexpected_percent_total": 1.6528925619834711,
    "unexpected_percent_nonmissing": 1.6528925619834711,
    "partial_unexpected_counts": [
      {
        "value": "C_085.png",
        "count": 2
      }
    ],
    "partial_unexpected_index_list": [
      84,
      85
    ]
  },
  "meta": {},
  "exception_info": {
    "raised_exception": false,
    "exception_traceback": null,
    "exception_message": null
  }
}


### Expectation 3

In [10]:
expectation_3 = gx.expectations.ExpectColumnValuesToBeUnique(
    column="Occluding_Car_view"
)

### Validate Data Against Expectation 3

In [11]:
validation_result_3 = batch.validate(expectation_3)
print(validation_result_3)

Calculating Metrics:   0%|          | 0/10 [00:00<?, ?it/s]

{
  "success": false,
  "expectation_config": {
    "type": "expect_column_values_to_be_unique",
    "kwargs": {
      "batch_id": "pandas-pd dataframe asset",
      "column": "Occluding_Car_view"
    },
    "meta": {}
  },
  "result": {
    "element_count": 121,
    "unexpected_count": 2,
    "unexpected_percent": 1.6528925619834711,
    "partial_unexpected_list": [
      "B_085.png",
      "B_085.png"
    ],
    "missing_count": 0,
    "missing_percent": 0.0,
    "unexpected_percent_total": 1.6528925619834711,
    "unexpected_percent_nonmissing": 1.6528925619834711,
    "partial_unexpected_counts": [
      {
        "value": "B_085.png",
        "count": 2
      }
    ],
    "partial_unexpected_index_list": [
      84,
      85
    ]
  },
  "meta": {},
  "exception_info": {
    "raised_exception": false,
    "exception_traceback": null,
    "exception_message": null
  }
}
