# Great Expectations Task

## 1. Install Great Expectations Library


In [1]:
%pip install great_expectations==0.17.14


Collecting great_expectations==0.17.14
  Using cached great_expectations-0.17.14-py3-none-any.whl.metadata (8.6 kB)
Using cached great_expectations-0.17.14-py3-none-any.whl (5.4 MB)
Installing collected packages: great_expectations
  Attempting uninstall: great_expectations
    Found existing installation: great-expectations 1.3.11
    Uninstalling great-expectations-1.3.11:
      Successfully uninstalled great-expectations-1.3.11
Successfully installed great_expectations-0.17.14
Note: you may need to restart the kernel to use updated packages.



[notice] A new release of pip is available: 24.2 -> 25.0.1
[notice] To update, run: python.exe -m pip install --upgrade pip


##2. Import Necessary Libraries

In [2]:
import pandas as pd
import great_expectations as gx

##3. Load Labels.csv

Download and upload the [Labels.csv](https://github.com/zubxxr/SOFE3980U-Lab5/blob/main/Labels.csv) into this notebook, and then load the file.

In [3]:
df = pd.read_csv('Labels.csv')

##4. Preview the Dataset

In [4]:
df.head()

Unnamed: 0,Timestamp,Car1_Location_X,Car1_Location_Y,Car1_Location_Z,Car2_Location_X,Car2_Location_Y,Car2_Location_Z,Occluded_Image_view,Occluding_Car_view,Ground_Truth_View,pedestrianLocationX_TopLeft,pedestrianLocationY_TopLeft,pedestrianLocationX_BottomRight,pedestrianLocationY_BottomRight
0,1736796157,-51.402977,143,0.596902,-59.32027,140,0.596902,A_001.png,B_001.png,C_001.png,593,361,610,410
1,1736796167,-53.819637,143,0.596902,-59.196568,140,0.596902,A_002.png,B_002.png,C_002.png,579,368,594,415
2,1736796178,-50.239144,143,0.596902,-56.744479,140,0.596902,A_003.png,B_003.png,C_003.png,854,720,854,720
3,1736796188,-53.70722,143,0.596902,-57.30938,140,0.596902,A_004.png,B_004.png,C_004.png,549,368,567,425
4,1736796198,-52.053721,143,0.596902,-59.545897,140,0.596902,A_005.png,B_005.png,C_005.png,524,368,537,413


##5. Set Up Great Expectations Context and Data Source

In [5]:
# Step 5: Create context and add datasource
context = gx.get_context()

# ✅ CORRECT way to add a Pandas datasource in latest GE
datasource = context.sources.add_pandas(name="my_pandas_datasource")

##6. Define and Create a Data Batch

In [10]:
# Step 6: Define and Create a Data Batch
asset = datasource.add_dataframe_asset(name="labels_asset_v2")
batch_request = asset.build_batch_request(dataframe=df)

# Get validator
validator = context.get_validator(batch_request=batch_request)


##7. Define Three Expectations for Column Values

Using this [link](https://greatexpectations.io/expectations/), choose three expectation functions and apply them to the labels dataset in a relevant manner.

You should replace the 'ExpectColumnValuesToBeBetween' function with other functions you select from the link.

You can also check the format/parameters required of each function when you click "See more" on the function.

In [11]:
## Original Function
expectation_1 = validator.expect_column_values_to_be_between(
    column="Car1_Location_X",
    min_value=0,
    max_value=500
)

expectation_2 = validator.expect_column_values_to_match_regex(
    column="Ground_Truth_View",
    regex=r".*\.png"
)

## This function only requires a column parameter, and not a max or min value
expectation_3 = validator.expect_column_values_to_be_unique(
    column="Timestamp"
)

Calculating Metrics:   0%|          | 0/8 [00:00<?, ?it/s]

Calculating Metrics:   0%|          | 0/8 [00:00<?, ?it/s]

Calculating Metrics:   0%|          | 0/8 [00:00<?, ?it/s]

### Expectation 1

In [12]:
expectation_1 = validator.expect_column_values_to_be_between(
    column="Car1_Location_X",
    min_value=0,
    max_value=500
)

Calculating Metrics:   0%|          | 0/8 [00:00<?, ?it/s]

### Validate Data Against Expectation 1

In [13]:
expectation_1

{
  "success": false,
  "result": {
    "element_count": 121,
    "unexpected_count": 121,
    "unexpected_percent": 100.0,
    "partial_unexpected_list": [
      -51.40297655,
      -53.81963722,
      -50.23914439,
      -53.70722021,
      -52.05372109,
      -53.93975603,
      -50.30258412,
      -53.17447194,
      -52.72667437,
      -50.18179353,
      -52.40699613,
      -52.38122971,
      -53.01906414,
      -50.85034015,
      -51.93070037,
      -50.75051989,
      -50.63015195,
      -50.69818291,
      -51.95966168,
      -50.88663347
    ],
    "missing_count": 0,
    "missing_percent": 0.0,
    "unexpected_percent_total": 100.0,
    "unexpected_percent_nonmissing": 100.0
  },
  "meta": {},
  "exception_info": {
    "raised_exception": false,
    "exception_traceback": null,
    "exception_message": null
  }
}

### Expectation 2

In [14]:
expectation_2 = validator.expect_column_values_to_match_regex(
    column="Ground_Truth_View",
    regex=r".*\.png"
)

Calculating Metrics:   0%|          | 0/8 [00:00<?, ?it/s]

### Validate Data Against Expectation 2

In [15]:
expectation_2

{
  "success": true,
  "result": {
    "element_count": 121,
    "unexpected_count": 0,
    "unexpected_percent": 0.0,
    "partial_unexpected_list": [],
    "missing_count": 0,
    "missing_percent": 0.0,
    "unexpected_percent_total": 0.0,
    "unexpected_percent_nonmissing": 0.0
  },
  "meta": {},
  "exception_info": {
    "raised_exception": false,
    "exception_traceback": null,
    "exception_message": null
  }
}

### Expectation 3

In [16]:
expectation_3 = validator.expect_column_values_to_be_unique(
    column="Timestamp"
)

Calculating Metrics:   0%|          | 0/8 [00:00<?, ?it/s]

### Validate Data Against Expectation 3

In [17]:
expectation_3

{
  "success": true,
  "result": {
    "element_count": 121,
    "unexpected_count": 0,
    "unexpected_percent": 0.0,
    "partial_unexpected_list": [],
    "missing_count": 0,
    "missing_percent": 0.0,
    "unexpected_percent_total": 0.0,
    "unexpected_percent_nonmissing": 0.0
  },
  "meta": {},
  "exception_info": {
    "raised_exception": false,
    "exception_traceback": null,
    "exception_message": null
  }
}