# Great Expectations Task

## 1. Install Great Expectations Library


In [13]:
!pip install great_expectations



##2. Import Necessary Libraries

In [12]:
import pandas as pd
import great_expectations as gx

ValueError: numpy.dtype size changed, may indicate binary incompatibility. Expected 96 from C header, got 88 from PyObject

##3. Load Labels.csv

Download and upload the [Labels.csv](https://github.com/zubxxr/SOFE3980U-Lab5/blob/main/Labels.csv) into this notebook, and then load the file.

In [None]:
df = pd.read_csv("Labels.csv")

##4. Preview the Dataset

In [None]:
df.head()

##5. Set Up Great Expectations Context and Data Source

In [None]:
context = gx.get_context()
datasource = context.sources.add_pandas(name="labels_data", dataframe=df)

data_asset = datasource.add_dataframe_asset(name="labels_dataframe")

##6. Define and Create a Data Batch

In [None]:
batch_request = data_asset.build_batch_request()
batch = context.get_batch(batch_request)

##7. Define Three Expectations for Column Values

Using this [link](https://greatexpectations.io/expectations/), choose three expectation functions and apply them to the labels dataset in a relevant manner.

You should replace the 'ExpectColumnValuesToBeBetween' function with other functions you select from the link.

You can also check the format/parameters required of each function when you click "See more" on the function.

In [None]:
## Original Function
expectation = gx.expectations.ExpectColumnValuesToBeBetween(
    column="column", min_value=0, max_value=20
)

## Example Function

## This function only requires a column parameter, and not a max or min value
expectation = gx.expectations.ExpectColumnValuesToBeUnique(
    column="column"
)

### Expectation 1

In [None]:
# Expectation 1: Ensuring distinct values in Car1_Location_Y
expectation_1 = gx.expectations.ExpectColumnDistinctValuesToEqualSet(
    column="Car1_Location_Y", value_set={143}
)

### Validate Data Against Expectation 1

In [None]:
validation_1 = batch.validate(expectation_1)
print(validation_1)

### Expectation 2

In [None]:
# Expectation 2: Ensuring the maximum value of Car2_Location_X falls within an expected range
expectation_2 = gx.expectations.ExpectColumnMaxToBeBetween(
    column="Car2_Location_X", min_value=-60, max_value=-50
)

### Validate Data Against Expectation 2

In [None]:
validation_2 = batch.validate(expectation_2)
print(validation_2)

### Expectation 3

In [None]:
# Expectation 3: Ensuring Car1_Location_X is always greater than Car2_Location_X
expectation_3 = gx.expectations.ExpectColumnPairValuesAToBeGreaterThanB(
    column_A="Car1_Location_X", column_B="Car2_Location_X", or_equal=True
)

### Validate Data Against Expectation 3

In [None]:
validation_3 = batch.validate(expectation_3)
print(validation_3)