---
This notebook contains great expectations for validating the dataset.

Created by 
- Angger Rizky Firdaus
- Basyira Sabita
- Muhammad Hafidz Adityaswara

---

# Configure Great Expectations

In [None]:
# Create a data context
from great_expectations.data_context import FileDataContext

context = FileDataContext.create(project_root_dir='./')

In [None]:
# Give a name to a Datasource. This name must be unique between Datasources.
datasource_name = 'insurance_dataset'
datasource = context.sources.add_pandas(datasource_name)

# Give a name to a data asset
asset_name = 'car-insurance-table'
path_to_data = 'car_insurance.csv'
asset = datasource.add_csv_asset(asset_name, filepath_or_buffer=path_to_data)

# Build batch request
batch_request = asset.build_batch_request()

In [None]:
# Creat an expectation suite
expectation_suite_name = 'expectation-car-insurance-dataset'
context.add_or_update_expectation_suite(expectation_suite_name)

# Create a validator using above expectation suite
validator = context.get_validator(
    batch_request = batch_request,
    expectation_suite_name = expectation_suite_name
)

# Check the validator
validator.head()

Calculating Metrics:   0%|          | 0/1 [00:00<?, ?it/s]

Unnamed: 0,id,age,gender,driving_experience,education,income,credit_score,vehicle_ownership,vehicle_year,type_of_vehicle,married,children,postal_code,annual_mileage,speeding_violations,duis,past_accidents,issue,outcome
0,816393,40-64,female,20-29y,university,middle class,0.63805,0.0,after 2015,sports car,0.0,0.0,37379,11000.0,0,0,0,crack,0.0
1,251762,26-39,male,20-29y,high school,middle class,0.475741,1.0,before 2015,hatchback,1.0,0.0,10238,9000.0,0,0,0,tire flat,1.0
2,481952,40-64,male,20-29y,none,middle class,0.839817,1.0,before 2015,sedan,1.0,1.0,10238,12000.0,0,0,0,glass shatter,1.0
3,3506,40-64,male,20-29y,high school,upper class,0.682527,1.0,before 2015,sedan,0.0,1.0,92099,6000.0,1,0,0,tire flat,1.0
4,498013,40-64,female,20-29y,none,working class,0.572184,1.0,after 2015,sedan,1.0,1.0,32122,15000.0,0,0,1,crack,0.0


# Expectations 

## expect_column_values_to_be_unique on column **id**

The function `expect_column_values_to_be_unique` is used to validate whether the 'id' column contains unique data or not.

In [None]:
# Expectation 1  : expect_column_values_to_be_unique
validator.expect_column_values_to_be_unique('id')

Calculating Metrics:   0%|          | 0/8 [00:00<?, ?it/s]

{
  "success": true,
  "result": {
    "element_count": 98485,
    "unexpected_count": 0,
    "unexpected_percent": 0.0,
    "partial_unexpected_list": [],
    "missing_count": 0,
    "missing_percent": 0.0,
    "unexpected_percent_total": 0.0,
    "unexpected_percent_nonmissing": 0.0
  },
  "meta": {},
  "exception_info": {
    "raised_exception": false,
    "exception_traceback": null,
    "exception_message": null
  }
}

## expect_column_values_to_be_of_type on column **age**

The `expect_column_values_to_be_of_type` validator is useful for checking whether the 'age' column has consistent data types, particularly if all values are of type 'object'. This is crucial because in this dataset, 'age' represents age groups, and there shouldn't be any numeric data in the rows.

In [None]:
# Expectation 2  : expect_column_values_to_be_of_type
validator.expect_column_values_to_be_of_type(
        "age",'object')

Calculating Metrics:   0%|          | 0/1 [00:00<?, ?it/s]

{
  "success": true,
  "result": {
    "observed_value": "object_"
  },
  "meta": {},
  "exception_info": {
    "raised_exception": false,
    "exception_traceback": null,
    "exception_message": null
  }
}

## expect_column_values_to_be_in_set on column **gender**

The `expect_column_values_to_be_in_set` validator is used to ensure that the 'gender' column contains values either 'male' or 'female'.

In [None]:
# Expectation 3  : expect_column_values_to_be_in_set
validator.expect_column_values_to_be_in_set(
        "gender",
        ['male','female'])

Calculating Metrics:   0%|          | 0/8 [00:00<?, ?it/s]

{
  "success": true,
  "result": {
    "element_count": 98485,
    "unexpected_count": 0,
    "unexpected_percent": 0.0,
    "partial_unexpected_list": [],
    "missing_count": 0,
    "missing_percent": 0.0,
    "unexpected_percent_total": 0.0,
    "unexpected_percent_nonmissing": 0.0
  },
  "meta": {},
  "exception_info": {
    "raised_exception": false,
    "exception_traceback": null,
    "exception_message": null
  }
}

## expect_column_values_to_be_of_type on column **driving_experience**

The `expect_column_values_to_be_of_type` validator is used to validate whether the 'driving experience' column has the datatype 'object'. This column represents age groups, and therefore, the data in this column should consist of predefined categories not numeric.

In [None]:
# Expectation 4  : expect_column_values_to_be_of_type
validator.expect_column_values_to_be_of_type(
        "driving_experience",'object')

Calculating Metrics:   0%|          | 0/1 [00:00<?, ?it/s]

{
  "success": true,
  "result": {
    "observed_value": "object_"
  },
  "meta": {},
  "exception_info": {
    "raised_exception": false,
    "exception_traceback": null,
    "exception_message": null
  }
}

## expect_column_values_to_be_in_set on column **education**

The `expect_column_values_to_be_in_set` validator is used to validate the 'education' column to ensure that it contains values that match the predefined categories: 'university', 'high school', or 'none'.

In [None]:
# Expectation 5  : expect_column_values_to_be_in_set
validator.expect_column_values_to_be_in_set(
        "education",
        ['university','high school','none'])

Calculating Metrics:   0%|          | 0/8 [00:00<?, ?it/s]

{
  "success": true,
  "result": {
    "element_count": 98485,
    "unexpected_count": 0,
    "unexpected_percent": 0.0,
    "partial_unexpected_list": [],
    "missing_count": 0,
    "missing_percent": 0.0,
    "unexpected_percent_total": 0.0,
    "unexpected_percent_nonmissing": 0.0
  },
  "meta": {},
  "exception_info": {
    "raised_exception": false,
    "exception_traceback": null,
    "exception_message": null
  }
}

## expect_column_values_to_be_in_set on column **income**

The `expect_column_values_to_be_in_set` validator for the 'income' column is used to validate whether the data in the 'income' column aligns with the predefined categories: 'middle class', 'upper class', 'working class', and 'poverty'.

In [None]:
# Expectation 6  : expect_column_values_to_be_in_set
validator.expect_column_values_to_be_in_set(
        "income",
        ['middle class', 'upper class', 'working class', 'poverty'])

Calculating Metrics:   0%|          | 0/8 [00:00<?, ?it/s]

{
  "success": true,
  "result": {
    "element_count": 98485,
    "unexpected_count": 0,
    "unexpected_percent": 0.0,
    "partial_unexpected_list": [],
    "missing_count": 0,
    "missing_percent": 0.0,
    "unexpected_percent_total": 0.0,
    "unexpected_percent_nonmissing": 0.0
  },
  "meta": {},
  "exception_info": {
    "raised_exception": false,
    "exception_traceback": null,
    "exception_message": null
  }
}

## expect_column_min_to_be_between on column **credit_score**

The `expect_column_min_to_be_between` validator is used to validate whether the values in the 'credit_score' column fall within the range of 0 to 1. Since credit scores typically range from 0 to 1, it is not possible for them to be below 0 or above 1.

In [None]:
# Expectation 7  : expect_column_min_to_be_between
validator.expect_column_min_to_be_between('credit_score',0,1)

Calculating Metrics:   0%|          | 0/4 [00:00<?, ?it/s]

{
  "success": true,
  "result": {
    "observed_value": 0.0668800479291154
  },
  "meta": {},
  "exception_info": {
    "raised_exception": false,
    "exception_traceback": null,
    "exception_message": null
  }
}

## expect_column_values_to_be_in_set on column **vehicle_year**

The `expect_column_values_to_be_in_set` validator is used on the 'vehicle_year' column to validate whether the values are in accordance with the categories 'after 2015' and 'before 2015'.

In [None]:
# Expectation 8  : expect_column_values_to_be_in_set
validator.expect_column_values_to_be_in_set(
        "vehicle_year",
        ['after 2015', 'before 2015'])

Calculating Metrics:   0%|          | 0/8 [00:00<?, ?it/s]

{
  "success": true,
  "result": {
    "element_count": 98485,
    "unexpected_count": 0,
    "unexpected_percent": 0.0,
    "partial_unexpected_list": [],
    "missing_count": 0,
    "missing_percent": 0.0,
    "unexpected_percent_total": 0.0,
    "unexpected_percent_nonmissing": 0.0
  },
  "meta": {},
  "exception_info": {
    "raised_exception": false,
    "exception_traceback": null,
    "exception_message": null
  }
}

## expect_column_values_to_be_in_set on column **type_of_vehicle**

The `expect_column_values_to_be_in_set` validator is used on the 'type of vehicle' column to ensure that the values in this column match the predefined categories.

In [None]:
# Expectation 9  : expect_column_values_to_be_in_set
validator.expect_column_values_to_be_in_set(
        "type_of_vehicle",
        ['sports car', 'hatchback', 'sedan', 'suv'])

Calculating Metrics:   0%|          | 0/8 [00:00<?, ?it/s]

{
  "success": true,
  "result": {
    "element_count": 98485,
    "unexpected_count": 0,
    "unexpected_percent": 0.0,
    "partial_unexpected_list": [],
    "missing_count": 0,
    "missing_percent": 0.0,
    "unexpected_percent_total": 0.0,
    "unexpected_percent_nonmissing": 0.0
  },
  "meta": {},
  "exception_info": {
    "raised_exception": false,
    "exception_traceback": null,
    "exception_message": null
  }
}

## expect_column_values_to_be_in_set on column **married**

The `expect_column_values_to_be_in_set` validator is used on the 'married' column to ensure that the values in this column match the predefined categories. The 'married' column contains values 0 (negative) and 1 (positive) indicating whether the individual is married or not, hence the appropriate validator.

In [None]:
# Expectation 10  : expect_column_values_to_be_in_set
validator.expect_column_values_to_be_in_set(
        "married",
        [0,1])

Calculating Metrics:   0%|          | 0/8 [00:00<?, ?it/s]

{
  "success": true,
  "result": {
    "element_count": 98485,
    "unexpected_count": 0,
    "unexpected_percent": 0.0,
    "partial_unexpected_list": [],
    "missing_count": 0,
    "missing_percent": 0.0,
    "unexpected_percent_total": 0.0,
    "unexpected_percent_nonmissing": 0.0
  },
  "meta": {},
  "exception_info": {
    "raised_exception": false,
    "exception_traceback": null,
    "exception_message": null
  }
}

## expect_column_values_to_be_in_set on column **children**

The `expect_column_values_to_be_in_set` validator is used on the 'children' column to ensure that the values in this column match the predefined categories. The 'children' column contains values 0 (negative) and 1 (positive) indicating whether the individual have children or not, hence the appropriate validator.

In [None]:
# Expectation 11  : expect_column_values_to_be_in_set
validator.expect_column_values_to_be_in_set(
        "children",
        [0,1])

Calculating Metrics:   0%|          | 0/8 [00:00<?, ?it/s]

{
  "success": true,
  "result": {
    "element_count": 98485,
    "unexpected_count": 0,
    "unexpected_percent": 0.0,
    "partial_unexpected_list": [],
    "missing_count": 0,
    "missing_percent": 0.0,
    "unexpected_percent_total": 0.0,
    "unexpected_percent_nonmissing": 0.0
  },
  "meta": {},
  "exception_info": {
    "raised_exception": false,
    "exception_traceback": null,
    "exception_message": null
  }
}

## expect_column_values_to_be_of_type on **postal_code**

The `expect_column_values_to_be_of_type` validator is used on the 'postal code' column to validate that this column has an integer data type.

In [None]:
# Expectation 12  : expect_column_values_to_be_of_type
validator.expect_column_values_to_be_of_type(
        "postal_code",'int64')

Calculating Metrics:   0%|          | 0/1 [00:00<?, ?it/s]

{
  "success": true,
  "result": {
    "observed_value": "int64"
  },
  "meta": {},
  "exception_info": {
    "raised_exception": false,
    "exception_traceback": null,
    "exception_message": null
  }
}

## expect_column_values_to_be_of_type on **annual_miliage**

The `expect_column_values_to_be_of_type` validator is used on the 'annual_miliage' column to validate that this column has an float data type.

In [None]:
# Expectation 13  : expect_column_values_to_be_of_type
validator.expect_column_values_to_be_of_type(
        "annual_mileage",'float64')

Calculating Metrics:   0%|          | 0/1 [00:00<?, ?it/s]

{
  "success": false,
  "result": {
    "observed_value": "float64"
  },
  "meta": {},
  "exception_info": {
    "raised_exception": false,
    "exception_traceback": null,
    "exception_message": null
  }
}

## expect_column_values_to_be_of_type on column **speeding_violations**

The `expect_column_values_to_be_of_type` validator is used on the 'speeding_violations' column to validate that this column has an integer data type.

In [None]:
# Expectation 14  : expect_column_values_to_be_of_type
validator.expect_column_values_to_be_of_type(
        "speeding_violations",'int64')

Calculating Metrics:   0%|          | 0/1 [00:00<?, ?it/s]

{
  "success": true,
  "result": {
    "observed_value": "int64"
  },
  "meta": {},
  "exception_info": {
    "raised_exception": false,
    "exception_traceback": null,
    "exception_message": null
  }
}

## expect_column_values_to_be_of_type on column **duis**

The `expect_column_values_to_be_of_type` validator is used on the 'duis' column to validate that this column has an integer data type.

In [None]:
# Expectation 15  : expect_column_values_to_be_of_type
validator.expect_column_values_to_be_of_type(
        "duis",'int64')

Calculating Metrics:   0%|          | 0/1 [00:00<?, ?it/s]

{
  "success": true,
  "result": {
    "observed_value": "int64"
  },
  "meta": {},
  "exception_info": {
    "raised_exception": false,
    "exception_traceback": null,
    "exception_message": null
  }
}

## expect_column_values_to_be_of_type on column **past_accidents**

The `expect_column_values_to_be_of_type` validator is used on the 'past_accidents' column to validate that this column has an integer data type.

In [None]:
# Expectation 16  : expect_column_values_to_be_of_type
validator.expect_column_values_to_be_of_type(
        "past_accidents",'int64')

Calculating Metrics:   0%|          | 0/1 [00:00<?, ?it/s]

{
  "success": true,
  "result": {
    "observed_value": "int64"
  },
  "meta": {},
  "exception_info": {
    "raised_exception": false,
    "exception_traceback": null,
    "exception_message": null
  }
}

## expect_column_values_to_be_in_set on column **issue**

The `expect_column_values_to_be_in_set` validator is used to validate the 'issue' column to ensure that it contains values that match the predefined categories: 'crack','dent','glass shatter','lamp broken', 'scratch','tire flat'.

In [None]:
# Expectation 17  : expect_column_values_to_be_in_set
validator.expect_column_values_to_be_in_set(
        "issue",
        ['crack','dent','glass shatter','lamp broken', 'scratch','tire flat'])

Calculating Metrics:   0%|          | 0/8 [00:00<?, ?it/s]

{
  "success": true,
  "result": {
    "element_count": 98485,
    "unexpected_count": 0,
    "unexpected_percent": 0.0,
    "partial_unexpected_list": [],
    "missing_count": 0,
    "missing_percent": 0.0,
    "unexpected_percent_total": 0.0,
    "unexpected_percent_nonmissing": 0.0
  },
  "meta": {},
  "exception_info": {
    "raised_exception": false,
    "exception_traceback": null,
    "exception_message": null
  }
}

## expect_column_values_to_be_in_set on column **outcome**

The `expect_column_values_to_be_in_set` validator is used on the 'outcome' column to ensure that the values in this column match the predefined categories. The 'outcome' column contains values 0 (negative) and 1 (positive) indicating whether the individual is having reimburse or not, hence the appropriate validator.

In [None]:
# Expectation 18  : expect_column_values_to_be_in_set
validator.expect_column_values_to_be_in_set(
        "outcome",
        [0,1])

Calculating Metrics:   0%|          | 0/8 [00:00<?, ?it/s]

{
  "success": true,
  "result": {
    "element_count": 98485,
    "unexpected_count": 0,
    "unexpected_percent": 0.0,
    "partial_unexpected_list": [],
    "missing_count": 0,
    "missing_percent": 0.0,
    "unexpected_percent_total": 0.0,
    "unexpected_percent_nonmissing": 0.0
  },
  "meta": {},
  "exception_info": {
    "raised_exception": false,
    "exception_traceback": null,
    "exception_message": null
  }
}