# Great Expectation

---

Notebook ini digunakan untuk proses analysis Great Expectation.

Analysis dilakukan untuk validasi dataset dan menjaga data quality agar bisa lebih reliable dalam penggunaan dan pengolahan lebih lanjut.

---

In [1]:
# Import libraries

import pandas as pd
import great_expectations as ge
from great_expectations.checkpoint import SimpleCheckpoint

In [2]:
# Load GX context
context = ge.get_context()

# Add datasource
datasource = context.sources.add_or_update_pandas(name="smartwatch_csv_datasource")

# Load CSV and make asset
csv_file = "data_product.csv"  
asset = datasource.add_csv_asset(name="smartwatch_asset", filepath_or_buffer=csv_file)

# Batch Request
batch_request = asset.build_batch_request()

# Expectation Suite
suite_name = "smartwatch_suite"
suite = context.add_or_update_expectation_suite(expectation_suite_name=suite_name)

# Create Validator
validator = context.get_validator(
    batch_request=batch_request,
    expectation_suite_name=suite_name)

In [3]:
# 1. To not be missing value

validator.expect_column_values_to_not_be_null('products')

Calculating Metrics:   0%|          | 0/6 [00:00<?, ?it/s]

{
  "success": true,
  "result": {
    "element_count": 280,
    "unexpected_count": 0,
    "unexpected_percent": 0.0,
    "partial_unexpected_list": []
  },
  "meta": {},
  "exception_info": {
    "raised_exception": false,
    "exception_traceback": null,
    "exception_message": null
  }
}

True, tidak ada missing value di kolom prducts.

In [4]:
# 2. To be between min_value and max_value

validator.expect_column_values_to_be_between('rating', min_value=1, max_value=5)

Calculating Metrics:   0%|          | 0/8 [00:00<?, ?it/s]

{
  "success": true,
  "result": {
    "element_count": 280,
    "unexpected_count": 0,
    "unexpected_percent": 0.0,
    "partial_unexpected_list": [],
    "missing_count": 0,
    "missing_percent": 0.0,
    "unexpected_percent_total": 0.0,
    "unexpected_percent_nonmissing": 0.0
  },
  "meta": {},
  "exception_info": {
    "raised_exception": false,
    "exception_traceback": null,
    "exception_message": null
  }
}

True, nilai rating antara 1-5.

In [5]:
# 3. To be exist

validator.expect_column_to_exist(column='rating')

Calculating Metrics:   0%|          | 0/2 [00:00<?, ?it/s]

{
  "success": true,
  "result": {},
  "meta": {},
  "exception_info": {
    "raised_exception": false,
    "exception_traceback": null,
    "exception_message": null
  }
}

True, rating harus ada karena penting untuk proses selanjutnya untuk memberikan rekomendasi kepada user.

In [6]:
# 4. To be in Set

valid_brand = ['Amazfit', 'Apple Watch', 'Fitbit', 'Garmin', 'Google', 'Huawei', 'Samsung', 'Xiaomi']
validator.expect_column_values_to_be_in_set('brand', valid_brand)

Calculating Metrics:   0%|          | 0/8 [00:00<?, ?it/s]

{
  "success": true,
  "result": {
    "element_count": 280,
    "unexpected_count": 0,
    "unexpected_percent": 0.0,
    "partial_unexpected_list": [],
    "missing_count": 0,
    "missing_percent": 0.0,
    "unexpected_percent_total": 0.0,
    "unexpected_percent_nonmissing": 0.0
  },
  "meta": {},
  "exception_info": {
    "raised_exception": false,
    "exception_traceback": null,
    "exception_message": null
  }
}

True, kolom brnad berisi brand sesuai list brand program ini.

In [7]:
# 5. To be in Type List

validator.expect_column_values_to_be_in_type_list('price', ['integer', 'float'])

Calculating Metrics:   0%|          | 0/1 [00:00<?, ?it/s]

{
  "success": true,
  "result": {
    "observed_value": "float64"
  },
  "meta": {},
  "exception_info": {
    "raised_exception": false,
    "exception_traceback": null,
    "exception_message": null
  }
}

True, nilai price haruslah numerikal.

In [8]:
# 6. To Match Regex

validator.expect_column_values_to_match_regex(
    'products',
    r'(?i)(Amazfit|Apple Watch|Fitbit|Garmin|Xiaomi|Huawei Watch|Samsung|Google Pixel)'
)

Calculating Metrics:   0%|          | 0/8 [00:00<?, ?it/s]

{
  "success": true,
  "result": {
    "element_count": 280,
    "unexpected_count": 0,
    "unexpected_percent": 0.0,
    "partial_unexpected_list": [],
    "missing_count": 0,
    "missing_percent": 0.0,
    "unexpected_percent_total": 0.0,
    "unexpected_percent_nonmissing": 0.0
  },
  "meta": {},
  "exception_info": {
    "raised_exception": false,
    "exception_traceback": null,
    "exception_message": null
  }
}

True, nama product harus sesuai regex brand yang ditentukan.

In [9]:
# 7. To be max

validator.expect_column_max_to_be_between('rating', 5)

Calculating Metrics:   0%|          | 0/4 [00:00<?, ?it/s]

{
  "success": true,
  "result": {
    "observed_value": 5.0
  },
  "meta": {},
  "exception_info": {
    "raised_exception": false,
    "exception_traceback": null,
    "exception_message": null
  }
}

True, nilai rating hanya bisa maksimal 5.

In [10]:
# Save into Expectation Suite
validator.save_expectation_suite(discard_failed_expectations=False)

In [11]:
# Create a Checkpoint
checkpoint = SimpleCheckpoint(
    name="smartwatch_csv_checkpoint",
    data_context=context,
    validator=validator
)

result = checkpoint.run()
print(result)

Calculating Metrics:   0%|          | 0/28 [00:00<?, ?it/s]

{
  "run_id": {
    "run_name": null,
    "run_time": "2025-07-16T20:31:44.254314+08:00"
  },
  "run_results": {
    "ValidationResultIdentifier::smartwatch_suite/__none__/20250716T123144.254314Z/smartwatch_csv_datasource-smartwatch_asset": {
      "validation_result": {
        "success": true,
        "results": [
          {
            "success": true,
            "expectation_config": {
              "expectation_type": "expect_column_values_to_not_be_null",
              "kwargs": {
                "column": "products",
                "batch_id": "smartwatch_csv_datasource-smartwatch_asset"
              },
              "meta": {}
            },
            "result": {
              "element_count": 280,
              "unexpected_count": 0,
              "unexpected_percent": 0.0,
              "partial_unexpected_list": [],
              "partial_unexpected_counts": [],
              "partial_unexpected_index_list": []
            },
            "meta": {},
            "except

Berdasarkan hasil great expectation analysis, dataset setelah proses cleaning valid dan reliable untuk digunakan pada proses selanjutnya.