# Great Expectation

---

Notebook ini digunakan untuk proses analysis Great Expectation.

Analysis dilakukan untuk validasi dataset dan menjaga data quality agar bisa lebih reliable dalam penggunaan dan pengolahan lebih lanjut.

---

In [118]:
# Import libraries

import pandas as pd
import great_expectations as ge

In [119]:
# Load cleaned data

df = pd.read_csv("data_product.csv")


In [120]:
# Convert to validator

validator = ge.from_pandas(df)
validator.head()

Unnamed: 0,products,rating,price,features,battery,connectivity,gps,screen_size,img_url,brand
0,Amazfit Bip 6,4.5,74.99,"Multisport Tracker, Text Messaging, AI Assista...",340.0,Bluetooth,1.0,1.97,https://m.media-amazon.com/images/I/61UvVTN0IE...,Amazfit
1,Amazfit Active 2,4.5,75.99,"Multisport Tracker, Text Messaging, AI Assista...",270.0,Bluetooth,1.0,1.32,https://m.media-amazon.com/images/I/71mpuO4Lqe...,Amazfit
2,Amazfit T-Rex 3,4.4,189.99,"Maps, Altitude Assistant, Compass, dual-band G...",0.0,"Bluetooth, Wi-Fi",1.0,1.5,https://m.media-amazon.com/images/I/71GtgMbKvK...,Amazfit
3,Amazfit Active 2,4.5,94.99,"Multisport Tracker, Text Messaging, AI Assista...",270.0,Bluetooth,1.0,1.32,https://m.media-amazon.com/images/I/71XpjL4qkP...,Amazfit
4,Amazfit Active 2,4.5,75.99,"Multisport Tracker, Text Messaging, AI Assista...",270.0,Bluetooth,1.0,1.32,https://m.media-amazon.com/images/I/71XpjL4qkP...,Amazfit


In [121]:
# 1. To not be missing value

validator.expect_column_values_to_not_be_null('products')

{
  "success": true,
  "result": {
    "element_count": 311,
    "unexpected_count": 0,
    "unexpected_percent": 0.0,
    "unexpected_percent_total": 0.0,
    "partial_unexpected_list": []
  },
  "meta": {},
  "exception_info": {
    "raised_exception": false,
    "exception_traceback": null,
    "exception_message": null
  }
}

True, tidak ada missing value di kolom prducts.

In [122]:
# 2. To be between min_value and max_value

validator.expect_column_values_to_be_between('rating', min_value=1, max_value=5)

{
  "success": true,
  "result": {
    "element_count": 311,
    "missing_count": 0,
    "missing_percent": 0.0,
    "unexpected_count": 0,
    "unexpected_percent": 0.0,
    "unexpected_percent_total": 0.0,
    "unexpected_percent_nonmissing": 0.0,
    "partial_unexpected_list": []
  },
  "meta": {},
  "exception_info": {
    "raised_exception": false,
    "exception_traceback": null,
    "exception_message": null
  }
}

True, nilai rating antara 1-5.

In [123]:
# 3. To be exist

validator.expect_column_to_exist(column='rating')

{
  "success": true,
  "result": {},
  "meta": {},
  "exception_info": {
    "raised_exception": false,
    "exception_traceback": null,
    "exception_message": null
  }
}

True, rating harus ada karena penting untuk proses selanjutnya untuk memberikan rekomendasi kepada user.

In [124]:
# 4. To be in Set

valid_brand = ['Amazfit', 'Apple Watch', 'Fitbit', 'Garmin', 'Google', 'Huawei', 'Samsung', 'Xiaomi']
validator.expect_column_values_to_be_in_set('brand', valid_brand)

{
  "success": true,
  "result": {
    "element_count": 311,
    "missing_count": 0,
    "missing_percent": 0.0,
    "unexpected_count": 0,
    "unexpected_percent": 0.0,
    "unexpected_percent_total": 0.0,
    "unexpected_percent_nonmissing": 0.0,
    "partial_unexpected_list": []
  },
  "meta": {},
  "exception_info": {
    "raised_exception": false,
    "exception_traceback": null,
    "exception_message": null
  }
}

True, kolom brnad berisi brand sesuai list brand program ini.

In [125]:
# 5. To be in Type List

validator.expect_column_values_to_be_in_type_list('price', ['integer', 'float'])

{
  "success": true,
  "result": {
    "observed_value": "float64"
  },
  "meta": {},
  "exception_info": {
    "raised_exception": false,
    "exception_traceback": null,
    "exception_message": null
  }
}

true, nilai price haruslah numerikal.

In [126]:
# 6. To Match Regex

validator.expect_column_values_to_match_regex(
    'products',
    r'(?i)(Amazfit|Apple Watch|Fitbit|Garmin|Xiaomi|Huawei Watch|Samsung|Google Pixel)'
)

{
  "success": true,
  "result": {
    "element_count": 311,
    "missing_count": 0,
    "missing_percent": 0.0,
    "unexpected_count": 0,
    "unexpected_percent": 0.0,
    "unexpected_percent_total": 0.0,
    "unexpected_percent_nonmissing": 0.0,
    "partial_unexpected_list": []
  },
  "meta": {},
  "exception_info": {
    "raised_exception": false,
    "exception_traceback": null,
    "exception_message": null
  }
}

True, nama product harus sesuai regex brand yang ditentukan.

In [127]:
# 7. To be max

validator.expect_column_max_to_be_between('rating', 5)

{
  "success": true,
  "result": {
    "observed_value": 5.0,
    "element_count": 311,
    "missing_count": null,
    "missing_percent": null
  },
  "meta": {},
  "exception_info": {
    "raised_exception": false,
    "exception_traceback": null,
    "exception_message": null
  }
}

True, nilai rating hanya bisa maksimal 5.

In [128]:
# Results

results = validator.validate()
print(results)

{
  "success": true,
  "results": [
    {
      "success": true,
      "expectation_config": {
        "expectation_type": "expect_column_values_to_not_be_null",
        "kwargs": {
          "column": "products",
          "result_format": "BASIC"
        },
        "meta": {}
      },
      "result": {
        "element_count": 311,
        "unexpected_count": 0,
        "unexpected_percent": 0.0,
        "unexpected_percent_total": 0.0,
        "partial_unexpected_list": []
      },
      "meta": {},
      "exception_info": {
        "raised_exception": false,
        "exception_message": null,
        "exception_traceback": null
      }
    },
    {
      "success": true,
      "expectation_config": {
        "expectation_type": "expect_column_values_to_match_regex",
        "kwargs": {
          "column": "products",
          "regex": "(?i)(Amazfit|Apple Watch|Fitbit|Garmin|Xiaomi|Huawei Watch|Samsung|Google Pixel)",
          "result_format": "BASIC"
        },
        "meta": {}

Berdasarkan hasil great expectation analysis, dataset setelah proses cleaning valid dan relable untuk digunakan pada proses selanjutnya.