- Schema validation
In order to maintain data quality, consistency and reliability, a system needs to validate that it conforms to certain predefined structure or format. This is called schema validation and you'll practice this in the following exercises.
- 模式验证
为了保持数据质量、一致性和可靠性，系统需要验证其是否符合某些预定义的结构或格式。这称为模式验证，您将在以下练习中练习这一点。

  a) Create a dictionary that look like this           
  b) Validate that the id is integer, name is string, is_active is boolean and age is integer. It should return true if valid and false if not valid.            

  c) The dictionary created can be seen as one row, now lets create more records and store each record (dictionary) in a list.           

In [1]:
import numpy as np

a) Create the dictionary

In [4]:
student = {"id": 101, "name": "Erika", "is_active": True, "age": 45}
print(student)

{'id': 101, 'name': 'Erika', 'is_active': True, 'age': 45}


b) Define a function to validate the schema of a dictionary:

In [5]:
# Schema validation function  架构验证函数

# isinstance() 是 Python 中的一个内置函数，用于检查某个变量是否是指定类型或其子类的实例。
# data 是函数的输入参数，预计它是一个字典。
# data.get("id"): 从字典 data 中获取键 "id" 的值。如果 "id" 不存在，则返回 None（而不会抛出错误）。
# 如果 "id" 不是整数，函数会立即返回 False，表示验证失败。

def validate_schema(data):
    if not isinstance(data.get("id"), int):
        return False
    if not isinstance(data.get("name"), str):
        return False
    if not isinstance(data.get("is_active"), bool):
        return False
    if not isinstance(data.get("age"), int):
        return False
    return True        

# Test the function with the initial record
validate_schema(student)


# kokchun
# schema 要验证的数据， student 原始数据
schema = {"id": int, "name": str, "is_active": bool, "age": int}
validation_list = []
for key, data_type in schema.items():
    # print(record[key], data_type )

    validation_list.append(isinstance(student[key], data_type))

all(validation_list)

all([True, False]), any([True, False])

# alternative with list comprehension
all([isinstance(student[key], data_type) for key, data_type in schema.items()])

True

In [6]:
# 测试单个字典
print(f"Single record validation: {'Valid' if validate_schema(student) else 'Invalid'}")


Single record validation: Valid


c) Create additional records and store them in a list: 创建附加记录并将其存储在列表中：

In [7]:
# Create a list of records

records = [
    {"id": 101, "name": "Erika", "is_active": True, "age": 45},
    {"id": 102, "name": "Marcus", "is_active": True, "age": 34},
    {"id": 103, "name": "David", "is_active": False, "age": 29},
    {"id": 104, "name": "Anna", "is_active": True, "age": 41.5},     # age 是浮点数 (无效)
    {"id": 106, "name": "Ingrid", "is_active": 'NOPE', "age": 8}     # is_active 是字符串 (无效)
]
records

[{'id': 101, 'name': 'Erika', 'is_active': True, 'age': 45},
 {'id': 102, 'name': 'Marcus', 'is_active': True, 'age': 34},
 {'id': 103, 'name': 'David', 'is_active': False, 'age': 29},
 {'id': 104, 'name': 'Anna', 'is_active': True, 'age': 41.5},
 {'id': 106, 'name': 'Ingrid', 'is_active': 'NOPE', 'age': 8}]

  d) Do schema validation on the JSON array in c) .  对 c) 中的 JSON 数组进行架构验证

   Python 的字典列表与 JSON 的对象数组的对等关系


In [8]:
# Validate all records

for record in records:
    is_valid = validate_schema(record)
    print(f"Record: {record}, Valid: {is_valid}")



Record: {'id': 101, 'name': 'Erika', 'is_active': True, 'age': 45}, Valid: True
Record: {'id': 102, 'name': 'Marcus', 'is_active': True, 'age': 34}, Valid: True
Record: {'id': 103, 'name': 'David', 'is_active': False, 'age': 29}, Valid: True
Record: {'id': 104, 'name': 'Anna', 'is_active': True, 'age': 41.5}, Valid: False
Record: {'id': 106, 'name': 'Ingrid', 'is_active': 'NOPE', 'age': 8}, Valid: False


 e) Make a function for schema validation and try input the two examples and see if you get correct answer. Also make other examples and test your function.         
 创建一个架构验证函数，并尝试输入两个示例，看看是否得到正确答案。还可以创建其他示例并测试您的函数。

In [25]:
test_data_1 = {"id": 107, "name": "Alice", "is_active": True, "age": 25}      # 有效
test_data_2 = {"id": "108", "name": "Bob", "is_active": False, "age": "30"}   # 无效

print(f"Test1 record validation: {'Valid' if validate_schema(test_data_1) else 'Invalid'}")
print(f"Test2 record validation: {'Valid' if validate_schema(test_data_2) else 'Invalid'}")

Test1 record validation: Valid
Test2 record validation: Invalid
