# 1. Schema validation 

### In order to maintain data quality, consistency and reliability, a system needs to validate that it conforms to certain predefined structure or format. This is called schema validation and you'll practice this in the following exercises.

**a)** Create a dictionary that look like this

| Key       | Value |
| --------- | ----- |
| id        | 101   |
| name      | Erika |
| is_active | True  |
| age       | 45    |


In [122]:
record = dict(id=101, name="Erika", is_active=True, age=45)
record

{'id': 101, 'name': 'Erika', 'is_active': True, 'age': 45}

**b)** Validate that the id is integer, name is string, is_active is boolean and age is integer. It should return true if valid and false if not valid.

In [123]:
record ,type(record)

({'id': 101, 'name': 'Erika', 'is_active': True, 'age': 45}, dict)

In [124]:
schema = {"id" : int, "name": str, "is_active": bool, "age": int}



In [125]:
record["id"],schema["id"], record["is_active"],schema["is_active"]

(101, int, True, bool)

In [126]:
type(record["id"]) == schema["id"]
    

True

In [127]:
for key in record:
    if type(record[key]) == schema[key]:
        print(True)

True
True
True
True


In [128]:
type(record["id"]) == schema["id"], type(record["is_active"]) == schema["id"]

(True, False)

In [129]:
isinstance(record["id"], str)

False

**solution with for statement**

In [130]:
validation_list = []
for key, data_type in schema.items():
    # print(record[key], data_type)
    
    validation_list.append(isinstance(record[key], data_type))
    
    
    
all(validation_list)
    

True

In [131]:
all([True, False]), any([True, False])

(False, True)

**alternative with list comprehension**

In [132]:

all([isinstance(record[key], data_type) for key, data_type in schema.items()])

True

**c)** The dictionary created can be seen as one row, now lets create more records and store each record (dictionary) in a list.

| id  | name   | is_active | age  |
| --- | ------ | --------- | ---- |
| 102 | Marcus | True      | 34   |
| 103 | David  | False     | 29   |
| 104 | Anna   | True      | 41.5 |
| 106 | Ingrid | NOPE      | 8    


In [133]:
record_list = dict(
    id=[102, 103, 104, 106],
    name=["Marcus", "David", "Anna", "Ingrid"],
    is_active=[True, False, True, "Nope"],
    age=[34, 29, 41.5, 8],
)

record_list[key], type(record)

([34, 29, 41.5, 8], dict)

In [134]:
schema = {"id": int, "name": str, "is_active": bool, "age": int}

isinstance(record_list["id"][0], int), isinstance(record_list["age"][2], int)

(True, False)

In [138]:
validation_list_2 = []

for key, data_type in schema.items():
    validation_result = [isinstance(value,data_type) for value in record_list[key]]
    validation_list_2.append({key: all(validation_result)})


for result in validation_list_2:
    print(result)

{'id': True}
{'name': True}
{'is_active': False}
{'age': False}


**d)** Make a function for schema validation and try input the two examples and see if you get correct answer. Also make other examples and test your function.

In [1]:
def validation_schema(data, schema):
    validation_result = {}
    
    for key, expected_type in schema.items():
        if key not in data:
            validation_result[key] = "Key missing in data"
            
        else:
            validation_result[key] = [isinstance(value, expected_type) for value in data[key]]
   
    return validation_result

record_dict = dict(
    id=[102, 103, 104, 106],
    name=["Marcus", "David", "Anna", "Ingrid"],
    is_active=[True, False, True, "Nope"],
    age=[34, 29, 41.5, 8],
)

test_ = {"id": [1], "name": ["Bob"], "is_active": ["Nope"], "age": [35]}
schema = {"id": int, "name": str, "is_active": bool, "age": int}



In [2]:
print(validation_schema(test_, schema))

{'id': [True], 'name': [True], 'is_active': [False], 'age': [True]}


## Test 

In [213]:
fotball_players = dict(
    nr=[7, 8, 10, 12, 14, 16],
    name=["Kent", "Charlie", 5, "Bob", "John", "Max"],
    is_active=[True, True, "Nope", False, True, True],
    age=[25, 27, 38, 45, 30, 27],
)
fotball_players, type(fotball_players)

schema_ = {"nr": int, "name": str, "is_active": bool, "age": int}

isinstance(fotball_players["nr"][0], schema_["nr"]), isinstance(fotball_players["is_active"][2], schema_["is_active"])

(True, False)

In [214]:
def validation_schema_(data, schema):
    validation_result = {}
    for key, expected_data_type in schema.items():
        if key not in data:
            validation_result[key] = "key don't existed"
        else:
            validation_result[key] = [all(isinstance(value, expected_data_type) for value in data[key])]
    
    return validation_result

In [215]:
print(validation_schema_(fotball_players, schema_))

{'nr': [True], 'name': [False], 'is_active': [False], 'age': [True]}
