# Python Serialization-Deserialization Type-Safe Automation

##  Using ChatGPT and typing.TypedDict

### Deserialization from sparse external JSON

In [24]:
import json
from typing import List, Optional, Literal, TypedDict

In [25]:
sparse_json = """
{
    "name": "John Smith",
    "age": 32,
    "fan_of_olives": true,
    "codes": ["321", "332", "442", "421"]
}
"""

In the absence of any variety, we could assume that all fields are non-nullable.

If we cannot assume non-nullable, and do not have access to more objects, we should safe-access all fields, and implement default/fallback behavior.

### Deserialization from plentiful external JSON

In other cases, we have knowledge of the values of many instantiations of the object, through which the various data states (union types) can be detected.

In [26]:
plentiful_json = """
[
    {
        "name": "John Smith",
        "age": 32,
        "fan_of_olives": true,
        "codes": ["321", "332", "442", "421"]
    },
    {
        "name": "Jill Smith",
        "age": 30,
        "fan_of_olives": true,
        "codes": null
    },
    {
        "name": "Andrey Smith",
        "fan_of_olives": false,
        "codes": ["321", "332", "888"]
    },  
    {
        "name": "Egor Kreed",
        "fan_of_olives": null,
        "codes": null
    }, 
    {
        "name": "Andrey Smith",
        "fan_of_olives": false,
        "codes": ["321", "332", "888"]
    }  
]
"""

Providing the above JSON to ChatGPT with the prompt "Generate a Python TypedDict out of this JSON"

In [27]:
# class Person(TypedDict):
#     name: str
#     age: int
#     fan_of_olives: Optional[bool]
#     codes: Optional[List[str]]
 # GPT provides in the class syntax by default, but dictionary based representation has fewer representational translations, and is therefore more semantically direct

Person = TypedDict("Person", {
    "name": str,
    "age": int,
    "fan_of_olives": Optional[bool],
    "codes": Optional[List[str]],
})

Here, it recognizes two different forms of Option type:
1. `Type | None` (in the case of 'codes')
2. `boolean | None` (in the case of 'fan_of_olives', representing `True`, `False`, and `Unknown` states)

It doesn't, however, do well at noticing that source JSON may sometimes *lack* a given key, see key `'age'`.

In this case, an error possibility is left open:

In [28]:
json_list: List[Person] = json.loads(plentiful_json);

for obj in json_list:
    print("\n", obj["name"]) # Here we get autocomplete
    try:
        print("Codes: ", obj['codes'])
    except:
        print("❌Missing expected key: codes!")

    try:
        print("Fan of olives: ", obj['fan_of_olives'])
    except:
        print("❌Missing expected key: fan_of_olives!")

    try:
        print("Age: ", obj['age']) # Autocomplete offers 'age'
    except:
        print("❌Missing expected key: age!")


 John Smith
Codes:  ['321', '332', '442', '421']
Fan of olives:  True
Age:  32

 Jill Smith
Codes:  None
Fan of olives:  True
Age:  30

 Andrey Smith
Codes:  ['321', '332', '888']
Fan of olives:  False
❌Missing expected key: age!

 Egor Kreed
Codes:  None
Fan of olives:  None
❌Missing expected key: age!

 Andrey Smith
Codes:  ['321', '332', '888']
Fan of olives:  False
❌Missing expected key: age!


Notice that my claim here is based on nothing done at runtime. I am merely causing type-checking to take my word for it: 
```python 
    json_list: List[Person] = json.loads(plentiful_json);
```



If we build a class for the JSON object, we can express the underlying uncertainty more semantically by breaking out the two expectations:
1. `TypedDict` states my expectation of the JSON structure
2. `Class` states my expectation of being able to initialize a well-typed variable out of that JSON

Or, put procedurally:
1. Deserialize the JSON string to Python
2. Make sure the Python variable generated matches my expectations -- upfront

Prompting ChatGPT to "Convert that TypedDict into a Python class whose __init__ method takes only the parameter `json_data: TypedDict`"

In [29]:
Person = TypedDict("Person", {
    "name": str,
    "age": int, # Inaccurate assumption of required presence
    "fan_of_olives": Optional[bool],
    "codes": Optional[List[str]],
})

class PersonClass:
    def __init__(self, json_data: Person): # Autocomplete from here according to expectation declared above
        self.name = json_data['name']
        self.age = json_data["age"]
        self.fan_of_olives = json_data['fan_of_olives']
        self.codes = json_data['codes']

When my expectation of being able to initialize a known-typed object out of the JSON fails, I then realize that my expectation of the JSON itself needs updating. Once that is done, type checking will propogate those implications for me.


In [34]:
json_list = json.loads(plentiful_json);

for obj in json_list:
    person = PersonClass(obj)

    # dot access completion:
    print("\n", person.name)
    print("Codes: ", person.codes)
    print("Fan of olives: ", person.fan_of_olives)
    print("Age: ", person.age)


 John Smith
Codes:  ['321', '332', '442', '421']
Fan of olives:  True
Age:  32

 Jill Smith
Codes:  None
Fan of olives:  True
Age:  30


KeyError: 'age'

Having automatically generated the TypedDict and its corresponding class in under a minute, we can easily funnel the JSON list through this kind of test, whereupon we can progressively adjust our expectations of the source JSON, when keys reveal themselves to be conditionally present.

In [35]:
from typing import NotRequired

Person = TypedDict("Person", {
    "name": str,
    "age": NotRequired[int], # Fixed based on quick test
    "fan_of_olives": Optional[bool],
    "codes": Optional[List[str]],
})

class PersonClass:
    def __init__(self, json_data: Person): # Autocomplete from here according to expectation declared above
        self.name = json_data['name']
        self.age = json_data["age"] # Pylance now understands that this is unsafe through the `reportTypedDictNotRequiredAccess` setting
        self.fan_of_olives = json_data['fan_of_olives']
        self.codes = json_data['codes']

Now the declaration of the expectations for the source JSON is explicitly denoting the unreliable presence of the key. This propogates type-check errors into the class, where we can then explicitly handle the conditionaly presence of the value, either by implementing a simple default, or a fallback behavior (raising an exception, getting the value from an API, etc.)

In [36]:
class PersonClass:
    def __init__(self, json_data: Person): # Autocomplete from here according to expectation declared above
        self.name = json_data['name']
        self.age = json_data.get('age', None)
        self.fan_of_olives = json_data['fan_of_olives']
        self.codes = json_data['codes']

# JSON lacking the age key
person = PersonClass({
    'codes': None,
    'fan_of_olives': False,
    'name': "Jim Henson"
})

# Type checking pulls from the class' types, which implement the recovery behavior we added to the class
# In this case, `int | None`
age = person.age 


This is better than simply marking the whole TypedDict with `total=False`, which globally claims that the dictionary may or may not have any given key, while maintaining that if the key is present, its value is of the stated type. That option would force us to claim that any field may conditionally absent, which is `untrue`, semantically misleading, and leads to a preponderance of not-actually-needed null checking or default-value confabulation.

By the targeted application of `NotRequired`, assuming we have a representative sample of the source JSON's variability, we can pinpoint those fields which actually need a fallback, without compromising the convenience of straightforward typing and access of truly required fields.

It is also necessary to create a separation of JSON expectation from Python translation anyway, because fields may need post-processing, such as turning a string into a datetime object.