<a href="https://colab.research.google.com/github/changsin/Medium/blob/main/notebooks/JSON_Serialization.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# JSON Serialization
Serialization is the process of turning an object into something that can be saved and retrieved later: e.g., displaying an object print or saving into a file. For basic types like string, int or Boolean, you don't need to do anything special to print and save them in Python. For custom objects like instances of a class you defined, things are not quite straightforward. Here is the summary of the issues that you will encounter and how you can solve them.

# 1. Use \__dict\__ for simple types
Simple types can be serialized as a dictionary object.

In [None]:
class ImageLabelSimple:
    def __init__(self, label, x, y, width, height):
        self.label = label
        self.x = x
        self.y = y
        self.width = width
        self.height = height

image_label = ImageLabelSimple("person", 10, 10, 4, 10)
print(image_label)
print(image_label.__dict__)

<__main__.ImageLabelSimple object at 0x7fbccbf15110>
{'label': 'person', 'x': 10, 'y': 10, 'width': 4, 'height': 10}


But this method doesn't work if you want to use json.dumps() which is a default method of serializing an object into a JSON object in Python.

In [None]:
import json

print(json.dumps(image_label))

TypeError: ignored

# 2. Implement \__str__ or \__repr__ method
Both __str__ and __repr__ are a string representation of the object. Then what is the difference? The main difference between __str__ and __repr__ is the intended audience. __str__ is the string representation of the object for display purposes while __repr__ is to the machine (other methods, etc.) [ref](https://www.pythontutorial.net/python-oop/python-__repr__/).

By implementing either or both methods, you do not have to explicitly call json.dumps() to return the JSON string of the object. In our case

In [None]:
class ImageLabel:
    def __init__(self, label, x, y, width, height):
        self.label = label
        self.x = x
        self.y = y
        self.width = width
        self.height = height

    def __iter__(self):
        yield from {
            "label": self.label,
            "x": self.x,
            "y": self.y,
            "width": self.width,
            "height": self.height
        }.items()

    def __str__(self):
        return json.dumps(dict(self), ensure_ascii=False)
        # json.dumps basically does this in this case.
        # return str({
        #     "label": self.label,
        #     "x": self.x,
        #     "y": self.y,
        #     "width": self.width,
        #     "height": self.height
        # })

    def __repr__(self):
        return json.dumps(dict(self), ensure_ascii=False)

image_label = ImageLabel("person", 10, 10, 4, 10)

print(image_label)
# but you cannot call json.dumps on the object since there is no JSONEncoder
# print(json.dumps(image_label))

{"label": "person", "x": 10, "y": 10, "width": 4, "height": 10}


# 3. Subclass Encoder
To support the json.dumps use case, one way is to implement a custom JSONEncoder class. The pros of this approach is that you don't have to implement extra methods like __str__ or __repr__ and supports a wide variety of class object types. The disadvantage is that you have to implement a custom class.


In [None]:
from json import JSONEncoder

class MyEncoder(JSONEncoder):
    def default(self, obj):
        return obj.__dict__    

image_label_simple = ImageLabelSimple("person", 10, 10, 4, 10)

print(MyEncoder().encode(image_label_simple))

print(json.dumps(image_label_simple, cls=MyEncoder))
print(image_label_simple)

{"label": "person", "x": 10, "y": 10, "width": 4, "height": 10}
{"label": "person", "x": 10, "y": 10, "width": 4, "height": 10}
<__main__.ImageLabelSimple object at 0x7fbccbfafc90>


# 4. Handling next classes
While the above solutions work for serializing simple classes, if you have more complex classes that include other custom objects, things do not work quite well. Let's suppose that you have another class "ImageLabelCollection" which contains a list of boundingBoxes which is a dictionary with a image file name and a list of bounding boxes for in the image. Below is a sample JSON file.

In [None]:
{
  "version": 1,
  "type": "bounding-box-labels",
  "boundingBoxes": {
    "20210715_111300 16.jpg": [
      {
        "label": "StabilityOff",
        "x": 1,
        "y": 1025,
        "width": 553,
        "height": 29
      },
      {
        "label": "StabilityOn",
        "x": 1,
        "y": 964,
        "width": 563,
        "height": 30
      },
    ]
  }
}

{'boundingBoxes': {'20210715_111300 16.jpg': [{'height': 29,
    'label': 'StabilityOff',
    'width': 553,
    'x': 1,
    'y': 1025},
   {'height': 30, 'label': 'StabilityOn', 'width': 563, 'x': 1, 'y': 964}]},
 'type': 'bounding-box-labels',
 'version': 1}

Let's see how they are serialized.

In [None]:
class ImageLabelCollectionBad:
    def __init__(self, bboxes):
        self.version = 1
        self.type = "bounding-box-labels"
        self.bboxes = bboxes

    def __iter__(self):
        yield from {
            "version": self.version,
            "type": self.type,
            "boundingBoxes": self.bboxes
        }.items()

    def __str__(self):
        # return json.dumps(dict(self), cls=MyEncoder, ensure_ascii=False)
        return json.dumps(dict(self), ensure_ascii=False)

    def __repr__(self):
        return self.__str__()


# image_label1 = ImageLabelSimple("person", 10, 10, 4, 10)
# image_label2 = ImageLabelSimple("car", 20, 20, 5, 11)

# image_bboxes = {"image1.jpg": [image_label1, image_label2]}

# image_label_col = ImageLabelCollection(image_bboxes)
# print(image_label_col)
# print(json.dumps(image_label_col, cls=MyEncoder))

image_label1 = ImageLabel("person", 10, 10, 4, 10)
image_label2 = ImageLabel("car", 20, 20, 5, 11)

image_bboxes = {"image1.jpg": [image_label1, image_label2]}

image_label_col_bad = ImageLabelCollectionBad(image_bboxes)
print(image_label_col_bad)
# print(json.dumps(image_label_col, cls=MyEncoder))

TypeError: ignored

The first attempt failed because when json.dumps() calls on the ImageLabelCollection object itself, it sees a list of ImageLabel objects which are not JSON serializable and thus the error.

The simplest way to fix is to specify the 'cls' parameter.

In [None]:
class ImageLabelCollection:
    def __init__(self, bboxes):
        self.version = 1
        self.type = "bounding-box-labels"
        self.bboxes = bboxes

    def __iter__(self):
        yield from {
            "version": self.version,
            "type": self.type,
            "boundingBoxes": self.bboxes
        }.items()

    def __str__(self):
        return json.dumps(dict(self), cls=MyEncoder, ensure_ascii=False)

    def __repr__(self):
        return self.__str__()


image_label1 = ImageLabel("person", 10, 10, 4, 10)
image_label2 = ImageLabel("car", 20, 20, 5, 11)

image_bboxes = {"image1.jpg": [image_label1, image_label2]}

image_label_col = ImageLabelCollection(image_bboxes)
print(image_label_col)
print(json.dumps(image_label_col, cls=MyEncoder))

test
{"version": 1, "type": "bounding-box-labels", "bboxes": {"image1.jpg": [{"label": "person", "x": 10, "y": 10, "width": 4, "height": 10}, {"label": "car", "x": 20, "y": 20, "width": 5, "height": 11}]}}


This looks a lot better, doesn't it? This method also works with json.dumps if you specify cls parameter as well.

# 5. Implement a custom to_json() method

One problem is that you see that the json results are slightly different. The string representation has "boundingBoxes" as the key, but with json.dumps() method, the key is "bboxes" which is the class variable of ImageLabelCollection. We can tell that, instead of calling the class method __str__, it called MyEncoder's default() method which simply returns __dict__.

How can we fix this? A solution is to re-define the default method of the Encoder. Since we are using the same encode for all classes, we have to add to_json() method to both classes.

In [None]:
class MyJSONEncoder(JSONEncoder):
    def default(self, obj):
        return obj.to_json()


class ImageLabel:
    def __init__(self, label, x, y, width, height):
        self.label = label
        self.x = x
        self.y = y
        self.width = width
        self.height = height

    def __iter__(self):
        yield from {
            "label": self.label,
            "x": self.x,
            "y": self.y,
            "width": self.width,
            "height": self.height
        }.items()

    def __str__(self):
        return json.dumps(dict(self), ensure_ascii=False)

    def __repr__(self):
        return self.__str__()

    def to_json(self):
        return self.__str__()

class ImageLabelCollection:
    def __init__(self, bboxes):
        self.version = 1
        self.type = "bounding-box-labels"
        self.bboxes = bboxes

    def __iter__(self):
        yield from {
            "version": self.version,
            "type": self.type,
            "boundingBoxes": self.bboxes
        }.items()

    def __str__(self):
        return json.dumps(dict(self), cls=MyJSONEncoder, ensure_ascii=False)

    def __repr__(self):
        return self.__str__()

    def to_json(self):
        return self.__str__()

image_label1 = ImageLabel("person", 10, 10, 4, 10)
image_label2 = ImageLabel("car", 20, 20, 5, 11)

image_bboxes = {"image1.jpg": [image_label1, image_label2]}

image_label_col = ImageLabelCollection(image_bboxes)
print(image_label_col)
print(json.dumps(image_label_col, cls=MyJSONEncoder))


{"version": 1, "type": "bounding-box-labels", "boundingBoxes": {"image1.jpg": ["{\"label\": \"person\", \"x\": 10, \"y\": 10, \"width\": 4, \"height\": 10}", "{\"label\": \"car\", \"x\": 20, \"y\": 20, \"width\": 5, \"height\": 11}"]}}
"{\"version\": 1, \"type\": \"bounding-box-labels\", \"boundingBoxes\": {\"image1.jpg\": [\"{\\\"label\\\": \\\"person\\\", \\\"x\\\": 10, \\\"y\\\": 10, \\\"width\\\": 4, \\\"height\\\": 10}\", \"{\\\"label\\\": \\\"car\\\", \\\"x\\\": 20, \\\"y\\\": 20, \\\"width\\\": 5, \\\"height\\\": 11}\"]}}"


# 6. Fixing double quotations
While we now see boundingBoxes in both cases (simple print and json.dumps), we see a different problem. The ImageLabel classes contained in ImageLabelCollection are turned into strings. This is because we defined __str__ of ImageLabel to return a JSON string representation of the object. When __str__ method of ImageLabelCollection is called, it will try to turn the JSON string into JSON string again. Ouch! We traded one problem with another.

A proper way to fix is to refactor to_json() method of ImageLabelCollection class and serialize the contained objects.

In [None]:
def default(obj):
    if hasattr(obj, 'to_json'):
        return obj.to_json()
    raise TypeError(f'Object of type {obj.__class__.__name__} is not JSON serializable')


class ImageLabel:
    def __init__(self, label, x, y, width, height):
        self.label = label
        self.x = x
        self.y = y
        self.width = width
        self.height = height

    def __iter__(self):
        yield from {
            "label": self.label,
            "x": self.x,
            "y": self.y,
            "width": self.width,
            "height": self.height
        }.items()

    def __str__(self):
        return json.dumps(dict(self), default=default, ensure_ascii=False)

    def __repr__(self):
        return self.__str__()

    def to_json(self):
        return self.__str__()


class ImageLabelCollection:
    def __init__(self, bboxes):
        self.version = 1
        self.type = "bounding-box-labels"
        self.bboxes = bboxes

    def __iter__(self):
        yield from {
            "version": self.version,
            "type": self.type,
            "boundingBoxes": self.bboxes
        }.items()

    def __str__(self):
        return json.dumps(dict(self), default=default, ensure_ascii=False)

    def __repr__(self):
        return self.__str__()

    def to_json(self):
        to_return = {"version": self.version, "type": self.type}
        image_boxes = {}
        for key, boxes in self.bboxes.items():
            jboxes = []
            for box in boxes:
                jboxes.append(box.__dict__)
            image_boxes[key] = jboxes

        to_return["boundingBoxes"] = image_boxes
        return to_return

image_label1 = ImageLabel("person", 10, 10, 4, 10)
image_label2 = ImageLabel("car", 20, 20, 5, 11)

image_bboxes = {"image1.jpg": [image_label1, image_label2]}

image_label_col = ImageLabelCollection(image_bboxes)
print(image_label_col)
print(json.dumps(image_label_col, default=default))


{"version": 1, "type": "bounding-box-labels", "boundingBoxes": {"image1.jpg": ["{\"label\": \"person\", \"x\": 10, \"y\": 10, \"width\": 4, \"height\": 10}", "{\"label\": \"car\", \"x\": 20, \"y\": 20, \"width\": 5, \"height\": 11}"]}}
{"version": 1, "type": "bounding-box-labels", "boundingBoxes": {"image1.jpg": [{"label": "person", "x": 10, "y": 10, "width": 4, "height": 10}, {"label": "car", "x": 20, "y": 20, "width": 5, "height": 11}]}}


Now we can see the difference between print() and json.dumps(). The regular print method calls __str__ and thus the contained objects are double-serialized, json.dumps() returns customized JSON file friendly output that we want.