# The data structure

The data structure is the way we store and structure our data. It's really important because it will define how your functions and machine learning systems will be able to access your data. It can be a simple one-liner when you need to test something or it can be a real mess with 10 functions just to format you data and feed you model the correct way.

## The problem

Let's say we have a great model that recognizes faces from an image. You want to store that in a structured way so we can plot our results, store them in a database, create a script that shows us all the faces detected to see if the model is accurate, etc...

## Solutions

In [1]:
from typing import List, Dict, Optional

### Dictionary

The first idea that you could have is to create a dictionary with the data in it:

In [2]:
# A dictionnary to store your data
my_image: dict = {
    "name": "image 1",
    "height": 800,
    "width": 300,
    "resolution": 800 * 300,
    "face_detected": [
        {"x0": 10, "x1": 60, "y0": 200, "y1": 250},
        {"x0": 10, "x1": 60, "y0": 300, "y1": 350},
    ],
    "confidence_score": 1.0,
}

my_image["resolution"]

240000

It workd but you can't type each property (at least not easily), you will need to create a different dictionary each time, you can make typos in the keys, and many more difficulties. It's not a good data structure if you're going to use it often.

### Class

That's when you will think about a class! You create a class that contains your data fields, then you just have to instantiate it.
You can even type each property! It looks like the perfect fit your issue.

In [3]:
class Image:
    """Class that stores the image data."""

    def __init__(
        self,
        name: str,
        height: int,
        width: int,
        score: int,
        face_detected: List[Dict[str, int]],
    ):

        self.name = name
        self.height = height
        self.width = width
        self.score = score
        self.face_detected = face_detected
        self.resolution = self.height * self.width


faces = [
    {"x0": 10, "x1": 60, "y0": 200, "y1": 250},
    {"x0": 10, "x1": 60, "y0": 300, "y1": 350},
]
# Instanciate an Image
my_image = Image(name="image 1", height=800, width=300, score=10, face_detected=faces)

my_image.resolution

240000

On one hand, the syntax isn't great, it's heavy, it's big and imagine that you have to define a lot of them to store different kinds of data, you will have a file that contains thousands of lines.
ON the other hand, you can keep control of your data, define that if an `Image` is instantiated without a `height`, it will raise an error.

Classes have another great feature, you can create attributes made of other attributes.

### Dataclass

Fortunately, Python has an answer to this heavy syntax and it's called `dataclass`. Dataclass is a decorator that you provide to a class. It allows you to create class with a super simple and short syntax.

In [4]:
from dataclasses import dataclass


@dataclass
class Image:
    """Class that store the image's data."""

    name: str
    height: int
    width: int
    score: int
    face_detected: List[Dict[str, int]]
    resolution: int


faces = [
    {"x0": 10, "x1": 60, "y0": 200, "y1": 250},
    {"x0": 10, "x1": 60, "y0": 300, "y1": 350},
]
# Instanciate an Image
my_image = Image(
    name="image 1",
    height=800,
    width=300,
    score=10,
    resolution=800 * 300,
    face_detected=faces,
)

my_image.face_detected[0]["x0"]

10

It could be a perfect fit if there is no relation between attributes. If we didn't use the resolution attribute there for example, it would be better to use a dataclass than a regular class.

### Named tuple

The named tuple is clearly a bad fit here as it only allows us to create a tuple that has attributes and can be called like a class. However, it's a good structure to know because it's good at storing a small amount of data. It will be the perfect fit for our face's coordinate!

In [2]:
from collections import namedtuple

Coordinate = namedtuple("Coordinate", ["x0", "x1", "y0", "y1"])

faces = [
    Coordinate(10, 60, 200, 250),
    Coordinate(10, 60, 300, 350),
]

faces[0].x0

10

### Merge solutions

We can of course merge multiple data types to fit our needs!

For example, here my favorite candidate will be a class mixed with named named tuple.

In [3]:
Coordinate = namedtuple("Coordinate", ["x0", "x1", "y0", "y1"])


class Image:
    """Class that store the image's data."""

    def __init__(
        self,
        name: str,
        height: int,
        width: int,
        score: int,
        face_detected: Optional[Coordinate],
    ):

        self.name = name
        self.height = height
        self.width = width
        self.score = score
        self.face_detected = face_detected
        self.resulotion = self.height * self.width


faces = [
    Coordinate(10, 60, 200, 250),
    Coordinate(10, 60, 300, 350),
]

my_image = Image(name="image 1", height=800, width=300, score=10, face_detected=faces)

my_image.face_detected[0].x0

NameError: name 'Optional' is not defined

## Conclusion

99% of the time, the best solution will be to mix multiple data structures to obtain something easy to read, to use and to store. Also, don't forget that the perfect solution doesn't exist! It's all a matter of choices!

A good indicator that you need to use a different data structure is when you're adding type hints to your code and you realize that you get something like like this:

```python
data: Dict[str, List[Dict[str, Dict[str, Tuple[int]]]] = ...
```

## Additional resources

Make sure to have a look to these resources:
* https://docs.python.org/3/library/collections.html
* https://docs.python.org/3/tutorial/datastructures.html
* https://www.edureka.co/blog/data-structures-in-python/