<h1 align="center"><b>AI Lab: Computer Vision and NLP</b></h1>
<h3 align="center">Lesson 20: Dataloaders</h3>

---

How can we create a dataloader? Suppose that we have a dataset of images: how do we load images and labels? Usually datasets of images are organized in the following way:

```python
└ dataset
    ├ imgs
    │   ├ img01.png
    │   ├ img02.png
    │   ├ img03.png
    │   ├ img04.png
    │   ├ img05.png
    │   └ ...
    └ labels.csv
```

We would have a `imgs` folder which contains all the images, and outside it we would have a `labels.csv` file which contains all the labels:

```python
image,      label
img01.png,  0
img02.png,  1
img03.png,  2
img04.png,  1
img05.png,  1
...,        ...
```

Where for instance `0 = cat`, `1 = dog`, `2 = plane`, etc... We can setup a dataset in this way: first, import all the packages:

In [1]:
import os
import pandas as pd 
from torchvision.io import read_image
from torch.utils.data import Dataset

Now we can create the class of our dataset. The program will load the images only at runtime. If the PC tried to read a large dataset all at once (say, 10.000 images), then it would crash because of the RAM shortage. Depending on the batch size, the program will load only $x$ images at a time:

In [5]:
class OurDataset(Dataset):
    def __init__(self, labels_path: str, images_dir: str, transform = None) -> None:
        """
        Creates a dataset given the path to the labels and the image directory

        Parameters:
            - `labels_path`: the path to the `csv` file containing the labels;
            - `images_dir`: the path to the directory with the images;
            - `transform`: states whether a transformation should be applied to the images or not.
        """
        super().__init__()
        self.images_dir = images_dir
        self.transform = transform
        self.labels = pd.read_csv(labels_path)

    def __len__(self) -> int:
        """Returns the length of the dataset
        
        Returns:
            - `length` (`int`): the length of the dataset"""
        return len(self.labels)

    def __getitem__(self, index: int):
        """Get the ith item in the dataset
        
        Parameters:
            - `index`: the index of the image that must be retrieven.
            
        Returns:
            - `image` (`img`): the image in the ith position in the dataset."""
        
        # Get the images path
        images_path = os.path.join(self.images_dir, self.labels.iloc[index, 0])
        image = read_image(images_path) # Can also be done with OpenCV's function cv2.imread()
        label = self.labels.iloc[index, 1]

        # Apply transformations
        if self.transform:
            image = self.transform(image)

        return (image, label)