# Exercise

Aim is to creating your own FlowersDataset.
First download the dataset you will work on using the code:

``` %%bash
mkdir ./flowers102/
wget https://www.robots.ox.ac.uk/~vgg/data/flowers/102/imagelabels.mat -P ./flowers102/
wget https://www.robots.ox.ac.uk/~vgg/data/flowers/102/setid.mat -P ./flowers102/
wget https://www.robots.ox.ac.uk/~vgg/data/flowers/102/102flowers.tgz -P ./flowers102/
tar zxvf ./flowers102/102flowers.tgz -C ./flowers102/
rm ./flowers102/102flowers.tgz
```

You need to use the Dataloader to iterate through the samples of train and test sets and visualize them.
(In this case we omit the validation set for simplicity.)

Remember the ids are in `flowers102/setid.mat`.

You need to apply following:

| **Operation** | **Train** | **Test** |
| --- | --- | --- |
| Normalize and Resize | ✅ | ✅ |
| Data Augmenation | ✅ | ⛔ |
| Shuffle Dataloader | ✅ | ⛔ |


Useful
torchvision transformations can e found in: https://pytorch.org/vision/stable/transforms.html

HINTS:

a) You need to create the dataset class that can handle training and test.

b)	You need to split the data according to the partition (use ```setid.mat```)

c)	```io.loadmat``` is to load all the ids. Recall that the ids in ```setid.mat``` of FlowerDataset 1-indexed.
To train a model you need to subtract “1” from the indexes supplied.

d)	For training apply any augmentation you want (for example: ```RandomHorizontalFlip```, ```RandomInvert```)

e)	Use batch size 8 to use DataLoader

f)	Notice that you need to separate train and test DataLoaders


## SOLUTION

In [None]:
# download the dataset we will work with
%%bash
mkdir ./flowers102/
wget https://www.robots.ox.ac.uk/~vgg/data/flowers/102/imagelabels.mat -P ./flowers102/
wget https://www.robots.ox.ac.uk/~vgg/data/flowers/102/setid.mat -P ./flowers102/
wget https://www.robots.ox.ac.uk/~vgg/data/flowers/102/102flowers.tgz -P ./flowers102/
tar zxvf ./flowers102/102flowers.tgz -C ./flowers102/
rm ./flowers102/102flowers.tgz

In [None]:
import torchvision.transforms as transforms
import torchvision
import torch
import PIL.Image as Image
import matplotlib.pyplot as plt
import os
import scipy.io as io

In [None]:

# Create the dataset calss, which can handle training and test apporpriately.

class MyDataset(torch.utils.data.Dataset):

  def __init__(self, root: str, img_size: int, partition: str = "train"):

    assert partition in ["train", "test"], print("Partition should be train or test")
    self.root = root

    # === Indices (samples)
    # Split data according to the training and test

    ids = io.loadmat(os.path.join(root, "setid.mat")) # Load all the ids
    if partition == "train":
      self.ids = ids["trnid"][0]
    else:
      self.ids = ids["tstid"][0]

    # === Labels
    # Load the labels corresponding to the images
    all_labels = io.loadmat(os.path.join(root, "imagelabels.mat"))["labels"][0] # Load all the labels
    self.labels = all_labels[self.ids - 1] # Keep only the labels associated with the current IDs.
    # setid.mat is based on index 1. Subtract one to match it to index 0.
    # Keep only the labels associated to current ids. setid.mat is 1 indexed we subtract one to make it 0 index.

    # === Transformations
    if partition == "train":
      # Only for training
      self.T = transforms.Compose([
            transforms.ToTensor(),
            transforms.RandomCrop(size=(img_size, img_size)),
            transforms.Resize(size=(img_size, img_size)),
            transforms.Normalize(mean=(0.0, 0.0, 0.0), std=(1.0, 1.0, 1.0)),  # normalize [0, 1]
            transforms.RandomHorizontalFlip(p=0.5),
            transforms.RandomGrayscale(p=0.2),
            transforms.RandomInvert(p=0.2),

      ])
    elif partition == "test":
      # Only for testing
      self.T = transforms.Compose([
                transforms.ToTensor(),
                transforms.Resize(size=(img_size, img_size)),
                transforms.Normalize(mean=(0.0, 0.0, 0.0), std=(1.0, 1.0, 1.0)),  # normalize [0, 1]
            ])


  def __len__(self):
    return self.ids.shape[0]

  def __getitem__(self, idx):

    img_name = f"image_{str(self.ids[idx]).zfill(5)}.jpg"
    img_path = os.path.join(self.root, "jpg", img_name)
    img = Image.open(img_path).convert("RGB")
    img = self.T(img)

    labels = self.labels[idx]
    return img, labels

In [None]:
data_root = "flowers102/"
batch_size = 8
image_size = 256


# Dataset and DataLoader for training
train_dataset = MyDataset(root=data_root,
                          img_size=image_size,
                          partition="train")

train_dataloader = torch.utils.data.DataLoader(train_dataset,
                                        batch_size=batch_size,
                                        shuffle=True)

In [None]:

# Dataset and DataLoader for testing
test_dataset = MyDataset(root=data_root,
                         img_size=256,
                         partition="test")
test_dataloader = torch.utils.data.DataLoader(test_dataset,
                                        batch_size=1,
                                        shuffle=False)


In [None]:
# Visualize training samples
print(f"Training dataset contains: {len(train_dataset)} samples")
train_batch, _ = next(iter(train_dataloader))
train_img = torchvision.utils.make_grid(train_batch)

f = plt.figure(figsize=(30, 5))
plt.imshow(train_img.permute(1,2,0))
plt.axis("off")
plt.title("Training Images")
plt.show()

In [None]:
# Visualize testing samples
print(f"Test dataset contains: {len(test_dataset)} samples")
test_batch, _ = next(iter(test_dataloader))
test_img = torchvision.utils.make_grid(test_batch)

f = plt.figure(figsize=(30, 5))
plt.imshow(test_img.permute(1,2,0))
plt.axis("off")
plt.title("Test Images")
plt.show()