# Validation Data

The following code creates a validation set by selecting randomly items from a training set. You could run it locally, but if you need to do this on Google Drive, you will need the following cell, which must be customized taking into account the path in which you save this notebook.

In [1]:
from google.colab import drive
drive.mount('/content/drive')

%cd "/content/drive/MyDrive/TC3007C/Reto"
!pwd

Mounted at /content/drive
/content/drive/MyDrive/TC3007C/Reto
/content/drive/MyDrive/TC3007C/Reto


Relevant libraries are `os` ans `shutil`.

In [2]:
import matplotlib.pyplot as plt
import numpy as np
import os

from shutil import move
from random import sample

The `base_dir` will be the directory in which you have your data. Notice that it is not the same directory in which you have your notebook:

Notebook path: `/content/drive/MyDrive/TC3007C/Reto`

Train set path: `/content/drive/MyDrive/TC3007C/Reto/Data`

Now you must specify the path of the training set, which will be saved in the `train_dir` variable. The path of the validation set must be created, which is `base_dir` plus `validation`. The validation folder is created with the `mkdir` method of the `os` library.

In [3]:
base_dir = 'Data'
train_dir = os.path.join(base_dir,'train_test')
validation_dir = os.path.join(base_dir, 'validation')
os.mkdir(validation_dir)

The following code will go through all the folders of the training set, and from each folder, it will select some images randomly that will be moved to their corresponding folder in the validation directory. The amount of images that will be selected randomly in each folder is equal to

$$\lfloor\text{number of images in folder}\times\text{proportion}\rfloor.$$

It is possible that the above formula will be equal to zero, especially for folders with a small amount of images. If that is the case, the folder will be created in the validation directory, but no images will be transferred. That said, if all runs well, the same folders that you have in the training directory will be created in the validation directory, but in some cases some folders will not have images in them.

In [4]:
proportion = 0.1

for folder in os.listdir(train_dir):
    path = os.path.join(train_dir, folder)
    new_path = os.path.join(validation_dir, folder)
    os.mkdir(new_path)
    images = [f for f in os.listdir(path)]
    sampled_images = sample(images, k=int(proportion*len(images)))  
    if sampled_images:
        for sampled_image in sampled_images:
            move(os.path.join(path, sampled_image), new_path)