# Working with *image_dataset_from_directory()*.

The first approach in this TFM was to serialize the data in a pickle and load the whole in memory => https://github.com/albertovpd/viu_tfm-deep_vision_classification/blob/main/src/TFM-serializing_data.ipynb . This is not optimal, and causes running out of RAM.

Then I started using tensorflow and its *image_from_dataset* method. It allows you to batch processing and also, use data augmentation for the batches, wich needs way less memory, but is quite tricky:
- by default, shuffle = True is set in *image_from_dataset*. That means that when using a classification report, labels in validaton dataset won't be associated with their right arrays, because it's shuffled. 
- Shuffle = False means that data is not shuffled at all, so while splitting, train_ds will have some of the classes, and val_ds will have other classes, it's like taking folders in order, without creating an heterogeneous sample for each one.

So, I'll create with this notebook the  *train, test and validation folder*.
- By the way, *image_dataset_from_director* doesn't allow you to create 3 subpartitions, just 2 (train and val).
- The created subfolders has the same proportion than the real ones.

### Final note
You can use this notebook in your PC or Google Cloud, but I warn you that for me, Google Drive was behaving oddly, not copying all the images to the right folder (not none at all, neither all of them... Just forgetting some of them... odd)

In [1]:
# Google Drive stuff
#from google.colab import drive
#drive.mount('/content/drive/', force_remount=True)

- libraries

In [2]:
# local path
from dotenv import load_dotenv
import os
import numpy as np
import shutil
import random

- paths

In [3]:
# for google drive
# base_folder = "/content/drive/My Drive/2-Estudios/viu-master_ai/tfm-deep_vision/"

# local
load_dotenv()
base_folder = os.environ.get("INPUT_PATH")

# pics located in
root_dir  = base_folder+"House_Room_Dataset-5_rooms/"
# my train/test/val folders will be created in
input_destination = base_folder+"dataset_2_folders/"

# splitting my data into just 2 folders

In [4]:
classes_dir = os.listdir(root_dir)
classes_dir

['Dinning', 'Bedroom', 'Livingroom', 'Kitchen', 'Bathroom']

In [5]:
train_ratio = 0.6
val_ratio  = 0.1

for cls in classes_dir:
    os.makedirs(input_destination +'train_ds/' + cls, exist_ok=True)
    os.makedirs(input_destination +'test_ds/' + cls, exist_ok=True)
    os.makedirs(input_destination +'val_ds/' + cls, exist_ok=True)
    
    # for each class, let's counts its elements
    src = root_dir + cls
    allFileNames = os.listdir(src)

    # shuffle it and split into train/test/va
    np.random.shuffle(allFileNames)
    train_FileNames, test_FileNames, val_FileNames = np.split(np.array(allFileNames),[int(train_ratio * len(allFileNames)), int((1-val_ratio) * len(allFileNames))])
    
    # save their initial path
    train_FileNames = [src+'/'+ name  for name in train_FileNames.tolist()]
    test_FileNames  = [src+'/' + name for name in test_FileNames.tolist()]
    val_FileNames   = [src+'/' + name for name in val_FileNames.tolist()]
    print("\n *****************************",
          "\n Total images: ",cls, len(allFileNames),
          '\n Training: ', len(train_FileNames),
          '\n Testing: ', len(test_FileNames),
          '\n Validation: ', len(val_FileNames),
          '\n *****************************')
    
    # copy files from the initial path to the final folders
    for name in train_FileNames:
      shutil.copy(name, input_destination +'train_ds/' + cls)
    for name in test_FileNames:
      shutil.copy(name, input_destination +'test_ds/' + cls)
    for name in val_FileNames:
      shutil.copy(name, input_destination +'val_ds/' + cls)


 ***************************** 
 Total images:  Dinning 1158 
 Training:  694 
 Testing:  348 
 Validation:  116 
 *****************************

 ***************************** 
 Total images:  Bedroom 1248 
 Training:  748 
 Testing:  375 
 Validation:  125 
 *****************************

 ***************************** 
 Total images:  Livingroom 1273 
 Training:  763 
 Testing:  382 
 Validation:  128 
 *****************************

 ***************************** 
 Total images:  Kitchen 965 
 Training:  579 
 Testing:  289 
 Validation:  97 
 *****************************

 ***************************** 
 Total images:  Bathroom 606 
 Training:  363 
 Testing:  182 
 Validation:  61 
 *****************************


In [6]:
# checking everything was fine
paths = ['train_ds/', 'test_ds/','val_ds/']
for p in paths:
  for dir,subdir,files in os.walk(input_destination + p):
    print(dir,' ', p, str(len(files)))

/home/vargas/Documents/data/viu_tfm-deep_vision_classification/input/dataset_2_folders/train_ds/   train_ds/ 0
/home/vargas/Documents/data/viu_tfm-deep_vision_classification/input/dataset_2_folders/train_ds/Dinning   train_ds/ 694
/home/vargas/Documents/data/viu_tfm-deep_vision_classification/input/dataset_2_folders/train_ds/Bedroom   train_ds/ 748
/home/vargas/Documents/data/viu_tfm-deep_vision_classification/input/dataset_2_folders/train_ds/Livingroom   train_ds/ 763
/home/vargas/Documents/data/viu_tfm-deep_vision_classification/input/dataset_2_folders/train_ds/Kitchen   train_ds/ 579
/home/vargas/Documents/data/viu_tfm-deep_vision_classification/input/dataset_2_folders/train_ds/Bathroom   train_ds/ 363
/home/vargas/Documents/data/viu_tfm-deep_vision_classification/input/dataset_2_folders/test_ds/   test_ds/ 0
/home/vargas/Documents/data/viu_tfm-deep_vision_classification/input/dataset_2_folders/test_ds/Dinning   test_ds/ 348
/home/vargas/Documents/data/viu_tfm-deep_vision_classifica