# Image Preprocessing

On this notebook you can find the image preprocessing to be loaded into the model.

Steps for the image preprocessing: 
1. Collect images 
2. Setup Folders
3. Image Labeling  

    3.1. Download the labelImg by tzutalin    
    3.2. Label all the image using the labelImg     
    3.3 Split into train and test    
    
    
4. Compress images

In [1]:
# Importing Dependencies

# Import Operating System to create folders
import os
import shutil

# Import time
import time

# Use mathematical formulas
import math

## 1. Collect Images

On this we are going to collect images to train and test our model. For this project there will be 5 classes of different drinks. This images are going to be stored on the TensorFlow->workspace->images->collectedimages
* Redbull
* Coca-cola
* Coca-cola Zero
* Coca-cola light
* Pepsi
* Pepsi Max
* 7-up
* Ice-tea
* TODO: Choose 5 from those

In [None]:
# Labels of the images
labels = ['Redbull', 'CocaCola', 'Pepsi', 'SevenUp', 'IceTeaLemon']
number_classes = len(labels)

## 2. Setup Folders

Create a the folder to save the images collected. Sub-folder created for each class


In [None]:
images_collected_path  = os.path.join('Tensorflow', 'workspace', 'images', 'collectedimages')
images_root_path  = os.path.join('Tensorflow', 'workspace', 'images')

# Training and test Paths
train_path = os.path.join('Tensorflow', 'workspace', 'images', 'train')
test_path = os.path.join('Tensorflow', 'workspace', 'images', 'test')

In [None]:
# Create folder collectedimages
if not os.path.exists(images_collected_path):
    if os.name == 'posix':
        !mkdir -p {images_collected_path}
    if os.name == 'nt':
         !mkdir {images_collected_path}

# Create folders for the difrent classes
for label in labels:
    path = os.path.join(images_collected_path, label)
    if not os.path.exists(path):
        !mkdir {path}
        
# Create folders train, test sets
for split_set in ['test', 'train']:
    path = os.path.join(images_root_path, split_set)
    if not os.path.exists(path):
        !mkdir {path}

## 3. Image Labeling

### 3.1. Download labelImg

In [None]:
# Install dependencies
!pip install --upgrade pyqt5 lxml

In [None]:
# Create labeleing path
labeling_path = os.path.join('Tensorflow', 'labelimg')

# Download the labelImg if there is no folder 
if not os.path.exists(LABELIMG_PATH):
    !mkdir {LABELIMG_PATH}
    !git clone https://github.com/tzutalin/labelImg {LABELIMG_PATH}
        
# Move to the resouces folder
if os.name == 'posix':
    !make qt5py3
if os.name =='nt':
    !cd {LABELIMG_PATH} && pyrcc5 -o libs/resources.py resources.qrc

### 3.2. Label the Images

In [None]:
# Open the labelImg
!cd {LABELIMG_PATH} && python labelImg.py

### 3.3 Split into train and test

Split the collected data into:
* **Train set** 80%
* **Test set** 20%

In [None]:
def split_train_test(images_collected_path=images_collected_path, 
                     train_path=train_path,
                     test_path=test_path,
                     split_train_size=0.8):
    """
    This function is going to move the file into the test and train folders according with the split_size
    """ 
    
    def round_up_to_even(f): return math.ceil(f / 2.) * 2

    for i in os.listdir(images_collected_path):
        counter = 0
        subfolder_path = os.path.join(images_collected_path, i)       
        collected_image_size = len([name for name in os.listdir(subfolder_path)])
        train_number = round_up_to_even(collected_image_size * split_train_size)

        for root, dirs, files in os.walk(subfolder_path, topdown=False):             
            for file in files:
                counter += 1         
                
                # Define new paths
                old_path = os.path.join(root, file)
                new_path_train = os.path.join(train_path, file)
                new_path_test = os.path.join(test_path, file)
    
                if counter <= train_number: shutil.move(old_path, new_path_train)
                else: shutil.move(old_path, new_path_test)

        print(f'Folder: {i}\n\
        Train number: {train_number}\n\
        Test number: {collected_image_size-train_number}\n\
        Number of files {counter}', end='\n\n')    

In [None]:
# Split the data into train and test folders
split_train_test()

## 4. Compress images

Compress the images to load load them in Google Colab

In [None]:
!tar -czf {ARCHIVE_PATH} {TRAIN_PATH} {TEST_PATH}