# Training Script

## This file is for the purposes of explaining the code flow of training classifier for execution it is advised to use .py file

    Starting the code with importing all the required libraries. Since it is image based approach
    and keras is used as prefered library for this task. the importing contains cv2 and keras as
    points of interest remaining imports are for the purpose of data handling and accessing.

In [None]:
import glob
import os
import numpy as np
import cv2
from keras.applications.vgg16 import VGG16
from keras.preprocessing.image import img_to_array
from keras.models import Sequential, Model
from keras.layers import Dense, GlobalAveragePooling2D
from keras.layers import Conv2D, MaxPooling2D
from keras.optimizers import SGD, Adam
from keras.utils import to_categorical
from matplotlib import pyplot as plt
import random
import pdb


    The below code snippet is used to restrict the usage of GPU to a certain ratio.
    This is under quotes but, if required remove the quotes and execute the cell and its effect
    can be seen during training when tensor is created on to the GPU

In [None]:
'''
# This part of the code is used to restrict the GPU usage.
import tensorflow as tf 
from keras.backend.tensorflow_backend import set_session 
config = tf.ConfigProto() 
config.gpu_options.per_process_gpu_memory_fraction = 0.3 
set_session(tf.Session(config=config))
'''

    Below funcation named data_spinner is a data processor which takes image paths of dataset
    and once the image is read its is nornalizzed and is made neural network ready. And after
    all the processing is done it spins out the data and label as output for the training operation.
    

In [None]:
def data_spinner(data_path):
    random.shuffle(data_path)
    Y = []
    X = []
    for img_path in data_path:
        img = cv2.imread(img_path)
        img = cv2.resize(img, (img_rows, img_cols))
        img1 = np.array(img, dtype='float')/255.0
        X.append(img1)
        name = img_path.split("/")
        if 'recepit' in name:
            Y.append([1, 0])
        if 'non-recepit' in name:
            Y.append([0, 1])
    Y = np.array(Y)
    # pdb.set_trace()
    X = np.array(X)
    return X, Y


    The Create Model Funcation is key funcation as it is responsible for creating 
    the model architecture. On calling this funcation it returns a model which is used for
    training.

In [None]:
input_shape = (img_rows, img_cols, 3)


def create_model():
    base_model = VGG16(weights='imagenet', include_top=False)
    model = base_model.output
    model = GlobalAveragePooling2D()(model)
    x = Dense(1024, activation='relu')(model)
    predictions = Dense(2, activation='softmax')(x)
    model = Model(input=base_model.input, output=predictions)
    return model

    The Train funcation where all the magic happens. When model is compiled it takes a
    Adam optimizer funcation as optimization funcation and tries to minimize 
    the 'categorical_crossentropy' loss for achiving a certain metric 'accuracy' in this case.
    This compilation when fit funcation is called on to the model sincce it is a itreative process 
    it takes arguments such as batch size to set how many images to be processed as a time,
    epoches to say how many times the itreation will run. And here itself we can create a 
    validation split. If required this can be created exclusivly and then called inside.
    And finaly when all the itreations are completed the model is save which is then used 
    for runing the inference.

In [None]:
def train():
    model = create_model()
    model.compile(optimizer=Adam(lr=0.0001, beta_1=0.9, beta_2=0.999),
                  loss='categorical_crossentropy', metrics=['accuracy'])
    model.fit(X, Y, batch_size=16, epochs=30, verbose=1,
              shuffle=True, validation_split=0.3)
    model.save_weights('./model/recpit.h5')

    Here all the necessary funcation calls are made in order to make all the things happen.

In [None]:
paths = "/home/akshay/Recpit_Classification/data/*/*"
img_rows, img_cols = 224, 224
data_path = glob.glob(paths)
print(len(data_path))
X, Y = data_spinner(data_path)
print(X)

train()