# Flipkart GRID challenge

### Round 2 : Object localization challenge
**Team Name** : undergradients <br>
**Round 2 Score** : 79.93% accuracy <br>
**Participants** : <br>
Ashutosh Sathe, Yash Jakhotiya, Prasad Rathod

### Data access and preprocessing :
The data was provided in the form of zipped images file along with a `training.csv` containing ~14000 lines, each describing a bounding box for each of the image.<br>
We decided to do preprocessing and EDA on Google Colab instead of local machine due to unavailability of steady internet connection at the time of writing

In [None]:
## Mount Google Drive in Google Colab
from google.colab import drive
drive.mount('/content/drive')

In [None]:
## A Python function for downloading files from Google Drive
def download_file_from_google_drive(id, destination):
    URL = "https://docs.google.com/uc?export=download"

    session = requests.Session()

    response = session.get(URL, params = { 'id' : id }, stream = True)
    token = get_confirm_token(response)

    if token:
        params = { 'id' : id, 'confirm' : token }
        response = session.get(URL, params = params, stream = True)

    save_response_content(response, destination)    

def get_confirm_token(response):
    for key, value in response.cookies.items():
        if key.startswith('download_warning'):
            return value

    return None

def save_response_content(response, destination):
    CHUNK_SIZE = 32768

    with open(destination, "wb") as f:
        for chunk in response.iter_content(CHUNK_SIZE):
            if chunk: # filter out keep-alive new chunks
                f.write(chunk)

In [None]:
download_file_from_google_drive('1Q-ZY19lGrlvvzkYQvgNIP7H-__g3-W61','images.zip')

In [None]:
## Confirm the download by listing all the files
!ls -al
## Unzip the images.zip. Since archive contains ~54000 images, 
## output of this command can be too large, thus we just print a 
## dot after extracting every 100 files
!unzip images.zip | awk 'BEGIN {ORS=" "}{if(NR%50==0)print "."}'

In [None]:
## Count the number of images extracted
!ls
!ls images -l | wc -l

In [None]:
## `training.csv` has been uploaded to Google Drive with the name `training`
import pandas as pd
csv_train = pd.read_csv('/content/drive/My Drive/training')
csv_train.head()

Since we donot have enough memory to load all the ~14000 images in our memory, we divide it into small chunks of 1000 images. We read batches of 1000 images in NumPy arrays and store every array on disk using `np.save()`

In [None]:
import cv2
import numpy as np
X_train_data = []
i = 0
files = csv_train['image_name'].values
for file in files:
    image = cv2.imread('images/' + file)
    i = i + 1
    X_train_data.append(image)
    if(i % 500 == 0):
        print("Read {} images".format(i))
    if(i % 1000 == 0):
        X_train_data = np.array(X_train_data)
        print(X_train_data.shape)
        print("Saving...")
        np.save("X_train_batch_{}.npy".format(i/1000), X_train_data)
        del X_train_data
        X_train_data = []

In [None]:
## Copy all the batches on Google Drive
## Now these `.npy` files can be easily downloaded from Google Drive
## The size will be much smaller than original since Google Drive compresses them as well
!ls -al
!cp X_train_batch_*.npy '/content/drive/My Drive/'

### Resizing 
Images of size 640x480 are too heavy for our laptops to handle. Thus we resize the images to be of size 192x256

In [None]:
import numpy as np 
from skimage.transform import resize

ORIG_PREFIX = './data/X_train_batch_'
ORIG_SUFFIX = '.0.npy'

RESIZE_PREFIX = './data/X_resized_train_batch_'
RESIZE_SUFFIX = '.npz'
for i in range(1, 15):
    X_resized_train = []
    X_train = np.load(ORIG_PREFIX + str(i) + ORIG_SUFFIX)
    for img in X_train:
        img_resized = resize(img, (192, 256))
        X_resized_train.append(img_resized)
    X_resized_train = np.asarray(X_resized_train)
    print("Resized shape for batch {} is {}".format(i, X_resized_train.shape))
    np.savez_compressed(RESIZE_PREFIX + str(i) + RESIZE_SUFFIX, X_resized_train)
    del X_train
    del X_resized_train
    print("Iteration complete")
    print("-------------------------------------------------------------")

### Training Pipeline and Model
The next section describes our training pipeline and model.The training pipeline needed to be resource efficient since we had only 2 GPUs with us. The GPUs we used were laptop versions of NVIDIA GeForce 940MX(4GB) and GeForce 1050Ti(4GB).

In [None]:
from keras.optimizers import Adam
from keras.callbacks import Callback
import pandas as pd 
import numpy as np 
import matplotlib.pyplot as plt
%matplotlib inline

In [None]:
## Read the training csv file
## We optimize our bounding box prediction model for (x1, y1, width, height)
## where (x1, y1) is bottom left corner of the bounding box.

Y_train = pd.read_csv('./training')
Y_train = Y_train.drop('image_name', axis=1)
Y_train['width'] = Y_train['x2'] - Y_train['x1']
Y_train['height'] = Y_train['y2'] - Y_train['y1']

Y_train = Y_train.drop('x2', axis=1)
Y_train = Y_train.drop('y2', axis=1)

In [None]:
def get_model(self):
    model = Sequential()
    model.add(Conv2D(32, kernel_size=3, 
                     input_shape=(192, 256, 3), data_format='channels_last'))
    model.add(LeakyReLU(0.2))
    model.add(Conv2D(64, kernel_size=5,
                      data_format='channels_last'))
    model.add(LeakyReLU(0.2))
    model.add(MaxPooling2D(pool_size=(1, 1), strides=(1, 1),
    data_format='channels_last'))
    model.add(Conv2D(64, kernel_size=7, strides=(2, 2),
                      data_format='channels_last'))
    model.add(LeakyReLU(0.2))
    model.add(MaxPooling2D(pool_size=(2, 2), data_format='channels_last'))
    model.add(Conv2D(128, kernel_size=9, strides=(2, 2),
                      data_format='channels_last'))
    model.add(LeakyReLU(0.2))
    model.add(MaxPooling2D(pool_size=(2, 2), data_format='channels_last'))
    model.add(Conv2D(256, kernel_size=7, strides=(3, 3), 
                      data_format='channels_last',kernel_regularizer=regularizers.l2(0.001)))
    model.add(LeakyReLU(0.2))
    model.add(Flatten())
    model.add(Dense(512, kernel_regularizer=regularizers.l2(0.001)))
    model.add(LeakyReLU(0.2))
    model.add(Dense(256, kernel_regularizer=regularizers.l2(0.001)))
    model.add(LeakyReLU(0.1))
    model.add(Dense(128, kernel_regularizer=regularizers.l2(0.001), activation='relu'))
    model.add(Dense(64, kernel_regularizer=regularizers.l2(0.001), activation='relu'))
    model.add(Dense(16, kernel_regularizer=regularizers.l2(0.001), activation='relu'))
    model.add(Dense(4))
    return model

In [None]:
## Building model
print("----------------------------------------------------------")
adam = Adam(lr=0.0002)
model = get_model()
with open('./model_summary.txt', 'w') as model_summary:
    model.summary(print_fn = lambda x: model_summary.write(x + '\n'))
model.compile(optimizer=adam, loss='mse')

print("Model compiled")
print("----------------------------------------------------------")

In [None]:
## Global parameters for getting data and storing valuable information
## This loads batches of 100 resized images from DATADIR ./data
DATADIR_PREFIX = './data/X_resized_train_batch_'
DATADIR_SUFFIX = '.npz'
## Saves model after each 1000 sample's batch
MODEL_PREFIX = './saved_weights/batch_'
MODEL_SUFFIX = '.h5'
## Stores loss for each of the 1000 sample's batch
LOSS_PREFIX = './losses/loss_batch_'
LOSS_SUFFIX = '.png'

In [None]:
## We want to store losses after each mini-batch is over
class LossHistory(Callback):
    def on_train_begin(self, logs={}):
        self.losses = []
    def on_batch_end(self, batch, logs={}):
        self.losses.append(logs.get('loss'))

### Training Loop 
Our main training loop<br>
Here's how it works :
1. Loads a batch of 1000 resized images from data directory into memory
1. Fits the model on those 1000 samples using a minibatch size of 16(max of what we can fit in our memory)
1. Because of our LossHistory callback, we can see our loss after each mini-batch
1. Saves the model into proper directory with proper batch name and epoch name
1. Predicts the values of 4 samples from the training batch itself and writes the predicted values and summary in a file called `training_summary.txt`. We constantly monitor this file, if we find that any of the predictions are going abnormal, we can stop the training for investigating more.
1. Finally, removes the currently loaded batch from memory thus making room for new batch

In [None]:
## Let the training begin !
print("----------------------------------------------------------")
for i in range(14):
    X_train = np.load(DATADIR_PREFIX + str(i + 1) + DATADIR_SUFFIX)
    X_train = X_train['arr_0']
    print("Dataset loaded for batch {}, shape {}".format(i + 1, X_train.shape))
    y_low = i * 1000
    y_high = (i + 1) * 1000
    losshistory = LossHistory()
    hist = model.fit(X_train, Y_train[y_low:y_high], epochs=1, batch_size=16, callbacks=[losshistory])
    print(hist)
    print("Batch {} completed".format(i + 1))
    print("Saving weights...")
    model.save(MODEL_PREFIX + str(i + 1) + MODEL_SUFFIX)
    print("Saving loss diagram...")
    plt.plot(losshistory.losses)
    plt.savefig(LOSS_PREFIX + str(i + 1) + LOSS_SUFFIX)
    Y_pred = model.predict(X_train[0:4])
    with open('training_summary.txt', "a") as ts:
        ts.write("Batch: {}\n".format(i + 1))
        ts.write("Starting Loss: {}\n".format(losshistory.losses[0]))
        ts.write("Ending Loss: {}\n".format(losshistory.losses[-1]))
        ts.write("Predictions : \n {} \n".format(Y_pred))
    del X_train
    print("Batch completed !")
    print("----------------------------------------------------------")

### Results and Visualizations 
We trained our model for 18 epochs on our laptops within 5 days. We could not do it for more epochs due to our mid semester exams. The training gave us the score of 79.93% on leaderboard. Following sections describe the visualization methodology used.

In [None]:
import matplotlib.pyplot as plt 
from keras.models import load_model
import pandas as pd 
import numpy as np 
import time 
import matplotlib.patches as patches 

In [None]:
MODEL_PATH = './saved_weights/epoch_18/batch_14.h5'
DATA_RESIZED_PATH = './data/X_resized_train_batch_{}.npz'
DATA_ORIG_PATH = './data/X_train_batch_{}.0.npy'

In [None]:
## Getting IoU from bounding boxes
def get_bouding_box_and_iou(y_pred, y_orig):
    """
    Calculate the Intersection over Union of 2 bounding boxes

    Parameters
    ----------
    y_pred : array : [x1, y1, width, height]
    y_orig : array : [x1, y1, width, height]

    Returns
    -------
    float iou in [0., 1.]
    """
    bb1 = {
        'x1': y_pred[0],
        'x2': y_pred[0] + y_pred[2],
        'y1': y_pred[1],
        'y2': y_pred[1] + y_pred[3]
    }
    bb2 = {
        'x1': y_orig[0],
        'x2': y_orig[0] + y_orig[2],
        'y1': y_orig[1],
        'y2': y_orig[1] + y_orig[3]
    }
    print(bb1)
    print(bb2)
    # determine the coordinates of the intersection rectangle
    x_left = max(bb1['x1'], bb2['x1'])
    y_top = max(bb1['y1'], bb2['y1'])
    x_right = min(bb1['x2'], bb2['x2'])
    y_bottom = min(bb1['y2'], bb2['y2'])

    if x_right < x_left or y_bottom < y_top:
        return 0.0

    # The intersection of two axis-aligned bounding boxes is always an
    # axis-aligned bounding box
    intersection_area = (x_right - x_left) * (y_bottom - y_top)

    # compute the area of both AABBs
    bb1_area = (bb1['x2'] - bb1['x1']) * (bb1['y2'] - bb1['y1'])
    bb2_area = (bb2['x2'] - bb2['x1']) * (bb2['y2'] - bb2['y1'])

    # compute the intersection over union by taking the intersection
    # area and dividing it by the sum of prediction + ground-truth
    # areas - the interesection area
    iou = intersection_area / float(bb1_area + bb2_area - intersection_area)
    assert iou >= 0.0
    assert iou <= 1.0
    return iou

In [None]:
Y_train = pd.read_csv('./training')
Y_train = Y_train.drop('image_name', axis=1)
Y_train['width'] = Y_train['x2'] - Y_train['x1']
Y_train['height'] = Y_train['y2'] - Y_train['y1']
Y_train = Y_train.drop('x2', axis=1)
Y_train = Y_train.drop('y2', axis=1)
timestr = time.strftime("%Y%m%d-%H%M%S")
    
model = load_model(MODEL_PATH)

fact = 10
cols = 4
fig, ax = plt.subplots(4, cols, figsize=(4 * fact, cols * fact))
for i in range(cols):
    X_resized_train = np.load(DATA_RESIZED_PATH.format(i + 1))
    X_resized_train = X_resized_train['arr_0']
    X_train = np.load(DATA_ORIG_PATH.format(i + 1))
    y_low = i * 1000
    Y_pred = model.predict(X_resized_train[0:4])
    for j in range(4):
        iou = get_bouding_box_and_iou(Y_pred[j], Y_train.iloc[y_low + j])
        # Showing the original image
        ax[j][i].imshow(X_train[j])
        x1, y1, width, height = Y_train.iloc[y_low + j]
        x1d, y1d, widthd, heightd = Y_pred[j]
        rect = patches.Rectangle((x1, y1), width, height, linewidth=1, edgecolor='g', facecolor='none')
        rectd = patches.Rectangle((x1d, y1d), widthd, heightd, linewidth=1, edgecolor='r', facecolor='none')
        ax[j][i].add_patch(rect)
        ax[j][i].add_patch(rectd)
        ax[j][i].set_title('IoU = {}'.format(iou))
        ax[j][i].axis('off')
    del X_resized_train
    del X_train 
fig.tight_layout()
plt.show()

Our loss decreased to ~10000 after first epoch. The figure below shows the losses for each mini batch during during 1st epoch.Whereas figure right to it shows the losses for last epoch's mini batches(Pardon for not putting correct labels on the graph images. The X axis denotes the mini-batch number while Y-axis denotes loss value for that mini-batch)<br>

| Losses in 1st epoch | Losses in last(18th) epoch |
| -- | -- |
| ![Losses in 2nd epoch graph](https://raw.githubusercontent.com/ashutoshbsathe/DLNotebooks/b457b73e697daecdf2ef2b00b743912935417a7c/images/flipkart_grid/loss_batch_1.png) | ![Losses in last(18th) epoch graph](https://raw.githubusercontent.com/ashutoshbsathe/DLNotebooks/b457b73e697daecdf2ef2b00b743912935417a7c/images/flipkart_grid/loss_batch_14.png) |

**Visualizations of predicted bounding box**
![Visualizations of IoU combined image](https://raw.githubusercontent.com/ashutoshbsathe/DLNotebooks/master/images/flipkart_grid/flipkart_grid_visualizations.png)

### Final Words 
Our model seems to be performing good enough for a small shallow model. As we can see from the visualizations,the model seems to be getting wrong bounding box for the images with humans in them. This could be because of multiple reasons such as 

1. Shallow model
1. Lesser training time
1. Only a fraction of images contain humans in them 

With more data, some of these issues may get solved. <br><br>
For some of the images (specifically 1st column 4th image and last column 4th image), our model seems to not recognize the white part correctly. This was tough for even us to recognize the white sole of the shoes correctly. On the other hand, white shoes in images (3rd column, 2nd image from top and 4th column, bottommost image) seem to have been recognized at higher accuracy. This may mean that given enough time, the model may perform better in the future.