## Download the images


We can use **GoogleDriveDownloader** form **google_drive_downloader** library in Python to download the shared files from the shared Google drive link: https://drive.google.com/file/d/1f7uslI-ZHidriQFZR966_aILjlkgDN76/view?usp=sharing

The file id in the above link is: **1f7uslI-ZHidriQFZR966_aILjlkgDN76**

In [None]:
# from google_drive_downloader import GoogleDriveDownloader as gdd

# gdd.download_file_from_google_drive(file_id='1f7uslI-ZHidriQFZR966_aILjlkgDN76',
#                                     dest_path='content/eye_gender_data.zip',
#                                     unzip=True)

In [None]:
# File upload browser in colab
from google.colab import files

uploaded = files.upload()

for fn in uploaded.keys():
  print('User uploaded file "{name}" with length {length} bytes'.format(
      name=fn, length=len(uploaded[fn]))) 

In [None]:
# from google.colab import drive
# drive.mount('/content/drive')

In [None]:
# Unzip our data
!unzip /content/eye_gender_data.zip

We have all the files from the shared Google drive link downloaded in the colab environment.

## Loading Libraries
All Python capabilities are not loaded to our working environment by default (even they are already installed in your system). So, we import each and every library that we want to use.

We chose alias names for our libraries for the sake of our convenience (numpy --> np and pandas --> pd, tensorlow --> tf).

Note: You can import all the libraries that you think will be required or can import it as you go along.

In [None]:
import pandas as pd                                     # Data analysis and manipultion tool
import numpy as np                                      # Fundamental package for linear algebra and multidimensional arrays
import tensorflow as tf                                 # Deep Learning Tool
import os                                               # OS module in Python provides a way of using operating system dependent functionality
import cv2                                              # Library for image processing
from sklearn.model_selection import train_test_split    # For splitting the data into train and validation set
from sklearn.metrics import f1_score
import matplotlib.pyplot as plt

## Loading and preparing training data


In [None]:
trainingSet = pd.read_csv("/content/eye_gender_data/Training_set.csv")
trainingSet.head()

In [None]:
trainingSet.shape # rows and columns

In [None]:
labels = pd.read_csv("/content/eye_gender_data/Training_set.csv")   # loading the labels
file_paths = [[fname, '/content/eye_gender_data/train/' + fname] for fname in labels['filename']] # Getting images file path
images = pd.DataFrame(file_paths, columns=['filename', 'filepaths'])
train_data = pd.merge(images, labels, how = 'inner', on = 'filename')
train_data.head()

## Data Pre-processing
It is necessary to bring all the images in the same shape and size, also convert them to their pixel values because all machine learning or deep learning models accepts only the numerical data. Also we need to convert all the labels from categorical to numerical values.

In [None]:
# Label Encoding on Target column
from sklearn.preprocessing import LabelEncoder
le = LabelEncoder()
train_data['label'] = le.fit_transform(train_data['label'])
train_data.head()

In [None]:
data = []     # initialize an empty numpy array
image_size = 100      # image size taken is 100 here. one can take other size too
for i in range(len(train_data)):
  
  img_array = cv2.imread(train_data['filepaths'][i], cv2.IMREAD_GRAYSCALE)   # converting the image to gray scale

  new_img_array = cv2.resize(img_array, (image_size, image_size))      # resizing the image array
  data.append([new_img_array, train_data['label'][i]])

In [None]:
data[0]

In [None]:
np.random.shuffle(data) #Shuffle the data

**Separating the images and labels**

In [None]:
x = []
y = []
for image in data:
  x.append(image[0])
  y.append(image[1])

# converting x & y to numpy array as they are list
x = np.array(x)
y = np.array(y)
print(x[0])
print(y[0])

In [None]:
np.unique(y, return_counts=True)

In [None]:
x = x.reshape(-1, 100, 100, 1)
# split the data into train and test
X_train, X_test, y_train, y_test = train_test_split(x,y,test_size=0.3, random_state = 42)

In [None]:
X_train.shape

In [None]:
# split the data into test and validation
X_test, X_val, y_test, y_val = train_test_split(X_test,y_test,test_size=0.5, random_state = 42)

## Building Model & Hyperparameter tuning
Now we are finally ready, and we can train the model.


In [None]:
# device_name = tf.test.gpu_device_name()
# print('Found GPU at: {}'.format(device_name))

In [None]:
# CNN Model creation

# with tf.device('/device:GPU:0'):
cnn = tf.keras.models.Sequential([
tf.keras.layers.Conv2D(filters=256, kernel_size=(3, 3), padding='same', activation='relu', input_shape=(100, 100, 1)),
tf.keras.layers.MaxPooling2D((2, 2)),
tf.keras.layers.Dropout(0.25),
tf.keras.layers.BatchNormalization(),
tf.keras.layers.Conv2D(filters=150, kernel_size=(3, 3), padding='same', activation='relu'),
tf.keras.layers.MaxPooling2D((2, 2)),
tf.keras.layers.Dropout(0.25),
tf.keras.layers.BatchNormalization(),
tf.keras.layers.Conv2D(filters=100, kernel_size=(3, 3), padding='same', activation='relu'),
tf.keras.layers.MaxPooling2D((2, 2)),
tf.keras.layers.Dropout(0.25),
tf.keras.layers.BatchNormalization(),
# tf.keras.layers.Flatten(input_shape=(100, 100, 1)),
tf.keras.layers.Flatten(),
tf.keras.layers.Dense(512, activation='relu'),
tf.keras.layers.Dropout(0.25),
tf.keras.layers.Dense(128, activation='relu'),
tf.keras.layers.Dropout(0.25),
tf.keras.layers.Dense(64, activation='relu'),
tf.keras.layers.Dense(2, activation='softmax')
])

opt = tf.keras.optimizers.Adam(learning_rate=0.01)
cnn.compile(optimizer=opt,
loss='sparse_categorical_crossentropy',
metrics=['accuracy'])


In [None]:
cnn.summary()

In [None]:
X_train[0]

In [None]:
X_val[0]

In [None]:
# Model complilation 
# with tf.device('/device:GPU:0'):
callback = tf.keras.callbacks.EarlyStopping(monitor='loss', patience=3)
history = cnn.fit(X_train, y_train, epochs=30, batch_size=250, validation_data=(X_val, y_val), callbacks=[callback])

## Validate the model


In [None]:
acc = history.history['accuracy']
val_acc = history.history['val_accuracy']

loss = history.history['loss']
val_loss = history.history['val_loss']

plt.figure(figsize=(8, 8))
plt.subplot(2, 1, 1)
plt.plot(acc, label='Training Accuracy')
plt.plot(val_acc, label='Validation Accuracy')
plt.legend(loc='lower right')
plt.ylabel('Accuracy')
plt.ylim([min(plt.ylim()),1])
plt.title('Training and Validation Accuracy')

plt.subplot(2, 1, 2)
plt.plot(loss, label='Training Loss')
plt.plot(val_loss, label='Validation Loss')
plt.legend(loc='upper right')
plt.ylabel('Cross Entropy')
plt.ylim([0,1.0])
plt.title('Training and Validation Loss')
plt.xlabel('epoch')
plt.show()

In [None]:
cnn.evaluate(X_val, y_val) # Model evaluation

## Predict The Output For Testing Dataset 😅
We have trained our model, evaluated it and now finally we will predict the output/target for the testing data (i.e. Test.csv).

#### Load Test Set
Load the test data on which final submission is to be made.

In [None]:
# Loading the order of the image's name that has been provided
test_image_order = pd.read_csv("/content/eye_gender_data/Testing_set.csv")
test_image_order.head()

## Data Pre-processing on test_data


In [None]:
# Getting images file path
file_paths = [[fname, '/content/eye_gender_data/test/' + fname] for fname in test_image_order['filename']]
file_paths[0]

In [None]:
# Confirm if number of images is same as number of labels given
if len(test_image_order) == len(file_paths):
  print('Number of image names i.e. ', len(test_image_order), 'matches the number of file paths i.e. ', len(file_paths))
else:
  print('Number of image names does not match the number of filepaths')

In [None]:
# Converting the file_paths to dataframe
test_images = pd.DataFrame(file_paths, columns=['filename', 'filepaths'])
test_images.head()

In [None]:
test_images.shape

In [None]:
test_pixel_data = [] # initialize an empty numpy array
image_size = 100 # image size taken is 100 here. one can take other size too
for i in range(len(test_images)):

  img_array = cv2.imread(test_images['filepaths'][i], cv2.IMREAD_GRAYSCALE) # converting the image to gray scale

  new_img_array = cv2.resize(img_array, (image_size, image_size)) # resizing the image array

  test_pixel_data.append(new_img_array)

In [None]:
#
test_pixel_data = np.array(test_pixel_data)

In [None]:
# Reshape
test_pixel_data = test_pixel_data.reshape(-1, 100, 100, 1)

### Make Prediction on Test Dataset
Time to make a submission!!!

In [None]:
# Make Prediction on Test Dataset
pred = cnn.predict(test_pixel_data)
pred

In [None]:
prediction = []
for value in pred:
  prediction.append(np.argmax(value))

In [None]:
prediction

In [None]:
predictions = le.inverse_transform(prediction)
predictions

In [None]:
len(predictions)

## **How to save prediciton results locally via jupyter notebook?**
If you are working on Jupyter notebook, execute below block of codes. A file named 'submission.csv' will be created in your current working directory.

In [None]:
test_images

In [None]:
test_images['filename']

In [None]:
res = pd.DataFrame({'filename': test_images['filename'], 'label': predictions})  # prediction is nothing but the final predictions of your model on input features of your new unseen test data
res.to_csv("submission.csv", index = False)      # the csv file will be saved locally on the same location where this notebook is located.

# **OR,**
**If you are working on Google Colab then use the below set of code to save prediction results locally**

## **How to save prediction results locally via colab notebook?**
If you are working on Google Colab Notebook, execute below block of codes. A file named 'prediction_results' will be downloaded in your system.

In [None]:
res = pd.DataFrame({'filename': test_images['filename'], 'label': predictions})  # prediction is nothing but the final predictions of your model on input features of your new unseen test data
res.to_csv("submission.csv", index = False) 

# To download the csv file locally
from google.colab import files        
files.download('submission.csv')

# **Well Done! 👍**
You are all set to make a submission. Let's head to the **[challenge page](https://dphi.tech/challenges/4-week-deep-learning-online-bootcamp-final-assignment-sex-determination-by-morphometry-of-eyes/144/submit)** to make the submission.