<a id="6"></a>
<p style="background-color:#615154;font-family:newtimeroman;color:#CABFC1;font-size:250%;text-align:center;border-radius:40px 40px;">Humpback Whale Identification</p>

<a href="https://ibb.co/8gr5HYB"><img src="https://i.ibb.co/HTtzjh2/happy-whale.jpg" alt="happy-whale" border="0"></a>

Story about dataset

1. After centuries of intense whaling, recovering whale populations still have a hard time adapting to warming oceans and struggle to compete every day with the industrial fishing industry for food.

2. To aid whale conservation efforts, scientists use photo surveillance systems to monitor ocean activity. They use the shape of whales’ tails and unique markings found in footage to identify what species of whale they’re analyzing and meticulously log whale pod dynamics and movements. For the past 40 years, most of this work has been done manually by individual scientists, leaving a huge trove of data untapped and underutilized.

3. We'd like to thank Happywhale for providing this data and problem. [Happywhale](https://happywhale.com/) is a platform that uses image process algorithms to let anyone to submit their whale photo and have it automatically identified.

📌 **Algorithms used:**  
  * CNN
    
    
📌**Tools used:-** 
* Google Colab

📌**Libraries used:-** 
* Numpy
* pandas
* Matplotlib
* flask
* scikit-learn
* Keras



<a id="1"></a> <br>
<a id="6"></a>
<p style="background-color:#615154;font-family:newtimeroman;color:#CABFC1;font-size:250%;text-align:center;border-radius:40px 40px;">Introduction</p>
<br>
 In this kernel, we will be working on Humpback Whale Identification Dataset (Implementing with Keras).

<a id="6"></a>
<p style="background-color:#615154;font-family:newtimeroman;color:#CABFC1;font-size:250%;text-align:center;border-radius:40px 40px;">Importing the libraries </p>

1. NumPy is a Python library used for working with arrays. It also has functions for working in domain of linear algebra, fourier transform, and matrices.
2. Pandas is mainly used for data analysis. Pandas allows importing data from various file formats such as comma-separated values, JSON, SQL database tables or queries, and Microsoft Excel.
3. matplotlib. pyplot is a collection of functions that make matplotlib work like MATLAB. Each pyplot function makes some change to a figure: e.g., creates a figure, creates a plotting area in a figure, plots some lines in a plotting area, decorates the plot with labels, etc.
4. The OS module in Python provides functions for interacting with the operating system. OS comes under Python's standard utility modules.
5. Warning messages are typically issued in situations where it is useful to alert the user of some condition in a program, where that condition (normally) doesn't warrant raising an exception and terminating the program.


In [None]:
import numpy as np # linear algebra
import pandas as pd # data processing, CSV file I/O (e.g. pd.read_csv)
import matplotlib.pyplot as plt
import seaborn as sns

import os
print(os.listdir("../input"))

# import warnings
import warnings
# filter warnings
warnings.filterwarnings('ignore')

<a id="6"></a>
<p style="background-color:#615154;font-family:newtimeroman;color:#CABFC1;font-size:250%;text-align:center;border-radius:40px 40px;">Importing the required dataset</p>

In [None]:
#Reading the csv file for train dataset
train = pd.read_csv("../input/train.csv")

In [None]:
# The info() method prints information about the DataFrame.
# The information contains the number of columns, column labels, column data types,
# memory usage, range index, and the number of cells in each column (non-null values).
train.info()

In [None]:
# The describe() method is used for calculating some statistical data like percentile,
# mean and std of the numerical values of the Series or DataFrame.
# It analyzes both numeric and object series and also the DataFrame column sets of mixed data types.
train.describe()

# **There are 5005 different classes in our dataset**

In [None]:
# shape gives number of rows and columns in a tuple
train.shape

In [None]:
# head funtion gives the first 5 rows of datasets
train.head()

In [None]:
# tail funtion gives the last 5 rows of datasets
train.tail()

**We need to identify the Id of whale. So, our output variable will be Id. We need to seperate feature columns and output column.**

In [None]:
# put labels into y_train variable
y_train = train["Id"]
# Drop the 'Id' column
xtrain = train.drop(labels = ["Id"], axis = 1)
y_train.head()

In [None]:
# Indicates sum of values in our data
train.isnull().sum()

**So, no row or column data is missing means we don't have to preprocess the data.**

<a id="6"></a>
<p style="background-color:#615154;font-family:newtimeroman;color:#CABFC1;font-size:250%;text-align:center;border-radius:40px 40px;">Image Preprocessing</p>

In [None]:
# importing the libraries for image preprocessing
from keras.preprocessing import image
from keras.applications.imagenet_utils import preprocess_input

In [None]:
def prepareImages(train, shape, path):
    
    x_train = np.zeros((shape, 100, 100, 3))
    count = 0
    
    for fig in train['Image']:
        
        #load images into images of size 100x100x3
        img = image.load_img("../input/"+path+"/"+fig, target_size=(100, 100, 3))
        x = image.img_to_array(img)
        x = preprocess_input(x)

        x_train[count] = x
        if (count%500 == 0):
            print("Processing image: ", count+1, ", ", fig)
        count += 1
    
    return x_train

In [None]:
x_train = prepareImages(train, train.shape[0], "train")

<a id="6"></a>
<p style="background-color:#615154;font-family:newtimeroman;color:#CABFC1;font-size:250%;text-align:center;border-radius:40px 40px;">Normalize the Data</p>

In [None]:
x_train = x_train / 255.0 
# rescaling the dataset 
# dividing an image by 255 simply rescales the image from 0-255 to 0-1. 
# (Converting it to float from int makes computation convenient too) 
print("xtrain shape: ",x_train.shape)

In [None]:
# Checking example input image
plt.imshow(x_train[0][:,:,0], cmap="gray")
plt.title(plt.title(train.iloc[0,0]))
plt.axis("off")
plt.show()

<a id="6"></a>
<p style="background-color:#615154;font-family:newtimeroman;color:#CABFC1;font-size:250%;text-align:center;border-radius:40px 40px;">Label Encoding</p>

1. Encode target labels with value between 0 and n_classes-1.
2. This transformer should be used to encode target values, i.e. y, and not the input X.
3. Note:- Label encoding converts the data in machine-readable form, but it assigns a unique number(starting from 0) to each class of data. This may lead to the generation of priority issues in the training of data sets. A label with a high value may be considered to have high priority than a label having a lower value.

In [None]:
from sklearn.preprocessing import LabelEncoder
label_encoder = LabelEncoder()

In [None]:
#Fit label encoder and return encoded labels.
y_train = label_encoder.fit_transform(y_train)

In [None]:
# let's look at first 10 values
y_train[0:10]  

In [None]:
# finding shape of y_train data
y_train.shape

In [None]:
# convert to one-hot-encoding
# we got 5005 classes from the function train.Id.describe()

from keras.utils.np_utils import to_categorical
y_train = to_categorical(y_train, num_classes = 5005)

In [None]:
y_train 

<a id="9"></a>
<p style="background-color:#615154;font-family:newtimeroman;color:#CABFC1;font-size:250%;text-align:center;border-radius:40px 40px;">Implementation with Keras using CNN(Convolutional Neural Network)</p>

**Convolutional layer**

1. This layer creates a convolution kernel that is convolved with the layer input to produce a tensor of outputs. If use_bias is True, a bias vector is created and added to the outputs. Finally, if activation is not None, it is applied to the outputs as well.

2. When using this layer as the first layer in a model, provide the keyword argument input_shape (tuple of integers or None, does not include the sample axis), e.g. input_shape=(128, 128, 3) for 128x128 RGB pictures in data_format="channels_last". You can use None when a dimension has variable size.

3. filters: Integer, the dimensionality of the output space (i.e. the number of output filters in the convolution).

4. kernel_size: An integer or tuple/list of 2 integers, specifying the height and width of the 2D convolution window. Can be a single integer to specify the same value for all spatial dimensions.

5. strides: An integer or tuple/list of 2 integers, specifying the strides of the convolution along the height and width. Can be a single integer to specify the same value for all spatial dimensions. Specifying any stride value != 1 is incompatible with specifying any dilation_rate value != 1.

6. padding: one of "valid" or "same" (case-insensitive). "valid" means no padding. "same" results in padding with zeros evenly to the left/right or up/down of the input. When padding="same" and strides=1, the output has the same size as the input.

7. kernel_initializer: Initializer for the kernel weights matrix (see keras.initializers). Defaults to 'glorot_uniform'.

<a href="https://ibb.co/YhhQ8pD"><img src="https://i.ibb.co/XttLpSy/uwHol.gif" alt="uwHol" border="0"></a>

**Padding**

1. When padding == ”VALID”, the input image is not padded. When padding == "VALID", there can be a loss of information and (Input Size != Output Size)

2. When padding == “SAME”, the output size is the same as the input size(when stride=1). Normally, padding is set to "SAME" while training the model. Output size is mathematically convenient for further computation.
3. Size of each feature map = [N-f+2P /S] + 1


**Visualize how padding works**
<br>

<a href="https://ibb.co/NsmXb1k"><img src="https://i.ibb.co/yXYD7hM/padding.gif" alt="padding" border="0"></a>

**Maxpooling2D**

1. Downsamples the input along its spatial dimensions (height and width) by taking the maximum value over an input window (of size defined by pool_size) for each channel of the input. The window is shifted by strides along each dimension.

2. The resulting output, when using the "valid" padding option, has a spatial shape (number of rows or columns) of: output_shape = math.floor((input_shape - pool_size) / strides) + 1 (when input_shape >= pool_size)

3. The resulting output shape when using the "same" padding option is: output_shape = math.floor((input_shape - 1) / strides) + 1

**Dropout layer**

1. The Dropout layer randomly sets input units to 0 with a frequency of rate at each step during training time, which helps prevent overfitting. Inputs not set to 0 are scaled up by 1/(1 - rate) such that the sum over all inputs is unchanged.

2. Note that the Dropout layer only applies when training is set to True such that no values are dropped during inference. When using model.fit, training will be appropriately set to True automatically, and in other contexts, you can set the kwarg explicitly to True when calling the layer.



**Flatten layer**

1. Flattens the input.
2. If inputs are shaped (batch,) without a feature axis, then flattening adds an extra channel dimension and output shape is (batch, 1).

In [None]:
from keras.utils.np_utils import to_categorical # convert to one-hot-encoding
from keras.models import Sequential # to create a cnn model
from keras.layers import Dense, Dropout, Flatten, Conv2D, MaxPool2D
from keras.optimizers import RMSprop,Adam
from keras.layers.normalization import BatchNormalization
from keras.preprocessing.image import ImageDataGenerator
from keras.callbacks import ReduceLROnPlateau

model = Sequential()

model.add(Conv2D(filters = 16, kernel_size = (3,3), padding = 'Same', activation = 'relu', input_shape = (100,100,3)))
model.add(Conv2D(filters = 16, kernel_size = (3,3), padding = 'Same', activation = 'relu'))
model.add(MaxPool2D(pool_size = (2,2)))
model.add(BatchNormalization())

model.add(Conv2D(filters = 32, kernel_size = (3,3), padding = 'Same', activation = 'relu'))
model.add(Conv2D(filters = 32, kernel_size = (3,3), padding = 'Same', activation = 'relu'))
model.add(MaxPool2D(pool_size = (2,2), strides=(2,2)))
model.add(Dropout(0.25))

model.add(Conv2D(filters = 64, kernel_size = (3,3), padding = 'Same', activation = 'relu'))
model.add(Conv2D(filters = 64, kernel_size = (3,3), padding = 'Same', activation = 'relu'))
model.add(MaxPool2D(pool_size = (2,2), strides=(2,2)))
model.add(BatchNormalization())

# fully connected
model.add(Flatten())
model.add(Dense(256, activation = 'relu'))
model.add(BatchNormalization())
model.add(Dense(y_train.shape[1], activation = "softmax"))

In [None]:
#Provides the summary of model we created
model.summary()


<a id="6"></a>
<p style="background-color:#615154;font-family:newtimeroman;color:#CABFC1;font-size:250%;text-align:center;border-radius:40px 40px;">Compile the Model</p>

 **Benefits of Adam:**
* Straightforward to implement.
* Computationally efficient.
* Little memory requirements.
* Invariant to diagonal rescale of the gradients.
* Well suited for problems that are large in terms of data and/or parameters.
* Appropriate for non-stationary objectives.
* Appropriate for problems with very noisy/or sparse gradients.
* Hyper-parameters have intuitive interpretation and typically require little tuning.


[Watch Adam paper here](https://arxiv.org/pdf/1412.6980.pdf/) 
<br>
<a href="https://ibb.co/fSVKJyY"><img src="https://i.ibb.co/7zxThZJ/adam-algo.jpg" alt="adam-algo" border="0"></a>

In [None]:
# Define the optimizer
optimizer = Adam(lr = 0.001, beta_1 = 0.9, beta_2 = 0.999)

**Learning rate scheduler**


At the beginning of every epoch, this callback gets the updated learning rate value from schedule function provided at __init__, with the current epoch and current learning rate, and applies the updated learning rate on the optimizer.

1. **schedule:** a function that takes an epoch index (integer, indexed from 0) and current learning rate (float) as inputs and returns a new learning rate as output (float).

2. **verbose:** int value 0: quiet , 1: update messages.

**ReduceLROnPlateau terms**

1. monitor	quantity to be monitored.
2. factor	factor by which the learning rate will be reduced. new_lr = lr * factor.
3. patience	number of epochs with no improvement after which learning rate will be reduced.
4. verbose	int. 0: quiet, 1: update messages.
5. min_delta	threshold for measuring the new optimum, to only focus on significant changes.
6. min_lr	lower bound on the learning rate.

<a href="https://ibb.co/NKWPRkX"><img src="https://i.ibb.co/tDYtRNS/lr-on-plateau.png" alt="lr-on-plateau" border="0"></a>

In [None]:
# Set a learning rate scheduler
learning_rate_reduction = ReduceLROnPlateau(monitor='val_acc', 
                                            patience=3, 
                                            verbose=1, 
                                            factor=0.5, 
                                            min_lr=0.00001)

**Compiler terms**

1. optimizer: String (name of optimizer) or optimizer instance.

2. loss: A loss function is any callable with the signature loss = fn(y_true, y_pred), where y_true are the ground truth values, and y_pred are the model's predictions. y_true should have shape (batch_size, d0, .. dN) (except in the case of sparse loss functions such as sparse categorical crossentropy which expects integer arrays of shape (batch_size, d0, .. dN-1)). y_pred should have shape (batch_size, d0, .. dN).

3. metrics: List of metrics to be evaluated by the model during training and testing.

4. loss_weights: Optional list or dictionary specifying scalar coefficients (Python floats) to weight the loss contributions of different model outputs. The loss value that will be minimized by the model will then be the weighted sum of all individual losses, weighted by the loss_weights coefficients. 

5. run_eagerly: Bool. Defaults to False.

6. steps_per_execution: (Default value of steps_er_epochs=1). The number of batches to run during each tf.function call. Running multiple batches inside a single tf.function call can greatly improve performance on small models. At most, one full epoch will be run each execution. If a number larger than the size of the epoch is passed, the execution will be truncated to the size of the epoch.

7. **kwargs: Arguments supported for backwards compatibility only.


In [None]:
model.compile(optimizer = optimizer, loss = "categorical_crossentropy", metrics=["accuracy"])

<a id="6"></a>
<p style="background-color:#615154;font-family:newtimeroman;color:#CABFC1;font-size:250%;text-align:center;border-radius:40px 40px;">Data Augmentation</p>

Lets use Data augmentation:-
1. Data augmentation are techniques used to increase the amount of data by adding slightly modified copies of already existing data or newly created synthetic data from existing data. 
2. It acts as a regularizer and helps reduce overfitting when training a machine learning model.

In [None]:
# With data augmentation to prevent overfitting

datagen = ImageDataGenerator(
        featurewise_center=False,  # set input mean to 0 over the dataset
        samplewise_center=False,  # set each sample mean to 0
        featurewise_std_normalization=False,  # divide inputs by std of the dataset
        samplewise_std_normalization=False,  # divide each input by its std
        zca_whitening=False,  # apply ZCA whitening
        rotation_range=10,  # randomly rotate images in the range (degrees, 0 to 180)
        zoom_range = 0.1, # Randomly zoom image 
        width_shift_range=0.1,  # randomly shift images horizontally (fraction of total width)
        height_shift_range=0.1,  # randomly shift images vertically (fraction of total height)
        horizontal_flip=False,  # randomly flip images horizontally
        vertical_flip=False)  # randomly flip images vertically


datagen.fit(x_train)

**Epochs and Batch Size**

1. Stochastic gradient descent is an iterative learning algorithm that uses a training dataset to update a model.
2. The batch size is a hyperparameter of gradient descent that controls the number of training samples to work through before the model’s internal parameters are updated.
3. The number of epochs is a hyperparameter of gradient descent that controls the number of complete passes through the training dataset.

In [None]:
epochs = 50  # for better result increase the number of epochs
batch_size = 64

<a id="6"></a>
<p style="background-color:#615154;font-family:newtimeroman;color:#CABFC1;font-size:250%;text-align:center;border-radius:40px 40px;">Fit the Model</p>

When we call the .fit_generator() function it makes assumptions:
1. Keras is first calling the generator function(dataAugmentaion)
2. Generator function(dataAugmentaion) provides a batch_size of 32 to our .fit_generator() function.
3. our .fit_generator() function first accepts a batch of the dataset, then performs backpropagation on it, and then updates the weights in our model.
4. For the number of epochs specified the process is repeated.
5. fit_generator is used when either we have a huge dataset to fit into our memory or when data augmentation needs to be applied.

In [None]:
history = model.fit_generator(datagen.flow(x_train, y_train, batch_size=batch_size),
                              epochs=50, verbose = 2, 
                              steps_per_epoch=x_train.shape[0] // batch_size,
                              callbacks=[learning_rate_reduction]) 

<a id="6"></a>
<p style="background-color:#615154;font-family:newtimeroman;color:#CABFC1;font-size:250%;text-align:center;border-radius:40px 40px;">Evaluate the model</p>



In [None]:
# Plot the loss curve for training
plt.plot(history.history['loss'], color='r', label="Train Loss")
plt.title("Train Loss")
plt.xlabel("Number of Epochs")
plt.ylabel("Loss")
plt.legend()
plt.show()

In [None]:
# Plot the accuracy curve for training
plt.plot(history.history['acc'], color='g', label="Train Accuracy")
plt.title("Train Accuracy")
plt.xlabel("Number of Epochs")
plt.ylabel("Accuracy")
plt.legend()
plt.show()

In [None]:
# finding the training accuracy 
print('Train accuracy of the model: ',history.history['acc'][-1])

In [None]:
# finding the training loss 
print('Train loss of the model: ',history.history['loss'][-1])

<a id="6"></a>
<p style="background-color:#615154;font-family:newtimeroman;color:#CABFC1;font-size:250%;text-align:center;border-radius:40px 40px;">The End</p>

