## Introduction
Main idea of this project is to successfully classify intel landscape image dataset. This dataset consists of 6 different landscapes namely; **buildings, streets, glaciers, forests, deserts and XX** and I'm going to use **Convolutional Neural Networks (ConvNets)** machine learning method to classify these images **as fast as and as accurate as possible.**

Convolutional Neural Network is **special type of Artificial Neural Network (ANN)** structure.
What separates Convolutional Neural Networks from Artificial Neural Networks is state of art structure of **ConvNets that is specifically created for image classification and related tasks.** Unlike ANN's fully connected network structure, **Cluster of Convolutional Layers is the core of ConvNets.** and it is the main engine to squeeze the images into processable size and structure. Not surprisingly, this unique structure boosts computational capability of ConvNets during image classification tasks when it compared to ANN.


* **Dataset**: Intel image dataset includes 6 different landscape images with 150x150 size.


* **Inspiration**: Accurately classify as much as image possible with robust machine learning.


* **Problem Definition**: Building Convolutional Neural Network model to obtain high accuracy.


* **Link**: https://www.kaggle.com/puneet6060/intel-image-classification

## Approach
* **0.Explanatory Data Analysis**: Understanding the dataset and check class imbalance.


* **Convolutional Neural Network**: Creating **ConvNets model** for the problem.


* **Hyperparameter Tuning**: Optimizing **hyperparameters** of the ConvNets model to achieve better results.

## Models
* **ConvNets**:  Variants of ConvNets models.


**Note:** Basically, I benefit from Kaggle GPU unit on this project to be able to obtain more robust results, therefore I did not run this code on Jupyter notebook but kaggle kernel section.

## Data Exploration


In [None]:
# Import necessary libraries and packages
import numpy as np # linear algebra
import pandas as pd # data processing, CSV file I/O (e.g. pd.read_csv)
import tensorflow as tf
from sklearn.metrics import confusion_matrix
import matplotlib.pyplot as plt 
import cv2  
import keras
import tensorflow.keras.layers as Layers
import tensorflow.keras.models as Models
import tensorflow.keras.optimizers as Optimizer
from keras.regularizers import l1
from keras.regularizers import l2
from keras.layers.normalization import BatchNormalization
import tensorflow.keras.metrics as Metrics
from keras.layers import Dropout
from sklearn.utils import shuffle
from sklearn.model_selection import cross_val_score
import seaborn as sn
import timeit
import os
from keras.optimizers import Adam
#from keras.optimizers import SGD
from random import randint
from tensorflow.keras.optimizers import SGD

TRAIN_PATH = "../input/seg_train/seg_train/"
TEST_PATH = "../input/seg_test/seg_test/"

Next, let's **discover image categories** by their percentage distributions.

In [None]:
# Exploratory analysis
def explore_categories(path):
    """This function explores data folders and counts number of landscape category by category."""

    # Counting each iamge category
    for category in os.listdir(path):
        if(category == "buildings"):
            no_buildings = len(os.listdir(path + "/" + "buildings"))
        elif(category == "forest"):
            no_forest = len(os.listdir(path + "/" + "forest"))
        elif(category == "glacier"):
            no_glacier = len(os.listdir(path + "/" + "glacier"))  
        elif(category == "mountain"):
            no_mountain = len(os.listdir(path + "/" + "mountain"))
        elif(category == "sea"):
            no_sea = len(os.listdir(path + "/" + "sea"))   
        elif(category == "street"):
            no_street = len(os.listdir(path + "/" + "street"))

    # Summing all images.        
    total_images = no_buildings + no_forest + no_glacier + no_mountain + no_sea + no_street

    # Pie chart, where the slices will be ordered and plotted counter-clockwise:
    labels = 'Buildings', 'Forest', 'Glacier', 'Mountain', 'Sea', 'Street'
    percentages = [no_buildings/total_images, no_forest/total_images, no_glacier/total_images, no_mountain/total_images, no_sea/total_images, no_street/total_images]

    if(path == TEST_PATH):
        pie_chart_generate(percentages, labels, "Test Data")
    elif(path == TRAIN_PATH):
        pie_chart_generate(percentages, labels, "Training Data")
    return total_images


def pie_chart_generate(percentages, labels, title):
  """This function generates pie charts of given class labels."""
    # Defining color map for pie chart.
    cmap = plt.get_cmap("tab20c")
    outer_colors = inner_colors = cmap(np.array([1, 2, 5, 6, 9, 10]))

    fig, ax = plt.subplots()
    ax.set_title(title)
    ax.pie(percentages, labels=labels, autopct='%1.1f%%',
        shadow=False, startangle=90, colors=outer_colors)
    ax.axis('equal')  # Equal aspect ratio ensures that pie is drawn as a circle.
    plt.show()


# Training data pie chart
number_training_images = explore_categories(TRAIN_PATH)
# Testing data pie chart
number_testing_images = explore_categories(TEST_PATH)

# Pie chart of the ratio of training and testing data
training_testing_ratio = [number_training_images/(number_training_images + number_testing_images), number_testing_images/(number_training_images + number_testing_images)]
pie_chart_generate(training_testing_ratio, ['Training Data', 'Test data'], 'Training-Test Ratio')

print("Number of training images: " + str(number_training_images))
print("Number of testing images: " + str(number_testing_images))
print("Number of images for prediction: " + str(len(os.listdir("../input/seg_pred/seg_pred/"))))


![title](images/pie1.png)
![title](images/pie2.png)
![title](images/pie3.png)

Well, clearly there is **no class imbalance on both training and test images** so it is good news. Also we can see that we have **high amount of training images** and low amount of test images so that I need to be careful with **overfitting of the model.**

Next, let's load the data from paths that I defined and shuffle data.

In [None]:
#Pre-process data
def pre_process(path, image_size=100):
    """This function loads, resizes, standardizes and shuffles all images."""
    data = []
    labels = []
    for category in os.listdir(path):
        if(category == "buildings"):
            label = 0
        elif(category == "forest"):
            label = 1
        elif(category == "glacier"):
            label = 2  
        elif(category == "mountain"):
            label = 3  
        elif(category == "sea"):
            label = 4   
        elif(category == "street"):
            label = 5

        training_subfolder_path = path + "/" + category

        for file in os.listdir(training_subfolder_path):
            image_path = training_subfolder_path + "/" + file
            image = cv2.imread(image_path)

            #Resize all images so they all have the same size
            image = cv2.resize(image,(image_size, image_size))
            image = np.array(image)

            #Standardize data by dividing by 255
            image = image.astype('float32')/255.0
            data.append(image)
            labels.append(label)

    #Shuffle data
    data, labels = shuffle(data, labels)
    data = np.array(data)
    labels = np.array(labels)
    return data, labels

In [None]:
# Loading data
train_data, labels = pre_process(TRAIN_PATH, image_size=100)

Let's assign class labels to each image and plot some of them to check the assign labels.

In [None]:
def get_classlabel(class_code):
  """This function assign class label text on every image according  to their type."""
    labels = {2:'glacier', 4:'sea', 0:'buildings', 1:'forest', 5:'street', 3:'mountain'}  
    return labels[class_code]
# Plotting images with class labels.
f,ax = plt.subplots(3,3)
f.subplots_adjust(0,0,3,3)
for i in range(0,3,1):
    for j in range(0,3,1):
        rnd_number = randint(0,len(train_data))
        ax[i,j].imshow(train_data[rnd_number])
        ax[i,j].set_title(get_classlabel(labels[rnd_number]))
        ax[i,j].axis('off')

![title](images/cl_images.png)

Alright, let's start to build our first Convolutional Neural Network. Before constructing the model, I would like to introduce core elements of ConvNets structure.
* **Convolutional Layer:** **Fundamental component** of ConvNets. These layers are responsible for **filtering given input image and capturing certain features** of the image via applying filter operation. Essentially, Conv Layers' role is filtering useful information from given input image.

![image-center](images/cnn.gif)

* **Pooling Layers :** These layers are responsible for **reducing the number of parameters** of feature map that we obtained after convolutional layer. They function as **iterating specific kernel over feature map** to **apply function** on the map. Although there are different **types** such as **Max, Average and Sum pooling,** I used **Max Pooling** in which kernel iterates over rectified feature map and **takes largest elements of zone** that kernel applies its function.

![image-center](images/max_pool.gif)


* **Activation Functions:** They introduces **non-linearity** into neural network structure. Their role is to **transform input signal of a node into output signal.** Introducing non-linearity into NN structure is **crucial to be able to induce learning of complex non-linear relation of input and output.** Most common activation functions are **Sigmoid** (Logistic), **Tanh** (Hyperbolic Tangent) and **ReLu** (Rectified Linear Units).

![image-center](images/acv_fun2.png)

* **Dropout:** Simply this **layer dropouts some of the nodes (units)** within neural network structure with **certain probability** while **forward and backward propagation.** Dropout layer is essentially included within model to **avoid overfitting** because **deeply connected and inter-dependent nodes naturally cause overfitting** through each training state.

![image-center](images/dropout.png)

Dropout image: Srivastava, Nitish, et al. ”Dropout: a simple way to prevent neural networks from overfitting”, JMLR 2014

* **Adam Optimizer:** Adam optimizer is one of the **most popular optimization method** being used training deep neural networks. Fundamentally, it is combination of **RMSprop and Stochastic Gradient Descend  with momentum.** It is **adaptive learning rate method** in which **individual learning rates** are computed for different parameters. It leverages first and second moments of gradient computations and use them to adapt the learning rate.


According to discussed structure ConvNets, I create below neural network to train my model and conduct predictions.

In [None]:
# Constructing Convolutional Neural Network Model
def cnn_model():
    """First Convolutional Nueral Network Model"""
    model = Models.Sequential()

    model.add(Layers.Conv2D(128,kernel_size=(3,3),activation='relu',input_shape=(100,100,3)))
    model.add(Layers.Conv2D(128,kernel_size=(3,3),activation='relu'))
    model.add(Layers.MaxPool2D(pool_size=(3,3)))

    model.add(Layers.Conv2D(256,kernel_size=(3,3),activation='relu'))
    model.add(Layers.Conv2D(256,kernel_size=(3,3),activation='relu'))
    model.add(Layers.MaxPool2D(pool_size=(3,3)))

    model.add(Layers.Flatten())
    model.add(Layers.Dense(256,activation='relu'))
    model.add(Layers.Dropout(0.5))

    model.add(Layers.Dense(256,activation='relu'))
    model.add(Layers.Dropout(0.5))

    model.add(Layers.Dense(6,activation='softmax'))
    model.compile(optimizer=Optimizer.Adam(lr=0.001, beta_1=0.9, beta_2=0.999, epsilon=1e-08, decay=0.0, amsgrad=False),
                  loss='sparse_categorical_crossentropy',metrics=['accuracy'])
    return model

So far so good. I constructed my Convolutional Neural Network structure with Adam optimizer and proper learning rate. Next, I define model fit function to
fit the neural network and make prediction. The function also plots accuracy and loss outcomes along with confusion matrix.

In [4]:
# Let's define model fit function.
def model_fit(my_model, number_epochs, batch_size):
    """This function accepts neural network structure, number of epochs and bathc size as function parameters and train the neural network."""
    start_time = timeit.default_timer()
    # Fit model
    model= my_model
    trained = model.fit(train_data,labels,epochs=number_epochs,validation_split=0.25,batch_size=batch_size)
    elapsed = timeit.default_timer()
    print('Runtime:', elapsed)

    # Plotting accuracy and validation accuracy.
    plt.plot(trained.history['acc'])
    plt.plot(trained.history['val_acc'])
    plt.title('Model Accuracy')
    plt.ylabel('Accuracy')
    plt.xlabel('Epoch')
    plt.legend(['Train', 'Test'], loc='upper left')
    plt.show()

    # Plotting loss and validation loss.
    plt.plot(trained.history['loss'])
    plt.plot(trained.history['val_loss'])
    plt.title('Model Loss')
    plt.ylabel('Loss')
    plt.xlabel('Epoch')
    plt.legend(['Train', 'Test'], loc='upper left')
    plt.show()

    # Prediction on test set.
    test_images,test_labels = load_data('../input/seg_test/seg_test/')
    test_images = np.array(test_images)
    test_labels = np.array(test_labels)
    model.evaluate(test_images,test_labels, verbose=1)

    # Plotting Confusion Matrix.
    predictions = model.predict(test_images)
    pred_labels = np.argmax(predictions, axis = 1)
    frame={'y':test_labels,'y_predicted':pred_labels}
    df = pd.DataFrame(frame, columns=['y','y_predicted'])
    confusion_matrix = pd.crosstab(df['y'], df['y_predicted'],rownames=['True Label'], colnames=['Predicted Label'], margins = False)
    sn.heatmap(confusion_matrix,annot=True,fmt="d",cmap="Blues",linecolor="blue", vmin=0,vmax=500)
    plt.title('Confusion Matrix', fontsize=16)

Now, let's run first prediction with defined neural network. I run my model for 15 epochs with 32 batch size.

In [None]:
# First Prediction
model=cnn_model()
number_epochs=15
batch_size=32
model_fit(model, number_epochs,batch_size)

At the moment, we have **10525 training image and 3509 validation image.**

Let's analyze model outcomes. **Clearly**, my model starts to **overfitting from 5th Epoch** as train and test lines **cross** each other and **builds separation** through following epochs. Therefore, it is easy to observe that model is **overfitting to training set** and it has poor performance on validation set.


![title](images/m1_acc.png)
![title](images/m1_loss.png)

Overall, I obtain **%80 accuracy from first prediction.** As a baseline score, it is not bad but requires improvement.


![title](images/cm_1.png)

From confusion matrix, one can observe that model **performs poorly** on recognizing **Building images** (Label 0) and mountain image (label 3). It misclassifies **mountains as glaciers** and **buildings as streets** or vice versa.

As next step, let's increase the batch size to boost batch of images that being trained in each step. I **increase batch size from 32 to 128.**

In [9]:
#Second Prediction
model=cnn_model()
number_epochs=15
batch_size=128
model_fit(model, number_epochs,batch_size)

At this stage, I did not change structure of my model.

![title](images/m2_acc.png)
![title](images/m2_loss.png)
Alright, **overfitting** problem is **still evident** fact from 7th Epoch.


3000/3000 [==============================] - 2s 504us/sample - loss: 0.5735 - acc: 0.8433

Yet, model manages to decrease **loss from 0.77 to 0.57** and to **increase accuracy almost %4.** This is great!. Now, the model correctly **classifies %84 of images.**

![title](images/cm_2.png)

Reflection of model accuracy increase can be observed from confusion matrix as well. Number of correct classification of building (label 0) and mountain (label 3) increased.

## **Data Augmentation**

As I am looking forward **to increase my model accuracy,** I start applying **Data Augmentation** to increase my training and validation data. Data Augmentation is a method to increase available dataset by altering image specification of existing image. **Alteration** may involve:
* Horizontal or vertical flip,
* Gamma adjustment,
* Rotation of image,
* Adding Gaussian noise,
* Cropping, zooming and stretching.

In my model, I only benefit from flipping images horizontally and vertically. I observed **decrease on accuracy** when I applied **gamma adjustment, zooming and sheering.**

In [7]:
# Data Augmentation Section
import random
from scipy import ndarray
import skimage as sk
from skimage import transform
from skimage import util
from skimage.exposure import adjust_gamma

#Defining augmentation operations.
def horizontal_flip(image):
    """Flips the given image horizontally"""
    return image[:, ::-1]

def up_side_down(image):
    return np.rot90(image, 2)

# Defining augmentation methods.    
methods={'h_flip':horizontal_flip,'u_s_d':up_side_down}
# Defining data and label lists to append images into.
data = []
labels = []
# Setting the path of data.
path = "../input/seg_train/seg_train/"
for category in os.listdir(path):
    if(category == "buildings"):
        label = 0
    elif(category == "forest"):
        label = 1
    elif(category == "glacier"):
        label = 2  
    elif(category == "mountain"):
        label = 3  
    elif(category == "sea"):
        label = 4   
    elif(category == "street"):
        label = 5

    training_subfolder_path = path + "/" + category        
    for file in os.listdir(training_subfolder_path):
        image_path = training_subfolder_path + "/" + file
        image = cv2.imread(image_path)

        #Resize all images so they all have the same size
        image = cv2.resize(image,(100,100))
        image = np.array(image)

        #Standardize data by dividing by 255
        image = image.astype('float32')/255.0
        data.append(image)
        labels.append(label)

        # Randomly choosing an augmentation operation.
        key = random.choice(list(methods))
        image=methods[key](image)
        data.append(image)
        labels.append(label)

# Generating training dataset.
print("Training data", len(data))

#Shuffle data
data, labels = shuffle(data, labels)
data = np.array(data)
labels = np.array(labels)
train_data=data

Training data 28068

''

After data augmentation process, I doubled my training data amount **from 14k to 28k images.** Let's try my model with data augmentation.

In [None]:
# Third Prediction
model=cnn_model()
number_epochs=20
batch_size=128
model_fit(model, number_epochs,batch_size)

**Overfitting** problem is **still existing** in the model.

![title](images/m3_acc.png)
![title](images/m3_loss.png)

**No substantial improvement** on model accuracy.

![title](images/cm_2.png)

In [None]:
# Construct model
def cnn_model2():
    """function description"""    
    model = Models.Sequential()

    model.add(Layers.Conv2D(128,kernel_size=(3,3),activation='relu',input_shape=(100,100,3)))
    model.add(Layers.Conv2D(128,kernel_size=(3,3),activation='relu'))
    model.add(Layers.MaxPool2D(pool_size=(3,3)))

    model.add(Layers.Conv2D(256,kernel_size=(3,3),activation='relu',kernel_regularizer=l2(0.001)))
    model.add(Layers.Conv2D(256,kernel_size=(3,3),activation='relu',kernel_regularizer=l2(0.001)))
    model.add(Layers.MaxPool2D(pool_size=(3,3)))

    model.add(Layers.Flatten())
    model.add(Layers.Dense(256,activation='relu',kernel_regularizer=l2(0.001)))
    model.add(Layers.Dropout(0.5))

    model.add(Layers.Dense(256,activation='relu',kernel_regularizer=l2(0.001)))
    model.add(Layers.Dropout(0.5))

    model.add(Layers.Dense(6,activation='softmax'))

    model.compile(optimizer=Optimizer.Adam(lr=0.0001, beta_1=0.9, beta_2=0.999, epsilon=1e-08, decay=0.0, amsgrad=True),
                  loss='sparse_categorical_crossentropy',metrics=['accuracy'])  
    return model

So, in the 2nd model I changed the **learning rate from 0.0001 to 0.001.**

In [None]:
# Fourth Prediction
model=cnn_model2()
number_epochs=60
batch_size=128

model_fit(model, number_epochs,batch_size)

![title](images/m4_acc.png)
![title](images/m4_loss.png)

![title](images/cm_4.png)

## Results

At the end of this project, I tried different approaches to be able to classify intel landscape image dataset as accurate as possible. I used **ConvNets** to tackle this problem. To **avoid overfitting** I utilized **L2 Regularization (Gaussian Prior/Ridge) along with dropout and data augmentation.**

Finally, designed deep learning model is able to **classify landscape images with around %87 success rate.**

As next step, one can change deep learning structure, making model deeper or shallower, including average or sum pooling approaches. In addition, one can also try out different optimizer such as Stochastic Gradient Descent or Adagrad with optimizing learning rate and epochs.