# Landscape Image classification using TensorFlow


In this project we will look at different images belonging to 6 different labels, train the model on those images and try to predict images with as high accuracy as possible.
1. Firstly, we'll import usefull packages.
1. Then, we'll load the data, before visualize and preprocess it.
1. We'll try a simple CNN model and then we will evaluate its performances.
1. We will then use pre trained model to address this challenge aswell.

# Import required Packages 

In [2]:
import numpy as np
import os
from sklearn.metrics import confusion_matrix
import seaborn as sn; sn.set(font_scale=1.4)
from sklearn.utils import shuffle           
import matplotlib.pyplot as plt             
import cv2                                 
import tensorflow as tf                
from tqdm import tqdm

In [3]:
class_names = ['mountain', 'street', 'glacier', 'buildings', 'sea', 'forest']
class_names_label = {class_name:i for i, class_name in enumerate(class_names)}

nb_classes = len(class_names)

IMAGE_SIZE = (150, 150)

In [4]:
print(class_names_label)

# Loading the Data
Writing the data_loader function to load all the images along with the labels. In total we will be using around 17000 images with 14000 being used for training and  the rest for testing.

In [5]:
def data_loader():
    datasets = ['../input/images/new_data/seg_train/seg_train', '../input/images/new_data/seg_test/seg_test']
    output = []
    
    # Iterating through both the training and testing sets
    for dataset in datasets:
        
        images = []
        labels = []
        
        print("Loading {}".format(dataset))
        
        for folder in os.listdir(dataset):
            label = class_names_label[folder]
            
            # Iterating through each image in the folder
            for file in tqdm(os.listdir(os.path.join(dataset, folder))):
                
                # Getting the path name of each image
                img_path = os.path.join(os.path.join(dataset, folder), file)
                
                # Using openCV package to adjust the image to our required specifications
                image = cv2.imread(img_path)
                image = cv2.cvtColor(image, cv2.COLOR_BGR2RGB)
                image = cv2.resize(image, IMAGE_SIZE) 
                
                # Appending the image and its corresponding label
                images.append(image)
                labels.append(label)
                
        images = np.array(images, dtype = 'float32')
        labels = np.array(labels, dtype = 'int32')   
        
        output.append((images, labels))

    return output

In [6]:
(train_images, train_labels), (test_images, test_labels) = data_loader()

In [7]:
train_images, train_labels = shuffle(train_images, train_labels, random_state=25)

# Exploring and Analyzing the dataset


In [8]:
n_train = train_images.shape[0]
n_test = test_images.shape[0]

print ("Number of training examples: {}".format(n_train))
print ("Number of testing examples: {}".format(n_test))
print ("Each image is of size: {}".format(IMAGE_SIZE))

In [9]:
import pandas as pd

_, train_counts = np.unique(train_labels, return_counts=True)
_, test_counts = np.unique(test_labels, return_counts=True)
pd.DataFrame({'train': train_counts,
                    'test': test_counts}, 
             index=class_names
            ).plot.bar()
plt.show()

In [10]:
print(train_counts)

In [11]:
plt.pie(train_counts,
        explode=(0, 0, 0, 0, 0, 0) , 
        labels=class_names,
        autopct='%1.1f%%')
plt.axis('equal')
plt.title('Proportion of each observed category')
plt.show()

## Scaling the data

In [12]:
train_images = train_images / 255.0 
test_images = test_images / 255.0

## Visualizing images from the data
We can display a random image from the training set.

In [13]:
def show_random_image(class_names, images, labels):
  
    index = np.random.randint(images.shape[0])
    plt.figure()
    plt.imshow(images[index])
    plt.xticks([])
    plt.yticks([])
    plt.grid(False)
    plt.title('Image #{} : '.format(index) + class_names[labels[index]])
    plt.show()

In [14]:
show_random_image(class_names, train_images, train_labels)

We can also display more images to get a better view

In [15]:
def show_examples(class_names, images, labels):
        
    fig = plt.figure(figsize=(10,10))
    fig.suptitle("Some examples of images of the dataset", fontsize=16)
    for i in range(20):
        plt.subplot(5,4,i+1)
        plt.xticks([])
        plt.yticks([])
        plt.grid(False)
        plt.imshow(images[i], cmap=plt.cm.binary)
        plt.xlabel(class_names[labels[i]])
    plt.show()

In [16]:
show_examples(class_names, train_images, train_labels)

Model Creation


1. Build the model,
1. Compile the model,
1. Train / fit the data to the model,
1. Evaluate the model on the testing set,

We will build a model using Conv2D, MaxPooling2D, Flatten layers, and using Relu and Softmax as activation functions. 

In [17]:
model = tf.keras.Sequential([
    tf.keras.layers.Conv2D(32, (3, 3), activation = 'relu', input_shape = (150, 150, 3)), 
    tf.keras.layers.MaxPooling2D(2,2),
    tf.keras.layers.Conv2D(32, (3, 3), activation = 'relu'),
    tf.keras.layers.MaxPooling2D(2,2),
    tf.keras.layers.Flatten(),
    tf.keras.layers.Dense(128, activation=tf.nn.relu),
    tf.keras.layers.Dense(6, activation=tf.nn.softmax)
])

Using '*adam*' as **Optimizer** and '*sparse categorical crossentropy*' as **Loss function**, we will compile the model.

In [18]:
model.compile(optimizer = 'adam', loss = 'sparse_categorical_crossentropy', metrics=['accuracy'])

Fitting the model to the training dataset.

In [19]:
model.fit(train_images, train_labels, batch_size=128, epochs=20, validation_split = 0.2)

Evaluating the model performance on the test dataset

In [20]:
test_loss = model.evaluate(test_images, test_labels)

We have achieved an accuracy of 0.76 on the testing test.

Let's select some random images and see how the classifier is evaluating them.

In [21]:
predictions = model.predict(test_images)     # Vector of probabilities
pred_labels = np.argmax(predictions, axis = 1) # We take the highest probability

show_random_image(class_names, test_images, pred_labels)

## Error analysis

Let's understand where the classifier is having trouble.

In [22]:
def show_mislabeled_images(class_names, test_images, test_labels, pred_labels):
   
    BOO = (test_labels == pred_labels)
    mislabeled_indices = np.where(BOO == 0)
    mislabeled_images = test_images[mislabeled_indices]
    mislabeled_labels = pred_labels[mislabeled_indices]

    title = "Some examples of mislabeled images by the classifier:"
    show_examples(class_names,  mislabeled_images, mislabeled_labels)


In [23]:
show_mislabeled_images(class_names, test_images, test_labels, pred_labels)

In [24]:
CM = confusion_matrix(test_labels, pred_labels)
ax = plt.axes()
sn.heatmap(CM, annot=True, 
           annot_kws={"size": 10}, 
           xticklabels=class_names, 
           yticklabels=class_names, ax = ax)
ax.set_title('Confusion matrix')
plt.show()

## Conclusion: The classifier has trouble with 2 sets of similar images.
It has trouble with street and buildings. Given streets and buildings are in a near similar landscape, it is no coincidence. 
The model also has trouble with sea, glacier and moutain images as well owing to similar colors and contrasts in the images. Forests are predicted accurately because of them being green and completely different from other landscapes.