# CNN for Intel Image Classification

In this project we're gonna build cnn model for image classification to distinct places such as buildings, forest, glacier, mountain, sea, and street. This model can be used for application that used landscape picture as its own features for instance to cluster recommendation places that similar with user input.

# Import Packages

Firstly, we need to import several packages but mostly we just need packages for data manipulation and build deep learning architecture model. In this case because we just want to build simple cnn model, we can use keras that more user friendly.

In [2]:
library(keras)
library(tidyverse)
library(stringr)
library(imager)

── [1mAttaching packages[22m ─────────────────────────────────────── tidyverse 1.2.1 ──

[32m✔[39m [34mggplot2[39m 3.2.1.[31m9000[39m     [32m✔[39m [34mpurrr  [39m 0.3.2     
[32m✔[39m [34mtibble [39m 2.1.3          [32m✔[39m [34mdplyr  [39m 0.8.3     
[32m✔[39m [34mtidyr  [39m 0.8.3          [32m✔[39m [34mstringr[39m 1.4.0     
[32m✔[39m [34mreadr  [39m 1.3.1          [32m✔[39m [34mforcats[39m 0.4.0     

── [1mConflicts[22m ────────────────────────────────────────── tidyverse_conflicts() ──
[31m✖[39m [34mdplyr[39m::[32mfilter()[39m masks [34mstats[39m::filter()
[31m✖[39m [34mdplyr[39m::[32mlag()[39m    masks [34mstats[39m::lag()

Loading required package: magrittr


Attaching package: ‘magrittr’


The following object is masked from ‘package:purrr’:

    set_names


The following object is masked from ‘package:tidyr’:

    extract



Attaching package: ‘imager’


The following object is masked from ‘package:magrittr’:

    add


T

## Set GPU Environment

In order to get fast computing, we can't use CPU processor to process our data because the size is too large. So kaggle provides gpu for us to compute our processes. In this section we can also check list of cores that are provided by kaggle.

In [3]:
Sys.setenv(KERAS_BACKEND = "keras")
Sys.setenv(THEANO_FLAGS = "device=gpu,floatX=float32")
k = backend()
sess = k$get_session()
sess$list_devices()
system("ls ../input")

[[1]]
_DeviceAttributes(/job:localhost/replica:0/task:0/device:CPU:0, CPU, 268435456, 6645422030548007246)

[[2]]
_DeviceAttributes(/job:localhost/replica:0/task:0/device:XLA_GPU:0, XLA_GPU, 17179869184, 11911082925581861493)

[[3]]
_DeviceAttributes(/job:localhost/replica:0/task:0/device:XLA_CPU:0, XLA_CPU, 17179869184, 6225206460837553550)

[[4]]
_DeviceAttributes(/job:localhost/replica:0/task:0/device:GPU:0, GPU, 15956161332, 7707700862761205860)


# Dataset Preparation

Firstly, before we fit our data into CNN model, we have to serve it as a matrix form. Our images will be converted into 3 Dimensional Matrix (width, height, and channel). In our dataset folder, there are 3 subfolders contain train, test, and prediction dataset. In this case we're only use 2 subfolders, seg_train for making train set and validation set, and seg_test to evaluate our model.

In this section we're gonna do 3 main process for our images:
- Data Transformation
- Data Augmentation
- Data Generator

## Data Transormation

Basically in this section, we will transform all of images in dataset folders into multidimensional matrix. So firstly we have to get our list images each of class respectively using list.files.

In [None]:
buildings <- list.files(path = "../input/intel-image-classification/seg_train/seg_train/buildings")
forest <- list.files(path = "../input/intel-image-classification/seg_train/seg_train/forest")
glacier <- list.files(path = "../input/intel-image-classification/seg_train/seg_train/glacier")
mountain <- list.files(path = "../input/intel-image-classification/seg_train/seg_train/mountain")
sea <- list.files(path = "../input/intel-image-classification/seg_train/seg_train/sea")
street <- list.files(path = "../input/intel-image-classification/seg_train/seg_train/street")

buildings_test <- list.files(path = "../input/intel-image-classification/seg_test/seg_test/buildings")
forest_test <- list.files(path = "../input/intel-image-classification/seg_test/seg_test/forest")
glacier_test <- list.files(path = "../input/intel-image-classification/seg_test/seg_test/glacier")
mountain_test <- list.files(path = "../input/intel-image-classification/seg_test/seg_test/mountain")
sea_test <- list.files(path = "../input/intel-image-classification/seg_test/seg_test/sea")
street_test <- list.files(path = "../input/intel-image-classification/seg_test/seg_test/street")

After we read all of images from our folder, we make train, evaluation, and test based on them. Disclaimer, if we use all of our images for fitting model, it will take expensive computation and it doesn't guarantee our model score better. So we decide to not use all of them. Moreover we will do data augmentation so that we get more variety of data.

In [4]:
size = 150
channels = 3

train <- c(
    buildings[1:1500], 
    forest[1:1500],
    glacier[1:1500],
    mountain[1:1500],
    sea[1:1500],
    street[1:1500]
)
train <- sample(train)
evaluation <- c(
    buildings[1501:1750], 
    forest[1501:1750],
    glacier[1501:1750],
    mountain[1501:1750],
    sea[1501:1750],
    street[1501:1750]
)
evaluation <- sample(evaluation)

test <- c(buildings_test, forest_test, glacier_test, mountain_test, sea_test, street_test)
test <- sample(test)

ERROR: Error in eval(expr, envir, enclos): object 'buildings' not found


After we get sample list of images, we have to make our data into matrix form both for X and y (label). For target or label, we can make new matrix like we do one hot encoding each of target class in our images. For X or variables, we prepare 4D tensor with shape detail total data, width, height, and channel (using 3 channel because RGB format). After our data transform into matrix, we can remove several variables that won't be used again for next process or clean up our environment (because it cost more memory usage if we don't remove its).

In [None]:
data_prep <- function(images, size, channels, path, list_img){

  count<- length(images)
  master_array <- array(NA, dim=c(count,size, size, channels))
  
  for (i in seq(length(images))) {
    folder_list <- list("buildings", "forest", "glacier", "mountain", "sea", "street")
    for(j in 1:length(folder_list)) {
        if(images[i] %in% list_img[[j]]) {
            img_path <- paste0(path, folder_list[[j]], "/", images[i])
            break
        }
    }
    img <- image_load(path = img_path, target_size = c(size,size))
    img_arr <- image_to_array(img)
#     img <- load.image(paste("../input/train/", images[i], sep=""))
#     img <-  resize(img,size_x = size, size_y = size, size_c = channels)
#     img_arr <- array_reshape(img, c(1, size, size, channels))
#     img_arr <- image_to_array(img)
    img_arr <- array_reshape(img_arr, c(1, size, size, channels))
    master_array[i,,,] <- img_arr
  }
  return(master_array)
}

label_prep <- function(images, list_img) {
    y <- c()
    for(i in seq(length(images))) {
        folder_list <- list("buildings", "forest", "glacier", "mountain", "sea", "street")
        for(j in 1:length(folder_list)) {
            if(images[i] %in% list_img[[j]]) {
                y <- append(y, j-1)
                break
            }
        }
    }
    return(y)
}

list_img_train <- list(buildings, forest, glacier, mountain, sea, street)
list_img_test <- list(buildings_test, forest_test, glacier_test, mountain_test, sea_test, street_test)

In [None]:
X_train <- data_prep(train, size, channels, "../input/intel-image-classification/seg_train/seg_train/", list_img_train)
X_evaluation <- data_prep(evaluation, size, channels, "../input/intel-image-classification/seg_train/seg_train/", list_img_train)
X_test <- data_prep(test, size, channels, "../input/intel-image-classification/seg_test/seg_test/", list_img_test)

In [None]:
y_train <- to_categorical(label_prep(train, list_img_train))
y_evaluation <- to_categorical(label_prep(evaluation, list_img_train))
y_test <- to_categorical(label_prep(test, list_img_test))

In [None]:
rm(list_img_train, list_img_test, buildings, forest, glacier, mountain, sea, street, buildings_test, forest_test, glacier_test, mountain_test, sea_test, street_test)

## Data Augmentation

One of most common problem with image classification is lack of datasets. Data augmentation helps us to get more images by manipulation or processing each of images such as rotation, flip, scale, etc. However data augmentation also depends to our cases. For instance, if we deal with medical images, we forbid to produce more data with rotation or flipping our base images. In this case for landscape picture, we can augment several images like horizontal flip, rescale our images, and shifting our images. We get more information without add more data to our folders.

In [None]:
train_datagen <- image_data_generator(rescale = 1/255,
  width_shift_range = 0.2,
  height_shift_range = 0.2,
  shear_range = 0.2,
  zoom_range = 0.2,
  horizontal_flip = TRUE)   

validation_datagen <- image_data_generator(rescale = 1/255)   
test_datagen <- image_data_generator(rescale = 1/255)

## Data Generator

On the process above, we just make some data generator for our images but actually not add new data based on image transformation. Now we want to wrap our data with generator together using batch size 32. After this process below is executed, our data prepare to be fit into CNN Model. To clean up our environment, we want to remove unused variables to save some memory usage.

In [None]:
train_generator <- flow_images_from_data(
  x = X_train, 
  y = y_train,
  generator = train_datagen,                                                                                       
  batch_size = 32
)

validation_generator <- flow_images_from_data(
  x = X_evaluation, 
  y = y_evaluation,
  generator = validation_datagen,                                                                                       
  batch_size = 32
)

test_generator <- flow_images_from_data(
    x = X_test,
    y = y_test,
    generator = test_datagen,
    batch_size = 32
)

In [None]:
rm(X_train, y_train, X_evaluation, y_evaluation)

# CNN Model Architecture

After dataset has prepared already, now we get into modelling section. In this case we want to classify images using convolutional neural network architecture, one of most popular NN architecture for image data. Basically the main differences using normal NN and CNN is convolutional layer. If we use dense layer, our model will learn global pattern in their input feature space whereas convolutional layer learn local layer. For instance at the cat classifier, cnn model can find local pattern such as edges, textures, and so on. More specifically, cnn model (maybe) learn ears, eyes, and etc. Moreover the advantage of using convolutional layers is they learn translation invariant. It means model can recognize objects anywhere whether the object at the bottom left corner, at the center, and so on due to learn local pattern.

Mainly our model architecture consist 4 different layers:
- convolutional layers
- pooling layers (for downsampling features)
- dense layers (for output classifier)
- dropout (prevent model overfitting)

Our deep learning problem is classification with multiclass problem. Therefore on the last dense layer for output, we use softmax activation function.

In [None]:
model <- keras_model_sequential() %>%
  layer_conv_2d(filters = 32, kernel_size = c(3, 3), activation = "relu",
                input_shape = c(150, 150, 3)) %>%
  layer_max_pooling_2d(pool_size = c(2, 2)) %>%
  layer_conv_2d(filters = 64, kernel_size = c(3, 3), activation = "relu") %>%
  layer_max_pooling_2d(pool_size = c(2, 2)) %>%
  layer_conv_2d(filters = 128, kernel_size = c(3, 3), activation = "relu") %>%
  layer_max_pooling_2d(pool_size = c(2, 2)) %>%
  layer_conv_2d(filters = 128, kernel_size = c(3, 3), activation = "relu") %>%
  layer_max_pooling_2d(pool_size = c(2, 2)) %>%
  layer_flatten() %>%
  layer_dropout(0.5)  %>% 
  layer_dense(units = 512, activation = "relu") %>%
  layer_dense(units = 6, activation = "softmax")

# Model Compiler

Next, we choose our model lost function and optimzer in order to do backpropagation. We use categorical crossentropy as cost function because our domain problem is multiclass classification and we use adam optimizer.

In [None]:
model %>% compile(
  loss = "categorical_crossentropy",
  optimizer = optimizer_adam(lr = 1e-3),
  metrics = c("accuracy")
)

# Training Phase 

Finally, we ready to train our model with dataset. We use train generator for training set and validation generator for validation set that are prepared before. Set total epoch can a little bit tricky due to higher epoch doesn't guarantee our model will better. So if at n-epoch model accuracy plot don't rise sharply or we can say steady at the same value, we can stop the process. It useful to save more time and less computation because we cut some unnecessary process. To this technique, we can use early stopping callback. In the example below, it means, if accuracy doesn't increase after 5 epoch, it will stop immediately and get the latest model fit.

In [None]:
history <- model %>% fit_generator(
  train_generator,
  steps_per_epoch = 100,
  epochs = 30,
  validation_data = validation_generator,
  validation_steps = 50,
  callbacks = list(
      callback_early_stopping(patience = 5, monitor = "accuracy", mode = "max")
  )
)

After fitting process done, we can plot our history to check the result of training model. According to plot below, our model get approximately 86% accuracy both for training and validation data. Therefore our model doensn't tend to be overfitting.Actually we can train more epoch but based on the graph, it looks like doesn't improve significantly instead using more memory to run it.

In [None]:
plot(history)

# Save Model

Training CNN model took times and sometimes our notebook crash because limited core or memory consumption. To prevent that we can save our model. So if we want to predict another data, we can load our keras model without train new model.

In [None]:
dir.create("model", showWarnings = FALSE)
model %>% save_model_hdf5("./model/my_model.h5")

In [None]:
load_model <- load_model_hdf5("./model/my_model.h5")
load_model %>% summary()

## Model Evaluation

After CNN model is created, we evaluate it to prove our model can distinct well every input images such as buildings, forest, glacier, mountain, sea, and street using evaluate generator function. If we take a look from history plot above, our cnn model fit the data properly both for train set and validation set (it doesn't tend to be overfit model). Finally if we run cell below, we can get our model accuracy about 87% and its score is same with train and validation when the model was fit.

In [None]:
load_model %>% evaluate_generator(test_generator, steps = 32)