<a href="https://colab.research.google.com/github/agungsugiarto/cnn-batik-classification/blob/master/cnn_batik_classification.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Implementasi Deep Learning Berbasis Tensorflow Dan Convulutioanal Neural Network (CNN) Untuk Pengenalan Daerah Asal Motif Batik Indonesia

---

by [Agung Sugiarto](https://agungsugiarto.github.io)
/ [GitHub](https://github.com/agungsugiarto/cnn-batik-classification.git)

## Abstrak

Indonesian merupakan bangsa yang terdiri dari berbagai etnik dan memiliki latar belakang budaya yang beraneka ragam. Salah satu hasil kebudayaan masyarakat Indonesia adalah Batik, batik mempunyai banyak macam-macam motif. Corak maupun motif batik tersebut tidak bisa lepas dari unsur-unsur yang melekat dari wilayah asal pembuatannya. Pengetahuan tentang pengenalan motif batik mungkin hanya dimiliki oleh orang-orang tertentu yang memiliki keahlian pada bidang terkait seperti bidang membatik dan tidak semua orang dapat mengenali motif batik tersebut. Namun seiring dengan berkembangnya jaman dan meningkatnya kebutuhan akan informasi mendorong manusia untuk mengembangkan teknologi baru agar pengolahan informasi dapat dilakukan dengan mudah dan cepat. Sehingga dibutuhkan suatu pendekatan dalam penyelesaian permasalahan ini. Salah satu pendekatan dalam pengenalan suatu gambar adalah Deep Learning, merupakan sebuah model neural network yang belakangan ini mulai ramai dikembangkan, hasil dari Deep Learning menunjukkan hasil yang baik dalam meningkatkan akurasi Image Classification atau kasuskasus lainnya. Penelitian ini bertujuan untuk mengetahui cara Deep Learning yaitu Convolutional Neural Network (CNN) dalam melakukan klasifikasi pengenalan daerah asal motif-motif batik.

**Kata Kunci** : *Deep Leraning*, *Convolutional Neural Network*, *Image Classification*, *Batik*.


## FLowchart ##

The following chart shows roughly how the data flows in the Convolutional Neural Network that is implemented below.

![Flowchart](https://github.com/agungsugiarto/cnn-batik-classification/blob/master/images/arsitektur%20cnn.jpeg?raw=1)

The input image is processed in the first convolutional layer using the filter-weights. This results in 16 new images, one for each filter in the convolutional layer. The images are also down-sampled so the image resolution is decreased from 28x28 to 14x14.

These 16 smaller images are then processed in the second convolutional layer. We need filter-weights for each of these 16 channels, and we need filter-weights for each output channel of this layer. There are 36 output channels so there are a total of 16 x 36 = 576 filters in the second convolutional layer. The resulting images are down-sampled again to 7x7 pixels.

The output of the second convolutional layer is 36 images of 7x7 pixels each. These are then flattened to a single vector of length 7 x 7 x 36 = 1764, which is used as the input to a fully-connected layer with 128 neurons (or elements). This feeds into another fully-connected layer with 10 neurons, one for each of the classes, which is used to determine the class of the image, that is, which number is depicted in the image.

The convolutional filters are initially chosen at random, so the classification is done randomly. The error between the predicted and true class of the input image is measured as the so-called cross-entropy. The optimizer then automatically propagates this error back through the Convolutional Network using the chain-rule of differentiation and updates the filter-weights so as to improve the classification error. This is done iteratively thousands of times until the classification error is sufficiently low.

These particular filter-weights and intermediate images are the results of one optimization run and may look different if you re-run this Notebook.

Note that the computation in TensorFlow is actually done on a batch of images instead of a single image, which makes the computation more efficient. This means the flowchart actually has one more data-dimension when implemented in TensorFlow.

## Convulutional Layer ##

The following chart shows the basic idea of processing an image in the first convolutional layer. The input image depicts the number 7 and four copies of the image are shown here, so we can see more clearly how the filter is being moved to different positions of the image. For each position of the filter, the dot-product is being calculated between the filter and the image pixels under the filter, which results in a single pixel in the output image. So moving the filter across the entire input image results in a new image being generated.

The red filter-weights means that the filter has a positive reaction to black pixels in the input image, while blue pixels means the filter has a negative reaction to black pixels.

In this case it appears that the filter recognizes the horizontal line of the 7-digit, as can be seen from its stronger reaction to that line in the output image.

![Convolution example](https://raw.githubusercontent.com/Hvass-Labs/TensorFlow-Tutorials/master/images/02_convolution.png)

The step-size for moving the filter across the input is called the stride. There is a stride for moving the filter horizontally (x-axis) and another stride for moving vertically (y-axis).

In the source-code below, the stride is set to 1 in both directions, which means the filter starts in the upper left corner of the input image and is being moved 1 pixel to the right in each step. When the filter reaches the end of the image to the right, then the filter is moved back to the left side and 1 pixel down the image. This continues until the filter has reached the lower right corner of the input image and the entire output image has been generated.

When the filter reaches the end of the right-side as well as the bottom of the input image, then it can be padded with zeroes (white pixels). This causes the output image to be of the exact same dimension as the input image.

Furthermore, the output of the convolution may be passed through a so-called Rectified Linear Unit (ReLU), which merely ensures that the output is positive because negative values are set to zero. The output may also be down-sampled by so-called max-pooling, which considers small windows of 2x2 pixels and only keeps the largest of those pixels. This halves the resolution of the input image e.g. from 28x28 to 14x14 pixels.

Note that the second convolutional layer is more complicated because it takes 16 input channels. We want a separate filter for each input channel, so we need 16 filters instead of just one. Furthermore, we want 36 output channels from the second convolutional layer, so in total we need 16 x 36 = 576 filters for the second convolutional layer. It can be a bit challenging to understand how this works.

##Import Dependency##

In [0]:
import matplotlib.pyplot as plt
import tensorflow as tf
import numpy as np
from sklearn.metrics import confusion_matrix
import time
from datetime import timedelta
import math

This was developed using Python 3.6 (Anaconda) and TensorFlow version:

In [0]:
tf.__version__

'1.13.1'

## Configuration of Neural Network

The configuration of the Convolutional Neural Network is defined here for convenience, so you can easily find and change these numbers and re-run the Notebook.

In [0]:
# Convolutional Layer 1.
filter_size1 = 5          # Convolution filters are 5 x 5 pixels.
num_filters1 = 16         # There are 16 of these filters.

# Convolutional Layer 2.
filter_size2 = 5          # Convolution filters are 5 x 5 pixels.
num_filters2 = 36         # There are 36 of these filters.

# Fully-connected layer.
fc_size = 128             # Number of neurons in fully-connected layer.

## Load Dataset ##