**Bark Texture Images Classification**

The Dataset used in this notebook can be downlaoded from this page https://www.kaggle.com/datasets/saurabhshahane/barkvn50

From the description of images we can see that there are 5578 images of 50 categories of bark texture. We need to build a model that can classify these images into appropriate category.

Once you download you will find a zip file and upon unzipping you will find a folder named "BarkVN-50". Under this folder, there is a folder named  "BarkVN-50_mendeley" in which there are 50 folders with each having images in it for each category of bark texture. So take the folder "BarkVN-50_mendeley" and move it to your current working directory 

Now let's first import all the required libraries

In [1]:
##Importing Required Libraries

import os 
import shutil
import numpy as np 
import matplotlib.pyplot as plt 
import random 
import tensorflow as tf
import tensorflow_hub as hub
from keras.applications import ImageDataGenerator

The "BarkVN-50_mendeley" folder is renamed as "bark_dataset".

Now let's have a look how many images per category we have 

In [None]:
for dirpath, dirnames, filenames in os.walk("bark_dataset/"):
    print(f"there are {len(dirnames)} directories and {len(filenames)} files in '{dirpath}'.")

The images for each category has been renamed as category1.JPG, category2.JPG. For example images in Acacia folder are named Acacia1.JPG., Acacia2.JPG .....etc.

    Since , the images of different categories can have same name as they are named in the format of IMG_ 3587.JPG. 
    So, it can be a problem while moving the images in different folder.

In [None]:
all_categories = os.listdir("bark_dataset/")

for category in sorted(all_categories):
    all_images = os.listdir("bark_dataset/" + category + "/" )
    i = 1
    for image in all_images:
        os.rename('bark_dataset/'+ category + "/" + image , 'bark_dataset/' + category + "/" + category + str(i) + ".JPG")
        i += 1

##Train-Test Splitting

For that I have created a train and est folder in my current working directory. In train folder, 50 folders have been created
for each category of images with the same name as it was in downloaded data 
and have been put all the training data for each category in that same specific category folder. For example
the training data for Acacia category is kept in "train/Acacia/".

Same procedure has been followed for test folder.  For example
the testing data for Acacia category is kept in "test/Acacia/".

In [None]:
os.makedirs("train")
for category in sorted(all_categories):
    os.makedirs("train/" + category )
    all_images = os.listdir("bark_dataset/" + category + "/")
    for image in random.sample(all_images, int(0.8 * len(all_images))):
        shutil.move("bark_dataset/" + category + "/" + image, "train/" + category + "/")

        
os.makedirs("test")
for category in sorted(all_categories):
    os.makedirs('test/' + category)
    all_images = os.listdir("bark_dataset/" + category + "/")
    for image in all_images:
        shutil.move("bark_dataset/" + category + "/" + image, "test/" + category + "/")

## Creating  a train and test data using ImageDataGenerator 

We needed to modify the file structure as we did above so that we can create batches of training and testing data to be used in Tensorflow model. 
Here, the training and testing data is created using ImageDataGenerator. 


    Note:- normalized pixel values of each image has been used instead of values between 0-255

In [None]:
# Set the seed
tf.random.set_seed(42)

# Preprocess data (get all of the pixel values between 1 and 0, also called scaling/normalization)
train_datagen = ImageDataGenerator(rescale=1./255)
test_datagen = ImageDataGenerator(rescale=1./255)

# Setup the train and test directories
train_dir = "train/"
test_dir = "test/"

# Import data from directories and turn it into batches
train_data = train_datagen.flow_from_directory(train_dir,
                                               batch_size=32, # number of images to process at a time 
                                               target_size=(224, 224), # convert all images to be 224 x 224
                                               class_mode="categorical", 
                                               seed=42)

test_data = test_datagen.flow_from_directory(test_dir,
                                               batch_size=32,
                                               target_size=(224, 224),
                                               class_mode="categorical",
                                               seed=42)


## Creating a base model in VGGNet and checking the accuracy metrics of this base model

Since here, the number of images for different categories are not same but the data is not highly 
imbalanced, therefore, accuracy will be used for checking the mterics of the base model.

    Note:- Base model will be trained for only 5 epochs just to check how the model performs

In [None]:
#defining the model
model_1 = tf.keras.models.Sequential([
  tf.keras.layers.Conv2D(filters=10, 
                         kernel_size=3, 
                         activation="relu", 
                         input_shape=(224, 224, 3)), 
  tf.keras.layers.Conv2D(10, 3, activation="relu"),
  tf.keras.layers.MaxPool2D(pool_size=2, 
                            padding="valid"),
  tf.keras.layers.Conv2D(10, 3, activation="relu"),
  tf.keras.layers.Conv2D(10, 3, activation="relu"), 
  tf.keras.layers.MaxPool2D(2),
  tf.keras.layers.Flatten(),
  tf.keras.layers.Dense(50, activation="softmax") 
])

# Compile the model
model_1.compile(loss="categorical_crossentropy",
              optimizer=tf.keras.optimizers.Adam(),
                metrics=["accuracy"])

# Fit the model
history_1 = model_1.fit(train_data,
                        epochs=5,
                        steps_per_epoch=len(train_data),
                        validation_data=test_data,
                        validation_steps=len(test_data))

## Data Visualization  

History callback returned is used whenever fit function is performed for visualization of the performace.

In [None]:
def plot_loss_curves(history):
    loss = history.history['loss']
    val_loss = history.history['val_loss']
    epochs = range(len(history.history['loss']))
    
    plt.figure()
    plt.plot(epochs, loss, label = 'training_loss')
    plt.plot(epochs, val_loss, label = 'testing_loss')
    plt.title("loss_curves")
    plt.legend()
    plt.show()
    
    accuracy = history.history['accuracy']
    val_accuracy = history.history['val_accuracy']
    
    plt.figure()
    plt.plot(epochs, accuracy, label = 'training_accuracy')
    plt.plot(epochs, val_accuracy, label = 'validation_accuracy')
    plt.title("accuracy_curves")
    plt.legend()
    plt.show()

### Visualize the performance of the first base model

In [None]:
plot_loss_curves(history_1)

## Overfitting Base Model.

To prevent overfitting, data augmentation and dropout is simultaneously is used in the model.


In [None]:
train_datagen_aug = ImageDataGenerator(rescale = 1./255, 
                                   horizontal_flip = True, 
                                   vertical_flip = True,
                                   height_shift_range = 0.2, 
                                   width_shift_range = 0.2)


train_data_aug = train_datagen_aug.flow_from_directory(train_dir, 
                                                  batch_size = 32, 
                                                  target_size = (224, 224), 
                                                  class_mode = 'categorical', 
                                                  seed = 42)

## New Model 

This model is same as the base model but has a dropout layer before maxpool layer and is trained on augmented data for more epochs. 

In [None]:
model_2 = tf.keras.models.Sequential([
    tf.keras.layers.Conv2D(filters = 10, kernel_size = 3, input_shape = (224, 224, 3), activation = 'relu'), 
    tf.keras.layers.Conv2D(filters = 10, kernel_size = 3, activation = 'relu'), 
    tf.keras.layers.Dropout(0.2),
    tf.keras.layers.MaxPool2D(2, padding = 'valid'),
    tf.keras.layers.Conv2D(filters = 10, kernel_size = 3, activation = 'relu'), 
    tf.keras.layers.Conv2D(filters = 10, kernel_size = 3, activation = 'relu'),
    tf.keras.layers.Dropout(0.4), 
    tf.keras.layers.MaxPool2D(2, padding = 'valid'),
    tf.keras.layers.Flatten(), 
    tf.keras.layers.Dense(50, activation = 'softmax')
])


model_2.compile(loss = 'categorical_crossentropy', 
               optimizer = tf.keras.optimizers.Adam(), 
               metrics = ["accuracy"])

history_2 = model_2.fit(train_data_aug, 
                       epochs = 15, 
                       steps_per_epoch = len(train_data_aug),
                       validation_data = test_data, 
                       validation_steps  = len(test_data))

## Visualise the performance of the new model

In [None]:
plot_loss_curves(history_2)

## Using Transfer Learning

Even though the model has low variance, the model is not performing well because it has very high bias now.

    So, Resnet model is used for transfer learning, considering the nature of the problem.




In [None]:
def create_model(model_url, num_classes = 50):
    #the model url helps to incorporate the architecutre which we want to use in transfer learning. 
    #using this function you can incorporate any architecture by providing the suitable url.
    feature_extractor_layer = hub.KerasLayer(model_url, 
                                            trainable = False, 
                                            input_shape = (224, 224, 3))
    model = tf.keras.Sequential([
        feature_extractor_layer, 
        tf.keras.layers.Dropout(0.4),
        tf.keras.layers.Dense(num_classes, activation = 'softmax')
    ])
    return model

A dropout of 40 percent before the prediction layer is used and the model is trained for 10 epochs.


In [None]:
resnet_v2_50_url = "https://tfhub.dev/google/imagenet/resnet_v2_50/feature_vector/5"

resnet_model = create_model(resnet_v2_50_url)

resnet_model.compile(loss = 'categorical_crossentropy', 
                    optimizer = tf.keras.optimizers.Adam(), 
                    metrics = ['accuracy'])

history_resnet_model = resnet_model.fit(train_data, 
                                   epochs = 10, 
                                   steps_per_epoch = len(train_data), 
                                   validation_data = test_data, 
                                   validation_steps = len(test_data))

In [None]:
plot_loss_curves(history_resnet_model)

    The current resnet_model is giving training accuracy of 98% and validation accuracy of 92%.
    So, it is a good model to use and to further predict on unseen data.