**This notebook is a cleaned version of my previous work which can be found here [Link to exploratory](https://www.kaggle.com/rohan9889/deep-learning-damage-type-to-the-car/notebook)**

Below are the inputs that we will neeed to perform our analysis

In [2]:
import numpy as np
import pandas as pd
import os
import xml.etree.ElementTree as ET
import keras
from tensorflow.keras.applications.vgg19 import preprocess_input
from tensorflow.keras.preprocessing.image import ImageDataGenerator
from tensorflow.keras.preprocessing.image import load_img, img_to_array
from tensorflow.keras.applications.vgg19 import VGG19
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense, Flatten, GlobalAveragePooling2D
print(os.listdir("Internship/"))

['annot_chem4', 'images_chem4']


Storing all the xml file names containing meta data about the labels in a list

In [3]:
annotations = os.listdir("Internship/annot_chem4/")

Relative path to annotations

In [4]:
path_annot = "Internship/annot_chem4/"

Relative path to images

In [5]:
path_images = "Internship/images_chem4/"

This dictionary will help to create DataFrame from our data

In [6]:
helping_dict = {}

In [7]:
for i in range(len(annotations)):
    helping_dict[i] = 0

Extracting and storing all the information in the dictionary

In [8]:
for i in range(len(annotations)):
    temp_dict = {}
    et = ET.parse(os.path.join(path_annot,annotations[i]))
    for j in et.iter():
        if j.tag == 'filename':
            temp_dict['FileLoc'] = os.path.join(path_images,j.text)
            temp_dict['FileName'] = j.text
            continue
        if j.tag == 'name':
            temp_dict['DamageType'] = j.text
    helping_dict[i] = temp_dict

Converting dictionary into DatFrame

In [9]:
data = pd.DataFrame(helping_dict)
data = data.T

In [10]:
unique_damages = data['DamageType'].unique()

Using value counts method to see differnt types of DamageTypes and to find out about the missing values

In [11]:
data['DamageType'].value_counts()

Scratch_or_spot         164
Dent                    113
Dislocation              87
Large_tear_or_damage     73
Tear                     68
Shatter                  53
Large_dent                2
Name: DamageType, dtype: int64

Removing rows were we dont have a valid label for our image data

In [12]:
data.dropna(inplace=True,axis=0)

Saving length of unique categorical labels

In [13]:
length_of_unique_damages = len(data['DamageType'].unique())

We have saved it as 224 value for both rows and columns as vgg19 takes images in the format of (224,224)

In [14]:
img_rows = 224
img_cols = 224

The following method will help us to read images, convert them to our required dimensions and will apply vvg19's specified preprocess_input function to images

In [15]:
def read_and_prep_images(img_paths, img_height=img_rows, img_width=img_cols):
    imgs = [load_img(img_path, target_size=(img_height, img_width)) for img_path in img_paths]
    img_array = np.array([img_to_array(img) for img in imgs])
    output = preprocess_input(img_array)
    return(output)

In [16]:
image_paths = data['FileLoc']

Using the before created function to convert our image data

In [17]:
train_samples = read_and_prep_images(image_paths)

Storing labels in a different variale

In [18]:
temp_labels = data['DamageType']

In [19]:
unique_damages = temp_labels.unique()

To feed categorical labels to our model we need to convert it to a format that our machine understands

In [20]:
labels_dicct = {}

In [21]:
for i in range(length_of_unique_damages):
    labels_dicct[unique_damages[i]] = i

In [22]:
def prep_data(raw):
    return keras.utils.to_categorical(raw, length_of_unique_damages)


In [23]:
def change_labels(blob):
    return labels_dicct[blob]

In [24]:
temp_labels = temp_labels.apply(change_labels)

In [25]:
train_labels = prep_data(temp_labels)

In [39]:
train_labels

array([[1., 0., 0., ..., 0., 0., 0.],
       [1., 0., 0., ..., 0., 0., 0.],
       [0., 1., 0., ..., 0., 0., 0.],
       ...,
       [0., 0., 0., ..., 1., 0., 0.],
       [0., 0., 1., ..., 0., 0., 0.],
       [0., 0., 0., ..., 0., 0., 0.]], dtype=float32)

We now have successfully converted string labels to the format that can be successfully fed to the model

Adding our custom model

In [26]:
model = Sequential()

In [27]:
model.add(VGG19(include_top=False,pooling='avg',weights='imagenet'))

In [28]:
model.add(Dense(units=length_of_unique_damages, activation='softmax'))

In [29]:
model.layers[0].trainable = False

In [30]:
model.compile(optimizer='adam', loss='categorical_crossentropy', metrics=['accuracy'])

In [31]:
"""model.fit(
    x= train_samples ,y= train_labels,
    batch_size = 3,
    epochs = 8,
    validation_split = 0.1,
)"""

Train on 504 samples, validate on 56 samples
Epoch 1/8
Epoch 2/8
Epoch 3/8
Epoch 4/8
Epoch 5/8
Epoch 6/8
Epoch 7/8
Epoch 8/8


<tensorflow.python.keras.callbacks.History at 0x218600ef7b8>

Our model has finished training and is predicting at a satisfactory level of accuracy

In [32]:
"""# serialize model to JSON
model_json = model.to_json()
with open("model_VGG19.json", "w") as json_file:
    json_file.write(model_json)
# serialize weights to HDF5
model.save_weights("model_VGG19.h5")
print("Saved model to disk")"""

Saved model to disk


In [None]:
"""# load json and create model
json_file = open('model.json', 'r')
loaded_model_json = json_file.read()
json_file.close()
loaded_model = model_from_json(loaded_model_json)
# load weights into new model
loaded_model.load_weights("model.h5")
print("Loaded model from disk")"""
 
"""# evaluate loaded model on test data
loaded_model.compile(loss='binary_crossentropy', optimizer='rmsprop', metrics=['accuracy'])
score = loaded_model.evaluate(X, Y, verbose=0)
print("%s: %.2f%%" % (loaded_model.metrics_names[1], score[1]*100))"""