# Steel Defects Classification

In this section we are build a multi-class to classifier to classify the type steel defect shown in an image. To mitigate the class-imbalance, during training we will augment the images and for every batch augmented we will over sample the minority classes to have better representation. To evaluate the performance of the model, we expecct the precision, recall and f1 scores to be similar. 

In [1]:
# main libraries

# dataset
from steel_defects import steel_defects

# model
from defects_classifier import defects_classifier

# batch generator over sampler
from BalancedDataGenerator import BalancedDataGenerator

# image generator
from keras.preprocessing.image import ImageDataGenerator

# model training helpers
from keras.callbacks import EarlyStopping
from keras.callbacks import ReduceLROnPlateau


Using TensorFlow backend.


## Load Steel Defects Dataset

The images are preprocessed into a shape of 150 by 150 to gray scale, as a way to reduce the number of features. Noticed when each dataset is loaded into memory it also shows the number of each class. Clasnumber 3 has approximately 75 percent of the whole dataset.

In [2]:
steel = steel_defects()

In [3]:
trn_dir = '/Users/carlostavarez/Desktop/imgs_multiClass/train'
tst_dir = '/Users/carlostavarez/Desktop/imgs_multiClass/test'
val_dir = '/Users/carlostavarez/Desktop/imgs_multiClass/valid'

In [4]:
# training data
tr_imgs, tr_lbls = steel.load_defects(trn_dir)

100%|██████████| 630/630 [00:02<00:00, 261.01it/s]
100%|██████████| 157/157 [00:00<00:00, 279.19it/s]
100%|██████████| 3850/3850 [00:14<00:00, 266.88it/s]
100%|██████████| 422/422 [00:01<00:00, 245.16it/s]


In [5]:
# validation data
vl_imgs, vl_lbls = steel.load_defects(val_dir)

100%|██████████| 62/62 [00:00<00:00, 256.77it/s]
100%|██████████| 19/19 [00:00<00:00, 265.21it/s]
100%|██████████| 433/433 [00:01<00:00, 265.37it/s]
100%|██████████| 42/42 [00:00<00:00, 249.80it/s]


In [6]:
# testing data
ts_imgs, ts_lbls = steel.load_defects(tst_dir)

100%|██████████| 77/77 [00:00<00:00, 233.16it/s]
100%|██████████| 19/19 [00:00<00:00, 257.61it/s]
100%|██████████| 476/476 [00:01<00:00, 248.38it/s]
100%|██████████| 52/52 [00:00<00:00, 243.61it/s]


## Data Preparation

To reduce the memory foot-print, to train the model we will use an image generator from keras. As mentioned above, an additional piece of code was added to balance the image generated during training. 

In [7]:
tr_generator = ImageDataGenerator(rescale=1.0/255, 
                            brightness_range=(0.2, 0.7), 
                            horizontal_flip=True, 
                            vertical_flip=True)

ts_generator = ImageDataGenerator(rescale=1.0/255)

ts_gen = ts_generator.flow(ts_imgs, ts_lbls, batch_size=32, seed=42)
vl_gen = ts_generator.flow(vl_imgs, vl_lbls, batch_size=32, seed=42)

bgen = BalancedDataGenerator(tr_imgs, tr_lbls, tr_generator, batch_size=32)

print('Number of steps per epoch {}'.format(bgen.steps_per_epoch))

Number of steps per epoch 481


## Model Training and Evaluation

In this part we will train model 

In [8]:
model_classifier = defects_classifier.make_model()

In [9]:
model_classifier.summary()

Model: "sequential_1"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
conv2d_1 (Conv2D)            (None, 148, 148, 32)      320       
_________________________________________________________________
batch_normalization_1 (Batch (None, 148, 148, 32)      128       
_________________________________________________________________
conv2d_2 (Conv2D)            (None, 146, 146, 64)      18496     
_________________________________________________________________
max_pooling2d_1 (MaxPooling2 (None, 73, 73, 64)        0         
_________________________________________________________________
spatial_dropout2d_1 (Spatial (None, 73, 73, 64)        0         
_________________________________________________________________
conv2d_3 (Conv2D)            (None, 71, 71, 128)       73856     
_________________________________________________________________
max_pooling2d_2 (MaxPooling2 (None, 35, 35, 128)      

In [10]:
call_list = [
    ReduceLROnPlateau(monitor='val_loss', patience=2, verbose=0, factor=0.04, min_lr=0.001),
    EarlyStopping(monitor='val_accuracy', patience=2)
]

In [None]:
history = model_classifier.fit_generator(bgen, 
                                           steps_per_epoch=481,
                                           validation_data=vl_gen,
                                           validation_steps=len(vl_imgs)//32,
                                           epochs=50,
                                           callbacks=call_list)

Epoch 1/50
 54/481 [==>...........................] - ETA: 9:15 - loss: 1.3071 - accuracy: 0.3328

In [None]:
print('Ahora si')