# Evaluation of binary classification

The model is trained and ready to be evaluated. For this purpose, I have 6588 images that did not participate in training. I decided to unbalance the data and get the metrics readings considering the real, production distribution of the data. During training, I will try to tune the estimation to minimize the first-order error to a minimum. I will start by importing the data and the necessary libraries

## STEP 1. Import data and libraries

In [1]:
import numpy as np 
import pandas as pd 
import zipfile
from matplotlib import pyplot as plt
import shutil 
from tqdm import tqdm
import torch
import torchvision
import time
import copy
from torchvision import transforms, models
import os
from datetime import datetime

In [2]:
data_root = r"data\data_root\al5083\test"
print(os.listdir(data_root))

['170904-150144-Al 2mm-part2', '170904-152301-Al 2mm-part2', '170904-154202-Al 2mm-part1', '170904-155610-Al 2mm', '170905-112213-Al 2mm', '170906-104925-Al 2mm', '170906-113317-Al 2mm-part1', '170906-143512-Al 2mm-part1', '170906-143512-Al 2mm-part2', '170906-151724-Al 2mm-part1', '170906-153326-Al 2mm-part3', '170906-155007-Al 2mm-part1', '170906-155007-Al 2mm-part2', '170913-154448-Al 2mm', 'test.json']


## STEP 2. Data markup

In [3]:
js = os.path.join(data_root,r"test.json")
labels = pd.read_json(js, typ='series')
labels = labels.to_frame()
labels = labels.reset_index()
labels = labels.rename(columns={'index':'path',0:'class'})
labels['class'] = labels['class'].astype(object)
def create_binary_label(row):
    if row['class'] == 0:
        return 'good_weld'
    else:
        return 'bad_weld'
labels['binary'] = labels.apply(create_binary_label, axis=1)
labels = labels.sort_values(by='class')
labels = labels.reset_index()
labels = labels.drop('index',axis=1)
classes = labels['binary'].unique()
test_labels =labels

In [4]:
test_labels

Unnamed: 0,path,class,binary
0,170913-154448-Al 2mm/frame_00162.png,0,good_weld
1,170906-113317-Al 2mm-part1/frame_00336.png,0,good_weld
2,170906-113317-Al 2mm-part1/frame_00544.png,0,good_weld
3,170906-113317-Al 2mm-part1/frame_00391.png,0,good_weld
4,170906-113317-Al 2mm-part1/frame_00454.png,0,good_weld
...,...,...,...
6583,170904-152301-Al 2mm-part2/frame_00626.png,5,bad_weld
6584,170904-152301-Al 2mm-part2/frame_00425.png,5,bad_weld
6585,170904-152301-Al 2mm-part2/frame_00482.png,5,bad_weld
6586,170904-152301-Al 2mm-part2/frame_00606.png,5,bad_weld


## STEP 3. Load images

In [5]:
for class_name in classes: 
    os.makedirs(os.path.join('data/binary_test', class_name), exist_ok=True)

In [8]:
test_dir = 'data/binary_test'
for class_name in classes: 
    for i, file_name in enumerate(tqdm(labels['path'].loc[labels['binary']==class_name].tolist())):
        pic_name = str(class_name) + '_' + str(i) + '.png'
        shutil.copy(os.path.join(data_root, file_name), os.path.join(os.path.join(test_dir, class_name,pic_name)))
print('Изображения для теста прогружены!') 

100%|██████████| 2189/2189 [00:15<00:00, 145.89it/s]
100%|██████████| 4399/4399 [00:31<00:00, 140.97it/s]

Изображения для теста прогружены!





In [13]:
data = []
for root, dirs, files in os.walk(test_dir):
    for file in files:
        file_path = os.path.join(root, file)
        file_name = os.path.basename(file_path)
        dir_name = os.path.basename(root)
        data.append((file_name, dir_name))

dfw = pd.DataFrame(data, columns=['path', 'test_class'])
dfw

Unnamed: 0,path,test_class
0,bad_weld_0.png,bad_weld
1,bad_weld_1.png,bad_weld
2,bad_weld_10.png,bad_weld
3,bad_weld_100.png,bad_weld
4,bad_weld_1000.png,bad_weld
...,...,...
6583,good_weld_995.png,good_weld
6584,good_weld_996.png,good_weld
6585,good_weld_997.png,good_weld
6586,good_weld_998.png,good_weld


## STEP 4. Transform

I apply the same transformations for testing as I did for training.

In [9]:
test_transforms = transforms.Compose([ 
    transforms.Resize((224, 224)), 
    transforms.ToTensor(), 
    transforms.Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225])])
test_dataset = torchvision.datasets.ImageFolder(test_dir, test_transforms)

In [10]:
batch_size = 8

In [11]:
test_dataloader = torch.utils.data.DataLoader(
        test_dataset, batch_size=batch_size, shuffle=False, num_workers=0) #Новый даталоадер с путями до изображени

In [12]:
len(test_dataset)

6588

## STEP 5. Load the model

In [22]:
model = torch.load(r"vgg13_adagrad_binary.pth",map_location=torch.device('cuda:0'))

In [23]:
device = torch.device("cuda:0")
model = model.to(device)
model.eval()

VGG(
  (features): Sequential(
    (0): Conv2d(3, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (1): ReLU(inplace=True)
    (2): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (3): ReLU(inplace=True)
    (4): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
    (5): Conv2d(64, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (6): ReLU(inplace=True)
    (7): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (8): ReLU(inplace=True)
    (9): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
    (10): Conv2d(128, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (11): ReLU(inplace=True)
    (12): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (13): ReLU(inplace=True)
    (14): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
    (15): Conv2d(256, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (16): 

## STEP 6. TESTING

In [30]:
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
model.to(device)
start = datetime.now()
predictions = []
model.eval()
with torch.no_grad():
    for images, _ in tqdm(test_dataloader):
        images = images.to(device)
        outputs = model(images)
        predicted = (outputs >= 0.001).int()
        predictions.extend(predicted.cpu().numpy())
finish = datetime.now() - start
print('Success!')

100%|██████████| 1098/1098 [09:09<00:00,  2.00it/s]

Тестирование данных закончено





## STEP 7. METRICS

In [32]:
df = dfw
for i in range(0, 1):
    df[f'prob_{i+1}'] = [probs[i] for probs in predictions] 
df['prob_1'] = df['prob_1'].replace({0:'bad_weld', 1:'good_weld'})

df['right_name'] = dfw['path']
df = df.drop('path',axis=1)
df = df.rename(columns={'prob_1':'predicted_classes'})
df['true_class'] = df['right_name'].apply(lambda x: x.split('_')[0])
df['true_class'] = df['true_class'] + '_weld'
df['true_class'] = df['true_class'].replace('flipped_weld','good_weld')
# df = df.drop('right_name',axis=1)
def apply_conditions(df):
    if df.true_class == 'good_weld':
        if df.predicted_classes == 'good_weld':
            return 'TN'
        elif df.predicted_classes == 'bad_weld':
            return 'FN'
    else:
        if df.predicted_classes == 'bad_weld':
            return 'TP'
        elif df.predicted_classes == 'good_weld':
            return 'FP'
df['answer'] = df.apply(apply_conditions, axis=1)
answer_group = df.groupby(df.answer).agg({'answer':'count'})
answer_group = answer_group.rename(columns={'answer':'Count'})
answer_group

Unnamed: 0_level_0,Count
answer,Unnamed: 1_level_1
FN,62
FP,984
TN,4316
TP,3415


TP - Correctly guessed that the seam is good
                                                                                                                            
TN - He guessed correctly that the seam is bad.
                                                                                                                            
FP - He decided that the correct seam has a defect. This is not a big deal, it will increase the work of building control.
                                                                                                                            
FN - He thinks the bad seam is not so bad and that it is OK. That's the BIGGEST FAULT

In [34]:
print('FOR MODEL:)
TP = int(answer_group.loc[answer_group.index == 'TP']['Count'])
TN = int(answer_group.loc[answer_group.index == 'TN']['Count'])
FP = int(answer_group.loc[answer_group.index == 'FP']['Count'])
FN = int(answer_group.loc[answer_group.index == 'FN']['Count'])
accuracy = (TP + TN) / (TP+TN+FP+FN)
print('Accuracy: ', accuracy)
recall = (TP) / (TP + FN)
print('Recall: ', recall)
precision = (TP) / (TP + FP)
print('Precision: ', precision)
f1_score = (2 * recall * precision) / (recall + precision)
print('F1-score: ', f1_score)
print('Время тестирования картинок: ', finish)

Для архитектуры AlexNet:
Accuracy:  0.8808248832175003
Recall:  0.9821685360943342
Precision:  0.7763127983632644
F1-score:  0.867191467750127
Время тестирования картинок:  0:09:09.027533
