# COMBINED CLASSIFIER

This notebook contains the code for the final classifier which returns whether each defect is present in each image or not. The classes included in this classifier include:

- FrontGridInterruption, NearSolderPad (Combined) 
- Closed
- Isolated 
- BrightSpot
- Corrosion
- Resistive

The rough operation of the classifier is as follows. 

1. Datasets are created calling the ImageLoader and DefectViewer Classes. 
2. The best pipeline of transformations for each class in stored in the [model_features.py](https://github.com/atox120/w281_finalproject_solascan/blob/main/app/model_features.py) file.
3. The parameters for each individual classifier pipeline is then input into the model.
4. A loop is then initiated whereby each dataset is loaded, filtered for the correct classes and the transformations applied. The scores for each individual classifier are recorded.
5. The scores are then concatenated together and scored using the [VectorClassifier](https://github.com/atox120/w281_finalproject_solascan/blob/f54d9cb7b62f3449f2d393d351ab9cbaf2e0e5fb/app/models.py#L723) class. The balanced accuracy score is used.

We present herein comparisons for each pipeline comparing the performance vs. Clean images and vs. all other classes combined separately. 

In contrast to the EDA notebooks, he scores presented herein are calculated on the TEST set. 

In [1]:
%load_ext autoreload
%autoreload 2

In [2]:
import os
import sys
import copy
import time
import tabulate
import numpy as np
from sklearn.linear_model import LogisticRegression
from sklearn.ensemble import GradientBoostingClassifier, ExtraTreesClassifier
sys.path.append(os.path.join(os.path.abspath(""), ".."))

In [3]:
from app import model_features
from app.models import Classifier
from app.model_features import get_samples, get_data_handler
from app.imager import ImageLoader, DefectViewer, Show, Exposure

## Setup

In [4]:
# Complimentary:
# If True: Split the data as Category and all other
# If False: Split the data as Category and None
complimentary = True

# Maximum number of samples to choose for defects
# The other class is 2X this number
num_samples = 2000

# Seed for plotting
seed = 1234

In [8]:
# Analyzing which defect types:
model_defect_classes = ['Closed', 'Isolated','BrightSpot', 'Corrosion', 'Resistive',
                       ('FrontGridInterruption', 'NearSolderPad')]

# Analyzing which defect
model_params = {'Closed': {'class': GradientBoostingClassifier, 'n_estimators': 300, 'max_depth': 4,
                         'learning_rate': 0.1, 'pca_dims': min(160, num_samples)},
            'Isolated': {'class': GradientBoostingClassifier, 'n_estimators': 300, 'max_depth': 4,
                         'learning_rate': 0.1, 'pca_dims': min(160, num_samples)},
            'BrightSpot': {'class': LogisticRegression, 'penalty': 'l2', 'pca_dims': None},
            'Corrosion': {'class': LogisticRegression, 'penalty': 'l2', 'pca_dims': None},
            'Resistive':  {'class': ExtraTreesClassifier, 'max_features': 0.1, 'min_samples_split': 8,
                           'random_state': 32},
            ('FrontGridInterruption', 'NearSolderPad'):
                        {'class': GradientBoostingClassifier, 'n_estimators': 600, 'max_depth': 4,
                         'learning_rate': 0.05, 'pca_dims': min(250, num_samples)},}

## Main loop for creating models and running evaluations

First we set compliment = False, which sets up the dataset for each pipeline to run a binary classification task distinguishing the defect class against a non-defective 'clean' class. 

In [9]:
# Empty objects for storing scores
model_objects = []
model_classes = []
model_data_handlers = []

# For each defect class, create the DataSet
for cnt, defect_classes in enumerate(model_defect_classes):
    if len(model_objects) >= cnt + 1:
        continue
        
    print(f'Working on {defect_classes}')
    start = time.perf_counter()
    model_param = model_params[defect_classes]
    
    # Get the samples for the model
    if isinstance(defect_classes, tuple):
        classes = list(defect_classes)
    else:
        classes = defect_classes
    
    # Get the data for modeling
    defect, not_defect = get_samples(classes, num_samples, complimentary=complimentary)
    
    # Get the data handler 
    data_handler = get_data_handler(defect_classes)
    
    # Get the pre processed data for this 
    defect_ = data_handler(defect, num_jobs=20)
    not_defect_ = data_handler(not_defect, num_jobs=20)
    print(not_defect_.category)
    
    # Show the pre and post processed images
    # _ = Show(num_images=2, seed=seed) << (defect, defect_) + (not_defect, not_defect_)
    
    # Get the parameter for this classifier
    this_param = copy.deepcopy(model_param)
    model_class = this_param['class']
    del this_param['class']
    
#     # Train the classifier 
#     print(defect_classes)
#     cla = Classifier(defect_, not_defect_, model_class, None)
#     score = cla.fit_cv(**this_param)
    
#     # Misclassified
#     print(score)
#     conf, out = cla.misclassified()
#     print(tabulate.tabulate([['True 0', conf[0, 0], conf[0, 1]], ['True 1', conf[1, 0], conf[1, 1]]], headers=['', 'Pred 0', 'Pred 1']))
    
    # Train the classifier 
    cla = Classifier(defect_, not_defect_, model_class, None)
    model = cla.fit(**this_param)
    
    model_objects.append(model)
    model_classes.append(defect_classes)
    model_data_handlers.append(data_handler)
    
    print(f'Completed {defect_classes} in {time.perf_counter()-start}s')

Working on Closed
model_features.closed
0 images were rejected


  out_img = (in_imgs - all_min) / (all_max - all_min)


Failed on count 367
Failed on count 1614
2 images were rejected
Others
 Closed - Preprocessed
Completed Closed in 114.68839525801013s
Working on Isolated
model_features.isolated
Others
 Isolated - Preprocessed
Completed Isolated in 30.336396701983176s
Working on BrightSpot
model_features.brightspots
0 were rejected
0 were rejected
Others
 Brightspots - Gaussian Blur - Fourier Transform


STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(


Completed BrightSpot in 15.498162627976853s
Working on Corrosion
model_features.generic_return
Others
 processed


STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(


Completed Corrosion in 5.011916688992642s
Working on Resistive
model_features.resistive


  magnitude = np.log10(np.abs(transformed))


Others
 ResistiveCrack - Preprocessed
Completed Resistive in 450.1302660870133s
Working on ('FrontGridInterruption', 'NearSolderPad')
model_features.grid_interruption
Others
 GridInterruption - Preprocessed
Completed ('FrontGridInterruption', 'NearSolderPad') in 245.0291437790147s


### Combine the models together and calculate score

In this step, the individual classifier scores are combined to obtain the overall multiclass classification score

In [10]:
img = ImageLoader(defect_class=None, do_train=False)
filename_df = img.get(n=1000)
filename_df = DefectViewer(row_chop=15, col_chop=15).get(filename_df)

In [11]:
from app.models import VectorClassifier

# Instantiate vector classifier object. 
vc = VectorClassifier(model_objects=model_objects, model_classes=model_classes, 
                      model_data_handlers=model_data_handlers, defect_classes=img.defect_classes.tolist())

[('Closed',), ('Isolated',), ('BrightSpot',), ('Corrosion',), ('Resistive',), ('FrontGridInterruption', 'NearSolderPad')]


In [12]:
# Evaluate on the test set. 
results = vc.test(filename_df)

0 images were rejected
0 were rejected
('Closed',) 0.7786319232590668 0.8281329561527582
('Isolated',) 0.8280570289916085 0.8874903474903475
('BrightSpot',) 0.966900702106319 0.9942857142857142
('Corrosion',) 0.9954864593781344 1.0
('Resistive',) 0.726774322169059 0.8064610389610389
('FrontGridInterruption', 'NearSolderPad') 0.6425420046109701 0.7015270935960591
{'Overall': (0.7691355866605709, 0.8072675026123302), ('Closed',): (0.7786319232590668, 0.8281329561527582), ('Isolated',): (0.8280570289916085, 0.8874903474903475), ('BrightSpot',): (0.966900702106319, 0.9942857142857142), ('Corrosion',): (0.9954864593781344, 1.0), ('Resistive',): (0.726774322169059, 0.8064610389610389), ('FrontGridInterruption', 'NearSolderPad'): (0.6425420046109701, 0.7015270935960591)}


In [13]:
for key, value in results.items():
    print(f'{key}, {value[0]}, {value[1]}')

Overall, 0.7691355866605709, 0.8072675026123302
('Closed',), 0.7786319232590668, 0.8281329561527582
('Isolated',), 0.8280570289916085, 0.8874903474903475
('BrightSpot',), 0.966900702106319, 0.9942857142857142
('Corrosion',), 0.9954864593781344, 1.0
('Resistive',), 0.726774322169059, 0.8064610389610389
('FrontGridInterruption', 'NearSolderPad'), 0.6425420046109701, 0.7015270935960591


##  Vs All Other Defects

Now we run the classifier whereby the classification is set up to distinguish a single defect class vs all other possible defect classes, includeing the no defect present class. 

In [18]:
complimentary = False

In [19]:
# Empty objects for storing scores
model_objects = []
model_classes = []
model_data_handlers = []

# For each defect class, create the DataSet
for cnt, defect_classes in enumerate(model_defect_classes):
    if len(model_objects) >= cnt + 1:
        continue
        
    print(f'Working on {defect_classes}')
    start = time.perf_counter()
    model_param = model_params[defect_classes]
    
    # Get the samples for the model
    if isinstance(defect_classes, tuple):
        classes = list(defect_classes)
    else:
        classes = defect_classes
    
    # Get the data for modeling
    defect, not_defect = get_samples(classes, num_samples, complimentary=complimentary)
    
    # Get the data handler 
    data_handler = get_data_handler(defect_classes)
    
    # Get the pre processed data for this 
    defect_ = data_handler(defect, num_jobs=20)
    not_defect_ = data_handler(not_defect, num_jobs=20)
    print(not_defect_.category)
    
    # Show the pre and post processed images
    # _ = Show(num_images=2, seed=seed) << (defect, defect_) + (not_defect, not_defect_)
    
    # Get the parameter for this classifier
    this_param = copy.deepcopy(model_param)
    model_class = this_param['class']
    del this_param['class']
    
#     # Train the classifier 
#     print(defect_classes)
#     cla = Classifier(defect_, not_defect_, model_class, None)
#     score = cla.fit_cv(**this_param)
    
#     # Misclassified
#     print(score)
#     conf, out = cla.misclassified()
#     print(tabulate.tabulate([['True 0', conf[0, 0], conf[0, 1]], ['True 1', conf[1, 0], conf[1, 1]]], headers=['', 'Pred 0', 'Pred 1']))
    
    # Train the classifier 
    cla = Classifier(defect_, not_defect_, model_class, None)
    model = cla.fit(**this_param)
    
    model_objects.append(model)
    model_classes.append(defect_classes)
    model_data_handlers.append(data_handler)
    
    print(f'Completed {defect_classes} in {time.perf_counter()-start}s')

Working on Closed
model_features.closed
0 images were rejected
0 images were rejected
None
 Closed - Preprocessed
Completed Closed in 114.54465924098622s
Working on Isolated
model_features.isolated
None
 Isolated - Preprocessed
Completed Isolated in 29.737355699006002s
Working on BrightSpot
model_features.brightspots
0 were rejected
0 were rejected
None
 Brightspots - Gaussian Blur - Fourier Transform


STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(


Completed BrightSpot in 15.741276284999913s
Working on Corrosion
model_features.generic_return
None
 processed


STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(


Completed Corrosion in 5.0324290769931395s
Working on Resistive
model_features.resistive
None
 ResistiveCrack - Preprocessed
Completed Resistive in 508.93710093200207s
Working on ('FrontGridInterruption', 'NearSolderPad')
model_features.grid_interruption
None
 GridInterruption - Preprocessed
Completed ('FrontGridInterruption', 'NearSolderPad') in 247.02336497200304s


In [20]:
# Load a dataset to obtain the scoring information. 
img = ImageLoader(defect_class=None, do_train=False)
filename_df = img.get(n=1000)
filename_df = DefectViewer(row_chop=15, col_chop=15).get(filename_df)

# Instantiate a new vector classifier class 
vc = VectorClassifier(model_objects=model_objects, model_classes=model_classes, 
                      model_data_handlers=model_data_handlers, defect_classes=img.defect_classes.tolist())

[('Closed',), ('Isolated',), ('BrightSpot',), ('Corrosion',), ('Resistive',), ('FrontGridInterruption', 'NearSolderPad')]


In [21]:
for key, value in results.items():
    print(f'{key}, {value[0]}, {value[1]}')

Overall, 0.7691355866605709, 0.8072675026123302
('Closed',), 0.7786319232590668, 0.8281329561527582
('Isolated',), 0.8280570289916085, 0.8874903474903475
('BrightSpot',), 0.966900702106319, 0.9942857142857142
('Corrosion',), 0.9954864593781344, 1.0
('Resistive',), 0.726774322169059, 0.8064610389610389
('FrontGridInterruption', 'NearSolderPad'), 0.6425420046109701, 0.7015270935960591
