<a href="https://colab.research.google.com/github/demolakstate/AdeNet-Deep-Learning-Architecture/blob/main/insulator_classification_using_Random_Forest_v6.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Image classification

**Author:** [@AdemolaOkerinde](https://twitter.com/AdemolaOkerinde)<br>
**Date created:** 2022/04/05<br>
**Last modified:** 2022/04/05<br>
**Description:** Training traditional machine learning classifier - Random Forest - on features of insulators extracted using Udat statistical analytical tool.

## Introduction

Our goal is to classify cropped images of insulators from power lines as either damaged or undamaged. We have extracted features from the dataset using `Udat` based on statistical analysis like chebychev, HOG, etc. We will then proceed to train traditional machine learning classifiers using SKlearn.

We use the `Udat` statistical analytic tool to generate the datasets in a `csv format`, and
we use SKlearn package to train the algorithms.


## Setup


#### Packages foor data loading, data analysis, and data preparation

In [1]:
import numpy as np
import pandas as pd
import seaborn as sns
from matplotlib import pyplot

from pandas import read_csv, set_option
from pandas.plotting import scatter_matrix
from sklearn.preprocessing import StandardScaler


#### Packages for model evaluation and classification models

In [2]:
from sklearn.model_selection import train_test_split, KFold,\
 cross_val_score, GridSearchCV, RepeatedKFold, LeavePOut, ShuffleSplit

from sklearn.linear_model import LogisticRegression
from sklearn.tree import DecisionTreeClassifier
from sklearn.neighbors import KNeighborsClassifier
from sklearn.discriminant_analysis import LinearDiscriminantAnalysis
from sklearn.naive_bayes import GaussianNB
from sklearn.svm import SVC
from sklearn.neural_network import MLPClassifier
from sklearn.pipeline import Pipeline
from sklearn.ensemble import AdaBoostClassifier, GradientBoostingClassifier
from sklearn.ensemble import RandomForestClassifier, ExtraTreesClassifier
from sklearn.metrics import classification_report, confusion_matrix,\
  accuracy_score

#### Packages for saving the model

In [3]:
from pickle import dump
from pickle import load

## 2.2. Loading the Data


###### We load the data in this step.

In [4]:
# load dataset
dataset = read_csv('dataset.csv')

In [5]:
#Diable the warnings
import warnings
warnings.filterwarnings('ignore')

## 3. Exploratory Data Analysis

#### 3.1. Descriptive Statistics

In [6]:
# shape
dataset.shape

(4253, 1027)

In [7]:
# peek at data
set_option('display.width', 100)
dataset.head(5)

Unnamed: 0,Path,Class,ChebyshevFourierCoefficientHistogram Bin00,ChebyshevFourierCoefficientHistogram Bin01,ChebyshevFourierCoefficientHistogram Bin02,ChebyshevFourierCoefficientHistogram Bin03,ChebyshevFourierCoefficientHistogram Bin04,ChebyshevFourierCoefficientHistogram Bin05,ChebyshevFourierCoefficientHistogram Bin06,ChebyshevFourierCoefficientHistogram Bin07,...,ZernikeMoments_FFT Z_14_12,ZernikeMoments_FFT Z_14_14,ZernikeMoments_FFT Z_15_01,ZernikeMoments_FFT Z_15_03,ZernikeMoments_FFT Z_15_05,ZernikeMoments_FFT Z_15_07,ZernikeMoments_FFT Z_15_09,ZernikeMoments_FFT Z_15_11,ZernikeMoments_FFT Z_15_13,ZernikeMoments_FFT Z_15_15
0,/homes/okerinde/all_dataset_tiff/All_damaged_t...,All_damaged_tiff,321,155,27,9,9,0,0,2,...,0,0,0,0,0,0,0,0,0,0
1,/homes/okerinde/all_dataset_tiff/All_damaged_t...,All_damaged_tiff,262,173,58,16,5,6,3,2,...,0,0,0,0,0,0,0,0,0,0
2,/homes/okerinde/all_dataset_tiff/All_damaged_t...,All_damaged_tiff,305,158,44,7,2,6,2,1,...,0,0,0,0,0,0,0,0,0,0
3,/homes/okerinde/all_dataset_tiff/All_damaged_t...,All_damaged_tiff,411,85,11,8,6,0,2,2,...,0,0,0,0,0,0,0,0,0,0
4,/homes/okerinde/all_dataset_tiff/All_damaged_t...,All_damaged_tiff,354,141,18,6,4,0,2,0,...,0,0,0,0,0,0,0,0,0,0


In [8]:
dataset.tail(5)

Unnamed: 0,Path,Class,ChebyshevFourierCoefficientHistogram Bin00,ChebyshevFourierCoefficientHistogram Bin01,ChebyshevFourierCoefficientHistogram Bin02,ChebyshevFourierCoefficientHistogram Bin03,ChebyshevFourierCoefficientHistogram Bin04,ChebyshevFourierCoefficientHistogram Bin05,ChebyshevFourierCoefficientHistogram Bin06,ChebyshevFourierCoefficientHistogram Bin07,...,ZernikeMoments_FFT Z_14_12,ZernikeMoments_FFT Z_14_14,ZernikeMoments_FFT Z_15_01,ZernikeMoments_FFT Z_15_03,ZernikeMoments_FFT Z_15_05,ZernikeMoments_FFT Z_15_07,ZernikeMoments_FFT Z_15_09,ZernikeMoments_FFT Z_15_11,ZernikeMoments_FFT Z_15_13,ZernikeMoments_FFT Z_15_15
4248,/homes/okerinde/all_dataset_tiff/All_undamaged...,All_undamaged_tiff,270,180,48,14,9,2,0,2,...,0,0,0,0,0,0,0,0,0,0
4249,/homes/okerinde/all_dataset_tiff/All_undamaged...,All_undamaged_tiff,359,95,44,22,3,0,0,2,...,0,0,0,0,0,0,0,0,0,0
4250,/homes/okerinde/all_dataset_tiff/All_undamaged...,All_undamaged_tiff,344,101,41,12,6,6,2,7,...,0,0,0,0,0,0,0,0,0,0
4251,/homes/okerinde/all_dataset_tiff/All_undamaged...,All_undamaged_tiff,399,92,21,6,2,5,0,0,...,0,0,0,0,0,0,0,0,0,0
4252,/homes/okerinde/all_dataset_tiff/All_undamaged...,All_undamaged_tiff,450,65,8,1,1,0,0,0,...,0,0,0,0,0,0,0,0,0,0


In [9]:
# types
set_option('display.max_rows', 500)
dataset.dtypes

Path                                          object
Class                                         object
ChebyshevFourierCoefficientHistogram Bin00     int64
ChebyshevFourierCoefficientHistogram Bin01     int64
ChebyshevFourierCoefficientHistogram Bin02     int64
                                               ...  
ZernikeMoments_FFT Z_15_07                     int64
ZernikeMoments_FFT Z_15_09                     int64
ZernikeMoments_FFT Z_15_11                     int64
ZernikeMoments_FFT Z_15_13                     int64
ZernikeMoments_FFT Z_15_15                     int64
Length: 1027, dtype: object

In [10]:
# describe data
set_option('precision', 3)
dataset.describe()

Unnamed: 0,ChebyshevFourierCoefficientHistogram Bin00,ChebyshevFourierCoefficientHistogram Bin01,ChebyshevFourierCoefficientHistogram Bin02,ChebyshevFourierCoefficientHistogram Bin03,ChebyshevFourierCoefficientHistogram Bin04,ChebyshevFourierCoefficientHistogram Bin05,ChebyshevFourierCoefficientHistogram Bin06,ChebyshevFourierCoefficientHistogram Bin07,ChebyshevFourierCoefficientHistogram Bin08,ChebyshevFourierCoefficientHistogram Bin09,...,ZernikeMoments_FFT Z_14_12,ZernikeMoments_FFT Z_14_14,ZernikeMoments_FFT Z_15_01,ZernikeMoments_FFT Z_15_03,ZernikeMoments_FFT Z_15_05,ZernikeMoments_FFT Z_15_07,ZernikeMoments_FFT Z_15_09,ZernikeMoments_FFT Z_15_11,ZernikeMoments_FFT Z_15_13,ZernikeMoments_FFT Z_15_15
count,4253.0,4253.0,4253.0,4253.0,4253.0,4253.0,4253.0,4253.0,4253.0,4253.0,...,4253.0,4253.0,4253.0,4253.0,4253.0,4253.0,4253.0,4253.0,4253.0,4253.0
mean,319.543,134.24,40.959,14.987,6.45,3.203,1.922,1.187,0.768,0.511,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
std,63.356,35.458,20.45,10.375,5.614,3.403,2.329,1.746,1.349,1.205,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
min,27.0,9.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
25%,279.0,115.0,26.0,7.0,2.0,1.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
50%,318.0,138.0,38.0,13.0,5.0,2.0,2.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
75%,359.0,158.0,53.0,20.0,9.0,4.0,3.0,2.0,2.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
max,513.0,250.0,120.0,104.0,50.0,60.0,33.0,36.0,12.0,30.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


In [11]:
# Let us check the number of damaged vs. non-damaged insulatorrs in the dataset
class_names = {'All_undamaged_tiff' : 'Not damaged', 'All_damaged_tiff' : 'damaged'}
print(dataset.Class.value_counts().rename(index = class_names))

Not damaged    2836
damaged        1417
Name: Class, dtype: int64


In [12]:
dataset.head()

Unnamed: 0,Path,Class,ChebyshevFourierCoefficientHistogram Bin00,ChebyshevFourierCoefficientHistogram Bin01,ChebyshevFourierCoefficientHistogram Bin02,ChebyshevFourierCoefficientHistogram Bin03,ChebyshevFourierCoefficientHistogram Bin04,ChebyshevFourierCoefficientHistogram Bin05,ChebyshevFourierCoefficientHistogram Bin06,ChebyshevFourierCoefficientHistogram Bin07,...,ZernikeMoments_FFT Z_14_12,ZernikeMoments_FFT Z_14_14,ZernikeMoments_FFT Z_15_01,ZernikeMoments_FFT Z_15_03,ZernikeMoments_FFT Z_15_05,ZernikeMoments_FFT Z_15_07,ZernikeMoments_FFT Z_15_09,ZernikeMoments_FFT Z_15_11,ZernikeMoments_FFT Z_15_13,ZernikeMoments_FFT Z_15_15
0,/homes/okerinde/all_dataset_tiff/All_damaged_t...,All_damaged_tiff,321,155,27,9,9,0,0,2,...,0,0,0,0,0,0,0,0,0,0
1,/homes/okerinde/all_dataset_tiff/All_damaged_t...,All_damaged_tiff,262,173,58,16,5,6,3,2,...,0,0,0,0,0,0,0,0,0,0
2,/homes/okerinde/all_dataset_tiff/All_damaged_t...,All_damaged_tiff,305,158,44,7,2,6,2,1,...,0,0,0,0,0,0,0,0,0,0
3,/homes/okerinde/all_dataset_tiff/All_damaged_t...,All_damaged_tiff,411,85,11,8,6,0,2,2,...,0,0,0,0,0,0,0,0,0,0
4,/homes/okerinde/all_dataset_tiff/All_damaged_t...,All_damaged_tiff,354,141,18,6,4,0,2,0,...,0,0,0,0,0,0,0,0,0,0


In [13]:
## The dataset is a bit unbalanced with the undamaged counts roughly doubling the damaged.

#### 3.2. Data Visualization
We skip this step

## 4. Data Preparation

In [14]:
#Checking for any null values and removing the null values'''
print('Null Values =',dataset.isnull().values.any())

Null Values = False


In [15]:
## Let's drop non-informative attribute: Path
dataset.drop('Path', axis=1, inplace=True)

In [16]:
# Let's check if everything is as expected
dataset.head()

Unnamed: 0,Class,ChebyshevFourierCoefficientHistogram Bin00,ChebyshevFourierCoefficientHistogram Bin01,ChebyshevFourierCoefficientHistogram Bin02,ChebyshevFourierCoefficientHistogram Bin03,ChebyshevFourierCoefficientHistogram Bin04,ChebyshevFourierCoefficientHistogram Bin05,ChebyshevFourierCoefficientHistogram Bin06,ChebyshevFourierCoefficientHistogram Bin07,ChebyshevFourierCoefficientHistogram Bin08,...,ZernikeMoments_FFT Z_14_12,ZernikeMoments_FFT Z_14_14,ZernikeMoments_FFT Z_15_01,ZernikeMoments_FFT Z_15_03,ZernikeMoments_FFT Z_15_05,ZernikeMoments_FFT Z_15_07,ZernikeMoments_FFT Z_15_09,ZernikeMoments_FFT Z_15_11,ZernikeMoments_FFT Z_15_13,ZernikeMoments_FFT Z_15_15
0,All_damaged_tiff,321,155,27,9,9,0,0,2,2,...,0,0,0,0,0,0,0,0,0,0
1,All_damaged_tiff,262,173,58,16,5,6,3,2,0,...,0,0,0,0,0,0,0,0,0,0
2,All_damaged_tiff,305,158,44,7,2,6,2,1,0,...,0,0,0,0,0,0,0,0,0,0
3,All_damaged_tiff,411,85,11,8,6,0,2,2,0,...,0,0,0,0,0,0,0,0,0,0
4,All_damaged_tiff,354,141,18,6,4,0,2,0,0,...,0,0,0,0,0,0,0,0,0,0


#### 4.2. Feature Selection

In [17]:

Y= dataset["Class"]
X = dataset.loc[:, dataset.columns != 'Class']


## 5. Evaluate Algorithms and Models

#### 5.1. Train Validation Split and Evaluation Metrics

In [18]:
# split out validation dataset for the end
Y = dataset["Class"]
X = dataset.loc[:, dataset.columns != 'Class']

In [19]:
# convert pandas dataframe to numpy
Y_numpy = Y.to_numpy()
X_numpy = X.to_numpy()

In [20]:
Y_numpy.size

4253

In [21]:
X_numpy.shape

(4253, 1025)

## 5.2. Checking Models and Algorithms

In [22]:
# test options for classification
num_folds = 5
seed = 7

In [23]:
kf = KFold(n_splits=num_folds, random_state=seed, shuffle=True)

In [24]:
kf.get_n_splits(X)

5

In [25]:
# # prepare Random Forest
# model_RF = RandomForestClassifier()

In [26]:
for i, (train_index, test_index) in enumerate(kf.split(X_numpy)):
    # prepare Random Forest
    model_RF = RandomForestClassifier()

    #print("TRAIN:", train_index, "TEST:", test_index)
    X_train, X_test = X_numpy[train_index], X_numpy[test_index]
    #print('X_train: \n', X_train)
    
    y_train, y_test = Y_numpy[train_index], Y_numpy[test_index]
    #print('y_train: \n', y_train)

    print (f'===============FOLD {i+1}================= \n')

    print('#training samples: ',len(X_train))
    print('#validation samples: ', len(X_test), '\n')

    # train the classifier
    model_RF.fit(X_train, y_train)

    # estimate accuracy on validation set
    #rescaledValidationX = scaler.transform(X_validation)
    rescaledValidationX = X_test
    predictions = model_RF.predict(rescaledValidationX)
    #print(accuracy_score(y_test, predictions))
    print('Confusion Matrix: \n', confusion_matrix(y_test, predictions))
    print('Classification Report: \n', classification_report(y_test, predictions))

    


#training samples:  3402
#validation samples:  851 

Confusion Matrix: 
 [[106 199]
 [ 19 527]]
Classification Report: 
                     precision    recall  f1-score   support

  All_damaged_tiff       0.85      0.35      0.49       305
All_undamaged_tiff       0.73      0.97      0.83       546

          accuracy                           0.74       851
         macro avg       0.79      0.66      0.66       851
      weighted avg       0.77      0.74      0.71       851


#training samples:  3402
#validation samples:  851 

Confusion Matrix: 
 [[ 92 187]
 [ 27 545]]
Classification Report: 
                     precision    recall  f1-score   support

  All_damaged_tiff       0.77      0.33      0.46       279
All_undamaged_tiff       0.74      0.95      0.84       572

          accuracy                           0.75       851
         macro avg       0.76      0.64      0.65       851
      weighted avg       0.75      0.75      0.71       851


#training samples:  3402
#val