# Grupo A3 - Classification using feature vectors

 In this project we will extract features from the shapes present in the images in order to correctly classify the EMNIST dataset. 
 
The features to be extracted are:

TODO: choose features
- area of the black pixels (foreground/shape)
- area of the white pixels (background)
- perimeter of the shape
- bounding-box area
- major axis length
- solidity
- extent
- equivalent-diameter
- centroid (x and y coordinates)
- convex_area

The classifiers that will be used are: 
- Nearest Neighbors
- SVC (SVM for classification, C-Support Vector Classification.)
- Gaussian Process
- Decision Tree
- Random Forest
- MLP (Multi-Layer Perceptron)
- Ada Boost
- Gaussian NB
- Quadratic Discriminant Analysis

We will use the classifiers from scikit-learn and follow the scheme present in 

#  Loading Data

## Imports

In [1]:
import sys
sys.path.append('../../')

from sarpy.datasets import load_emnist
%matplotlib inline

import matplotlib.pyplot as plt
import numpy as np

## Loading data set and binarizing the images

In [2]:
X_train, y_train, X_test, y_test, X_valid, y_valid, mapping, nb_classes = load_emnist('balanced', validation=True)

X_train = X_train > 0
X_train = X_train.astype(int)
X_train = np.squeeze(X_train)

X_test = X_test > 0
X_test = X_test.astype(int)
X_test = np.squeeze(X_test)

X_valid = X_valid > 0
X_valid = X_valid.astype(int)
X_valid = np.squeeze(X_valid)

reshape training: 100%|██████████| 112800/112800 [00:03<00:00, 32992.10it/s]
reshape testing: 100%|██████████| 18800/18800 [00:00<00:00, 34694.48it/s]


Train size: 94000
Test size: 18800
Validation size: 18800
# classes: 47


# Extracting Features

## Imports

In [3]:
from skimage.measure import regionprops

## Feature extraction

In [4]:
def feature_extraction(image):
    features = []*11
    props = regionprops(image, coordinates='rc', cache =  True)
    
    features.append(props[0].area)
    features.append(len(image)*len(image[0]) - props[0].area)
    features.append(props[0].perimeter)
    features.append(props[0].bbox_area)
    features.append(props[0].major_axis_length)
    features.append(props[0].solidity)
    features.append(props[0].extent)
    features.append(props[0].equivalent_diameter)
    features.append(props[0].centroid[0])
    features.append(props[0].centroid[1])
    features.append(props[0].convex_area)
    return features


In [None]:
f_train = []
f_test = []
f_valid = []

#isso está demorando bastante 
for i in range(0, len(X_train)):
    f_train.append(feature_extraction(X_train[i]))
    
print("Created train feature vector")

for i in range(0, len(X_test)):
    f_test.append(feature_extraction(X_test[i]))
    
print("Created test feature vector")

for i in range(0, len(X_valid)):
    f_valid.append(feature_extraction(X_valid[i]))
    
print("Created valid feature vector")



# for i in range(0, len(f)):
#     print(f[i])


# Fitting Classifiers

# Results