In [1]:
import sys
sys.path.append('./helper_scripts/')# make script files in ml-ex3 accessible

In [2]:
from data_loaders import CIFAR10Loader

In [3]:
X_train, X_val, X_test, y_train, y_val, y_test = CIFAR10Loader('./data/CIFAR-10').get_processed_imgs(target_width = 32, target_height = 32, normalize=False)

CIFAR-10 dataset already downloaded, loading files from memory
loading training images and labels
loading test images and labels
done
processing training images
processing test images
done processing, creating train/val/test split


For the purposes of traditional feature extraction we do not need a validation set so let's append it to our training set:

In [4]:
import numpy as np

X_train = np.concatenate((X_train, X_val))
y_train = np.concatenate((y_train, y_val))

To evaluate a given feature we use a self written function that tests some classifiers and evaluates their performance on the passed feature:

In [5]:
from traditional_ml import evaluate_feature

# Color Histograms

In [6]:
from PIL import Image

def get_color_hist_from_image(image):
    res = Image.fromarray(image).convert('RGB')
    return res.histogram()

In [7]:
X_train = [get_color_hist_from_image(i) for i in X_train]
X_test = [get_color_hist_from_image(i) for i in X_test]

In [8]:
evaluate_feature(X_train, y_train, X_test, y_test)

Trying KNeighborsClassifier()
Fitted, best are {'clf__n_neighbors': 25, 'clf__weights': 'distance'} with cross val score of 0.29606000000000005.
Accuracy on Test Set is 0.2998
Trying MLPClassifier()




Fitted, best are {'clf__activation': 'tanh', 'clf__hidden_layer_sizes': (50,), 'clf__learning_rate': 'adaptive', 'clf__learning_rate_init': 0.01} with cross val score of 0.25926.
Accuracy on Test Set is 0.2702
Trying GaussianNB()
Fitted, best are {} with cross val score of 0.24009999999999998.
Accuracy on Test Set is 0.2481
Trying DecisionTreeClassifier()
Fitted, best are {'clf__criterion': 'gini', 'clf__splitter': 'best'} with cross val score of 0.19316.
Accuracy on Test Set is 0.1957
Trying RandomForestClassifier(random_state=123)
Fitted, best are {'clf__criterion': 'gini', 'clf__n_estimators': 100} with cross val score of 0.3256.
Accuracy on Test Set is 0.3296
Best Classifier is (RandomForestClassifier(random_state=123), {'clf__criterion': 'gini', 'clf__n_estimators': 100}) with an accuracy of 0.3296, predicting took 1.210282325744629 seconds and this whole process took 00:29:03.70


So using color histograms we can achieve an accuracy of 32.96% on this test set. How long does the RandomForest Model take to train?

In [10]:
from sklearn import ensemble
import time

cl = ensemble.RandomForestClassifier(random_state=123, criterion='gini', n_estimators=100)
ping = time.time()
cl = cl.fit(X_train, y_train)
print(f"Fit Time: {time.time()-ping} seconds")

Fit Time: 47.48492121696472 seconds
