    Instructions:
    1. Generate the vector database if not already generated.
    2. Press Run All (or restart kernel and run all cells).
    3. You will be prompted to provide input values.


    Task 3: Implement a program which,
    – given even-numbered Caltec101 images,
    ∗ creates an m-NN classifer (for a user specified m),
    ∗ creates a decision-tree classifier,
    ∗ creates a PPR based classifier.
    For this task, you can use feature space of your choice.
    – for the odd numbered images, predicts the most likely labels using the user selected classifier.
    The system should also output per-label precision, recall, and F1-score values as well as output an overall accuracy
    value.

In [13]:
FEATURE_SPACE = input("Provide a feature space [color, hog, avgpool, layer3, fc, resnet].")

CLASSIFIER_TYPE = input("Provide the type of classifier you want to use [knn, ppn, decision_tree]")

In [14]:
def filter_data(feature_vectors, image_ids):
    filtered_data = {}
    for i in feature_vectors:
        feature_vector = feature_vectors[i]
        if i in image_ids:
            filtered_data[i] = feature_vector
        if len(filtered_data) == len(image_ids):
            break
    return filtered_data

In [15]:
from utils.database_utils import retrieve, store, exists
train_data = retrieve(f'{FEATURE_SPACE}.pt')
test_data = retrieve(f'odd{FEATURE_SPACE}.pt')

In [16]:
if CLASSIFIER_TYPE == "knn":
    from classifiers.knn import KNNClassifier
    K = int(input("Provide a value for k to use in the k-NN classifier"))
    classifier = KNNClassifier(K, train_data)
    out_statement = "k-NN Classifier for k = " + str(K)

elif CLASSIFIER_TYPE == "ppn":
    N = 5
    M = 4
    damping_factor = float(input("Provide a value for PPR Random Jump Probability"))
    from classifiers.ppn import PPNClassifier
    classifier = PPNClassifier(N, M, train_data, feature_space=FEATURE_SPACE, damping_factor=damping_factor)
    out_statement = "Personalized Page Rank Classifier with M = {}, N = {}, and Random Jump Probability = {}".format(M, N, damping_factor)

elif CLASSIFIER_TYPE == 'decision_tree':
    if exists(f'{FEATURE_SPACE}_decision_tree_classifier.pt'):
        classifier = retrieve(f'{FEATURE_SPACE}_decision_tree_classifier.pt')
        IG_MODE = classifier.mode
        MIN_SAMPLE = classifier.mimimum_samples_leaf
        MAX_DEPTH = 0 if classifier.max_depth == float('inf') else classifier.max_depth
    else:
        print("Training Decision Tree Classifier for ", FEATURE_SPACE)
        from classifiers.decision_tree import DecisionTreeClassifier
        IG_MODE = input("Provide information gain measurement mode - gini or entropy")
        MIN_SAMPLE = int(input("Provide minimum sample size for a node to split"))
        MAX_DEPTH = int(input("Provide maximum depth of the tree. For unbounded depth enter 0"))
        classifier = DecisionTreeClassifier(mode=IG_MODE, mimimum_samples_leaf=MIN_SAMPLE, max_depth=MAX_DEPTH)
        classifier.fit(train_data)
        store(classifier, f'{FEATURE_SPACE}_decision_tree_classifier.pt')
    out_statement = f"Decision Tree Classifier with Mode = {IG_MODE}, Min-Sample-Size= {MIN_SAMPLE}, and Max-Depth = {MAX_DEPTH}"


print("Running Predictions for " + out_statement)
# Cached results retrieval, or run again.
if CLASSIFIER_TYPE == "ppn" and exists(f'{FEATURE_SPACE}_ppn_predictions.pt'):
    output_labels = retrieve(f'{FEATURE_SPACE}_ppn_predictions.pt')
    print(output_labels)
    for opl in output_labels:
        print(f"IMG_ID: {opl['img_id']}\tTrue label: {opl['test_label']}\tPred labels: {opl['predicted_label']}")
else:
    output_labels = classifier.classify(test_data)
    if CLASSIFIER_TYPE == "ppn":
        store(output_labels, f'{FEATURE_SPACE}_ppn_predictions.pt')

Running Predictions for Decision Tree Classifier with Mode = gini, Min-Sample-Size= 25, and Max-Depth = 0
IMG_ID: 1	True label: Faces	Pred label: Faces
IMG_ID: 3	True label: Faces	Pred label: Faces_easy
IMG_ID: 5	True label: Faces	Pred label: Faces
IMG_ID: 7	True label: Faces	Pred label: Faces
IMG_ID: 9	True label: Faces	Pred label: Faces
IMG_ID: 11	True label: Faces	Pred label: Faces
IMG_ID: 13	True label: Faces	Pred label: Faces_easy
IMG_ID: 15	True label: Faces	Pred label: Faces
IMG_ID: 17	True label: Faces	Pred label: Faces
IMG_ID: 19	True label: Faces	Pred label: Faces
IMG_ID: 21	True label: Faces	Pred label: Faces
IMG_ID: 23	True label: Faces	Pred label: Faces
IMG_ID: 25	True label: Faces	Pred label: Faces
IMG_ID: 27	True label: Faces	Pred label: Faces
IMG_ID: 29	True label: Faces	Pred label: Faces
IMG_ID: 31	True label: Faces	Pred label: Faces
IMG_ID: 33	True label: Faces	Pred label: Faces
IMG_ID: 35	True label: Faces	Pred label: Faces
IMG_ID: 37	True label: Faces	Pred label: Fa

In [17]:
from utils.perf_utils import PerLabelPerf
from utils.dataset_utils import initialize_dataset
all_labels = initialize_dataset().categories
perf_calc = PerLabelPerf(all_labels)

for tuple in output_labels:
    perf_calc.process_perf(y=tuple["test_label"], y_pred=tuple["predicted_label"])

Downloading dataset if not present.
Files already downloaded and verified


In [18]:
from utils.query_input_processor import align_print

print("Per-label Precision, Recall, F1 Scores for", out_statement + ":\n")
max_label_len = max([len(label) for label in all_labels])
decimals = 4

for label in all_labels:
    precision = str(round(perf_calc.get_precision(label), decimals))
    recall = str(round(perf_calc.get_recall(label), decimals))
    f1_score = str(round(perf_calc.get_f1_score(label), decimals))
    print(f"Label: {align_print(label, max_label_len)}\tPrecision: {align_print(precision, decimals+2)}\tRecall: {align_print(recall, decimals+2)} \t F1 score:{align_print(f1_score, decimals+2)}\t")

print("\n\nOverall accuracy for " + out_statement + f": {perf_calc.get_overall_accuracy() * 100}%")

Per-label Precision, Recall, F1 Scores for Decision Tree Classifier with Mode = gini, Min-Sample-Size= 25, and Max-Depth = 0:

Label: Faces          	Precision: 0.7802	Recall: 0.9816 	 F1 score:0.8694	
Label: Faces_easy     	Precision: 0.9954	Recall: 0.9862 	 F1 score:0.9908	
Label: Leopards       	Precision: 0.9057	Recall: 0.96   	 F1 score:0.932 	
Label: Motorbikes     	Precision: 0.9872	Recall: 0.9699 	 F1 score:0.9785	
Label: accordion      	Precision: 0.9259	Recall: 0.9259 	 F1 score:0.9259	
Label: airplanes      	Precision: 0.9632	Recall: 0.9825 	 F1 score:0.9728	
Label: anchor         	Precision: 0.0526	Recall: 0.0476 	 F1 score:0.05  	
Label: ant            	Precision: 0.8333	Recall: 0.7143 	 F1 score:0.7692	
Label: barrel         	Precision: 0.9524	Recall: 0.8333 	 F1 score:0.8889	
Label: bass           	Precision: 0.4   	Recall: 0.5185 	 F1 score:0.4516	
Label: beaver         	Precision: 0.3415	Recall: 0.6087 	 F1 score:0.4375	
Label: binocular      	Precision: 0.9412	Recall: