# 1.4
we first need to establish a vocabulary of visual words. We will form this vocabulary by sampling many local features from our training set and then clustering them with kmeans. 
1. You WILL want to use KMeans functions available in sci-kit learn library. The number of kmeans clusters is the size of our vocabulary and the size of our features. For example, you might start by clustering many SIFT descriptors into k=50 clusters. This partitions the continuous, 128 dimensional SIFT feature space into 50 regions. For any new SIFT feature we observe, we can figure out which region it belongs to as long as we save the centroids of our original clusters. 
2. We simply count how many SIFT descriptors fall into each cluster in our visual word vocabulary. This is done by finding the nearest neighbor kmeans centroid for every SIFT feature. Thus, if we have a vocabulary of 50 visual words, and we detect 220 SIFT features in an image, our bag of SIFT representation will be a histogram of 50 dimensions where each bin counts how many times a SIFT descriptor was assigned to that cluster and sums to 220. The histogram should be normalized so that image size does not dramatically change the bag of feature magnitude. To do this, you will need to fill in the details of get_bags_of_sifts function in the util.py; be sure to follow suggestions in the starter code.

3. Plot average historam for every scene category. Average histograms can be obtained by simply averaging histograms for each training image. You should end up visualizing 15 average histograms which you should also submit as part of the writeup. Write a few sentences to describe how different are the histograms from different classes. Which classes you may believe to be hardest to separate (i.e., which you would be expect to be most confused) looking at these histograms.

In [36]:
import numpy as np
import os
import glob
from sklearn.cluster import KMeans
import matplotlib.pyplot as plt
from sklearn.metrics import confusion_matrix

def build_vocabulary(image_paths, vocab_size):
    """ Sample SIFT descriptors, cluster them using k-means, and return the fitted k-means model.
    NOTE: We don't necessarily need to use the entire training dataset. You can use the function
    sample_images() to sample a subset of images, and pass them into this function.

    Parameters
    ----------
    image_paths: an (n_image, 1) array of image paths.
    vocab_size: the number of clusters desired.
    
    Returns
    -------
    kmeans: the fitted k-means clustering model.
    """
    n_image = len(image_paths)

    # Since want to sample tens of thousands of SIFT descriptors from different images, we
    # calculate the number of SIFT descriptors we need to sample from each image.
    n_each = int(np.ceil(10000 / n_image))

    # Initialize an array of features, which will store the sampled descriptors
    # keypoints = np.zeros((n_image * n_each, 2))
    descriptors = np.zeros((n_image * n_each, 128))

    for i, path in enumerate(image_paths):
        # Load features from each image
        features = np.loadtxt(path, delimiter=',',dtype=float)
        sift_keypoints = features[:, :2]
        sift_descriptors = features[:, 2:]

        # TODO: Randomly sample n_each descriptors from sift_descriptor and store them into descriptors
        size = min(n_each,sift_descriptors.shape[0])
        random_index = np.random.choice(sift_descriptors.shape[0], size, replace = False)

        
        for j, index in enumerate(random_index):
            descriptors[i*n_each+j]=sift_descriptors[index]
        

    # TODO: pefrom k-means clustering to cluster sampled sift descriptors into vocab_size regions.
    # You can use KMeans from sci-kit learn.
    # Reference: https://scikit-learn.org/stable/modules/generated/sklearn.cluster.KMeans.html
    kmeans = KMeans(n_clusters=vocab_size).fit(descriptors)
    return kmeans
    
def get_bags_of_sifts(image_paths, kmeans):
    """ Represent each image as bags of SIFT features histogram.

    Parameters
    ----------
    image_paths: an (n_image, 1) array of image paths.
    kmeans: k-means clustering model with vocab_size centroids.

    Returns
    -------
    image_feats: an (n_image, vocab_size) matrix, where each row is a histogram.
    """
    n_image = len(image_paths)
    vocab_size = kmeans.cluster_centers_.shape[0]

    image_feats = np.zeros((n_image, vocab_size))

    for i, path in enumerate(image_paths):
        # Load features from each image
        features = np.loadtxt(path, delimiter=',',dtype=float)

        # TODO: Assign each feature to the closest cluster center
        # Again, each feature consists of the (x, y) location and the 128-dimensional sift descriptor
        # You can access the sift descriptors part by features[:, 2:]
        
        sift_descriptors = features[:, 2:]
        for sd in sift_descriptors:
            pos = kmeans.predict([sd])
            image_feats[i][pos] += 1 
        # TODO: Build a histogram normalized by the number of descriptors

        #normalize
        image_feats[i] = image_feats[i]/np.sum(image_feats[i,:])
        
    return image_feats


def avg_histogram(imgfeats,lables,category):
    """
        plot the histogram base on the image_feats and array of class lables
    """
    print(imgfeats)
    avg_y = np.zeros((15,imgfeats.shape[1]))
    print(avg_y.shape)
    bag, count = np.unique(lables, return_counts = True)
    for i, lable in enumerate(lables):
        index = np.where(bag == lable)
        avg_y[index] += imgfeats[i]
    
    # avg: divide the whole count
    for j in range(15):
        avg_y[j] = avg_y[j]/count[j]
        
    for k in range(15):
        plt.hist(np.arange(imgfeats[1]), avg_y[0], facecolor='orange')
        plt.title("Average Histogram for:" + category[i])
        plt.savefig('avghistogramFor/' + category[i] + '.png')
        plt.show()
    
    return None


def get_category(ds_path):
    classes = glob.glob(os.path.join(ds_path, "*"))
    category = []
    for c in classes:
#         print(c.split("\\")[1])
        ctg = str(c.split("\\")[1])
        category.append(ctg)
    return category
    

def plot_confusion_matrix(y_true, y_pred,normalize,title):
    # ref from https://scikit-learn.org/stable/modules/generated/sklearn.metrics.plot_confusion_matrix.html#sklearn.metrics.plot_confusion_matrix
    #https://scikit-learn.org/stable/modules/generated/sklearn.metrics.confusion_matrix.html?highlight=confusion_matrix#sklearn.metrics.confusion_matrix
     
    cm = confusion_matrix(y_true,y_pred,title=title)
    print(cm)
    
    accuracy = sum(cm.diagonal())/cm.sum()
    
    plt.imshow(cm, interpolation="nearest")
    
    plt.title(title+accuracy)
    
    fmt = '.2f' if normalize else 'd'
    thresh = cm.max() / 2.
    for i, j in itertools.product(range(cm.shape[0]), range(cm.shape[1])):
        plt.text(j, i, format(cm[i, j], fmt),
                 horizontalalignment="center",
                 color="white" if cm[i, j] > thresh else "black")

    plt.ylabel('True label')
    plt.xlabel('Predicted label')
    plt.savefig("confusionmatrix optput"+ title+accuracy+".png")

def load(ds_path):
    """ Load from the training/testing dataset.

    Parameters
    ----------
    ds_path: path to the training/testing dataset.
             e.g., sift/train or sift/test 
    
    Returns
    -------
    image_paths: a (n_sample, 1) array that contains the paths to the descriptors. 
    labels: class labels corresponding to each image
    """
    # Grab a list of paths that matches the pathname
    files = glob.glob(os.path.join(ds_path, "*", "*.txt"))
    n_files = len(files)
    image_paths = np.asarray(files)
 
    # Get class labels
    classes = glob.glob(os.path.join(ds_path, "*"))
    labels = np.zeros(n_files)

    for i, path in enumerate(image_paths):
        folder, fn = os.path.split(path)
        labels[i] = np.argwhere(np.core.defchararray.equal(classes, folder))[0,0]

    # Randomize the order
    idx = np.random.choice(n_files, size=n_files, replace=False)
    image_paths = image_paths[idx]
    labels = labels[idx]

    return image_paths, labels


if __name__ == "__main__":
    paths, labels = load("sift/train")
    #build_vocabulary(paths, 10)


In [40]:
#  get_category("sift/train")

# 4.5
You should now measure how well your bag of SIFT representation works when paired with a nearest neighbor classifier. There are many design decisions and free parameters for the bag of SIFT representation so accuracy might vary from 30% to 40%. To implement this part follow the instructions in classifiers.py. Again, you will want to use sci-kit learn library instead of coding this from scratch yourself.

To measure performance of nearest neighbor classifier report clssification accuracy and plot confusion matrix (which should be of size 15x15). You will be required to hand in both of these measures as part of your PDF writeup. Experiment with the size of k, how does this effect the performance of the classifier?

In [41]:
 #Starter code prepared by Borna Ghotbi for computer vision
 #based on MATLAB code by James Hay

'''This function will predict the category for every test image by finding
the training image with most similar features. Instead of 1 nearest
neighbor, you can vote based on k nearest neighbors which will increase
performance (although you need to pick a reasonable value for k). '''

def nearest_neighbor_classify(k,train_image_feats, train_labels, test_image_feats):

    '''
    Parameters
        ----------
        train_image_feats:  is an N x d matrix, where d is the dimensionality of the feature representation.
        train_labels: is an N x l cell array, where each entry is a string 
        			  indicating the ground truth one-hot vector for each training image.
    	test_image_feats: is an M x d matrix, where d is the dimensionality of the
    					  feature representation. You can assume M = N unless you've modified the starter code.
        
    Returns
        -------
    	is an M x l cell array, where each row is a one-hot vector 
        indicating the predicted category for each test image.

    Usefull funtion:
    	
    	# You can use knn from sci-kit learn.
        # Reference: https://scikit-learn.org/stable/modules/generated/sklearn.neighbors.KNeighborsClassifier.html
    '''
    # make k is 4
#     k = 4
    knn = KNeighborsClassifier(n_neighbors=k).fit(train_image_feats,train_labels)
    predicted_labels = knn.predict(test_image_feats)
    return predicted_labels



'''This function will train a linear SVM for every category (i.e. one vs all)
and then use the learned linear classifiers to predict the category of
very test image. Every test feature will be evaluated with all 15 SVMs
and the most confident SVM will "win". Confidence, or distance from the
margin, is W*X + B where '*' is the inner product or dot product and W and
B are the learned hyperplane parameters. '''

def svm_classify(train_image_feats, train_labels, test_image_feats):

    '''
    Parameters
        ----------
        train_image_feats:  is an N x d matrix, where d is the dimensionality of the feature representation.
        train_labels: is an N x l cell array, where each entry is a string 
        			  indicating the ground truth one-hot vector for each training image.
    	test_image_feats: is an M x d matrix, where d is the dimensionality of the
    					  feature representation. You can assume M = N unless you've modified the starter code.
        
    Returns
        -------
    	is an M x l cell array, where each row is a one-hot vector 
        indicating the predicted category for each test image.

    Usefull funtion:
    	
    	# You can use svm from sci-kit learn.
        # Reference: https://scikit-learn.org/stable/modules/svm.html

    '''
    svmclassfier = OneVsRestClassifier(svm.SVC(kernel="linear", C=150)).fit(train_image_feats,train_labels)
    predicted_labels = svmclassfier.predict(test_image_feats)
    
    return predicted_labels



# 4.6

1. There are numerous methods to learn linear classifiers but we will find linear decision boundaries with a support vector machine. You do not have to implement the support vector machine. However, linear classifiers are inherently binary and we have a 15-way classification problem. To decide which of 15 categories a test case belongs to, you will train 15 binary, 1-vs-all SVMs. 1-vs-all means that each classifier will be trained to recognize 'forest' vs 'non-forest', 'kitchen' vs 'non-kitchen', etc. All 15 classifiers will be evaluated on each test case and the classifier which is most confidently positive "wins". E.g. if the 'kitchen' classifier returns a score of -0.2 (where 0 is on the decision boundary), and the 'forest' classifier returns a score of -0.3, and all of the other classifiers are even more negative, the test case would be classified as a kitchen even though none of the classifiers put the test case on the positive side of the decision boundary. When learning an SVM, you have a free parameter 'C' which controls how strongly regularized the model is. Your accuracy will be very sensitive to C, so be sure to test differnt values. See classifiers.py for more details.

2. Now you can evaluate the bag of SIFT representation paired with 1-vs-all linear SVMs. Accuracy should be from 40% to 50% depending on the parameters. To measure performance of the SVM classifier report clssification accuracy and plot confusion matrix (which should be of size 15x15) again. You will be required to hand in both of these measures as part of your PDF writeup. Experiment with the parameter 'C' and report your experience.

In [42]:
#Starter code prepared by Borna Ghotbi, Polina Zablotskaia, and Ariel Shann for Computer Vision
#based on a MATLAB code by James Hays and Sam Birch 

import numpy as np
# from util import load, build_vocabulary, get_bags_of_sifts
from classifiers import nearest_neighbor_classify, svm_classify

#For this assignment, you will need to report performance for sift features on two different classifiers:
# 1) Bag of sift features and nearest neighbor classifier
# 2) Bag of sift features and linear SVM classifier

#For simplicity you can define a "num_train_per_cat" vairable, limiting the number of
#examples per category. num_train_per_cat = 100 for intance.

#Sample images from the training/testing dataset. 
#You can limit number of samples by using the n_sample parameter.

print('Getting paths and labels for all train and test data\n')
train_image_paths, train_labels = load("sift/train")
test_image_paths, test_labels = load("sift/test")
# print(train_labels)
# print(test_labels)

''' Step 1: Represent each image with the appropriate feature
 Each function to construct features should return an N x d matrix, where
 N is the number of paths passed to the function and d is the 
 dimensionality of each image representation. See the starter code for
 each function for more details. '''

        
# print('Extracting SIFT features\n')
# #TODO: You code build_vocabulary function in util.py
kmeans = build_vocabulary(train_image_paths, vocab_size=200)
# print("Done SIPFT")
# #TODO: You code get_bags_of_sifts function in util.py 
train_image_feats = get_bags_of_sifts(train_image_paths, kmeans)
test_image_feats = get_bags_of_sifts(test_image_paths, kmeans)
        
#If you want to avoid recomputing the features while debugging the
#classifiers, you can either 'save' and 'load' the extracted features
#to/from a file.

np.save("train_image_feats", train_image_feats)
np.save("test_image_feats", test_image_feats)
print("Done save")
train_image_feats = np.load("train_image_feats.npy")
test_image_feats = np.load("test_image_feats.npy")
print("show the histogram")
category = get_category("sift/train")
avg_histogram(test_image_feats,test_labels,category)
print("Done histogram")

# ''' Step 2: Classify each test image by training and using the appropriate classifier
#  Each function to classify test features will return an N x l cell array,
#  where N is the number of test cases and each entry is a string indicating
#  the predicted one-hot vector for each test image. See the starter code for each function
#  for more details. '''

print('Using nearest neighbor classifier to predict test set categories\n')
#TODO: YOU CODE nearest_neighbor_classify function from classifers.py
pred_labels_knn = nearest_neighbor_classify(4,train_image_feats, train_labels, test_image_feats)
  

print('Using support vector machine to predict test set categories\n')
#TODO: YOU CODE svm_classify function from classifers.py
pred_labels_svm = svm_classify(train_image_feats, train_labels, test_image_feats)



print('---Evaluation---\n')
# Step 3: Build a confusion matrix and score the recognition system for 
#         each of the classifiers.
# TODO: In this step you will be doing evaluation. 
# 1) Calculate the total accuracy of your model by counting number
#   of true positives and true negatives over all. 
# 2) Build a Confusion matrix and visualize it. 
#   You will need to convert the one-hot format labels back
#   to their category name format.
knn_correct = np.sum(pred_labels_knn == test_labels)
svm_correct = np.sum(pred_labels_svm == test_labels)

print("KNN Accurracy:", knn_correct / len(test_labels))
print("SVM Accurracy:", svm_correct / len(test_labels))

knn_cm = plot_confusion_matrix(test_labels, pred_labels_knn, "KNN Accurracy:")
svn_cm = plot_confusion_matrix(test_labels, pred_labels_svm, "SVM Accurracy:")

# Interpreting your performance with 100 training examples per category:
#  accuracy  =   0 -> Your code is broken (probably not the classifier's
#                     fault! A classifier would have to be amazing to
#                     perform this badly).
#  accuracy ~= .10 -> Your performance is chance. Something is broken or
#                     you ran the starter code unchanged.
#  accuracy ~= .40 -> Rough performance with bag of SIFT and nearest
#                     neighbor classifier. 
#  accuracy ~= .50 -> You've gotten things roughly correct with bag of
#                     SIFT and a linear SVM classifier.
#  accuracy >= .60 -> You've added in spatial information somehow or you've
#                     added additional, complementary image features. This
#                     represents state of the art in Lazebnik et al 2006.
#  accuracy >= .85 -> You've done extremely well. This is the state of the
#                     art in the 2010 SUN database paper from fusing many 
#                     features. Don't trust this number unless you actually
#                     measure many random splits.
#  accuracy >= .90 -> You used modern deep features trained on much larger
#                     image databases.
#  accuracy >= .96 -> You can beat a human at this task. This isn't a
#                     realistic number. Some accuracy calculation is broken
#                     or your classifier is cheating and seeing the test
#                     labels.


Getting paths and labels for all train and test data



KeyboardInterrupt: 

I ran it and waited about 1 hour, no response, it was just frozen......I tried to run just one function, but still nothing responded.So I submit it after I code the needed function without the result graphs.