<img align="center" src="figures/course.png" width="800">

#                                    16720 (B) Bag of Visual Words - Assignment 2

     Instructor: Kris Kitani                      TAs: Sheng-Yu, Jinkun, Rawal, Arka, Rohan

## Building a Recognition System
We have formed a convenient way to represent images for recognition. We will now produce a basic recognition system with spatial pyramid matching. The goal of the system is presented below,
given an image, classify (\ie recognize/name) the scene where the image was taken. 

<img align="center" src="figures/teaser/teaser.png" width="800">

Traditional classification problems follow two phases: training and testing.
At training time, the computer is given a pile of formatted data (\ie, a collection
of feature vectors) with corresponding labels (\eg, ``desert``, ``kitchen``) and
then builds a model of how the data relates to the labels:
``if green, then kitchen``. At test time, the computer takes features and uses these rules to infer the label:
\eg, ``this is green, therefore it is kitchen``.

In this assignment, we will use the simplest classification model: nearest neighbor.
At test time, we will simply look at the query's nearest neighbor in the training set
and transfer that label. In this example, you will be looking
at the query image and 
looking up its nearest neighbor in a collection of training images whose labels are already known. This approach works
surprisingly well given a huge amount of data, \eg, a very cool graphics applications from [1]. 

The key components of any nearest-neighbor system are: 
* $features$ (how do you represent your instances?)
* $similarity$ (how do you compare instances in the feature space?)

You will implement both in this section.

In [1]:
import nbimporter
import numpy as np
import skimage
import multiprocess
import threading
import queue
import os,time
import math
from ipynb.fs.defs.p1 import get_visual_words

## For Autograding P2, ensure uploading `trained_system.npz`

### Q2.1 (10 Points -> 5 Autograder + 5 WriteUp)
We will first represent an image with a bag of words approach. In each image, we simply look at
how often each word appears. Write the function
```
            def get_feature_from_wordmap(wordmap,dict_size):
```
that extracts the histogram (Look into ``numpy.histogram()``) of visual words within the given image
(\ie, the bag of visual words). 
As inputs, the function will take:

* $wordmap$ is a $H$ $\times$ $W$ image containing the IDs of the visual words
* $dict\_size$ is the maximum visual word ID (\ie, the number of visual words, the dictionary size). Notice that your histogram should have $dict\_size$ bins, corresponding to how often that each word occurs. 

As output, the function will return $hist$, a $dict\_size$ histogram that is $L_1$ normalized, (i.e., the sum equals $1$).

In [2]:
from numpy import histogram
from matplotlib import pyplot as plt

def get_feature_from_wordmap(wordmap, dict_size):
    '''
    Compute histogram of visual words.

    [input]
    * wordmap: numpy.ndarray of shape (H,W)
    * dict_size: dictionary size K

    [output]
    * hist: numpy.ndarray of shape (K)
    '''
    
    '''
    HINTS:
    (1) We can use np.histogram with flattened wordmap
    '''
    # ----- TODO -----
    # YOUR CODE HERE
    hist = np.histogram(np.ravel(wordmap), dict_size) #np.ravel is for flattening the wordmap
    if not np.sum(hist[0]) == 0:
        hist = hist[0]/np.sum(hist[0])
    return hist
    # For Plotting
    plt.hist(np.ravel(wordmap), dict_size)
    plt.xlabel("Dict classes")
    plt.ylabel("Frq of pixels")
    plt.title('aquarium/sun_aairflxfskjrkepm.jpg')
    plt.show()
    # raise NotImplementedError()


In [3]:
# import cv2
# import numpy as np
# path_img = "./data/aquarium/sun_aairflxfskjrkepm.jpg"
# image = cv2.imread(path_img)
# image = image.astype('float')/255
# dict = np.load('./dictionary.npy')
# words = get_visual_words(image,dict)
# get_feature_from_wordmap(words,200)

<font color="blue">**For 5 Images, load visual word maps, visualize their histogram, and include it in the write up.**</font> This will help you verifying that your function is working correctly before proceeding.

### Multi-resolution: Spatial Pyramid Matching

Bag of words is simple and efficient, but it discards information about the spatial structure of the image and this information is often valuable. One way to alleviate this issue is to use spatial pyramid matching [2]. The general idea is to divide the image into a small number of cells, and concatenate the histogram of each of these cells to the histogram of the original image, with a suitable weight. 

Here we will implement a popular scheme that chops the image into $2^l\times2^l$ cells where $l$ is the layer number. We treat each cell as a small image and count how often each visual word appears. This results in a histogram for every single cell in every layer. Finally to represent the entire image, we concatenate all the histograms together. If there are $L+1$ layers and $K$ visual words, the resulting vector has dimensionality $K\sum_{l=0}^L{4^l} = K\left(4^{(L+1)}-1\right)/3$.

Now comes the weighting scheme. Note that when concatenating all the histograms, histograms from different levels are assigned different weights. Typically (and in the original work [2]), features from layer $l$ gets half the weight of features from layer $l+1$, with the exception of layer 0, which is assigned a weight equal to layer 1. A popular choice is for layer $0$ and layer $1$ the weight is set to $2^{-L}$, and for the rest it is set to $2^{l-L-1}$ (\eg, in a three layer spatial pyramid, $L=2$ and weights are set to $1/4$, $1/4$ and $1/2$ for layer 0, 1 and 2 respectively, see Fig. 7). Take level 2 as an example, there will be 16 histograms in total, each has a norm equal to one. You should concatenate these histograms, normalize this layer (multiply the concatenated vector by 1/16), and apply the 1/2 layer weight of level 2 on top of that. Note that following this operation, concatenating the weighted features of each layer will result in a final vector of norm equal to 1.

<img align="center" src="figures/spm.jpg" width="600">
<figcaption align = "center"><b>Figure 7. Spatial Pyramid Matching: From [2]. Toy example of a pyramid for L = 2. The image has three visual words, indicated by circles, diamonds, and crosses. We subdivide the image at three different levels of resolution. For each level of resolution and each channel, we count the features that fall in each spatial bin. Finally, weight each spatial histogram.}</b></figcaption>

In [4]:
# a = np.zeros((4,4))
# b = np.array([[1,2],[3,4],[5,6],[7,8]])
# c = np.array([[100],[101],[102],[103]])
# a[:,2:4] = b
# a[:,0:1] = c
# print (a)

In [5]:
# L = 1
# no_of_hists = int((4**(L + 1) -1)/3)
# hist_all = np.zeros((200, no_of_hists))
# print(hist_all.shape)

In [6]:
# a = np.array([[1, 2], [3, 4]])
# print(a)
# b = np.array([[5, 6]])
# print(b.T)
# np.concatenate((a, b.T), axis=-1)

In [7]:
# for i in range(0,282+1,93):
#     print (i)

### Q2.2.1 (15 Points Autograder)

Create a function that form a multi-resolution representation of the given image.
```
            def get_feature_from_wordmap_SPM(wordmap,layer_num,dict_size):
```
As inputs, the function will take:

* **layer_num** the number of layers in the spatial pyramid, \ie, $L+1$
* **wordmap** is a $H$ $\times$ $W$ image containing the IDs (\ie index) of the visual words
* **dict_size** is the maximum visual word ID (\ie, the number of visual words, the dictionary size)

As output, the function will return hist_all, a vector that is $L_1$ normalized. **Please use a 3-layer spatial pyramid ($L=2$) for all the following recognition tasks.**

One small hint for efficiency: a lot of computation can be saved if you first compute the histograms of the _finest layer_, because the histograms of coarser layers can then be aggregated from finer ones. Make sure you normalize the histogram after aggregation.

**Note for Autograder :** Ensure that final $hist\_all$ (the output of `get_feature_from_wordmap_SPM`) has histogram features arranged from Loweset Level (global features) to Highest Level (finest features). Example: the output array should **first contain the histogram for Level 0, followed by Level 1, and then Level 2**.

In [46]:
A = np.array([
   [[ 1,  2,  3],[ 4,  5,  6], [12, 34, 90]],
   [[ 1,  5,  6],[ 2,  5,  6], [ 7,  3,  4]],
   [[ 7,  1,  0],[ 3,  7,  1], [ 0,  2,  4]],
   [[ 1,  0,  3],[ 9,  2,  3], [ 1,  9,  6]]
])
print('A = ',A)
print('A ka shape = ',A.shape)

np.max(A[:,0:2,0:2], axis=(1,2))
# b = np.sum((A[:,0:2,0:2]),axis = 0)
# c = np.sum(b,axis = -1)
# # d = c.reshape((2,1,1))
# print('b = ',b)
# print('b ka shape = ',b.shape)
# print('c = ',c)
# print('c ka shape = ',c.shape)
# # print('d = ',d)
# # print('d ka shape = ',d.shape)

A =  [[[ 1  2  3]
  [ 4  5  6]
  [12 34 90]]

 [[ 1  5  6]
  [ 2  5  6]
  [ 7  3  4]]

 [[ 7  1  0]
  [ 3  7  1]
  [ 0  2  4]]

 [[ 1  0  3]
  [ 9  2  3]
  [ 1  9  6]]]
A ka shape =  (4, 3, 3)


9

In [9]:
# a = np.array([[1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16],[17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32]])
# print('a = ',a)
# print('a ka shape = ',a.shape)
# b = a.reshape((2,4,4))
# print('b = ',b)
# print('b ka shape = ', b.shape)
# print('new vis :,:,0= ', b[:,:,0])
# print('new vis :,:,1= ', b[:,:,1])
# c = np.sum((b[:,0:2,0:2]),axis=-1)
# print('c = ', c)
# print('c ka shape = ', c.shape)
# d = np.sum((c[:,:]),axis=-1)
# print('d = ', d)
# print('d ka shape = ', d.shape)
# e = d.reshape((2,1,1))
# print('e = ',e)

In [10]:
# from numpy import ppmt


# def get_feature_from_wordmap_SPM(wordmap, layer_num, dict_size):
#     # print('wordmap shape = ',  wordmap.shape)
#     # print('no of layers = ', layer_num)
#     # print('dictionary size  = ', dict_size)
#     '''
#     Compute histogram of visual words using spatial pyramid matching.

#     [input]
#     * wordmap: numpy.ndarray of shape (H,W)
#     * layer_num: number of spatial pyramid layers
#     * dict_size: dictionary size K

#     [output]
#     * hist_all: numpy.ndarray of shape (K*(4^layer_num-1)/3)
#     '''
#     '''
#     HINTS:
#     (1) Take care of Weights 
#     (2) Try to build the pyramid in Bottom Up Manner
#     (3) the output array should first contain the histogram for Level 0 (top most level) , followed by Level 1, and then Level 2.
#     '''
#     # ----- TODO -----
    
#     # print("h = ",h)
#     # print("w = ",w)
#     L = layer_num - 1
    
#     # print("patchheight =", patch_height)
#     # print("patchwidth =", patch_width)
#     hist = []
#     '''
#     HINTS:
#     1.> create an array of size (dict_size, (4**(L + 1) -1)/3) )
#     2.> pre-compute the starts, ends and weights for the SPM layers L 
#     '''
#     # YOUR CODE HERE
#     # no_of_hists = int((4**(L + 1) -1)/3)
#     # hist_all_testing = np.zeros((dict_size, no_of_hists))
#     # print('hist_all_testing shape = ',hist_all_testing.shape)
#     # hist_all = []
#     no_of_hists = int((4**(L + 1) -1)/3)
#     hist_all = np.zeros((dict_size, no_of_hists))
#     def getweightforcurrlayer(curr_layer_num, L):
#         if (curr_layer_num == 0) | (curr_layer_num == 1):
#             return 2**(-L)
#             # return 0.25
#         else:
#             return 2**(curr_layer_num - L - 1)
#     # raise NotImplementedError()
#     '''
#     HINTS:
#     1.> Loop over the layers from L to 0
#     2.> Handle the base case (Layer L) separately and then build over that
#     3.> Normalize each histogram separately and also normalize the final histogram
#     '''
#     # YOUR CODE HERE
#     # hist_all= []
#     for l in range(L):
#         h, w = wordmap.shape
#         patch_width = math.floor(w / (2**l))
#         patch_height = math.floor(h / (2**l))
#         curr_layer_hist = []
#         weight = getweightforcurrlayer(l, L)
#         for height in range(0,h-patch_height+1,patch_height):
#                 # height_iteration += 1
#                 for width in range(0,w-patch_width+1,patch_width):
#                     # width_iteration += 1
#                     curr_layer_hist.append(get_feature_from_wordmap(wordmap[height:height+patch_height+1, width:width+patch_width+1], dict_size))
#                     # curr_layer_hist = curr_layer_hist/np.sum(curr_layer_hist)
#                     # hist_all[:,int(((4**(L) -1)/3)+height*2**l+width)] = curr_layer_hist*weight*4**(-l)
#         curr_layer_hist = np.array(curr_layer_hist)
#         print('current layer hist ka shape = ', curr_layer_hist.shape)
#         # print('check1')
#         curr_layer_hist = curr_layer_hist.T
#         curr_layer_hist = curr_layer_hist*weight*4**(-l)       ## Normalizing
#         # print('current layer hist ka shape = ', len(curr_layer_hist))
#         # hist_all.append(curr_layer_hist)
#     hist_all = hist_all/np.sum(hist_all)
#     hist_all = np.ravel(hist_all)
#     # hist_all = np.array(hist_all)
#     # hist_all = np.reshape(hist_all,(200,-1))
#     return hist_all

In [11]:
# import numpy as np
# input = np.array([[0,0,1,1],[2,2,1,1],[2,2,0,1],[1,1,2,2]])
# # input = np.ones((500, 85))
# features = get_feature_from_wordmap_SPM(input, 3, 3)
# print(features, np.sum(features), features.shape)

In [37]:
from numpy import ppmt


def get_feature_from_wordmap_SPM(wordmap, layer_num, dict_size):
    # print('wordmap shape = ',  wordmap.shape)
    # print('no of layers = ', layer_num)
    # print('dictionary size  = ', dict_size)
    '''
    Compute histogram of visual words using spatial pyramid matching.

    [input]
    * wordmap: numpy.ndarray of shape (H,W)
    * layer_num: number of spatial pyramid layers
    * dict_size: dictionary size K

    [output]
    * hist_all: numpy.ndarray of shape (K*(4^layer_num-1)/3)
    '''
    '''
    HINTS:
    (1) Take care of Weights 
    (2) Try to build the pyramid in Bottom Up Manner
    (3) the output array should first contain the histogram for Level 0 (top most level) , followed by Level 1, and then Level 2.
    '''
    # ----- TODO -----
    
    # print("h = ",h)
    # print("w = ",w)
    L = layer_num - 1
    
    # print("patchheight =", patch_height)
    # print("patchwidth =", patch_width)
    hist = []
    '''
    HINTS:
    1.> create an array of size (dict_size, (4**(L + 1) -1)/3) )
    2.> pre-compute the starts, ends and weights for the SPM layers L 
    '''
    # YOUR CODE HERE
    no_of_hists = int((4**(L + 1) -1)/3)
    hist_all_testing = np.zeros((dict_size, no_of_hists))
    # print('hist_all_testing shape = ',hist_all_testing.shape)
    hist_all = []
    def getweightforcurrlayer(curr_layer_num, L):
        if (curr_layer_num == 0) | (curr_layer_num == 1):
            return 2**(-L)
            # return 0.25
        else:
            return 2**(curr_layer_num - L - 1)
    # raise NotImplementedError()
    '''
    HINTS:
    1.> Loop over the layers from L to 0
    2.> Handle the base case (Layer L) separately and then build over that
    3.> Normalize each histogram separately and also normalize the final histogram
    '''
    # YOUR CODE HERE
    hist_temp1 = []
    hist_temp2 = []
    hist_temp3 = []
    height_iteration = 0
    width_iteration = 0
    gridscovered = 0
    # for curr_layer_num in range(L,-1,-1):
    # # for curr_layer_num in [2,1]:
    #     print('the current layer is: ', curr_layer_num)
    #     weight = getweightforcurrlayer(curr_layer_num, L)
    #     # print('weight = ',weight)
    #     if curr_layer_num == L:
    #         print("i am in if")
    #         for height in range(0,h-1,patch_height):
    #             # height_iteration += 1
    #             for width in range(0,w-1,patch_width):
    #                 # width_iteration += 1
    #                 hist_temp1.append(get_feature_from_wordmap(wordmap[height:height+patch_height, width:width+patch_width], dict_size))
    #                 # print(hist_temp1.shape)
    #         # print("first time length of temp = ", len(hist_temp1))
    #         hist_temp1 = np.array(hist_temp1)
    #         # print(hist_temp1.shape)
    #         hist_temp1 = (1/2)*(1/16)*(hist_temp1.T)
    #         print("temp1 ka shape = ",hist_temp1.shape)
    #         # print("for layer = 2 after reshaping shape = ", hist_temp1.shape)
    #         # cols_to_fill = hist_temp1.shape[1]
    #         # hist_all[:,-cols_to_fill:] = hist_temp1
        
    #     # else:
    #     #     # print("I am here")
    #     #     for_calculation = hist_temp1.reshape(200,int(hist_temp1.shape[1]/4),int(hist_temp1.shape[1]/4))
    #     #     print("for calculation wala = ",for_calculation)
    #     #     # print("for calculation shape = ", for_calculation.shape)
    #     #     hist_temp1 = []
    #     #     # print(for_calculation.shape)
    #     #     # cur_row = 0
    #     #     # cur_col = 0
    #     #     # while not(gridscovered % L == 0):
    #     #     for cur_row in range(0,for_calculation.shape[1]-1,2):
    #     #         height_iteration += 1
    #     #         for cur_col in range(0,for_calculation.shape[2]-1,2):
    #     #             width_iteration += 1
    #     #             for i in range(0,dict_size):
    #     #                 hist_temp1.append(np.sum(for_calculation[i:i+1,cur_row:cur_row+2,cur_col:cur_col+2]))
                    
    #     #     print("no of iterations = ", height_iteration, width_iteration)
    #     #     print("second time length of temp = ", len(hist_temp1))
    #     #     # hist_temp1 = np.array(hist_temp1)
    #     #     print("new temp hist = ", hist_temp1)
    #     # hist_temp1 = hist_temp1.T
    #     # cols_to_fill = cols_to_fill + hist_temp1.shape[1]
    #     # hist_all[:,-cols_to_fill:-(cols_to_fill-hist_temp1.shape[1])] = hist_temp1

    #     elif curr_layer_num == 1:
    #         print("i am in elif")
    #         for i in [0,2,8,10]:
    #             for row in range(0, hist_temp1.shape[0]):
    #                 print("I am in elif for loop")
    #                 hist_temp2.append(np.sum((hist_temp1[row,i],hist_temp1[row,i+1],hist_temp1[row,i+4],hist_temp1[row,i+5])))
    #         hist_temp2 = np.array(hist_temp2)
    #         print("for layer = 1 shape = ", hist_temp2.shape)
    #         hist_temp2 = (1/4)*(1/4)*hist_temp2.reshape(dict_size,-1)
    #         print("for layer = 1 after reshaping shape = ", hist_temp2.shape)
    # # raise NotImplementedError()

    #     # else:
    #     #     for row in range(0, hist_temp2.shape[0]):
    #     #         hist_temp3.append(np.sum((hist_temp2[row,0],hist_temp2[row,1],hist_temp2[row,2],hist_temp2[row,3])))
    #     #     hist_temp3 = np.array(hist_temp3)
    #     #     # print('temp3 = ', hist_temp3)
    #     #     print("for layer = 2 shape = ", hist_temp3.shape)
    #     #     hist_temp3 = (1/4)*hist_temp3.reshape(dict_size,-1)
    #     #     print('temp 3 after reshaping = ', hist_temp3.shape)
    #         # print("for layer = 0 after reshaping shape = ", hist_temp3.shape)
    # # hist_all[:,0:1] = hist_temp3
    # # hist_all[:,1:5] = hist_temp2
    # # hist_all[:,5:22] = hist_temp1
    # # hist_all.append(hist_temp3)
    # # hist_all.append(hist_temp2)
    # # hist_all.append(hist_temp1)
    # # hist_all = np.hstack((hist_temp2,hist_temp1))
    # # hist_all = np.
    # print(hist_temp2.shape, hist_temp1.shape)
    # # hist_all = hist_all_np.reshape(dict_size,-1)
    # # print('size after appending = ', hist_all.shape)

    ### Using Recirsive Function. ###
    most_detailed_layer = L # layer_num -1
    curr_layer_hist = []
    
    def fun(hist_all, wordmap, most_detailed_layer, dict_size, curr_layer_no, curr_layer_hist):
        # print('current layer number = ', curr_layer_no)
        h, w = wordmap.shape
        patch_width = math.floor(w / (2**most_detailed_layer))
        patch_height = math.floor(h / (2**most_detailed_layer))
        # print('patch width = ', patch_width)
        # print('patch height = ', patch_height)
        # print('h = ', h)
        # print('w = ', w)
        height_iteration = 0
        width_iteration = 0
        curr_layer_hist = []
        if curr_layer_no == most_detailed_layer:
            weight = getweightforcurrlayer(curr_layer_no, most_detailed_layer)
            # print('weight = ', weight)
            for height in range(0,h-patch_height+1,patch_height):
                height_iteration += 1
                for width in range(0,w-patch_width+1,patch_width):
                    width_iteration += 1
                    curr_layer_hist.append(get_feature_from_wordmap(wordmap[height:height+patch_height, width:width+patch_width], dict_size))
            curr_layer_hist = np.array(curr_layer_hist)
            # print('most detailed layer shape before transpose = ',curr_layer_hist.shape)
            # print('width iterations = ',  width_iteration)
            # print('height iterations = ',  height_iteration)
            # curr_layer_hist = (1/2)*(1/16)*(curr_layer_hist.T)
            curr_layer_hist = curr_layer_hist.T
            # curr_layer_hist = weight*(1/curr_layer_hist.shape[1])*curr_layer_hist       ## Normalizing
            curr_layer_hist = weight*(4**(-curr_layer_no))*curr_layer_hist       ## Normalizing
            detailed_hist = curr_layer_hist
            # print('most detailed layer shape = ', detailed_hist.shape)
            return [detailed_hist]

        
        else:
            hist_all = fun(hist_all, wordmap, most_detailed_layer, dict_size, curr_layer_no+1, curr_layer_hist)
            weight = getweightforcurrlayer(curr_layer_no, most_detailed_layer)
            # print('weight = ', weight)
            latest_hist = hist_all[0]
            curr_layer_hist = []
            reshape_dimension = int(np.sqrt(latest_hist.shape[1]))
            latest_hist = latest_hist.reshape(dict_size,reshape_dimension,reshape_dimension)
            # print('latest hist after reshaping ka shape = ', latest_hist.shape)
            for cur_row in range(0,latest_hist.shape[1]-1,2):
                # temp = []
                for cur_col in range(0,latest_hist.shape[2]-1,2):
                    # temp = np.sum(latest_hist[:,cur_row:cur_row+2,cur_col:cur_col+2],axis=-1)
                    temp = np.sum(latest_hist[:,cur_row:cur_row+2,cur_col:cur_col+2],axis=(1,2))
                    # print('temp ka shape after first sum = ',temp.shape)
                    # temp = np.sum(temp[:,:],axis=-1)
                    # print('temp ka shape after second sum = ',temp.shape)
                    curr_layer_hist.append(temp)
            curr_layer_hist = np.array(curr_layer_hist)
            curr_layer_hist = curr_layer_hist.T
            # curr_layer_hist = weight*(1/curr_layer_hist.shape[1])*curr_layer_hist       ## Normalizing
            curr_layer_hist = weight*(4**(-curr_layer_no))*curr_layer_hist       ## Normalizing
            # print('current layer ka shape boom!!= ', curr_layer_hist.shape)
            return [curr_layer_hist, *hist_all]

            # else:
        #     hist_all = fun(hist_all, wordmap, most_detailed_layer, dict_size, curr_layer_no+1, curr_layer_hist)
        #     latest_hist = hist_all[0]
        #     curr_layer_hist = []
        #     if curr_layer_no == 1:
        #         for i in [0,2,8,10]:
        #             for row in range(0, dict_size):
        #                 curr_layer_hist.append(np.sum((latest_hist[row,i],latest_hist[row,i+1],latest_hist[row,i+4],latest_hist[row,i+5])))
        #         curr_layer_hist = np.array(curr_layer_hist)
        #         curr_layer_hist = (1/4)*(1/4)*curr_layer_hist.reshape(dict_size,-1)
        #         print('layer 1 shape = ', curr_layer_hist.shape)
        #         return [curr_layer_hist, *hist_all]

        #     elif curr_layer_no == 0:
        #         # print('shape of latest for layer 0 = ', latest_hist.shape)
        #         # print(dict_size)
        #         curr_layer_hist = []
        #         for row in range(0, dict_size):
        #             curr_layer_hist.append(np.sum((latest_hist[row,0],latest_hist[row,1],latest_hist[row,2],latest_hist[row,3])))
        #         curr_layer_hist = np.array(curr_layer_hist)
        #         print('layer 0 shape before reshape = ', curr_layer_hist.shape)
        #         curr_layer_hist = (1/4)*curr_layer_hist.reshape(dict_size,-1)
        #         print('layer 0 shape after reshape = ', curr_layer_hist.shape)
        #         return [curr_layer_hist, *hist_all]


    hist_all = fun(hist_all, wordmap, most_detailed_layer, dict_size, 0, curr_layer_hist)
    # print('1st col in hist', )
    # hist_all = np.array(hist_all)
    # for i in hist_all: print('each layer shape = ', i.shape)
    hist_all = np.concatenate(hist_all, axis=1)
    # print('1st col in hist', hist_all[:,0])
    # s1 = hist_all[:,0]
    # s2 = hist_all[:,1]+hist_all[:,2]+hist_all[:,3]+hist_all[:,4]
    # print('diff = ', np.sum(np.abs(s1-s2)))
    # print('s2 = ', s2)
    # print('s1 = ', s1)
    # print('final shape after concatenation = ', hist_all.shape)
    hist_all = np.array(hist_all)
    # print('final shape after converting into np array = ', hist_all.shape)
    hist_all = hist_all/np.sum(hist_all)
    # print('final shape after normalizing= ', hist_all.shape)
    hist_all = np.ravel(hist_all)
    # hist_all = hist_all.T
    # print('shape after ravel = ', hist_all.shape)
    
    return hist_all


In [24]:
# import numpy as np
# input = np.array([[0,0,1,1],[2,2,1,1],[2,2,0,1],[1,1,2,2]])
# # input = np.ones((500, 85))
# features = get_feature_from_wordmap_SPM(input, 3, 3)
# print(features, np.sum(features), features.shape)

[0.00967262 0.01190476 0.00297619 0.01488095 0.00892857 0.02380952
 0.01190476 0.         0.         0.         0.01190476 0.01190476
 0.         0.02380952 0.01190476 0.01190476 0.02380952 0.
 0.02380952 0.         0.         0.02455357 0.02083333 0.03571429
 0.01488095 0.02678571 0.         0.02380952 0.04761905 0.04761905
 0.04761905 0.01190476 0.         0.04761905 0.         0.01190476
 0.01190476 0.         0.04761905 0.         0.04761905 0.04761905
 0.01339286 0.01488095 0.00892857 0.01785714 0.01190476 0.02380952
 0.01190476 0.         0.         0.         0.02380952 0.03571429
 0.         0.02380952 0.02380952 0.02380952 0.02380952 0.
 0.02380952 0.         0.        ] 1.0 (63,)


In [36]:
# import cv2
# import numpy as np
# path_img = "./data/aquarium/sun_aairflxfskjrkepm.jpg"
# image = cv2.imread(path_img)
# image = image.astype('float')/255
# dict = np.load('./dictionary.npy')
# words = get_visual_words(image,dict)
# get_feature_from_wordmap_SPM(words,3,200)

[[  0   0 123 ... 112 112 143]
 [117 123 123 ... 112 167 167]
 [123 104 104 ... 192 167 167]
 ...
 [192  55  55 ... 154 154 167]
 [ 67 133  55 ... 154 154 167]
 [ 67  67  67 ... 154 143 143]]
diff =  0.0234375


array([3.40352616e-04, 2.51057316e-04, 7.66600013e-04, ...,
       7.07826795e-05, 2.35942265e-05, 4.81605351e-04])

### 2.3 Comparing images


We will also need a way of comparing images to find the _nearest_ instance in the training data. In this assignment, we'll use the histogram intersection similarity. The histogram intersection similarity between two histograms is the sum of the minimum value of each corresponding bins.

Note that since this is a similarity, you want the $largest$ value to find the _nearest_ instance.

#### Q2.3.1 (10 Points Autograder)
Create the function
```
                def distance_to_set(word_hist,histograms):
```
where $word\_hist$ is a $K\left(4^{(L+1)}-1\right)/3$ vector and $histograms$ is a $T \times K\left(4^{(L+1)}-1\right)/3$ matrix containing $T$ features from $T$ training samples concatenated along the rows. This function returns the histogram intersection similarity between $word\_hist$ and each training sample as a vector of length $T$. Since this is called every time you want to look up a classification, you want this to be fast, so doing a for-loop over tens of thousands of histograms is a very bad idea.

In [15]:
def distance_to_set(word_hist, histograms):
    '''
    Compute similarity between a histogram of visual words with all training image histograms.

    [input]
    * word_hist: numpy.ndarray of shape (K)
    * histograms: numpy.ndarray of shape (N,K)

    [output]
    * sim: numpy.ndarray of shape (N)
    '''
    '''
    HINTS:
    (1) Consider A = [0.1,0.4,0.5] and B = [[0.2,0.3,0.5],[0.8,0.1,0.1]] then \
        similarity between element A and set B could be represented as [[0.1,0.3,0.5],[0.1,0.1,0.1]]   
    '''
    # ----- TODO -----
    # YOUR CODE HERE
    sim = np.minimum(word_hist,histograms)
    sim = np.sum(sim, axis = -1)
    # raise NotImplementedError()
    return sim

In [16]:
# A = [0.1,0.4,0.5]
# B = [[0.2,0.3,0.5],[0.8,0.9,0.1]]
# C = np.minimum(A, B)
# print(C)
# D = np.sum(C, axis=-1)
# print('D = ',D) 

#### Q2.4 Building a Model of the Visual World

Now that we've obtained a representation for each image, and defined a similarity measure to compare two spatial pyramids, we want to put everything up to now together.

You will need to load the training file names from `data/train_data.npz` and the filter bank and visual word dictionary from `dictionary.npy`.
You will save everything to a `.npz` numpy-formated (use `np.savez`) file named `trained_system.npz`. Included will be:


1. $dictionary$: your visual word dictionary.
2. $features$: a $N \times  K\left(4^{(L+1)}-1\right)/3$ matrix containing all of the histograms of the $N$ training images in the data set. A dictionary with $150$ words will make a ``train_features`` matrix of size $1000 \times 3150$.
3. $labels$: an $N$ vector containing the labels of each of the images. ( ``features[i]`` will correspond to label ``labels[i]``).
4. $SPM\_layer\_num$: the number of spatial pyramid layers you used to extract the features for the training images.

We have provided you with the names of the training images in ``data/train_data.npz``.
You want to use the dictionary entry ``image_names`` for training.
You are also provided the names of the test images in ``data/test_data.npz``, which is structured in the same way as the training data; however, _you cannot use the testing images for training._

If it's any helpful, the below table lists the class names that correspond to the label indices:

| 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 |
| :--: | :--: | :--: | :--: | :--: | :--: | :--: | :--: |
|aquarium | park | desert | highway | kitchen | laundromat | waterfall | windmill|

#### Q2.4.1 (15 Points Autograder)
Implement the function
```
                def build_recognition_system():
```
that produces ``trained_system.npz``.

Implement 
```
                def get_image_feature(file_path,dictionary,layer_num,K):
```
that load image, extract word map from the image, compute SPM feature and return the computed feature. Use this function in your ``build_recognition_system()``.

In [17]:
def get_image_feature(file_path, dictionary, layer_num, K):
    '''
    Extracts the spatial pyramid matching feature.

    [input]
    * file_path: path of image file to read
    * dictionary: numpy.ndarray of shape (K,3F)
    * layer_num: number of spatial pyramid layers
    * K: number of clusters for the word maps

    [output]
    * feature: numpy.ndarray of shape (K*(4^layer_num-1)/3)
    '''
    # ----- TODO -----
    # YOUR CODE HERE
    import cv2
    image = cv2.imread(file_path)
    image = image.astype('float')/255
    # dict = np.load('./dictionary.npy')
    words = get_visual_words(image,dictionary)
    feature = get_feature_from_wordmap_SPM(words,layer_num,K)
    # raise NotImplementedError()
    return [file_path, feature]


In [18]:
# # import cv2
# # import numpy as np
# file_path = "./data/aquarium/sun_aairflxfskjrkepm.jpg"
# # image = cv2.imread(path_img)
# # image = image.astype('float')/255
# dict = np.load('./dictionary.npy')
# # words = get_visual_words(image,dict)
# # get_feature_from_wordmap_SPM(words,1,200)
# get_image_feature(file_path, dict, 3, 200)

In [19]:
# import numpy as np
# a = np.array([[1,2,3],[4,5,6]])
# b= a[1]
# print(b)

In [20]:
# train_data = np.load("./data/train_data.npz")
# print(np.array(train_data))
# print(train_data['files'])

In [38]:
def build_recognition_system(num_workers=4):
    '''
    Creates a trained recognition system by generating training features from all training images.

    [input]
    * num_workers: number of workers to process in parallel

    [saved]
    * features: numpy.ndarray of shape (N,M)
    * labels: numpy.ndarray of shape (N)
    * dictionary: numpy.ndarray of shape (K,3F)
    * SPM_layer_num: number of spatial pyramid layers
    '''

    train_data = np.load("./data/train_data.npz")
    dictionary = np.load("dictionary.npy")

    # ----- TODO -----
    # YOUR CODE HERE
#  layer_num, K
    args_list = []
    image_names = train_data['files']
    labels = train_data['labels']

    ideal_dict = {}
    # for i in range(0, len(image_names)):
    #     ideal_dict[image_names[i]] = labels[i]
    org_paths = []
    SPM_layer_num = 3
    for idx in range(0,len(image_names)):
        img_path = './data/' + image_names[idx]
        # ideal_dict[img_path[idx]] = labels[idx]
        org_paths.append(img_path)
        args_list.append([img_path, dictionary, SPM_layer_num, dictionary.shape[0]])
        # args_list.append(img_path)
    
    with multiprocess.Pool(num_workers) as pool:
        temparr = pool.starmap(get_image_feature, args_list)
    
    unordered_filepaths = [z[0] for z in temparr] 
    unordered_features = [t[1] for t in temparr]
    # Create a dictionary of unordered outputs
    # All the o/p are random because of multiprocessing. So now we need to order them in the original way.
    unordered_dict = {} 
    for x in range(0, len(unordered_filepaths)):
        unordered_dict[unordered_filepaths[x]] = unordered_features[x]
    
    ordered_features = []
    for h in org_paths:
        ordered_features.append(unordered_dict[h])

    # print(ordered_features)
    # raise NotImplementedError()
    np.savez('trained_system.npz', features=ordered_features,
                                    labels=labels,
                                    dictionary=dictionary,
                                    SPM_layer_num=SPM_layer_num)


# NOTE: comment out the lines below before submitting to gradescope
# build_recognition_system()

[[112 112  22 ... 123 103  78]
 [112   6  22 ...  41 152 103]
 [193   6  22 ... 194 152 152]
 ...
 [143 143 143 ... 143 143 143]
 [143 143 143 ... 143 143 143]
 [143 143 143 ... 143 143 143]]
diff =  0.0234375
[[112 112 143 ...   6   6 167]
 [112 112 112 ... 169 169 167]
 [112 112 112 ... 136 169  78]
 ...
 [ 25  25  25 ...  58  58 167]
 [143  25  25 ...  58  58 167]
 [143 143  25 ...  67  67 167]]
diff =  0.0234375
[[  6   6   6 ...   6   6 167]
 [ 34 174 174 ... 174 196 167]
 [ 34 174 136 ... 136 152  78]
 ...
 [112 112 143 ... 143 143 143]
 [112 143 143 ... 143 143 143]
 [143 143 143 ... 143 143 143]]
diff =  0.0234375
[[  6   6   0 ... 103 103  78]
 [ 34 123 123 ...  76 152 103]
 [ 34 123  29 ... 194 152 152]
 ...
 [112 112 112 ... 172 172 143]
 [112 143 112 ... 172 172 143]
 [143 143 143 ... 143 143 143]]
[[112   6 158 ...   6   6 158]
 [192 174 174 ... 174 174 158]
 [192 174 174 ... 174 152 158]
 ...
 [187 187 187 ... 187  70 143]
 [187 187  70 ...  70  70 143]
 [143 193 187 ... 

In [None]:
# print(unordered_dict)

NameError: name 'unordered_dict' is not defined

In [None]:
# a = ['abcd','bdef','cg']
# for i in a:
#     print (i)

abcd
bdef
cg


### References

[1]  James Hays and Alexei A Efros. Scene completion using millions of photographs.ACM Transactions onGraphics (SIGGRAPH 2007), 26(3), 2007.

[2]  S. Lazebnik, C. Schmid, and J. Ponce. Beyond bags of features: Spatial pyramid matching for recogniz-ing natural scene categories.  InComputer Vision and Pattern Recognition (CVPR), 2006 IEEE Conferenceon, volume 2, pages 2169–2178, 2006.

[3]  Jian xiong Xiao, J. Hays, K. Ehinger, A. Oliva, and A. Torralba. Sun database: Large-scale scene recogni-tion from abbey to zoo.2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition,pages 3485–3492, 2010.14