<div align="center">
 <img src="http://www.di.uoa.gr/themes/corporate_lite/logo_en.png" title="Department of Informatics and Telecommunications - University of Athens" align="center" /> 
</div>

<br>

---

<div align="center"> 
  <font size="3"><b>Machine Learning</b> </font>
</div>

<div align="center"> 
  <font size="6">
      <b>Winner Takes All Hash</b><br>
    </font>
     <font size="4">
        A brief study on Winner Takes All Hash algorithm, inspired by Jay Yagnik. 
    </font><br><br>
    <font size="3">
        Original paper: 
        <a href="http://www.cs.toronto.edu/~dross/YagnikStrelowRossLin_ICCV2011.pdf">
            The Power of Comparative Reasoning <br>Jay Yagnik, Dennis Strelow, David A. Ross, Ruei-sung Lin - 2011
        </a>
    </font>
</div>

---

<div align="center"> 
    <font size="4">
         <b>Konstantinos Nikoletos, BS Student</b>
     </font>
</div>
<div align="center"> 
    <font size="2">Athens 2021</font>
</div>


---

# Basic idea

## Algorithms main functionality

Takes as input a set of vectors, and for each vector:
- Permutes with a random pemutation
- Takes the first K components from the permuted vector
- Finds and outputs the index of the maximum value of that components
- Repeats for every vector in the set



### A simple pairwise-order measure

The algorithms main point is to make a feature space transformation that benefits the most from the ordering of the vectors. In effect the similarity between two points is defined by the degree to which their feature dimension rankings agree. As an example, Yagnik's proposal is a measure like the above:

__Equation 1__:
$$
    PO(X,Y) = \sum_{i} \sum_{j<i}  T((x_i − x_j ) (y_i − y_j)) 
$$

where:
- $x_i$ and $y_i$ are the i-th feature dimensions in
- $X,Y$: ranked ordered vectors
- $T$ is simply a threshold function

Equation 1 simply measures the number of pairs of feature dimensions in X and Y that agree in ordering.



The above PO function and the Threshold: 

In [37]:
def WTA_similarity(vector1,vector2):
    
    PO=0
    for i in range(0,len(vector1),1):
        for j in range(0,i,1):
            ij_1 = vector1[i] - vector1[j]
            ij_2 = vector2[i] - vector2[j]
            PO += WTA_Threshold(ij_1*ij_2)
            
    return PO

def WTA_Threshold(x):    
    if x>0:
        return 1
    else:
        return 0

# __Implementation code in Python__

__Import of libraries__

In [2]:
import numpy as np

__Main code__

In [51]:
def WTA(vectors, K, number_of_permutations):
    '''
      Winner Take All hash - Yagnik
      .............................
      
      vectors: initial vectors
      K: window size
      number_of_permutations: number of times each vector will be permuted  
    '''
    
    newVectors = []
    buckets = dict()

    numOfVectors = vectors.shape[0]
    vectorDim    = vectors.shape[1]

    if vectorDim < K:
        K = vectorDim
        warnings.warn("Window size greater than vector dimension")

    C = np.zeros([numOfVectors,number_of_permutations], dtype=int)

    permutation_dimension = vectorDim
    for permutation_index in range(0,number_of_permutations,1):
        theta = np.random.permutation(permutation_dimension)
        i=0;j=0;
        for v_index in range(0,numOfVectors,1):
            if permutation_index == 0:
                X_new = permuted(vectors[v_index],theta)
                newVectors.append(X_new)
            else:
                X_new = permuted(vectors[v_index],theta)
                newVectors[v_index] = X_new

            C[i][permutation_index] = max(range(len(X_new[:K])), key=X_new[:K].__getitem__)
            i+=1
        permutation_dimension = K

    for c,i in zip(C,range(0,numOfVectors,1)):
        buckets = bucketInsert(buckets,str(c),i)

    return C,buckets,np.array(newVectors,dtype=np.intp)


def permuted(vector,permutation):
    permuted_vector = [vector[x] for x in permutation]

    return permuted_vector


def bucketInsert(buckets,bucket_id,item):
    if bucket_id not in buckets.keys():
        buckets[bucket_id] = []
    buckets[bucket_id].append(item)

    return buckets

# Examples and Evaluation

## 1st Example (same with paper)

An example with 6-dimensional input vectors, 
- K = 4, and 
- θ = (1, 4, 2, 5, 0, 3). 

X in (a) and (b) are unrelated and result in different output codes, 1 and 2 respectively.
X in (c) is a scaled and offset version of (a) and results in
the same code as (a). X in (d) has each element perturbed
by 1 which results in a different ranking of the elements,
but the maximum of the first K elements is the same, again
resulting in the same code.


In [4]:
a = [10,5,2,6,12,3]
b = [4,5,10,3,2,1]
c = [22,12,6,14,26,8]
d = [11,4,3,7,13,2]

vectors = np.array([a,b,c,d])
vectors

array([[10,  5,  2,  6, 12,  3],
       [ 4,  5, 10,  3,  2,  1],
       [22, 12,  6, 14, 26,  8],
       [11,  4,  3,  7, 13,  2]])

In [33]:
K = 2
number_of_permutations = 1


C,buckets,newVectors = WTA(vectors, K, number_of_permutations)

In [34]:
C

array([[1],
       [0],
       [1],
       [1]])

In [31]:
buckets

{'[1]': [0, 2, 3], '[2]': [1]}

In [35]:
newVectors

array([[10, 12,  5,  2,  6,  3],
       [ 4,  2,  5, 10,  3,  1],
       [22, 26, 12,  6, 14,  8],
       [11, 13,  4,  3,  7,  2]], dtype=int64)

### Pairwise agreement of a,b,c,d

In [49]:
print("WTA_similarity(a,b) = ",WTA_similarity(a,b))
print("WTA_similarity(a,c) = ",WTA_similarity(a,c))
print("WTA_similarity(a,d) = ",WTA_similarity(a,d))

WTA_similarity(a,b) =  5
WTA_similarity(a,c) =  15
WTA_similarity(a,d) =  14


### Kendal Tau similarity

In [41]:
from scipy.stats import kendalltau

Similarity based on Kendal Tau for the initial vectors ($X$)

In [50]:
similarity_prob, p_value = kendalltau(a,b)
print("kendalltau(a,b) = ", similarity_prob)
similarity_prob, p_value = kendalltau(a,c)
print("kendalltau(a,c) = ", similarity_prob)
similarity_prob, p_value = kendalltau(a,d)
print("kendalltau(a,d) = ", similarity_prob)

kendalltau(a,b) =  -0.3333333333333333
kendalltau(a,c) =  0.9999999999999999
kendalltau(a,d) =  0.8666666666666666


Similarity based on Kendal Tau for the permuted vectors ($X'$)

In [48]:
similarity_prob, p_value = kendalltau(newVectors[0],newVectors[1])
print("a,b similarity: ", similarity_prob)
similarity_prob, p_value = kendalltau(newVectors[0],newVectors[2])
print("a,c similarity: ", similarity_prob)
similarity_prob, p_value = kendalltau(newVectors[0],newVectors[3])
print("a,d similarity: ", similarity_prob)

a,b similarity:  -0.3333333333333333
a,c similarity:  0.9999999999999999
a,d similarity:  0.8666666666666666


# References

1.   [The dissimilarity representation for pattern recognition, a tutorial
Robert P.W. Duin and Elzbieta Pekalska Delft University of Technology, The Netherlands School of Computer Science, University of Manchester, United Kingdom](http://homepage.tudelft.nl/a9p19/presentations/DisRep_Tutorial_doc.pdf)