<div align="center">
 <img src="http://www.di.uoa.gr/themes/corporate_lite/logo_en.png" title="Department of Informatics and Telecommunications - University of Athens" align="center" /> 
</div>

<br>

---

<div align="center"> 
  <font size="3"><b>Machine Learning</b> </font>
</div>
<br>
<div align="center"> 
  <font size="6">
      <b>Winner Takes All Hash<br></b>
    </font>
    <br>
    <font size="4">
        A brief study on Winner Takes All Hash algorithm, inspired by Jay Yagnik. 
    </font>
    <br>
    <br>
    <font size="3">
        Original paper: 
        <a href="http://www.cs.toronto.edu/~dross/YagnikStrelowRossLin_ICCV2011.pdf">
            The Power of Comparative Reasoning <br>Jay Yagnik, Dennis Strelow, David A. Ross, Ruei-sung Lin - 2011
        </a>
    </font>
</div>

---

<div align="center"> 
    <font size="4">
         <b>Konstantinos Nikoletos, BS Student</b>
     </font>
</div>
<div align="center"> 
    <font size="2">Athens 2021</font>
</div>


---

# WTA: Basic idea

## Rank Correlation Spaces

This algorithm benefits the nature of the given embeddings. Embeddings must be rank ordered (partialy or fully). What do we mean when say rank ordered embedding or vector. Lets take an example. Assume $Y$ is our initial vector.
$$
Y = [4, -1.5, 2.1, 5.5, 0]
$$
Then the ranked ordered vector of $Y$ will be:
$$
Y_{rankedOrdered} = [3, 0, 2, 4, 1]
$$

And what's a __rank correlation__?

Rank correlation is any of several statistics that measure an ordinal association — the relationship between rankings of different ordinal variables or different rankings of the same variable, where a "ranking" is the assignment of the ordering labels 1st, 2nd, 3rd, etc. to different observations of a particular variable. __Rank correlation coefficient__ measures the degree of similarity between two rankings, and can be used to assess the significance of the relation between them. Rank correlation measures are known for their stability to perturbations in
numeric values while giving a good indication of inherent similarity / agreement between items / vectors being considered.


__Partial order rankings__

There's no need to have embeddings or vectors of full rankings. For example $Y$ is a full ranking as it has all indexes from 0 to 4. Partial rankings miss a number of that indexes. 

## Goal

WTA algorithms goal is a feature space transformation that results in a space that is not sensitive to the absolute values of the feature dimensions but rather on the implicit ordering defined by those values. In effect the similarity between two points is defined by the degree to which their feature dimension rankings agree.

## Algorithms main functionality

Takes as input a set of vectors (embeddings), and for each vector:
- Permutes with a random pemutation
- Takes the first K components from the permuted vector
- Finds and outputs the index of the maximum value of that components
- Repeats for every vector in the set

The maximum index is the hash code. 


### A simple pairwise-order measure

The algorithms main point is to make a feature space transformation that benefits the most from the ordering of the vectors. In effect the similarity between two points is defined by the degree to which their feature dimension rankings agree. As an example, Yagnik's proposal is a measure like the above:

__Equation 1__:
$$
    PO(X,Y) = \sum_{i} \sum_{j<i}  T((x_i − x_j ) (y_i − y_j)) 
$$

where:
- $x_i$ and $y_i$ are the i-th feature dimensions in
- $X,Y$: ranked ordered vectors
- $T$ is simply a threshold function

Equation 1 simply measures the number of pairs of feature dimensions in X and Y that agree in ordering.



The above PO function and the Threshold: 

In [1]:
def WTA_similarity(vector1,vector2):
    
    PO=0
    for i in range(0,len(vector1),1):
        for j in range(0,i,1):
            ij_1 = vector1[i] - vector1[j]
            ij_2 = vector2[i] - vector2[j]
            PO += WTA_Threshold(ij_1*ij_2)
            
    return PO

def WTA_Threshold(x):    
    if x>0:
        return 1
    else:
        return 0

### Comparison with MinHash

WTA is a generalization of the well-known MinHash and should enjoy the theoretical benefits of LSH schemes. MinHash is a special case of WTA when applies to binary feautures.


## __Implementation code in Python__

__Import of libraries__

In [1]:
import numpy as np

__Main code__

In [26]:
'''
@author: Konstantinos Nikoletos, 2021
'''
def WTA(vectors, K, numOfPermutations):
    '''
      Winner Take All hash - Yagnik
      .............................
      
      vectors: initial vectors
      K: window size
      number_of_permutations: number of times each vector will be permuted  
    '''
    
    newVectors = []
    buckets = dict()
    numOfVectors = vectors.shape[0]
    vectorDim    = vectors.shape[1]

    if vectorDim < K:
        K = vectorDim
        warnings.warn("Window size greater than vector dimension")

    C = np.zeros((numOfPermutations, numOfVectors), dtype=int)
    i=0;
    for permutation_index in range(0, numOfPermutations, 1):
        
        theta = np.random.permutation(vectorDim)      # randomization is without replacement and has to be consistent 
                                                      # across all samples and hence the notion of permutations

        j=0;
        for v_index in range(0,numOfVectors,1):
            if permutation_index == 0:
                X_new = permuted(vectors[v_index], theta)
                newVectors.append(X_new)
            else:
                X_new = permuted(vectors[v_index], theta)
                newVectors[v_index] = X_new

            if permutation_index % 2 == 0: 
                C[i][j] = max(range(len(X_new[:K])), key=X_new[:K].__getitem__)
            else:
                C[i][j] = min(range(len(X_new[-K:])), key=X_new[-K:].__getitem__)
                
            bucketInsert(buckets, str(i)+str(j), v_index)
            j+=1;
        i+=1;

    return C, buckets, np.array(newVectors, dtype=np.intp)


def permuted(vector,permutation):
    permuted_vector = [vector[x] for x in permutation]
    return permuted_vector


def bucketInsert(buckets, hashCode, value):
    if hashCode not in buckets.keys():
        buckets[hashCode] = set()
    buckets[hashCode].add(value)
    return buckets

## Example and Remarks

### Example (same with paper)

An example with 6-dimensional input vectors, 
- K = 4, and 
- θ = (1, 4, 2, 5, 0, 3). 

X in (a) and (b) are unrelated and result in different output codes, 1 and 2 respectively.
X in (c) is a scaled and offset version of (a) and results in
the same code as (a). X in (d) has each element perturbed
by 1 which results in a different ranking of the elements,
but the maximum of the first K elements is the same, again
resulting in the same code.


In [18]:
a = [10,5,2,6,12,3]
b = [4,5,10,3,2,1]
c = [22,12,6,14,26,8]
d = [11,4,3,7,13,2]

vectors = np.array([a,b,c,d])
vectors

array([[10,  5,  2,  6, 12,  3],
       [ 4,  5, 10,  3,  2,  1],
       [22, 12,  6, 14, 26,  8],
       [11,  4,  3,  7, 13,  2]])

In [27]:
K = 4
number_of_permutations = 2

C, buckets, newVectors = WTA(vectors, K, number_of_permutations)

In [28]:
C

array([[1, 0, 1, 1],
       [0, 3, 0, 0]])

In [25]:
buckets

{'00': {0},
 '01': {1},
 '02': {2},
 '03': {3},
 '10': {0},
 '11': {1},
 '12': {2},
 '13': {3}}

In [8]:
newVectors

array([[ 6, 12,  2, 10,  3,  5],
       [ 3,  2, 10,  4,  1,  5],
       [14, 26,  6, 22,  8, 12],
       [ 7, 13,  3, 11,  2,  4]], dtype=int64)

### Pairwise agreement of a,b,c,d

In [9]:
print("Similarity on X: ")
print("WTA_similarity(a,b) = ",WTA_similarity(a,b))
print("WTA_similarity(a,c) = ",WTA_similarity(a,c))
print("WTA_similarity(a,d) = ",WTA_similarity(a,d))
print("\n\n")
print("Similarity on X': ")
print("WTA_similarity(a,b) = ",WTA_similarity(newVectors[0],newVectors[1]))
print("WTA_similarity(a,c) = ",WTA_similarity(newVectors[0],newVectors[2]))
print("WTA_similarity(a,d) = ",WTA_similarity(newVectors[0],newVectors[3]))
print("\n\n")
print("Similarity on X'[:K]: ")
print("WTA_similarity(a,b) = ",WTA_similarity(newVectors[0][:K],newVectors[1][:K]))
print("WTA_similarity(a,c) = ",WTA_similarity(newVectors[0][:K],newVectors[2][:K]))
print("WTA_similarity(a,d) = ",WTA_similarity(newVectors[0][:K],newVectors[3][:K]))

Similarity on X: 
WTA_similarity(a,b) =  5
WTA_similarity(a,c) =  15
WTA_similarity(a,d) =  14



Similarity on X': 
WTA_similarity(a,b) =  5
WTA_similarity(a,c) =  15
WTA_similarity(a,d) =  14



Similarity on X'[:K]: 
WTA_similarity(a,b) =  2
WTA_similarity(a,c) =  6
WTA_similarity(a,d) =  5


### Kendal Tau similarity

In [10]:
from scipy.stats import kendalltau

Similarity based on Kendal Tau for the initial vectors ($X$)

In [11]:
similarity_prob, p_value = kendalltau(a,b)
print("kendalltau(a,b) = ", similarity_prob)
similarity_prob, p_value = kendalltau(a,c)
print("kendalltau(a,c) = ", similarity_prob)
similarity_prob, p_value = kendalltau(a,d)
print("kendalltau(a,d) = ", similarity_prob)

kendalltau(a,b) =  -0.3333333333333333
kendalltau(a,c) =  0.9999999999999999
kendalltau(a,d) =  0.8666666666666666


Similarity based on Kendal Tau for the permuted vectors ($X'$)

In [12]:
similarity_prob, p_value = kendalltau(newVectors[0],newVectors[1])
print("a,b similarity: ", similarity_prob)
similarity_prob, p_value = kendalltau(newVectors[0],newVectors[2])
print("a,c similarity: ", similarity_prob)
similarity_prob, p_value = kendalltau(newVectors[0],newVectors[3])
print("a,d similarity: ", similarity_prob)

a,b similarity:  -0.3333333333333333
a,c similarity:  0.9999999999999999
a,d similarity:  0.8666666666666666


In [13]:
similarity_prob, p_value = kendalltau(newVectors[0][:K],newVectors[1][:K])
print("a,b similarity: ", similarity_prob)
similarity_prob, p_value = kendalltau(newVectors[0][:K],newVectors[2][:K])
print("a,c similarity: ", similarity_prob)
similarity_prob, p_value = kendalltau(newVectors[0][:K],newVectors[3][:K])
print("a,d similarity: ", similarity_prob)

a,b similarity:  -0.3333333333333334
a,c similarity:  1.0
a,d similarity:  0.6666666666666669


# m-WTA: Not one Winner but m-Winners

Instead of taking the index of the biggest value, I propose to take indexes of the m-biggest values as a set. Buckets hash codes in this way will be sets of hash codes that is not affected by the ranking order between them. For m=1 it's the initial-previous algorithm of WTA.


In [97]:
import functools
import operator

class mWTA:
    
    def __init__(self, K, number_of_permutations, m):
        
        self.K = K
        self.number_of_permutations = number_of_permutations
        self.m = m
        
    
    def fit(self, vectors):
        '''
          Winner Take All hash - Yagnik
          .............................

          vectors: initial vectors
          K: window size
          number_of_permutations: number of times each vector will be permuted  
        '''

        newVectors = []
        buckets = dict()

        numOfVectors = vectors.shape[0]
        vectorDim    = vectors.shape[1]

        if vectorDim < self.K:
            self.K = vectorDim
            warnings.warn("Window size greater than vector dimension")

        C = np.empty([numOfVectors,number_of_permutations], dtype=np.object)

        permutation_dimension = vectorDim
        for permutation_index in range(0,number_of_permutations,1):

            # randomization is without replacement and has to be consistent 
            # across all samples and hence the notion of permutations
            theta = np.random.permutation(permutation_dimension) 

            i=0;j=0;
            for v_index in range(0,numOfVectors,1):
                if permutation_index == 0:
                    X_new = self.permuted(vectors[v_index],theta)
                    newVectors.append(X_new)
                else:
                    X_new = permuted(vectors[v_index],theta)
                    newVectors[v_index] = X_new

                C[i][permutation_index] = np.argsort(X_new[:self.K])[-self.m:].tolist()
                i+=1

            
        for c, i in zip(C, range(0, numOfVectors, 1)):
            buckets = self.bucketInsert(buckets, frozenset(c[0]), i)

        return C, buckets, np.array(newVectors,dtype=np.intp)

    def permuted(self, vector, permutation):
        permuted_vector = [vector[x] for x in permutation]

        return permuted_vector

    def bucketInsert(self, buckets, bucket_id, item):
        if bucket_id not in buckets.keys():
            buckets[bucket_id] = []

        buckets[bucket_id].append(item)

        return buckets

In [98]:
K = 4
number_of_permutations = 1
m = 2

mWta = mWTA(K, number_of_permutations, m)
C,buckets,newVectors = mWta.fit(vectors)

In [95]:
C

array([[list([1, 0])],
       [list([2, 3])],
       [list([1, 0])],
       [list([1, 0])]], dtype=object)

# MinHash

# Overall Remarks


## Advantages

1. Very fast hashing method. For example, consider using it in a classification problem.
2. No need of trainning

## Disadvantages

1. Only applied to ranked vectors


# References

1.   [The Power of Comparative Reasoning Jay Yagnik, Dennis Strelow, David A. Ross, Ruei-sung Lin](http://www.cs.toronto.edu/~dross/YagnikStrelowRossLin_ICCV2011.pdf)
2.   [Rank correlation](https://en.wikipedia.org/wiki/Rank_correlation)