In [1]:
import random

## Algorithm Design 2019-20 @ Computer Science - Università di Pisa

### Scribes: Chiara Boni, Eleonora Di Gregorio 
### Lecturer: Roberto Grossi 

# Hashing

## Multiplicative Universal Hashing

### Definitions and goals


Recalling that $h: U \rightarrow$ [$m$] is an hashing function, generated randomly, which maps from an universe $U$ of keys to a set of hash values [$m$] = {$0,...,m-1$}; it is $universal$, therefore for any given distinct keys $x,y \in U$, there is a low probability of collision in $h$: <br>
$Pr_{h}$[$h(x) = h(y)$] $\le 1/m$. <br>
$h$ is called $c-universal$ when, for some $c = O(1)$ and $c \ge 1$, $Pr_{h}$[$h(x) = h(y)$] $\le c/m$. <p>

### Designing multiplicative universal hashing

This scheme was proposed by Dietzfelbinger and it generally addresses hashing from $w$-bit integers to $l$-bit integers.<br>

In [2]:
w = 64
l = 12
print("h:["+str(pow(2, w))+"]->["+str(pow(2, l))+"]")

h:[18446744073709551616]->[4096]


After picking uniformly at random an odd $w$-bit integer $a$, it has to be computed $h_{a}:$[$2^{w}$] $\rightarrow$ [$2^{l}$], as $h_{a}(x)$ = {($ax$ mod $2^{w}$)/$2^{w-l}$}.<br>

In [3]:
def getOdd(m):  # return an odd value in range 0 m-1
    n = random.randint(1, m - 2)
    if n % 2 == 0:
        n = n + 1
    return n
a = getOdd(pow(2, w))
print("a="+str(a))

a=8641261826262442449


With this schema, the numbers are stored as $bit$ $strings$, with the least significant bit to the right, so the integer division by a power of two would be represented as a $right$ $shift$. <br>

In [4]:
x = 42
ax = 42*a
h_x = int(ax %pow(2,w)/ pow(2,w-l))
print("h(x)="+str(h_x)+"="+str(bin(h_x)[2:]))

h(x)=2763=101011001011


It operates by extracting bits from $w-l$ to $w-1$, from the product $ax$, as illustrated: <br>
<img src="./files/bitVector.png" /> <p>

In [5]:
shift = bin(ax)[-(w):-((w)-l)]
print(shift)

101011001011


$Claim$ <br>
Multiply-shift is 2-universal, for $x \ne y$, therefore: <br>
$$Pr_{a \in [2^{w}], odd}[h_{a}(x) = h_{a}(y)] \le 2/2^{l} = 2/m$$ <p>
    
$Proof$ <br>
$h_{a}(x) = h_{a}(y)$ is a collision which occurs only if $ax$ and $ay = ax + a(y-x)$ are the same on the bits $w-l,..,w-1$. <br>
This match requires that bits $w-l,..,w-1$ of $(y-x)$ are either all 0s or 1s. <br>
In order to have this condition fulfilled, two possible cases arise, when we add $a(y-x)$ to $ax$: if there's no carry, $h_{a}(x) = h_{a}(y)$ happens when all the bits $w-l,..,w-1$ of $a(y-x)$ are 0s; otherwise, if there's a carry, $h_{a}(x) = h_{a}(y)$ happens when all the bits are 1s.<br>     
Therefore, to prove the claim is sufficient to prove that the probability that all bits $w-l,..,w-1$ of $a(y-x)$ are all 0s, or all 1s, is at most $2/2^{l}$. <p>

$Fact.$ If $\alpha$ is odd and $\beta \in [2^{q}]$ then $\alpha\beta \equiv 0$ (mod $a^{q}$). <br>
This exploit that any odd number $z$ is relatively prime to any power of two.<p>
    
Let's define $b$ such that $a = 1 + 2b$, then $b$ is uniformly distributed in [$2^{w-1}$], since $a$ is uniformly distributed in [$2^{w}$]. <br>
Define $z$ to be the odd number satisfying $(y-x) = z2^{i}$, then $a(y-x) = z2^{i} + bz2^{i+1}$.<br>
<img src="./files/claim.png" /> <p>

It is remained to prove that $bz$ mod $2^{w-1}$ is uniformly distributed in $[2^{w-1}]$. <br>
Note that there's a 1-1 correspondence between the $b \in [2^{w-1}]$ and the products $bz$ mod $2^{w-1}$; for if there was another $b^{'} \in [2^{w-1}]$ with $b^{'}z \equiv bz$ (mod $2^{w-1}$) $\iff$ $z(b^{'} - b) \equiv 0$ (mod $2^{w-1}$), then this would contradict the fact 1, since $z$ is odd. <br>
The uniform distribution on $b$ implies that $bz$ mod $2^{w-1}$ is uniformly distributed too. <br>
This concludes that $a(y-x) = z2^{i} + bz2^{i+1}$ has bits set to $0$ until $i-1$, then a bit set to $1$ in position $i$ and a uniform distribution on bits $i+1,...,i+w-1$. <p>
    
The collision $h_{a}(x) = h_{a}(y)$ happens when $ax$ and $ay= ax + a(y-x)$ are identical on bits $w-l,..,w-1$ <br>
The two are always different in bit $i$, so if $i \ge w-l$, there won't any collision, regardless of $a$.<br>
However, if $i < w-l$, then because of carries there could be a collision if bits $w-l,..,w-1$ of $a(y-x)$ are either 0s or 1s. <br>
Because of the uniform distribution, either this event happens with probability $1/2^{l}$, for a combined probability bounded by $2/2^{l}$, which completes the proof.

### Code

In [6]:
def getOdd(m):  # return an odd value in range 0 m-1
    n = random.randint(1, m - 2)
    if n % 2 == 0:
        n = n + 1
    return n


class MultiplicativeHashFamily(object):
    def __init__(self, w=64, l=12):
        self.w = w
        self.l = l
        self.a = 0

    def randomChoose(self):
        self.a = a = getOdd(pow(2, w))
        return lambda x: int(((a * x) % pow(2, self.w)) / pow(2, (self.w - self.l)))

    def __str__(self):
        return "h(x) = (%d*x %% 2^%d) / 2^(%d-%d)" % (self.a, self.w, self.w, self.l)


def buildMultiplicativeHash(S, w=64, l=12):
    H = MultiplicativeHashFamily(w, l)
    h = H.randomChoose()
    print(H)
    for elem in S:
        ax = elem * H.a
        print(str(elem) + "*a=" + str(bin(ax)[2:]))
        test = bin(ax)[-(H.w):-((H.w)-l)]
        print("x*a[w-1,w-l]=" + str(test))
        hash = h(elem)
        print("h(" + str(elem) + ")=" + str(hash) + "=" + bin(hash)[2:])


# test the multiplicative hash
S = [11, 25, 36, 41, 57, 65, 13, 29, 49]
print("S =", S)
buildMultiplicativeHash(S, 64, 12)

S = [11, 25, 36, 41, 57, 65, 13, 29, 49]
h(x) = (12518956011447531325*x % 2^64) / 2^(64-12)
11*a=1110111011100010110111011011100111000110111110010000101111110011111
x*a[w-1,w-l]=011101110001
h(11)=1905=11101110001
25*a=100001111011101100010101010000001101001111110110001110110010011110101
x*a[w-1,w-l]=111101110110
h(25)=3958=111101110110
36*a=110000110111001111001100101011110100010110110100011111100010010010100
x*a[w-1,w-l]=011011100111
h(36)=1767=11011100111
41*a=110111101001100100110111010101011101011001111111010101101100011000101
x*a[w-1,w-l]=110100110010
h(41)=3378=110100110010
57*a=1001101010111011101011001011010101101100100001000011100100110010010101
x*a[w-1,w-l]=101011101110
h(57)=2798=101011101110
65*a=1011000001110011001101010011101010101101001001101000000000011001111101
x*a[w-1,w-l]=000111001100
h(65)=460=111001100
13*a=10001101001010001111011101100010001001000001111011001100111000011001
x*a[w-1,w-l]=110100101000
h(13)=3368=110100101000
29*a=10011101011100101001110111000110000

### References

Mikkel Thorup,High Speed Hashing for Integers and Strings,CoRR,abs/1504.06804,2015, http://arxiv.org/abs/1504.06804