In [4]:
!pip install bhv==0.3.1

Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
Collecting bhv==0.3.0
  Downloading bhv-0.3.0-py3-none-any.whl (34 kB)
Installing collected packages: bhv
Successfully installed bhv-0.3.0


In [5]:
from bhv.vanilla import VanillaBHV as BHV, DIMENSION

Given some random vectors


In [6]:
a, b, c = BHV.nrand(3)

`active` counts the number of on-bits.
For `rand` vectors (sampled with `Bernoulli(0.5)`) we expect this to be around `DIMENSION/2`.

In [7]:
print(a.active())
print(DIMENSION//2)

4102
4096


Getting the shared bits (that are on in both vectors) by counting the active bits in their conjunction.
For rand vectors we expect this to be around `DIMENSION/4`.

In [8]:
ab = a & b
print(ab.active())
print(DIMENSION//4)

2104
2048


Using XOR, the exclusive disjunction which is 1 if its inputs differ, we can count the differences between two vectors.
This is also called the hamming distance.

In [10]:
diff = ab ^ c
print(diff.active())
print(ab.hamming(c))

3969
3969


To avoid needing to do the calculations in our head of how close this is `DIMENSION/2` we can use `bit_error_rate`, which is the normalized hamming distance.

In [15]:
print(ab.bit_error_rate(c))
print(ab.hamming(c)/DIMENSION)
print((ab ^ c).active_fraction())  # active fraction is x.active()/DIMENSION

0.4844970703125
0.4844970703125
0.4844970703125


While very useful for `rand`-like vectors, this measure can be misleading when the two vectors have different fractions of on bits. The cosine distance can factor in the on rates of its arguments.

In [14]:
print(ab.cosine(c, distance=True))
print(1 - (ab & c).active()/(ab.active()*c.active())**.5)

0.6367207566879403
0.6367207566879403


The Jaccard distance is a slightly different way to accomplish this, and takes into account overlapping active bits.

In [16]:
print(ab.jaccard(c, distance=True))
print(1 - (ab & c).active()/(ab | c).active())

0.7909525707453169
0.7909525707453169


Note that for both metrics their opposites, "similarity" instead of "distance", are more commonly used.

In [17]:
print(ab.cosine(c))  # also called Otsuka–Ochiai coefficient
print(ab.jaccard(c))  # also called Jaccard Index

0.3632792433120598
0.20904742925468314


Getting back to `rand` distributed vectors, let's calculate a vector similar to multiple other vectors using bundling, implemented by [majority](https://en.wikipedia.org/wiki/Majority_function).

In [18]:
mabc = BHV.majority([a, b, c])

print(a.bit_error_rate(mabc), b.bit_error_rate(mabc), c.bit_error_rate(mabc))

0.2452392578125 0.246826171875 0.2490234375


Recall a, b, and c are pairwise unrelated, they have a `bit_error_rate` of ~`0.5`.

In [19]:
for x, y in zip([a, b, c], [b, c, a]):
    assert abs(.5 - x.bit_error_rate(y)) < .02

If you're like me, you totally despise the `.02` right there. Luckily there's a more principled way to talk about this distance in terms of probability: how likely is it that our two vectors are this far apart coming from the same `Bernoulli(0.5)` distribution?
The function for checking this is `unrelated` which uses `std_apart` to get the "distance" in standard deviations and check if it's at least 6σ apart.
Of course you can set the threshold to your desired level certainty, 6σ would let 3.4 in a million vectors through.

In [23]:
for x, y in zip([a, b, c], [b, c, a]):
    assert x.unrelated(y)
    assert x.std_apart(y, invert=True) <= 6

LEVEL = 7  # only let one in a trillion through

for x, y in zip([a, b, c], [b, c, a]):
    assert x.unrelated(y, stdvs=LEVEL)
    assert x.std_apart(y, invert=True) <= LEVEL

Another way to get vectors that are a mixture of existing ones is using `select` (also known as mux and if-then-else). The first argument decides whether to take bits from the second or third argument.

In [49]:
cond = BHV.rand()

a_if_cond_else_b = cond.select(a, b)

print(a_if_cond_else_b.active_fraction())
print(a_if_cond_else_b.bit_error_rate(a))
print(a_if_cond_else_b.bit_error_rate(b))
print(a_if_cond_else_b.bit_error_rate(cond))

0.5030517578125
0.2476806640625
0.244384765625
0.4970703125


TODO.

In [150]:
d = BHV.rand()

a_if_cond_else_c = cond.select(a, c)


print(a_if_cond_else_b.bit_error_rate(a_if_cond_else_c))
print(a_if_cond_else_b.bias_rel(a_if_cond_else_c, cond))
print(a_if_cond_else_b.bias_rel(a_if_cond_else_c, ~cond))

# print(weight_rel(a_if_cond_else_b, a_if_cond_else_c, cond))
# print(weight_rel(a, b, cond))
# print(a_if_cond_else_b.bit_error_rate(a_if_cond_else_c))

0.249267578125
0.5
0.5123031496062992
