# Bloom filters in Ethereum

### Requirements

Python3
    
    $ pip install ethereum
    
    $ pip install rlp
    
    $ pip install jupyter
    
    $ pip install eth_abi


In [1]:
# Install a pip package in the current Jupyter kernel
import sys

!{sys.executable} -m pip install ethereum
!{sys.executable} -m pip install ethereum
!{sys.executable} -m pip install eth_abi

!{sys.executable} -m pip install rlp


[31mweb3 3.16.5 has requirement requests>=2.12.4, but you'll have requests 2.9.1 which is incompatible.[0m
[31mweb3 3.16.5 has requirement requests>=2.12.4, but you'll have requests 2.9.1 which is incompatible.[0m
[31mweb3 3.16.5 has requirement requests>=2.12.4, but you'll have requests 2.9.1 which is incompatible.[0m
[31mweb3 3.16.5 has requirement requests>=2.12.4, but you'll have requests 2.9.1 which is incompatible.[0m


In [12]:
# import abi
from ethereum import bloom, utils
import eth_abi
import rlp


We define a few functions that will be useful for us, the first one decoding an integer from a hexadecimal and the second one encoding a hex from an int

We can visualize an "empty" bloom filter as a bitmap of all 0s of length 512

In [15]:
# A "normal" bloom filter start, a bitmap with all 0s
log_bloom = utils.encode_hex(utils.zpad(utils.int_to_big_endian(0), 256))
print(log_bloom)
print(len(log_bloom))



00000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000
512


Now we add an element to the bloom filter, we will add an address to it

In [16]:
encoded_address = utils.decode_hex(utils.remove_0x_head('0x864Be2775d392787D5fa37ee1DB45FE0b1B3D1FC'))
# Add an address ot the bloom filter
b = bloom.bloom_insert(0, encoded_address)
# let's see what we have in hex
print(bloom.b64(b))
# Turn it into a hex string
print(utils.encode_hex(bloom.b64(b)))


b'\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00 \x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x04\x00\x00\x00\x00\x00\x00\x00\x00\x00\x08\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\

We see that we have 3 entries in the logbloom. How are they determined? 

We take the hash of the address (keccack256), take the low order 9 bits of the first three double bytes (2 hex numbers) of the hash digest. The numbers that come out are the positions of the bits.

Example:

bloom(0f572e5295c57f15886f9b263e2f6d2d6c7b5ec6)
sha3: bd2b01afcd27800b54d2179edc49e2bffde5078bb6d0b204694169b1643fb108
first double-bytes: bd2b, 01af, cd27 -- which leads to bits in bloom --> 299, 431, 295


We can check very easily if the address that we passed in is in the set


In [17]:
print(bloom.bloom_query(b, encoded_address))

True


Let's try another one that is not in the set

In [19]:
other_address = encoded_address = utils.decode_hex(utils.remove_0x_head('0x0f572e5295c57f15886f9b263e2f6d2d6c7b5ec6'))
print(bloom.bloom_query(b, encoded_address))

False


In [23]:
new_b = bloom.bloom_insert(b, 1);

print(bloom.bloom_query(new_b, '1'))
print(bloom.bloom_query(new_b, '2'))

True
False


What does bloom query do? Takes in the value and adds it to a new 0 bloom and compares the 2 blooms

In [24]:
    bloom2 = bloom.bloom_insert(0, '2')
    print((new_b & bloom2) == bloom2)
    bloom3 = bloom.bloom_insert(0, '1')
    print((new_b & bloom3) == bloom3)

False
True


Now let's look at a bloom filter in action for receipts in ethereum. We first use the abi to be able to encode the values. The log entry consists of one topic and a data field. We notice that we have 2 indexed inputs: the to and from address, while the value is not.

In [25]:

abi = {
  "anonymous": "false",
  "inputs": [
    {
      "indexed": True,
      "name": "from",
      "type": "address"
    },
    {
      "indexed": True,
      "name": "to",
      "type": "address"
    },
    {
      "indexed": False,
      "name": "value",
      "type": "uint256"
    }
  ],
  "name": "Transfer",
  "type": "event"
}

We add a real world example transaction receipt, one where a user transferred tokens from one address to the other.

In [29]:
example_receipt = {
  "jsonrpc": "2.0",
  "id": 1,
  "result": {
    "blockHash": "0xc57ff9020f066420198584aafc2944c8abaac1038e56a3f5a347bbd199111956",
    "blockNumber": "0x54c0fb",
    "contractAddress": "null",
    "cumulativeGasUsed": "0x684a0c",
    "from": "0xcc56dcc36d43341c074f0fc06aec3211cd8f8f44",
    "gasUsed": "0x92c3",
    "logs": [{
      "address": "0xea38eaa3c86c8f9b751533ba2e562deb9acded40",
      "topics": ["0xddf252ad1be2c89b69c2b068fc378daa952ba7f163c4a11628f55a4df523b3ef", "0x000000000000000000000000cc56dcc36d43341c074f0fc06aec3211cd8f8f44", "0x0000000000000000000000001c4b70a3968436b9a0a9cf5205c787eb81bb558c"],
      "data": "0x000000000000000000000000000000000000000000000cb28be99bb554a80000",
      "blockNumber": "0x54c0fb",
      "transactionHash": "0x865edf70c0e4b9860a6fe3af62f095ad7f9d3d881ab5ab4dfe3cf8fcead8c843",
      "transactionIndex": "0x3b",
      "blockHash": "0xc57ff9020f066420198584aafc2944c8abaac1038e56a3f5a347bbd199111956",
      "logIndex": "0x9e",
      "removed": "false"
    }],
    "logsBloom": "0x00000000000000000000000000000000000000000000000000000000000000100000008000000000000000000100000000000000000000000000000000000000000000000000000000000008001000000000000000000000000000000000000000000000000000000000000000000000200000000000000000000010000000000000000000002000000000000000000000000000000000000000000000000000000000000000000000000000000000000002000000000000000000000000000000000002000000000000000000000000000000000000000000000000004000000000000000000000000080000000000000000000000000000000000000000000",
    "status": "0x1",
    "to": "0xea38eaa3c86c8f9b751533ba2e562deb9acded40",
    "transactionHash": "0x865edf70c0e4b9860a6fe3af62f095ad7f9d3d881ab5ab4dfe3cf8fcead8c843",
    "transactionIndex": "0x3b"
  }
}

Under "topics", we see that we have an array with 3 different elements, one is a hash, and others seems to be encoded data

In [33]:
topics = example_receipt["result"]["logs"][0]["topics"]
print(topics)

['0xddf252ad1be2c89b69c2b068fc378daa952ba7f163c4a11628f55a4df523b3ef', '0x000000000000000000000000cc56dcc36d43341c074f0fc06aec3211cd8f8f44', '0x0000000000000000000000001c4b70a3968436b9a0a9cf5205c787eb81bb558c']


The first field corresponds to the hash ot the event 

In [38]:
event_hash = '0x' + utils.encode_hex(utils.sha3('Transfer(address,address,uint256)'))
print(event_hash == topics[0])


True


Now let's retrieve the values that are indexed and not indexed from the abi

In [42]:
types = [i['type'] for i in abi['inputs'] if not i['indexed']]
print(types)

['uint256']


The unindexed type is the value field and it's actual value is located in the data field of the receipt

In [44]:
logs = example_receipt["result"]["logs"]
values = eth_abi.decode_abi(types, logs[0]["data"])
print(values)

(59962000000000000000000,)


Now we do the same with the indexed event variables, but now their value is stored in the topics array that we saw earlier

In [48]:

# indexed ones

indexed_types = [i['type'] for i in abi['inputs'] if i['indexed']]

indexed_names = [i['name'] for i in abi['inputs'] if i['indexed']]

indexed_values = [eth_abi.decode_single(t, v) for t, v in zip(indexed_types, logs[0]['topics'][1:])]

print(indexed_names, indexed_values)

['from', 'to'] ['0xcc56dcc36d43341c074f0fc06aec3211cd8f8f44', '0x1c4b70a3968436b9a0a9cf5205c787eb81bb558c']


We then bring it all together into the originial event that happened:

In [49]:
event_info  = {
    "from": indexed_values[0],
    "to": indexed_values[1],
    "value": values[0]
}

print(event_info)


# https://hur.st/bloomfilter/?n=3&p=1.0E-9&m=256&k=


{'from': '0xcc56dcc36d43341c074f0fc06aec3211cd8f8f44', 'value': 59962000000000000000000, 'to': '0x1c4b70a3968436b9a0a9cf5205c787eb81bb558c'}
