# On-Chain Clustering
In this notebook, we take care of clustering BTC addresses and entities based on their interaction with the LN. At the end, we will have a mapping between BTC entities and "components" (either star, snake, collector or proxy), that will be needed in the linking heuristics. The sections are:

1. Create on-chain Clusters
2. Sort and Verify Mapping

# 1. Create on-chain Clusters [Bernhard]
Here we present how we obtain on-chain clusters of BTC entities based on their opening/closing channels in the LN.

In [None]:
# inputs: funding_addresses_csv_file, settlement_addresses_csv_file
# outputs: star_file, snake_file, collector_file, proxy_file, funding_address_entity_file, settlement_address_entity_file

# 2. Sort and Verify Mapping

In [1]:
import sys
sys.path.append("..")

from utils import df_to_two_dicts, patterns_list

# inputs
from utils import patterns_files 
# star_file, snake_file, collector_file, proxy_file, funding_address_entity_file, settlement_address_entity_file

# outputs
from utils import patterns_sorted_files
# star_sorted_file, snake_sorted_file, collector_sorted_file, proxy_sorted_file

import pandas as pd

In [2]:
pattern_double_mapping = dict()
for pattern in patterns_list:
    pattern_double_mapping[pattern] = df_to_two_dicts(pd.read_csv(patterns_files[pattern])) # entity-star, star-entity

In [4]:
# check that there is no entity overlap between stars, snakes, collectors and proxies
print('overlap of entities snakes-stars:')
print(len(set(pattern_double_mapping['snakes'][0]).intersection(set(pattern_double_mapping['stars'][0]))))
print('overlap of entities snakes-proxies:')
print(len(set(pattern_double_mapping['snakes'][0]).intersection(set(pattern_double_mapping['proxies'][0]))))
print('overlap of entities snakes-collectors:')
print(len(set(pattern_double_mapping['snakes'][0]).intersection(set(pattern_double_mapping['collectors'][0]))))
print('overlap of entities proxies-collectors:')
print(len(set(pattern_double_mapping['proxies'][0]).intersection(set(pattern_double_mapping['collectors'][0]))))

overlap of entities snakes-stars:
0
overlap of entities snakes-proxies:
0
overlap of entities snakes-collectors:
0
overlap of entities proxies-collectors:
0


In [5]:
i = 1  # to avoid negative zero
component_sorted_mapping_dict = dict()
for pattern in patterns_list:
    component_sorted_mapping_dict[pattern] = dict()
    for component in pattern_double_mapping[pattern][1] :
        component_sorted_mapping_dict[pattern][component] = i
        i += 1
    print(pattern, 'till', i)

stars till 53
snakes till 5691
collectors till 7167
proxies till 8156


In [None]:
for patters in patterns_list:
    write_json(component_sorted_mapping_dict[pattern], patterns_sorted_files[pattern], True)