# Block attribuiton customized

This notebook updates the `miners.json` with manual edits to `miners_customized.json`
and adds different attributions to `blocks_attributions_0-$(current_blockheight).json`



### Types of attributions:
- `custom_addr`
- `custom_marker`
- `cusrtom`
- `graphsense_cluster`
- `graphsense_tag`


### Later:
The address history is not done anymore in this notebook.
The `addresses_0-$(current_blockheight).json` file contains the tx histroy of all coinbase output addresses. 
- `address_history`

## Imports

In [1]:
# python3.5
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

import re
import binascii
import sys
import random # draw samples

import csv
import json

import os.path # os.path.isfile; 
from collections import defaultdict

import pprint
pp = pprint.PrettyPrinter(indent=4)

In [2]:
# custom imports 
import util
from importlib import reload
reload(util)

<module 'util' from '/home/matteo/deep_dive/util.py'>

## Global varibales and functions

In [3]:
# data up to blockheight 
current_blockheight = util.CURRENT_BLOCKHEIGHT
print(current_blockheight)

556400


### Input

#### `miners.json`
The mining entity data from:
* blockchain.info 
* blocktrail.com

Again this data is in our custom JSON format defined above.

In [4]:
miners_json_file = './dataset/miners.json'
assert( os.path.isfile(miners_json_file) )

#### `blocks_attribution_0-$(current_blockheight).json`
already performed block attributions to link same miners and pools together:
- `blockchain_info_tag`
- `blockchain_info_address`
- `blockchain_info`
- `blocktrail`

In [5]:
blocks_attribution_json_file = './dataset/blocks_attribution_0-' + str(current_blockheight) + '.json'
assert( os.path.isfile(blocks_attribution_json_file) )

In [6]:
!du -sh $blocks_attribution_json_file

1,3G	./dataset/blocks_attribution_0-556400.json


#### `address_cluster.json`
Graphsense address cluster created by multiple-input clustering

* `address`: address that gets mapped to a cluster
* `cluster`: graphsense cluster

```
address,cluster
1qp3SJMVVEDGMVwdKqFnMzGa2xEPxMdjh,234254928
1bp7rn5kNqVUMDwzNgpGDcK2AmQXsx5y7,217034837
```

In [8]:
address_cluster_csv_file = './dataset/address_cluster.csv'
assert( os.path.isfile(address_cluster_csv_file) )

In [9]:
!du -sh $address_cluster_csv_file

109M	./dataset/address_cluster.csv


In [12]:
# load all address->cluster mappings into a dict

address_to_cluster = dict()
i = 0
limit = 1000000
#limit = 100

with open(address_cluster_csv_file) as fp:
    reader = csv.DictReader(fp)
    for row in reader:
        address_to_cluster[ row["address"] ] = int(row["cluster"])
        #i += 1
        #if i > limit:
        #    break

len(address_to_cluster)

2554886

#### `cluster_tags.json`
Graphsense address cluster tags for clusters

* `cluster`: graphsense cluster id
* `address`: address that was mapped to this cluster
* `tag`: tag of this cluster 
* `source`: source where this tag came from
* `actor_category`: see below
* `source_uri`: source of the tag uri

```
cluster,address,tag,source,actor_category,source_uri
104828426,19LyARU1iNQ6zDJknAm3hsakUHy5Fa2C2M,Saalbacher Hof,blockchain.info,,https://blockchain.info/de/tags
100416883,1HYE1CcnTxUdFuvF4mit4hqhUxaLe6eft3,P2P ND Haarlem 3,blockchain.info,,https://blockchain.info/de/tags
```

In [14]:
address_cluster_tags_csv_file = './dataset/cluster_tags.csv'
assert( os.path.isfile(address_cluster_tags_csv_file) )

In [15]:
!du -sh $address_cluster_tags_csv_file

196K	./dataset/cluster_tags.csv


### Output

#### `miners_custom.json`
The mining entity data from:
* blockchain.info 
* blocktrail.com

Plus:
* **custom edits and manually identified markers**

Again this data is in our custom JSON miner format.

In [17]:
miners_custom_json_file = './dataset/miners_custom.json'
if os.path.isfile(miners_custom_json_file):
    print("Output file " + miners_custom_json_file +  "exists, will be overwritten.")

#### `blocks_attribution_0-$(current_blockheight).json`
already performed block attributions to link same miners and pools together:
- `blockchain_info_tag`
- `blockchain_info_address`
- `blockchain_info`
- `blocktrail`

Plus:
- `coinbase_marker`
- `graphsense_cluster`
- `graphsense_tag`

In [18]:
blocks_attribution_json_file = './dataset/blocks_attribution_0-' + str(current_blockheight) + '.json'
if os.path.isfile(blocks_attribution_json_file):
    print("Output file " + blocks_attribution_json_file +  "exists, will be overwritten.")

Output file ./dataset/blocks_attribution_0-556400.jsonexists, will be overwritten.


#### `address_conflicts_0-$(current_blockheight).json`
~~The history of every coinbase output address~~
Only **conflicting**:

The address history of every coinbase output address that was attributed to more than one mining pool

In [19]:
addresses_json_file = './dataset/address_conflicts_0-' + str(current_blockheight) + '.json'
if os.path.isfile(addresses_json_file):
    print("Output file " + addresses_json_file +  "exists, will be overwritten.")

#### `attribution_conflicts_0-$(current_blockheight).json`
Attribution conflicts

In [20]:
conflicts_json_file = './dataset/conflicts_0-' + str(current_blockheight) + '.json'
if os.path.isfile(conflicts_json_file):
    print("Output file " + conflicts_json_file +  "exists, will be overwritten.")

#### `cluster_to_miners_conflicts_0-$(current_blockheight).json`
Collisions of clusters that where attributed to more than one miner

In [21]:
cluster_to_miners_collisions_json_file = './dataset/cluster_to_miners_conflicts_0-' + str(current_blockheight) + '.json'
if os.path.isfile(cluster_to_miners_collisions_json_file):
    print("Output file " + cluster_to_miners_collisions_json_file +  "exists, will be overwritten.")

## Custom addr/marker attribution

In [22]:
with open(miners_json_file, 'r') as fp:
    miners = json.load(fp)

Manually add markers that have been identified by us to the *miners* dict:

In [23]:
# F2Pool:
#util.add_marker("F2Pool",miners,b"\xf0\x9f\x90\x9f","manual")  # the fish of DiscusFish (🐟 ) as bystes, works but has a problem when persited to json file
util.add_marker("F2Pool",miners,"🐟","manual")  # the fish of DiscusFish (🐟 )

# BTCC:
#util.add_marker("BTCC Pool",miners,"btcc","manual") # <- false positive: 'Mined by user shbtcc'
#util.add_marker("BTCC Pool",miners,"BTCC","manual") # already in: '/BTCC/'

# AntPool:
#util.add_marker("AntPool",miners,"AntPool","manual") # already in: 'Mined by AntPool' '/AntPool/'

# ViaBTC:
#util.add_marker("ViaBTC",miners,"viabtc.com","manual") # already in: 'viabtc.com deploy'

# CANOE
util.add_marker("CANOE",miners,"/canoepool/","manual")

# Bixin / HaoBTC
util.add_marker("Bixin",miners,"Bixin","manual")
util.add_marker("Bixin",miners,"HAOBTC","manual") # already in: '/HaoBTC/'
#util.add_marker("Bixin",miners,"haobtc","manual") # <- this produces a lot of false positives with F2Pool 'Mined by haobtc'

# BTC.TOP
#util.add_marker("BTC.TOP",miners,"/E2M & BTC.TOP/","manual") # This pool is now called 'WAYI.CN' 

# BitClub Network
# util.add_marker("BitClub Network",miners,"BitClub Network","manual") # already in: '/BitClub Network/'

util.add_marker("Poolin",miners,"/poolin.com/","manual")

# Add manually identified pool names by searching for coinbase strings:

util.add_name("digitalBTC",miners,"MINTSY","manual")
util.add_name("Mt Red",miners,"MTRED","manual")
util.add_name("Waterhole",miners,"WATERHOLE.IO","manual")
util.add_name("Bravo Mining",miners,"BRAVO","manual")
util.add_name("BitcoinRussia",miners,"BITCOIN-RUSSIA.RU","manual")
util.add_name("Bitcoin India",miners,"BITCOIN-INDIA","manual")
util.add_name("xbtc.exx.com&bw.com",miners,"EXX","manual")
util.add_name("PHash.IO",miners,"PHASH.CN","manual")

# Add manually idenified miners by searching for coinbase strings:

# btcpoolman (only mined one block)
util.add_miner("btcpoolman",
               miners,
               names_dict= { "BTCPOOLMAN": { util.DD_URL:"",
                                            util.DD_CURRENCIES: ["BTC",],
                                            util.DD_FULLNAME: "",
                                            util.DD_FIRSTUSED: 0,
                                            util.DD_LASTUSED: 0,
                                            util.DD_SOURCES:["manual",] } },
               addresses_dict= { "1J4UEn8dauHCpaU8zfuAqkZJ7U1556gmEf": { util.DD_CURRENCIES: ["BTC",],
                                              util.DD_FIRSTUSED: 0,
                                              util.DD_LASTUSED: 0,
                                              util.DD_SOURCES:["manual",] } },
               markers_dict={ "/btcpoolman/": { util.DD_CURRENCIES: ["BTC",],
                                              util.DD_FIRSTUSED: 0,
                                              util.DD_LASTUSED: 0,
                                              util.DD_SOURCES:["manual",] } },)
# b2pool
util.add_miner("b2pool",
               miners,
               names_dict= { "B2POOL": { util.DD_URL:"",
                                            util.DD_CURRENCIES: ["BTC",],
                                            util.DD_FULLNAME: "",
                                            util.DD_FIRSTUSED: 0,
                                            util.DD_LASTUSED: 0,
                                            util.DD_SOURCES:["manual",] } },
               addresses_dict= { "3Nz8KABPqaHWYRfmrizpk8KKJJtNANMNtP": { util.DD_CURRENCIES: ["BTC",],
                                              util.DD_FIRSTUSED: 0,
                                              util.DD_LASTUSED: 0,
                                              util.DD_SOURCES:["manual",] } },
               markers_dict={ "b2pool": { util.DD_CURRENCIES: ["BTC",],
                                              util.DD_FIRSTUSED: 0,
                                              util.DD_LASTUSED: 0,
                                              util.DD_SOURCES:["manual",] } },)

# migpool
util.add_miner("migpool",
               miners,
               names_dict= { "MIGPOOL": { util.DD_URL:"",
                                            util.DD_CURRENCIES: ["BTC",],
                                            util.DD_FULLNAME: "",
                                            util.DD_FIRSTUSED: 0,
                                            util.DD_LASTUSED: 0,
                                            util.DD_SOURCES:["manual",] } },
               addresses_dict= { "3JM1Uu9vFjSPrVve8S6v7hkHNVGzV3dW98": { util.DD_CURRENCIES: ["BTC",],
                                              util.DD_FIRSTUSED: 0,
                                              util.DD_LASTUSED: 0,
                                              util.DD_SOURCES:["manual",] } },
               markers_dict={ "/MiGPool/": { util.DD_CURRENCIES: ["BTC",],
                                              util.DD_FIRSTUSED: 0,
                                              util.DD_LASTUSED: 0,
                                              util.DD_SOURCES:["manual",] } },)

# btpool
util.add_miner("BTPOOL",
               miners,
               names_dict= { "btpool": { util.DD_URL:"",
                                            util.DD_CURRENCIES: ["BTC",],
                                            util.DD_FULLNAME: "",
                                            util.DD_FIRSTUSED: 0,
                                            util.DD_LASTUSED: 0,
                                            util.DD_SOURCES:["manual",] } },
               addresses_dict= { "1134jVdutTZpDfsPjLu4U1vE3UURdWeFui": { util.DD_CURRENCIES: ["BTC",],
                                              util.DD_FIRSTUSED: 0,
                                              util.DD_LASTUSED: 0,
                                              util.DD_SOURCES:["manual",] } },
               markers_dict={ "/BTPOOL/": { util.DD_CURRENCIES: ["BTC",],
                                              util.DD_FIRSTUSED: 0,
                                              util.DD_LASTUSED: 0,
                                              util.DD_SOURCES:["manual",] } },)

bt_pool_addresses = {
 '134AnrmSLjjCPP6b8ZjT4YKLX6i2RJBqf9',
 '14gXJ8DXTTwbwrnjBmVG9Q6MGEFgu1h3GJ',
 '157BdGfshRYJt38M3wAomB7vPvLV8xMGqc',
 '1618ZtpkSNQ2VHhbTcsZTyTCJAH8WToJav',
 '16wpxZavanjK3excfeP3Z2hrxhgNR2ZQEK',
 '18BjFdiThEtu7D8hF3yURRPyPh9gNkRcBB',
 '19NuSmWEaCzvrFipZPWEQgZBKspbirsUmh',
 '19eC3KhGNKoAij7tTX41cvS3wL6K4w3kBC',
 '1A9fXzM5YhygvTPpRuv3hASUVvQrdzrdid',
 '1BjQJ9qmkmxHCUgj2pfjgfX9xEykqjMoXA',
 '1CFQ6a9yk2Buj3hCvXCcy7fVHJip2bCfa8',
 '1HFBo8KGsx6GufonyMw3WKwLxckdvYatPF',
 '1Hbu4aBg4ngy2J5vdgb7fagYaNXTryy8gA',
 '1LJVjXVRamik5N2cZ11B4bbhChmvbzZoYm',
 '1NQNUt6dZYvYaovYPL5KRCc8abpSAuNThP',
 '1PHhNs6de2qQfpjBEVAz5ujGKZrTHQNhnJ',
 '1Q7MES3Ww6cUHBNTyNBgf9kZHHDFAq6Asp',
 '1iP4ZL65gaSVDqg7VVDJniopX2cDcRKHi'    
}
for address in bt_pool_addresses:
    util.add_addr("btpool",miners,address,source="manual",currencies=["BTC",])

# coinpool
util.add_miner("coinpool",
               miners,
               names_dict= { "COINPOOL": { util.DD_URL:"",
                                            util.DD_CURRENCIES: ["BTC",],
                                            util.DD_FULLNAME: "",
                                            util.DD_FIRSTUSED: 0,
                                            util.DD_LASTUSED: 0,
                                            util.DD_SOURCES:["manual",] } },
               addresses_dict= { "1FuAnHgWpHjviGAXBQGvJrdUfwwmtSoTUG": { util.DD_CURRENCIES: ["BTC",],
                                              util.DD_FIRSTUSED: 0,
                                              util.DD_LASTUSED: 0,
                                              util.DD_SOURCES:["manual",] } },
               markers_dict={ "ecoinpool": { util.DD_CURRENCIES: ["BTC",],
                                              util.DD_FIRSTUSED: 0,
                                              util.DD_LASTUSED: 0,
                                              util.DD_SOURCES:["manual",] } },)

# mkalinin.ru
util.add_miner("mkalinin.ru",
               miners,
               names_dict= { "pool.mkalinin.ru": { util.DD_URL:"http://dribbble.com",
                                            util.DD_CURRENCIES: ["BTC",],
                                            util.DD_FULLNAME: "",
                                            util.DD_FIRSTUSED: 0,
                                            util.DD_LASTUSED: 0,
                                            util.DD_SOURCES:["manual",] } },
               addresses_dict= { "13vYtrrm69Aba27Ezxhc71jx4pcyoLWdvL": { util.DD_CURRENCIES: ["BTC",],
                                              util.DD_FIRSTUSED: 0,
                                              util.DD_LASTUSED: 0,
                                              util.DD_SOURCES:["manual",] } },
               markers_dict={ "mmpool.mkalinin.ru": { util.DD_CURRENCIES: ["BTC",],
                                              util.DD_FIRSTUSED: 0,
                                              util.DD_LASTUSED: 0,
                                              util.DD_SOURCES:["manual",] } },)

util.add_addr("mkalinin.ru",miners,'1EoAmuLBeLC2YpRAFwRnzM6PrbaGUNFgW8',source="manual",currencies=["BTC",])

# coinumserv
util.add_miner("coiniumserv",
               miners,
               names_dict= { "COINIUMSERV": { util.DD_URL:"",
                                            util.DD_CURRENCIES: ["BTC",],
                                            util.DD_FULLNAME: "",
                                            util.DD_FIRSTUSED: 0,
                                            util.DD_LASTUSED: 0,
                                            util.DD_SOURCES:["manual",] } },
               addresses_dict= { "1KVcqebRwgwRK6PMCrn34KoSRbm7gfXv8B": { util.DD_CURRENCIES: ["BTC",],
                                              util.DD_FIRSTUSED: 0,
                                              util.DD_LASTUSED: 0,
                                              util.DD_SOURCES:["manual",] } },
               markers_dict={ "/CoiniumServ/": { util.DD_CURRENCIES: ["BTC",],
                                              util.DD_FIRSTUSED: 0,
                                              util.DD_LASTUSED: 0,
                                              util.DD_SOURCES:["manual",] } },)

# Tigerpool
util.add_miner("tigerpool.net",
               miners,
               names_dict= { "TigerPool": { util.DD_URL:"",
                                            util.DD_CURRENCIES: ["BTC",],
                                            util.DD_FULLNAME: "",
                                            util.DD_FIRSTUSED: 0,
                                            util.DD_LASTUSED: 0,
                                            util.DD_SOURCES:["manual",] } },
               markers_dict={ "tigerpool.net": { util.DD_CURRENCIES: ["BTC",],
                                              util.DD_FIRSTUSED: 0,
                                              util.DD_LASTUSED: 0,
                                              util.DD_SOURCES:["manual",] } },)

# Masterpool 
util.add_miner("Masterpool",
               miners,
               names_dict= { "MasterPool": { util.DD_URL:"",
                                            util.DD_CURRENCIES: ["BTC",],
                                            util.DD_FULLNAME: "",
                                            util.DD_FIRSTUSED: 0,
                                            util.DD_LASTUSED: 0,
                                            util.DD_SOURCES:["manual",] } },
               markers_dict={ "/MasterPool/": { util.DD_CURRENCIES: ["BTC",],
                                              util.DD_FIRSTUSED: 0,
                                              util.DD_LASTUSED: 0,
                                              util.DD_SOURCES:["manual",] } },)

# BITFARMS 
util.add_miner("Bitfarms",
               miners,
               names_dict= { "BITFARMS": { util.DD_URL:"",
                                            util.DD_CURRENCIES: ["BTC",],
                                            util.DD_FULLNAME: "",
                                            util.DD_FIRSTUSED: 0,
                                            util.DD_LASTUSED: 0,
                                            util.DD_SOURCES:["manual",] } },
               markers_dict={ "BITFARMS": { util.DD_CURRENCIES: ["BTC",],
                                              util.DD_FIRSTUSED: 0,
                                              util.DD_LASTUSED: 0,
                                              util.DD_SOURCES:["manual",] } },)

# okminer
util.add_miner("okminer",
               miners,
               names_dict= { "OKminer": { util.DD_URL:"",
                                            util.DD_CURRENCIES: ["BTC",],
                                            util.DD_FULLNAME: "",
                                            util.DD_FIRSTUSED: 0,
                                            util.DD_LASTUSED: 0,
                                            util.DD_SOURCES:["manual",] } },
               markers_dict={ "okminer": { util.DD_CURRENCIES: ["BTC",],
                                              util.DD_FIRSTUSED: 0,
                                              util.DD_LASTUSED: 0,
                                              util.DD_SOURCES:["manual",] } },)

print()




Perform custom attributions based on updated *miner* information:

In [26]:
with open(blocks_attribution_json_file, 'r') as fp:
    blocks = json.load(fp)

In [27]:
def attribute_blocks(blocks,
                     miners_dict,
                     addr_attr,
                     marker_attr,
                     both_attr,
                     source,
                     override=False,
                     update=False):
    """ Attribute given blocks based on given miners_dict json
    
    Takes names for the different attribution per address, marker and both as well as a source 
    from where the miners_dict information comes from. Overrides existing attributions with given
    names if override flag is set. 
    Returns tuple of (blocks,miners_dict,conflicts) and does change miners_dict in the process.
    """
    i = 0
    conflicts = list()
    conflicts.clear()

    for blknum in blocks:
        match = list()
        addr_match = list()
        cb_match = list()

        try:
            # first always test if not already attributed 
            if ( addr_attr not in blocks[ blknum ][ util.D_ATTRIBUTIONS ].keys() ) or override:
                # match address
                if len( blocks[ blknum ][ util.D_ADDRESSES ] ) == 1:
                    # only match if there is just one output address in the coinbase 
                    address = blocks[ blknum ][ util.D_ADDRESSES ][0]
                    match = util.match_address_to_miner( address, miners_dict, strict=False, blknum=int(blknum) )

                    if len( match ) >= 1:
                        # if multiple coinbase markers match we can get more than one match
                        matched_miners = defaultdict(list)
                        for ma in match:
                            matched_miners[ ma[0] ].append( ma[1] )
                        j = 0
                        attr = ""
                        for mi in matched_miners:
                            blocks[ blknum ][ util.D_ATTRIBUTIONS ][ addr_attr + attr ] = { util.DDD_MINER:mi,
                                                                                               "matches":matched_miners[mi],
                                                                                               util.DDD_SRC:source }
                            j += 1
                            attr = str(j)

            if ( marker_attr not in blocks[ blknum ][ util.D_ATTRIBUTIONS ].keys() ) or override:
                # match coinbase
                coinbase = blocks[ blknum ][ util.D_CB ]
                match = util.match_coinbase_to_miner( coinbase, miners_dict, strict=False, blknum=int(blknum) )

                if len( match ) >= 1:
                    # if multiple coinbase markers match we can get more than one match
                    matched_miners = defaultdict(list)
                    for ma in match:
                        matched_miners[ ma[0] ].append( ma[1] )
                    j = 0
                    attr = ""
                    for mi in matched_miners:
                        blocks[ blknum ][ util.D_ATTRIBUTIONS ][ marker_attr + attr ] = { util.DDD_MINER:mi,
                                                                                           "matches":matched_miners[mi],
                                                                                           util.DDD_SRC:source }
                        j += 1
                        attr = str(j)

            if ( both_attr not in blocks[ blknum ][ util.D_ATTRIBUTIONS ].keys() ) or override:
                # match both and update miners
                coinbase = blocks[ blknum ][ util.D_CB ]
                if len( blocks[ blknum ][ util.D_ADDRESSES ] ) == 1:
                    address = blocks[ blknum ][ util.D_ADDRESSES ][0]
                    match = util.match_miner(miners_dict,address,coinbase,update=update, blknum=int(blknum) )
                else:
                    match = util.match_miner(miners=miners_dict,coinbase=coinbase, blknum=int(blknum) )

                if len( match ) > 0:
                    # There could be more than one marker of the same pool that matches simultaniously
                    matches = list()
                    #print(match)
                    for m in match:
                        matches.append( m[1] )
                    blocks[ blknum ][ util.D_ATTRIBUTIONS ][ both_attr ] = { util.DDD_MINER:match[0][0],
                                                                                    "matches":matches,
                                                                                    util.DDD_SRC:source }
        except util.ConflictingMinerData as e:
            print()
            print("Message    = ",e.message)
            print("Blockheight= ",blknum)
            print("Miner1     = ",e.miner1)
            print("Miner2     = ",e.miner2)
            print("Coinbase   = ",e.coinbase)
            print("CoinbaseStr= ",repr(binascii.unhexlify(e.coinbase)))
            print("Addesses   = ",e.address)
            print("addr_match = ",e.addr_match)
            print("cb_match   = ",e.cb_match)
            conflicts.append( { "message":e.message,
                                util.DDD_MINER + "1":e.miner1,
                                util.DDD_MINER + "2":e.miner2,
                                util.D_CB + "1":e.coinbase,
                                "address": e.address,
                                "addr_match": e.addr_match,
                                "cb_match": e.cb_match,
                                util.DDD_SRC:source } )

        # progress bar     
        i+=1
        if i % 1000 == 0:
            print(i,end="\r")
            sys.stdout.flush()
    return (blocks,miners_dict,conflicts)

In [28]:
i = 0
conflicts = list()

override_custom = True

for blknum in blocks:
    match = list()
    addr_match = list()
    cb_match = list()
    
    
    try:
        if ( util.DD_CUSTOM_ADDR_ATTR not in blocks[ blknum ][ util.D_ATTRIBUTIONS ].keys() ) or override_custom:
            if len( blocks[ blknum ][ util.D_ADDRESSES ] ) == 1:
                address = blocks[ blknum ][ util.D_ADDRESSES ][0]
                match = util.match_address_to_miner(address,miners,strict=False,blknum=int(blknum))
                #if len( match ) == 1:
                #    blocks[ blknum ][ util.D_ATTRIBUTIONS ][ util.DD_CUSTOM_ADDR_ATTR ] = { util.DDD_MINER:match[0][0],
                #                                                                       "matches":[ match[0][1], ],
                #                                                                       util.DDD_SRC:"custom" }
                
                if len( match ) >= 1:
                        matched_miners = defaultdict(list)
                        for ma in match:
                            matched_miners[ ma[0] ].append( ma[1] )
                        j = 0
                        attr = ""
                        for mi in matched_miners:
                            blocks[ blknum ][ util.D_ATTRIBUTIONS ][ util.DD_CUSTOM_ADDR_ATTR + attr ] = { util.DDD_MINER:mi,
                                                                                       "matches":matched_miners[mi],
                                                                                       util.DDD_SRC:"custom" }
                            j += 1
                            attr = str(j)
                
        if ( util.DD_CUSTOM_MARKER_ATTR not in blocks[ blknum ][ util.D_ATTRIBUTIONS ].keys() ) or override_custom:
            coinbase = blocks[ blknum ][ util.D_CB ]
            match = util.match_coinbase_to_miner(coinbase,miners,strict=False,blknum=int(blknum))
            if len( match ) >= 1:
                # if multiple coinbase markers match we can get more than one match
                matched_miners = defaultdict(list)
                for ma in match:
                    matched_miners[ ma[0] ].append( ma[1] )
                j = 0
                attr = ""
                for mi in matched_miners:
                    blocks[ blknum ][ util.D_ATTRIBUTIONS ][ util.DD_CUSTOM_MARKER_ATTR + attr ] = { util.DDD_MINER:mi,
                                                                                       "matches":matched_miners[mi],
                                                                                       util.DDD_SRC:"custom" }
                    j += 1
                    attr = str(j)
                    
        # match with updated coinbase markers
        if ( util.DD_CUSTOM_ATTR not in blocks[ blknum ][ util.D_ATTRIBUTIONS ].keys() ) or override_custom:
            # match if not already attributed 
            coinbase = blocks[ blknum ][ util.D_CB ]
            if len( blocks[ blknum ][ util.D_ADDRESSES ] ) == 1:
                # match via address if there is only one coinbase output address
                address = blocks[ blknum ][ util.D_ADDRESSES ][0]
                match = util.match_miner(miners,address,coinbase,update=True,blknum=int(blknum))
            else:
                # match via coinbase markers in any case
                match = util.match_miner(miners=miners,coinbase=coinbase,blknum=int(blknum))
            if len( match ) == 1:
                blocks[ blknum ][ util.D_ATTRIBUTIONS ][ util.DD_CUSTOM_ATTR ] = { util.DDD_MINER:match[0][0],
                                                                                 "matches":[ match[0][1], ],
                                                                                 util.DDD_SRC:"custom" }

            if len( match ) > 1:
                # if multiple coinbase markers match we can get more than one match
                matches = list()
                #print(match)
                for m in match:
                    matches.append( m[1] )
                blocks[ blknum ][ util.D_ATTRIBUTIONS ][ util.DD_CUSTOM_ATTR ] = { util.DDD_MINER:match[0][0],
                                                                                 "matches":matches,
                                                                                 util.DDD_SRC:"manual" }  
                
    except util.ConflictingMinerData as e:
        #print()
        #print("Message    = ",e.message)
        #print("Blockheight= ",blknum)
        #print("Miner1     = ",e.miner1)
        #print("Miner2     = ",e.miner2)
        #print("Coinbase   = ",e.coinbase)
        #print("CoinbaseStr= ",repr(binascii.unhexlify(e.coinbase)))
        #print("Addesses   = ",e.address)
        #print("addr_match = ",e.addr_match)
        #print("cb_match   = ",e.cb_match)
        conflicts.append( { "message":e.message,
                            util.DDD_MINER + "1":e.miner1,
                            "blockheight":blknum,
                            util.DDD_MINER + "2":e.miner2,
                            util.D_CB + "1":e.coinbase,
                            "address": e.address,
                            "addr_match": e.addr_match,
                            "cb_match": e.cb_match,
                            util.DDD_SRC:"manual" }  )
        
        
    i+=1
    if i > 10000:
        i = 0
        sys.stdout.write('.')
        sys.stdout.flush()

.......................................................

In [30]:
(blocks,miners,conflicts) = util.attribute_blocks(blocks=blocks,
                                   miners_dict=miners,
                                   addr_attr=util.DD_CUSTOM_ADDR_ATTR,
                                   marker_attr=util.DD_CUSTOM_MARKER_ATTR,
                                   both_attr=util.DD_CUSTOM_ATTR,
                                   source="custom",
                                   override=True,
                                   update=True)

In [32]:
len(conflicts) # 163 # 415 # 596 # 601 # 602

602

In [35]:
(blocks,miners,conflicts) = util.attribute_blocks(blocks=blocks,
                                   miners_dict=miners,
                                   addr_attr=util.DD_CUSTOM_ADDR_ATTR,
                                   marker_attr=util.DD_CUSTOM_MARKER_ATTR,
                                   both_attr=util.DD_CUSTOM_ATTR,
                                   source="custom",
                                   override=True,
                                   update=True)

In [34]:
len(conflicts) # 163 # 415 # 596 # 601 # 602

602

In [36]:
i = 0
for conflict in conflicts:
    if conflict["message"] == 'Addr and Cb match differ':
        i += 1
        #print(conflict)
print(i) # 145

146


In [37]:
i = 0
for conflict in conflicts:
    if conflict["message"] == 'Multiple addresses match':
        i += 1
        #pprint.pprint(conflict)
print(i) # 270

451


In [39]:
util.check_for_miner_addresses_from_markers(miners)

75

In [40]:
util.check_for_obvious_address_collisions(miners)

147SwRQdpCfj5p8PnfsXV2SsVVpVcz3aPq : 2
1FLH1SoLv4U68yUERhDiWzrJn5TggMqkaZ : 2
19RE4mz2UbDxDVougc6GGdoT4x5yXxwFq2 : 2
197miJmttpCt2ubVs6DDtGBYFDroxHmvVB : 2


True

In [41]:
with open(blocks_attribution_json_file, 'w') as fp: 
    json.dump(blocks, fp)

In [42]:
with open(miners_custom_json_file, 'w') as fp:
    json.dump(miners, fp)

In [43]:
with open(conflicts_json_file, 'w') as fp:
    json.dump(conflicts, fp)

## Address history 

In [408]:
#if blocks is None:
#    with open(blocks_attribution_json_file, 'r') as fp:
#        blocks = json.load(fp)

In [409]:
len(blocks)

556401

Add all coinbase marker matches 

In [44]:
addresses = dict()

for blknum in blocks:
    # iterate over all blocks
    # and check found coinbase markers
    matches = list()
    if util.DD_CUSTOM_MARKER_ATTR in blocks[ blknum ][ util.D_ATTRIBUTIONS ].keys():
        # only check if there is a coinbase marker attribution
        coinbase = blocks[ blknum ][ util.D_CB ]
        if len( blocks[ blknum ][ util.D_ADDRESSES ] ) == 1:
            # only check if there is just one coinbase output address - otherwise we are not sure to which address the marker applies
            addr = blocks[ blknum ][ util.D_ADDRESSES ][0]
            
            for attr in blocks[ blknum ][ util.D_ATTRIBUTIONS ]:
                if attr.startswith( util.DD_CUSTOM_MARKER_ATTR ):
                    matches.extend( blocks[ blknum ][ util.D_ATTRIBUTIONS ][ attr ][ "matches" ] ) 
            
            for m in matches:
                # go over all matches and add them to a set for the address 
                #print(m)
                if "cb_match" in m.keys():  
                    if addr in addresses.keys():
                        addresses[ addr ].add(m["cb_match"])
                    else:
                        addresses[ addr ] = set() 
                        addresses[ addr ].add(m["cb_match"])
                    
        i+=1
        if i > 10000:
            i = 0
            sys.stdout.write('.')
            sys.stdout.flush()


..............................

In [45]:
len(addresses) # 6039 # 6100 # 6108 # 6114

6114

In [46]:
i = 0
address_collisions = dict()

for addr in addresses:
    if len(addresses[ addr ]) > 1:
        address_collisions[ addr ] = addresses[ addr ]
        print(addr,":",addresses[ addr ])
        #break
        i += 1
print(i)

1GG9HQZchCRxPSBV5SwZ9GoYEVq9vVLGqU : {'yourbtc.net', 'ozco.in'}
1KFHE7w8BhaENAswwryaoccDb6qcT6DbYY : {'🐟', '七彩神仙鱼'}
152f1muMCNa7goXYhYAQC61hxEgGacmncB : {'BTCChina.com', 'btcchina.com', '/BTCC/', 'BTCChina Pool'}
15urYnyeJe3gwbGJ74wcX89Tz7ZtsFDVew : {'/AntPool/', 'Mined by AntPool'}
1N2H8sDjwK7xM1RDZ6o5SVUuoDsynCKfCM : {'bypmneU', 'by polmine.pl'}
1DrK44np3gMKuvcGeFVv9Jk67zodP52eMu : {'/Bitfury/', '/BitFury/'}
18cBEMRxXHqzWWCxZNtU91F5sbUNKhL5PX : {'/ViaBTC/', 'viabtc.com deploy', 'okminer'}
1KsFhYKLs8qb1GHqrPxHoywNQpet2CtP9t : {'Bixin', '/HaoBTC/'}
1FLH1SoLv4U68yUERhDiWzrJn5TggMqkaZ : {'/BTC.COM/', '/WATERHOLE.IO/'}
147SwRQdpCfj5p8PnfsXV2SsVVpVcz3aPq : {'/BTC.TOP/', '/canoepool/'}
1Nh7uHdvY6fNwtQtM1G5EZAFPLC33B59rB : {'Mined By AntPool', 'Mined by AntPool'}
3BidxLnZUwkgrnKAdWN4freEzBTn2ganx8 : {'/DCEX/', '/DCExploration/'}
165GCEAx81wce33FWEnPCRhdjcXCrBJdKn : {'/Bitcoin-Russia.ru/', '/Bitcoin-Ukraine.com.ua/'}
1RtUKxMRGBrz7Qt3YPZJb988PddKCNEFk : {'BW Pool', 'BWPool'}
39pLhpmYEhYaz6Mrxv

In [47]:
# Since the pools are sets we have to write our own JSON encoder
class SetEncoder(json.JSONEncoder):
    def default(self, obj):
        if isinstance(obj, set):
            return list(obj)
        return json.JSONEncoder.default(self, obj)

with open(addresses_json_file, 'w') as fp:
    json.dump(address_collisions, fp, cls=SetEncoder)

## Gaphsense cluster

### Manually add miner names from walletexplorer.com
These names are amongst the Graphsense tags.


In [416]:
#with open(miners_custom_json_file, 'r') as fp:
#    miners = json.load(fp)

In [48]:
# manually add pool names to miners that have been fetched from walletexplorer.com
# and are also tags in the Graphsense clusters
# https://www.walletexplorer.com/
source = "walletexplorer.com"

# BTCC:
util.add_name("BTCC Pool",miners,"BTCCPool",source) 

# SlushPool
util.add_name("SlushPool",miners,"SlushPool.com",source)
util.add_name("SlushPool",miners,"SlushPool.com-old",source)
util.add_name("SlushPool",miners,"SlushPool.com-old2",source)

# GHash.IO
util.add_name("GHash.IO",miners,"GHash.io",source)

# AntPool:
util.add_name("AntPool",miners,"AntPool.com",source) 
util.add_name("AntPool",miners,"AntPool.com-old",source)
util.add_name("AntPool",miners,"AntPool.com-old2",source)

# BitMinter
util.add_name("BitMinter",miners,"BitMinter.com",source)

# EclipseMC
util.add_name("EclipseMC",miners,"EclipseMC.com",source)
util.add_name("EclipseMC",miners,"EclipseMC.com-old",source)
util.add_name("EclipseMC",miners,"EclipseMC.com-old2",source)
util.add_name("EclipseMC",miners,"EclipseMC.com-old3",source)

# KnCMiner
util.add_name("KnCMiner",miners,"KnCMiner.com",source)

# BitFury
util.add_name("BitFury",miners,"Bitfury.org",source)

# BW Pool
util.add_name("BW Pool",miners,"BW.com",source)

# Eligius
util.add_name("Eligius",miners,"Eligius.st",source)

# KanoPool
util.add_name("KanoPool",miners,"Kano.is",source)
util.add_name("KanoPool",miners,"Kano.is-old",source)

# Telco 214
util.add_name("Telco 214",miners,"Telco214",source)

# 58COIN
util.add_name("58COIN",miners,"58coin.com",source)



print()




In [419]:
#with open(miners_custom_json_file, 'w') as fp:
#    json.dump(miners, fp)

### Attribute blocks to clusters `graphsense_cluster`  i.e., only set miner if tag is from walletexplorer.com
____

Find all tags that correspond to miner names and assign a unique miner id from our miners json.
In a first iteration **only look for walletexplorer.com** miner names

In [49]:
name_to_miner = dict()

for m in miners:
    for n in miners[ m ][ util.D_NAMES ]:
        if "walletexplorer.com" in miners[ m ][ util.D_NAMES ][ n ][ util.DD_SOURCES ]:
            # only add walletexplor.com names
            name_to_miner[ n ] = m   

In [50]:
len(name_to_miner)

20

In [51]:
name_to_miner

{'BitMinter.com': 'BitMinter',
 'Eligius.st': 'Eligius',
 'EclipseMC.com': 'EclipseMC',
 'EclipseMC.com-old': 'EclipseMC',
 'EclipseMC.com-old2': 'EclipseMC',
 'EclipseMC.com-old3': 'EclipseMC',
 'GHash.io': 'GHash.IO',
 'KnCMiner.com': 'KnCMiner',
 'SlushPool.com': 'SlushPool',
 'SlushPool.com-old': 'SlushPool',
 'SlushPool.com-old2': 'SlushPool',
 'AntPool.com': 'AntPool',
 'AntPool.com-old': 'AntPool',
 'AntPool.com-old2': 'AntPool',
 'Kano.is': 'KanoPool',
 'Kano.is-old': 'KanoPool',
 'BTCCPool': 'BTCC Pool',
 'Bitfury.org': 'BitFury',
 '58coin.com': '58COIN',
 'Telco214': 'Telco 214'}

Try to attribute a mining pools to a cluster based on the cluster tags of walletexplorer.com
The end result is a **mapping of cluster id to mining pool** in the miners json

In [52]:
cluster_to_miner = dict()

with open(address_cluster_tags_csv_file) as fp:
    reader = csv.DictReader(fp)
    for row in reader:     
        for n in name_to_miner.keys():
            if n in row[ "tag" ]:
                cluster_to_miner[ row["cluster"] ] = name_to_miner[ n ]

In [53]:
len(cluster_to_miner)

16

In [54]:
cluster_to_miner

{'2457517': 'BitMinter',
 '9967943': 'Eligius',
 '12567755': 'SlushPool',
 '17184660': 'GHash.IO',
 '23012884': 'AntPool',
 '27183019': 'KnCMiner',
 '39061298': 'EclipseMC',
 '48240376': 'SlushPool',
 '49332647': 'BitFury',
 '55843419': 'AntPool',
 '56989427': 'EclipseMC',
 '58788964': 'EclipseMC',
 '88831558': 'KanoPool',
 '97589177': 'BTCC Pool',
 '196188226': 'Telco 214',
 '210140240': 'AntPool'}

#### Load all cluster addresses and attribute clusters to each miner

In [56]:
# load clusters using pandas dataframe

df = pd.read_csv(address_cluster_csv_file)
print(len(df))
print(df.columns)
address_cluster = df.to_dict('index')

# load clusters using pandas dataframe in chunks

chunksize = 100
df = pd.read_csv(address_cluster_csv_file, nrows=chunksize)
address_cluster = df.to_dict('index')
address_cluster = pd.read_csv(address_cluster_csv_file).to_dict('index')
len(address_cluster)

2554886
Index(['address', 'cluster'], dtype='object')


2554886

In [57]:
# load all address->cluster mappings into a dict

address_to_cluster = dict()
i = 0
limit = 1000000
#limit = 100

with open(address_cluster_csv_file) as fp:
    reader = csv.DictReader(fp)
    for row in reader:
        address_to_cluster[ row["address"] ] = int(row["cluster"])
        #i += 1
        #if i > limit:
        #    break

len(address_to_cluster)


2554886

Find clusters that belong to currently known mining pools i.e., attribute clusters to our mining pools

In [58]:
miner_to_clusters = dict()

for m in miners:
    clusters = None
    for addr in miners[ m ][ util.D_ADDRESSES ].keys():
        if addr in address_to_cluster.keys():
            if clusters is not None and address_to_cluster[ addr ] not in clusters:
                miner_to_clusters[ m ].append( (addr,address_to_cluster[ addr ]) )
                clusters.append( address_to_cluster[ addr ] )
            else:
                miner_to_clusters[ m ] = [ (addr,address_to_cluster[ addr ]), ] 
                clusters = [ address_to_cluster[ addr ], ]

In [59]:
len(miner_to_clusters) # 83 # 86 # 94

70

#### Check for collision where certain clusters belong to more than one miner

In [60]:
i = 0
for m in miner_to_clusters:
    if len(miner_to_clusters[ m ]) > 1:
        i += 1
        #print(m,":",miner_to_clusters[ m ])
print(i," Miners have more than one cluster") # 30 # 34

25  Miners have more than one cluster


In [61]:
cluster_to_miners = dict()

for m in miner_to_clusters:
    for cluster_tuple in miner_to_clusters[ m ]:
        miner_cluster = cluster_tuple[1]
        if miner_cluster in cluster_to_miners.keys():
            cluster_to_miners[ miner_cluster ].add( m )
        else:
            cluster_to_miners[ miner_cluster ] = set()
            cluster_to_miners[ miner_cluster ].add(m)

In [62]:
len(cluster_to_miners) # 247 # 360 # 364 # 396

115

In [63]:
util.get_sample(cluster_to_miners,True,False)

key   =  51458790
value = 
{'BTC Pool Party'}


These are the collisions, i.e., a cluster that belongs to more than one miner.
**These are interesting cases!**

In [64]:
cluster_to_miners_collisions = dict()

for c in cluster_to_miners:
    if len(cluster_to_miners[ c ]) > 1:
        cluster_to_miners_collisions[ c ] = cluster_to_miners[ c ]
        print(c,":",cluster_to_miners[ c ])

226924763 : {'Bitcoin-Ukraine', 'BitcoinRussia'}
254056608 : {'BTC.com', 'Bixin', '7pool'}
235587358 : {'BTC.TOP', 'CANOE'}
259819906 : {'BTC.com', 'Waterhole'}


In [434]:
#cluster_to_miners_collisions = dict()
#
#for c in cluster_to_miners:
#    if len(cluster_to_miners[ c ]) > 1:
#        cluster_to_miners_collisions[ c ] = cluster_to_miners[ c ]
#        print(c,":",cluster_to_miners[ c ])

In [65]:
# Since the pools are sets we have to write our own JSON encoder
class SetEncoder(json.JSONEncoder):
    def default(self, obj):
        if isinstance(obj, set):
            return list(obj)
        return json.JSONEncoder.default(self, obj)

with open(cluster_to_miners_collisions_json_file, 'w') as fp:
    json.dump(cluster_to_miners_collisions, fp, cls=SetEncoder)

#### **Attribute blocks** to clusters `graphsense_cluster`  i.e., only set miner if tag is from **walletexplorer.com**

In [436]:
#with open(blocks_attribution_json_file, 'r') as fp:
#    blocks = json.load(fp)

In [66]:
override_cluster = True

for blknum in blocks:
    # iterate over all blocks
    if ( util.DD_GS_CLUSTER not in blocks[ blknum ][ util.D_ATTRIBUTIONS ].keys() ) or override_cluster:
        # if not already attributed with graphsense cluster
        if len( blocks[ blknum ][ util.D_ADDRESSES ] ) == 1:
            addr = blocks[ blknum ][ util.D_ADDRESSES ][ 0 ]
            if addr not in address_to_cluster.keys():
                continue
            
            cluster = address_to_cluster[ addr ]
            if str(cluster) in cluster_to_miner.keys():
                cluster_miner = cluster_to_miner[ str(cluster) ]
                blocks[ blknum ][ util.D_ATTRIBUTIONS ][ util.DD_GS_CLUSTER ] = { util.DDD_MINER:cluster_miner,
                                                                              util.DDD_CLUSTER:cluster,
                                                                              util.DDD_SRC:"graphsense" }                 
        i+=1
        if i > 10000:
            i = 0
            sys.stdout.write('.')
            sys.stdout.flush()

...............................

In [67]:
for blknum in blocks:
    if util.DD_GS_CLUSTER in blocks[ blknum ][ util.D_ATTRIBUTIONS ].keys():
        break
blocks[blknum]

{'time': 1320673487,
 'cb': '094269744d696e74657204b652020001032cfabe6d6d100f88209fd48034f16dc2599b664826c0f96e459d63056118f10d314f8112f90100000000000000',
 'addresses': ['19PkHafEN18mquJ9ChwZt5YEFoCdPP5vYB'],
 'miner': '',
 'conflicts': 0,
 'attribution': '',
 'attributions': {'blockchain_info_address': {'miner': 'BitMinter',
   'matches': [{'addr_match': '19PkHafEN18mquJ9ChwZt5YEFoCdPP5vYB'}],
   'src': 'blockchain.info'},
  'blockchain_info_marker': {'miner': 'BitMinter',
   'matches': [{'cb_match': 'BitMinter'}],
   'src': 'blockchain.info'},
  'blockchain_info': {'miner': 'BitMinter',
   'matches': [{'cb_match': 'BitMinter'},
    {'addr_match': '19PkHafEN18mquJ9ChwZt5YEFoCdPP5vYB'}],
   'src': 'blockchain.info'},
  'blockchain_info_address_update': {'miner': 'BitMinter',
   'matches': [{'addr_match': '19PkHafEN18mquJ9ChwZt5YEFoCdPP5vYB'}],
   'src': 'blockchain.info'},
  'blockchain_info_marker_update': {'miner': 'BitMinter',
   'matches': [{'cb_match': 'BitMinter'}],
   'src': 'b

In [68]:
i = 0
for blknum in blocks:
    if util.DD_GS_CLUSTER in blocks[ blknum ][ util.D_ATTRIBUTIONS ].keys():
        i += 1
print("Number of successful cluster_tag attributions:",i,"=",(i/current_blockheight)*100 ) 
# 92567 = 18.000738954573738
# 97252 = 17.47879223580158

Number of successful cluster_tag attributions: 92697 = 16.660136592379583


### Attribute blocks to clusters `graphsense_tag`  i.e., set miner if a miner name was found somewhere in the tags (fuzzy matching of miner)
___

Find all tags that correspond to **any** miner names and assign a unique miner id from our miners json.
In a first iteration only looked only for `walletexplorer.com` miner names, but now we look for any tag that matches a currently known miner name.

In [69]:
name_to_miner_fuzzy = dict()

for m in miners:
    for n in miners[ m ][ util.D_NAMES ]:
        # try to attribute any name form the miners if it occures in the graphsense tags
        # This might cause false positives!
        name_to_miner_fuzzy[ n ] = m 


In [70]:
len(name_to_miner_fuzzy) # 167 # 171 # 197

197

Try to attribute a mining pools to a cluster based on the fuzzy matching of cluster tags.
The end result is a mapping of **cluster id to mining pool** in the miners json

In [72]:
cluster_to_miner_fuzzy = dict()

with open(address_cluster_tags_csv_file) as fp:
    reader = csv.DictReader(fp)
    for row in reader:     
        for n in name_to_miner_fuzzy.keys():
            if n in row[ "tag" ]:
                cluster_to_miner_fuzzy[ row["cluster"] ] = name_to_miner_fuzzy[ n ]


In [73]:
len(cluster_to_miner_fuzzy) # 92 # 84 # 120

92

#### Attribute blocks to clusters `graphsense_tag`  i.e., set miner if a miner name was found somewhere in the tags (fuzzy matching of miner)

In [74]:
override_cluster = True

for blknum in blocks:
    # iterate over all blocks
    if ( util.DD_GS_TAG not in blocks[ blknum ][ util.D_ATTRIBUTIONS ].keys() ) or override_cluster:
        # if not already attributed with graphsense tag
        coinbase = blocks[ blknum ][ util.D_CB ]
        if len( blocks[ blknum ][ util.D_ADDRESSES ] ) == 1:
            addr = blocks[ blknum ][ util.D_ADDRESSES ][0]
            if addr not in address_to_cluster.keys():
                continue
                
            cluster = address_to_cluster[ addr ]
            if str(cluster) in cluster_to_miner_fuzzy.keys():
                cluster_miner = cluster_to_miner_fuzzy[ str(cluster) ]
                blocks[ blknum ][ util.D_ATTRIBUTIONS ][ util.DD_GS_TAG ] = { util.DDD_MINER:cluster_miner,
                                                                              util.DDD_CLUSTER:cluster,
                                                                              util.DDD_SRC:"graphsense" }                 
        i+=1
        if i > 10000:
            i = 0
            sys.stdout.write('.')
            sys.stdout.flush()

................................

In [75]:
for blknum in blocks:
    if util.DD_GS_TAG in blocks[ blknum ][ util.D_ATTRIBUTIONS ].keys():
        break
blocks[blknum]

{'time': 1320673487,
 'cb': '094269744d696e74657204b652020001032cfabe6d6d100f88209fd48034f16dc2599b664826c0f96e459d63056118f10d314f8112f90100000000000000',
 'addresses': ['19PkHafEN18mquJ9ChwZt5YEFoCdPP5vYB'],
 'miner': '',
 'conflicts': 0,
 'attribution': '',
 'attributions': {'blockchain_info_address': {'miner': 'BitMinter',
   'matches': [{'addr_match': '19PkHafEN18mquJ9ChwZt5YEFoCdPP5vYB'}],
   'src': 'blockchain.info'},
  'blockchain_info_marker': {'miner': 'BitMinter',
   'matches': [{'cb_match': 'BitMinter'}],
   'src': 'blockchain.info'},
  'blockchain_info': {'miner': 'BitMinter',
   'matches': [{'cb_match': 'BitMinter'},
    {'addr_match': '19PkHafEN18mquJ9ChwZt5YEFoCdPP5vYB'}],
   'src': 'blockchain.info'},
  'blockchain_info_address_update': {'miner': 'BitMinter',
   'matches': [{'addr_match': '19PkHafEN18mquJ9ChwZt5YEFoCdPP5vYB'}],
   'src': 'blockchain.info'},
  'blockchain_info_marker_update': {'miner': 'BitMinter',
   'matches': [{'cb_match': 'BitMinter'}],
   'src': 'b

In [76]:
i = 0
for blknum in blocks:
    if util.DD_GS_TAG in blocks[ blknum ][ util.D_ATTRIBUTIONS ].keys():
        i += 1
print("Number of successful cluster_tag attributions:",i,"=",(i/current_blockheight)*100 ) 
# 167270 = 32.527613565650284
# 191845 = 34.479690869877786

Number of successful cluster_tag attributions: 178687 = 32.11484543493889


## Persist files 

In [77]:
with open(blocks_attribution_json_file, 'w') as fp: 
    json.dump(blocks, fp)

In [78]:
with open(miners_custom_json_file, 'w') as fp:
    json.dump(miners, fp)