# Analysis of the Hops during the GetClosestPeers lookups
This scripts covers all the metrics that we can extract from the publication of the CIDS (for the publisher):
1. Successful PR Holders CDF, PDF
2. Total publication time distribution: CDF, PDS, Quartile Distributions
3. Client distribution from the whole set of PR Holders
4. Client distribution for the PR Holders of each CID    

In [None]:
## Import dependencies
import sqlalchemy as sa
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
import numpy as np

## DB Credentials
HOST="localhost"
PORT="5432"
DB="hoarder_test"
USER="hoarder"
PASSWD="password"

# Connecte with the DB
engine = sa.create_engine(f'postgresql://{USER}:{PASSWD}@{HOST}:{PORT}/{DB}')

## plotting style
fig_size= (7,4)
sns.set_context("talk", font_scale=1)


## Number of total hops
If we think about a peer tree built while walking the DHT looking for the closest peers to a CID:
```
Peer Tree:
	peer 0 	-- peer 2	-- peer 6	-- peer 7
 		  `		   `-- peer 7	   `-- peer 6
 		   `-- peer 3	-- peer 8	-- peer 7
 								   `-- peer 6
 					   `-- peer 6

 	peer 1 	-- peer 4	-- peer 7
 		   `-- peer 5	-- peer 9 	-- peer 7
					 `		   	   `-- peer 6
 		    		   `-- peer 10 	-- peer 11	--peer6 
```
the total number of hops represent the max number of hops performed during the lookup (max depth of the tree)

In [None]:
## Get the number of total hops that needed to be done to get the closest peers over the entire study

hops_obj = pd.read_sql_query("""
    SELECT 
        total_hops, 
        count(total_hops) 
    FROM fetch_results 
    GROUP BY total_hops 
    ORDER BY total_hops ASC;
""", engine)

# calculate the distributions
tot_hops = hops_obj["count"].sum()
hops_obj["%"] = (hops_obj["count"]/tot_hops)*100

fig, ax = plt.subplots()
box_dict = ax.bar(hops_obj["total_hops"], hops_obj["%"])
ax.set_ylabel("K Closest Calculations (%)", fontsize=18)
ax.set_xlabel("Number of Hops", fontsize=18)
ax.set_ylim(bottom=0)
ax.set_xlim(0, 10)

plt.grid(axis='y')
plt.tick_params(axis='x', which='major', labelsize=16)
plt.tick_params(axis='y', which='major', labelsize=16)
plt.show()

## Number of min hops to discover the closest peers for the first time
If we think about a peer tree built while walking the DHT looking for the closest peers to a CID:
```
Peer Tree:
	peer 0 	-- peer 2	-- peer 6	-- peer 7
 		  `		   `-- peer 7	   `-- peer 6
 		   `-- peer 3	-- peer 8	-- peer 7
 								   `-- peer 6
 					   `-- peer 6

 	peer 1 	-- peer 4	-- peer 7
 		   `-- peer 5	-- peer 9 	-- peer 7
					 `		   	   `-- peer 6
 		    		   `-- peer 10 	-- peer 11	--peer6 
```
the minimum hops to discover the closest peers represent the min depth in the tree at which we already know all the closest peers.

In [None]:
# Get the number of hops that needed to know all the closest peers over the entire study
hops_obj = pd.read_sql_query("""
    SELECT
        hops_for_closest, 
        count(hops_for_closest) 
    FROM fetch_results 
    GROUP BY hops_for_closest 
    ORDER BY hops_for_closest ASC;
""", engine) 

# get the distributions
tot_hops = hops_obj["count"].sum()
hops_obj["%"] = (hops_obj["count"]/tot_hops)*100

fig, ax = plt.subplots()
box_dict = ax.bar(hops_obj["hops_for_closest"], hops_obj["%"])
ax.set_ylabel("K Closest Calculations (%)", fontsize=18)
ax.set_xlabel("Number of Hops", fontsize=18)
ax.set_ylim(bottom=0)
ax.set_xlim(0, 8)

plt.grid(axis='y')
plt.tick_params(axis='x', which='major', labelsize=16)
plt.tick_params(axis='y', which='major', labelsize=16)
plt.show()

In [None]:
engine.dispose()