<h1 align=center><font size = 5>Bitcoin OTC trust weighted signed network
</font></h1>

<h0 align=left><font size = 3>#bitcoin-otc is an over-the-counter marketplace for trading with bitcoin. The marketplace is located in #bitcoin-otc channel on the freenode IRC network. </font></h0>

https://www.bitcoin-otc.com/

<h0 align=left><font size = 3>To complement the OTC marketplace, they offer a web of trust service. Due to the p2p nature of OTC transactions, people are exposed to counterparty risk. To mitigate this risk, they need to have access to their counterparty's reputation and trade history. This is precisely the kind of information that the OTC web of trust provides. </font></h0>

## Data Set information 

<h0 align=left><font size = 3>This is who-trusts-whom network of people who trade using Bitcoin on a platform called Bitcoin OTC. Since Bitcoin users are anonymous, there is a need to maintain a record of users' reputation to prevent transactions with fraudulent and risky users. Members of Bitcoin OTC rate other members in a scale of -10 (total distrust) to +10 (total trust) in steps of 1. This is the first explicit weighted signed directed network available for research. </font></h0>

## Data format 

<h0 align=left><font size = 3>Each line has one rating, sorted by time, with the following format:

SOURCE, TARGET, RATING, TIME</font></h0>


## Necessary libraries 

In [1]:
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import numpy as np
from scipy.stats import norm
from sklearn.preprocessing import StandardScaler
from scipy import stats
import warnings
warnings.filterwarnings('ignore')
%matplotlib inline

## Drawing functions 

In [2]:
def draw_graph (G_sample):
    # draw the network G1
    fig= plt.figure(figsize=(20,15))
    nx.draw_spring(G_sample,node_color = 'g',with_labels=False,node_size=150,edge_color='Gray',alpha=0.7)
    return 

## Data preparations

In [3]:
data = pd.read_csv('data.csv')

In [4]:
data.head()

Unnamed: 0,Source,Target,Score,Time
0,6,2,4,1289242000.0
1,6,5,2,1289242000.0
2,1,15,1,1289243000.0
3,4,3,7,1289245000.0
4,13,16,8,1289254000.0


<h0 align=left><font size = 3>For instance I won't use the Time information as I will consider all scoring was on the same time for eveyone </font></h0>

In [5]:
data.columns

Index(['Source', 'Target', 'Score', 'Time'], dtype='object')

In [6]:
data.drop('Time', axis = 1,inplace = True)

In [7]:
print("The data size is : {} ".format(data.shape))

The data size is : (35592, 3) 


## Graph construction 

In [8]:
import networkx as nx

In [9]:
data_array=np.array(data)

In [10]:
G=nx.DiGraph()

In [11]:
data_array.shape[0]

35592

In [12]:
for i in range (data_array.shape[0]): 
    G.add_edge(data_array[i][0],data_array[i][1],score=data_array[i][2])

In [13]:
G.is_directed()

True

In [14]:
G.number_of_nodes()

5881

In [15]:
G.number_of_edges()

35592

## ScoingRank Algorithm 

In [16]:
def top_percent_page_rank(G , per = 0.02):
    page_rank = nx.pagerank(G, alpha=0.85, personalization=None, max_iter=100, tol=1e-06, nstart=None, weight=None, dangling=None)

    lis=[]
    dic = dict(page_rank)

    #Top per of the data 
    n= int(per * G.number_of_nodes())

    for i in range (0,n) :
        maximum = 0 
        node = 0
        for elt in dic : 
            if dic[elt]> maximum  : 
                maximum = dic[elt]
                node = elt
            
        del dic[node]
        lis.append(node)
    lis_degree=lis
    #print(lis_degree)
    return lis_degree

In [23]:
def top_percent_hub (G , per = 0.02):
    hub,auth =nx.hits(G, max_iter=100, tol=1e-08, nstart=None, normalized=True)
    
    lis=[]
    dic = dict(hub)

    #Top per of the data 
    n= int(per * G.number_of_nodes())

    for i in range (0,n) :
        maximum = 0 
        node = 0
        for elt in dic : 
            if dic[elt]> maximum  : 
                maximum = dic[elt]
                node = elt
            
        del dic[node]
        lis.append(node)
    lis_degree=lis
    #print(lis_degree)
    return lis_degree

In [4]:
print('scoringRank (G,alpha=0.85,betta=3,gamma=0.8,num_iter=3,sigma=0,zetta=5)')

scoringRank (G,alpha=0.85,betta=3,gamma=0.8,num_iter=3,sigma=0,zetta=5)


In [17]:
def scoringRank (G,alpha=0.85,betta=3,gamma=0.8,num_iter=3,sigma=0,zetta=5) : 
    # alpha , the percentage taken from the received score from non popular nodes
    # betta , threshold that defines the minimum of in_degree to calculate the trust 
    # gamma, the percentage of nodes that are considered as specially popular 
    # num_iter, number of iterations to claculate the reusrive part of the the second step of the algo
    
    
    # initialisation the trust score to 0 
    for node in G.nodes():
        G.nodes[node]['trust']=0

    # making the trust score the sum of all the scores 
    for edge in G.edges(data=True) : 
        G.nodes[edge[1]]['trust']=G.nodes[edge[1]]['trust']+edge[2]['score']

    # considering the mean not the sum 
    for node in G.nodes():
        if (G.in_degree(node) >= betta):
            G.nodes[node]['trust']= G.nodes[node]['trust'] / (G.in_degree(node))
        else :
            G.nodes[node]['trust']= 0

    
    
    ###  Step 2 : Considering the initial score of the person who scored
    i=1
    while i<=num_iter :
        # initialisation the trust_2 score to 0 
        dic = {}
        for node in G.nodes():
            dic[node]=0
        # calculating trust_2 (considering trust_1 and the received scores )
        for edge in G.edges(data=True) : 
            if G.nodes[edge[0]]['trust'] > 0 : 
                if G.in_degree(edge[1]) >= betta :  
                    dic[edge[1]]=dic[edge[1]]+(edge[2]['score']/(G.in_degree(edge[1])))
        # trust receives trust_2
        for node in G.nodes() : 
            G.nodes[node]['trust']=dic[node]
        i=i+1 

    
    
    ### Step 3 : Giving greater importance to the nodes who are most popular 
    # scaled
    if sigma == 0 :
        list_popular_nodes= top_percent_page_rank(G , per = gamma) 
    elif sigma ==1 : 
        list_popular_nodes= top_percent_hub (G , per = gamma)

    dic = {}
    for node in G.nodes():
        dic[node]=0

    # calculating trust_2 (considering trust_1 and the received scores )
    for edge in G.edges(data=True) : 
        if G.nodes[edge[0]]['trust'] > 0 : 
            if G.in_degree(edge[1]) >= betta :  
                if edge[0] in list_popular_nodes : 
                    dic[edge[1]]=dic[edge[1]]+(edge[2]['score']*(2-alpha)/(G.in_degree(edge[1])))
                else : 
                    dic[edge[1]]=dic[edge[1]]+((edge[2]['score']*alpha)/(G.in_degree(edge[1])))

    # trust receives dic values 
    for node in G.nodes() : 
        if zetta > 0 :
            G.nodes[node]['trust']=round(dic[node],zetta)
        elif zetta == 0 : 
            G.nodes[node]['trust']=int(round(dic[node],zetta))
        
    return dict(G.nodes(data=True))

<h0 align=left><font size = 3> 
    * alpha    (0.85)  : the percentage taken from the received score from non popular nodes
    * betta    (3)    : threshold that defines the minimum of in_degree to calculate the trust 
    * gamma    (0.8)  : the percentage of nodes that are considered as specially popular 
    * num_iter (3)    : number of iterations to claculate the reusrive part of the the second step of the algo . 
    * sigma    (0/1)  : Using which algorithm to classify the centrality of the node , pageRank = 0 , hub = 1
    * zetta    (0)    : Number of digits after the vergule 
   </font></h0>