### 2.1.2 Network setup

With the data we gained, we can now set up a network with the following properties: 
* **Nodes** can either be **users** or **currencies**. This is specified by a node attribute "Type"
* **Edges** are either of the type **gazes** (User-to-currency) or **follows** (user-to-user)

In [5]:
from github import Github
import networkx as nx
import pickle
import random
import json
import numpy as np
import matplotlib.pyplot as plt 
import re
import itertools as it
import operator
import time
import datetime
from pathlib import Path
import community

In [6]:
# Load data
with open('./crypto_stargazers_dict.pickle', 'rb') as handle:
    crypto_stargazers_dict = pickle.load(handle)
    
Stargaze_Network = nx.DiGraph()

# Setup network
for crypto_name, stargazers_list in crypto_stargazers_dict.items():
    # Add node for the currency (if not already there)
    if crypto_name not in Stargaze_Network.nodes():
        Stargaze_Network.add_node(crypto_name, Type="Currency")
    for user in stargazers_list:
        # add nodes for all stargazers (if not already there)
        if user.login not in Stargaze_Network.nodes():
            Stargaze_Network.add_node(user.login, Type="User")
            
        # add edge from user to currency
        Stargaze_Network.add_edge(user.login, crypto_name, Type="gazes")

In [7]:
# Quick summary of graph so far
print(nx.info(Stargaze_Network)[6:])


Type: DiGraph
Number of nodes: 41211
Number of edges: 57462
Average in degree:   1.3943
Average out degree:   1.3943


In [8]:
# Save the graph
with open('./Stargaze_Network.pickle', 'wb') as handle:
    pickle.dump(Stargaze_Network, handle, protocol=pickle.HIGHEST_PROTOCOL)

**Add links between users**
A edge points from one github user to another if the user follows the other user on github.<br>
Hence,
* edges representing a users "following" are pointing to the user. (from their following to the user)
* edges representing a users "followers" are poining away from the user (from the user to their followers)

By geting the users a user is following for every user in the network and intersecting this with users in the network (as as most of the users a user is following with not have starred any of the github communities) we will be able to create 

In [9]:
# Use Personal Access Token
gh = Github('9860490c5562541b51646a7f909e58e3ab420de5')

with open('./crypto_stargazers_dict.pickle', 'rb') as handle:
    crypto_stargazers_dict = pickle.load(handle)
    
with open('./Stargaze_Network.pickle', 'rb') as handle:
    Stargaze_Network = pickle.load(handle)
users_set = set(Stargaze_Network.nodes)

In [10]:
# List of lists of stargazers of each crypto-currency
user_list_of_lists = [stargazers_list for crypto_name, stargazers_list in crypto_stargazers_dict.items()]

# Convert lists of lists to one long list (will contain duplicates)
flat_user_list = [user_object for user_list in user_list_of_lists for user_object in user_list]

# Replace PyGitHub user objects with the username of the user-object 
login_userobject_list = [(user.login,user) for user in flat_user_list]

# Remove duplicates
login_userobject_list = list(dict(login_userobject_list).items()) 

print('Number of unique users: ',len(login_userobject_list))

Number of unique users:  40825


The section below was run in chunks taking over 70 hours of  runtime due to the slow request time for the GitHub API requests. 
We could not run this code block in the explainer notebook as it would take three days hence this section has no output.

In [None]:
t_start = time.time()
following_dict = {}

for i,(login, user_object) in enumerate(login_userobject_list):
   
    try:
        # Calling the GitHub API to get the users a user follows
         # Adds the users followers and following into dicts. The key is the the users login
        following_dict[login] = [follower.login for follower in gh.get_user(user_object.login).get_following() if follower.login in users_set]
    except:
        print(i)

    # Pickle "following_dict" every 1000 users (we have 40824 users so  this will result in 41 files)
    if i in range(1000,40825,1000) or i == 40824:
        filename = './' + 'following_dict' + '__' + str(i-1000) + '_TO_' + str(i) + '.pickle'
        with open(filename, 'wb') as handle:
            pickle.dump(following_dict, handle, protocol=pickle.HIGHEST_PROTOCOL)
        print(filename)
        following_dict = {}
        now = time.time()
        print('i: ', i, '  | Runtime: ', now-t_start)

In [11]:
pathlist = Path('./following/').glob('**/*.pickle')
Following_Network = nx.DiGraph()

# Iterate over the 41 pickled files
for i,path in enumerate(pathlist):
    with open(str(path), 'rb') as handle:
        following = pickle.load(handle)

        for user_login in following:
            for followed_user in following[user_login]:
                # Add the followed users as edges from a given user
                Following_Network.add_edge(user_login, followed_user, Type="following")

In [12]:
# Save the graph
with open('./Following_Network.pickle', 'wb') as handle:
    pickle.dump(Following_Network, handle, protocol=pickle.HIGHEST_PROTOCOL)

In [13]:
print(nx.info(Following_Network)[6:])


Type: DiGraph
Number of nodes: 24404
Number of edges: 77117
Average in degree:   3.1600
Average out degree:   3.1600


Comparing to the previous quick summary of the Stargaze_Network graph before the "following" edges where added we can see we have the same number of  nodes, 41211.

However we now have more edges, 134,556 compared to 57,462 previously.

Hence for the Stargaze_Network graph of 134,556 edges
* 57,462 are "gazes" edges
* 77,094 are "following" edges

In [14]:
# Save the graph
with open('./Following_Network.pickle', 'rb') as handle:
    Following_Network = pickle.load(handle)

**Community Partition**

In [15]:
# For the python-louvain community partition we need an undirected grah
Following_Network_undirected = Following_Network.to_undirected()
partition = community.best_partition(Following_Network_undirected)

In [16]:
# Check how many communities where partitioned
size = float(len(set(partitiona.values())))
print('The best partition results in', size, 'communities')

The best partition results in 100.0 communities


In [None]:
# Draw the partition for the best partition found (of 100 communities)
pos = nx.spring_layout(Following_Network_undirected)
count = 0.
plt.figure(figsize=(15,10))
for community in set(partition.values()):
    count += 1.
    list_nodes = [nodes for nodes in partition.keys()
                                if partition[nodes] == community]
    nx.draw_networkx_nodes(Following_Network_undirected, pos, list_nodes, node_size = 20,
                                node_color = str(count / size))
nx.draw_networkx_edges(Following_Network_undirected, pos, alpha=0.5)
plt.show()
savefig('community_partition.png', bbox_inches='tight')

In [33]:
# Draw a dendrogram of the GitHub communities 
dendrogram = community.generate_dendrogram(Following_Network_undirected, part_init=partition)

AttributeError: 'list' object has no attribute 'generate_dendrogram'

**Interpreting the communities**

First lets view the number of users (the size) of each community

In [56]:
community_list = [[] for _ in range(int(size))]
for k,v in partition.items():
    community_list[v].append(k)
print('%22s  | %22s  | %22s  | %22s' % ('Communities 0-24','Communities 25-49','Communities 50-74','Communities 75-99'))
print('---------------------------------------------------------------------------------------------------------')
print('%10s %10s   | %10s %10s   | %10s %10s   | %10s %10s' % ('No.','Size', 'No.','Size', 'No.','Size', 'No.','Size'))
print('---------------------------------------------------------------------------------------------------------')
for i in range(int(size)//4):
    print('%10s %10s   | %10s %10s   | %10s %10s   | %10s %10s' 
          % (str(i), str(len(community_list[i])), str(i+25), str(len(community_list[i+25])),
            str(i+50), str(len(community_list[i+50])), str(i+75), str(len(community_list[i+75]))))

      Communities 0-24  |      Communities 25-49  |      Communities 50-74  |      Communities 75-99
---------------------------------------------------------------------------------------------------------
       No.       Size   |        No.       Size   |        No.       Size   |        No.       Size
---------------------------------------------------------------------------------------------------------
         0       3033   |         25          2   |         50          2   |         75          2
         1        920   |         26          2   |         51          4   |         76          2
         2       3043   |         27          4   |         52          2   |         77          2
         3        810   |         28         11   |         53          2   |         78          2
         4       2424   |         29          2   |         54          3   |         79          2
         5        543   |         30          3   |         55          2   |         8

In [221]:
small_communities = [i for i,community in enumerate(community_list) if len(community) in set([2,3])]
print('There are', len(small_communities), 'communities with only two or three users')

There are 76 communities with only two or three users


From inspection of this table we can see the majority of of users a classed in few large communities. Now we need to understand the differences in the communities. 

This  will  be  carried out by:
1. For each community calculating how many users in the community starred each  crypto-currency
2. Calculating how many users of the  whole network starred each crypto-currency
3. Dividing the number of stars a crypto-currency has in a community by the number of stars the crypto-currency has in the whole network and then normalising for each group. 
4. The result is a value for each crypto-currency in each community that indicates the skews of the community. For example a value of '2.0' for group 1, crypto-currency 'Bitcoin', means the community has relativley starred the bitcoin repository twice as much compared to the whole network.

This means we can detect the skews of a community towards certain crypto-currencies. 

In [211]:
Stargaze_Network_undirected = Stargaze_Network.to_undirected()
user_neighbors_dict = {}
currencies_degree_dict = {}

for node, data in Stargaze_Network.nodes(data=True):
    if data["Type"]=="User":
         user_neighbors_dict[node] = list(Stargaze_Network_undirected.neighbors(node))
    elif data["Type"]=="Currency":
        currencies_degree_dict[node] = Stargaze_Network.degree(node)

total_number_of_stars = sum(currencies_degree_dict.values())
currencies_degree_dict = {k:v/total_number_of_stars for k,v in currencies_degree_dict.items()}

In [212]:
community_stars_compilation_dict = {i:{} for i in range(int(size))}

In [213]:
for i,community in enumerate(community_list):
    for member in community:
        if member in user_neighbors_dict.keys():
            for currency in user_neighbors_dict[member]:
                try:
                    if name_marketcap_dict[currency] > 5000000:
                        try:
                            community_stars_compilation_dict[i][currency] += 1
                        except KeyError:
                            community_stars_compilation_dict[i][currency] = 1
                except:
                    pass

In [214]:
community_star_total = [sum(currency_star_dict.values()) for currency_star_dict in community_stars_compilation_dict.values()]

In [215]:
with open('./name_marketcap_dict.pickle', 'rb') as handle:
    name_marketcap_dict = pickle.load(handle)

In [216]:
for i in community_stars_compilation_dict.keys():
    for currency in community_stars_compilation_dict[i].keys():
        try:
            community_stars_compilation_dict[i][currency] = (community_stars_compilation_dict[i][currency]/community_star_total[i])/currencies_degree_dict[currency]
        except:
            community_stars_compilation_dict[i][currency] = None

In [220]:
for i in range(len(community_stars_compilation_dict)):
    if i not in set(small_communities):
        
        temp = []
        for k,v in community_stars_compilation_dict[i].items():
            if v is not None and float(v) > 5:
                temp.append((k,float(v)))
        if len(temp) > 0 and len(community_list[i]) > 100:
            print('Community', i, '('+str(len(community_list[i]))+')')
            for k,v in sorted(temp,key=lambda x: x[1],  reverse=True)[:10]:
                print(k,v)
            print('--------------------------------------------------')

Community 1 (920)
bitcoindark 9.007846829880728
bitcoin-plus 8.107062146892655
centurion 8.107062146892655
exclusivecoin 5.790758676351897
sibcoin 5.404708097928436
bitbay 5.06691384180791
casinocoin 5.06691384180791
--------------------------------------------------
Community 2 (3043)
salus 10.347575265909501
energycoin 5.173787632954751
bridgecoin 5.173787632954751
pinkcoin 5.173787632954751
masternodecoin 5.173787632954751
--------------------------------------------------
Community 3 (810)
posw-coin 13.49905926622766
cloakcoin 6.74952963311383
riecoin 5.999581896101181
--------------------------------------------------
Community 4 (2424)
rise 7.733495014820803
bridgecoin 7.733495014820803
--------------------------------------------------
Community 5 (543)
monacoin 17.19418186659566
centurion 14.137438423645321
sexcoin 7.854132457580733
--------------------------------------------------
Community 6 (218)
goldblocks 48.56006768189509
karbowanec 29.136040609137055
prizm 24.2800338409

## 2.2 Summary statistics and basic properties of the network

After collecting all our data, we can start looking into it. First, we need an overview of how big our network is:

In [60]:
# Number uf users and currencies
users = []
currencies = []
for node, data in Stargaze_Network.nodes(data=True):
    if data["Type"]=="User":
        users.append(node)
    elif data["Type"]=="Currency":
        currencies.append(node)
        
gazes = []
following = []
for user, destination, data in Stargaze_Network.edges(data=True):
    if data["Type"] == "gazes":
        gazes.append(node)
    elif data["Type"] == "following":
        following.append(node)        

print("Total number of currencies:", len(currencies))
print("Total number of users:", len(users))
print("Total number of stars:", len(gazes))
print("Total number of follows:", len(following))

Total number of currencies: 390
Total number of users: 40821
Total number of stars: 57462
Total number of follows: 0


The number of users is (obviously) much higher than the number of currencies. 

We now want to talk about degrees in the network. Therefore we look at the node types separately. We will start with the users:

In [None]:
# Out-degree of users
users_out = Stargaze_Network.out_degree(users)

users_deg = [d[1] for d in users_out]

# Mean stargazing degree
print("Mean stargazing degree:", np.mean(users_deg))

# Plot stargazing distribution
fig = plt.figure(figsize=(16,9))
plt.hist(users_deg, bins=max(users_deg))
plt.title("User stargazing degree distribution")
plt.xlabel("Degree")
plt.ylabel("Count")
plt.show()

# Top stargazers:
top_stargazers = sorted(users_out, key=lambda x: x[1], reverse=True)[:5]
print("Top stargazers:")
for x in top_stargazers:
    print(str(x[0])+": "+str(x[1])+ " followings")

Most users follow only one currency, but some follow more than one, resulting in a power-law distribution. Users are on average subscribed to 1.4 currencies. The maximum number of currencies stargazed is 43 ("followtheart").

Next, we do the same for the in-degree of the currencies:

In [None]:
# In-degrees of currencies
curr_in = Stargaze_Network.in_degree(currencies)

# Degrees only
curr_deg = [d[1] for d in curr_in]

# Mean currency degree
print("Mean currency degree:", np.mean(curr_deg))
print("Median currency degree:", np.median(curr_deg))

# Make Histogram
fig = plt.figure(figsize=(16,9))
plt.hist(curr_deg, bins=50)
plt.title("Currency degree distribution")
plt.xlabel("Degree")
plt.ylabel("Count")
plt.show()

# Top currencies:
top_currencies = sorted(curr_in, key=lambda x: x[1], reverse=True)[:5]
print("Top currencies:")
for x in top_currencies:
    print(str(x[0])+": "+str(x[1])+ " followings")

For currencies, we have an even steeper power-law distribution of degrees. Most currencies have only very few stargazers, while only a few eclipse the 1000 stargazers mark. Unsurprisingly, bitcoin takes the cake with nearly 20000 stargazers, followed by the second most popular currency, ethereum. On average, a currency has about 147 stargazers. This is influenced heavily by the big currencies though (The median is at only 6 subscribers).

After this first analysis, we can dig deeper into certain aspects of the cryptocurrency github commumnity: