# 1. Motivation

## 1.1 How and why we want to analyze the cryptocurrency developer community

Cryptocurrencies are maybe the most interesting economic phenomenon in recent years. Being able to basically generate money by running computer algorithms and doing very little work after an initial setup is a very tempting outlook for everybody. And even though the possible uses of bitcoin and co. to actually buy stuff from the real world are still limited, more and more people have jumped on the train of mining and trading with cryptocurrencies, resulting in huge value increases. In 2017 alone, the worth of bitcoin, the oldest, most common and most valuable cryptocurrency, has increased more than tenfold, [eclipsing the 10,000$ mark on November 28th](http://markets.businessinsider.com/currencies/news/bitcoin-price-clears-10000-2017-11-1009817597). This has lead to some interesting developments in recent time, such as prices for some computer hardware, especially powerful graphics cards, [almost doubling in the last year](https://www.digitaltrends.com/computing/cryptocurrency-mining-graphics-card-prices/).

But just as more and more people have begun to join the virtual gold rush and try their luck at mining bitcoins with their home equipment, other groups of programming enthusiasts and cryptology specialists have started to create alternatives to the almighty bitcoin, resulting in hundreds of more or less valuable virtual forms of money being available and ready to be mined. In order to do so, one has to get a piece of source code from the internet, which is most of the time stored on github.

Github itself, again, is not only a way to share and collaborate on programmin projects, but also has functions that make it a social platform, such as the ability to follow another user or to "like" a certain project or code reporitory. This second act is called "stargazing" in github terms. And that's where data science and social graphs come into play. We want to find out, how the stargazer community of different cryptocurrencies is shaped. In order to do so, we need to get the necessary information about the code repositories via the github api and then analyze it with tools of network science. 

We also want to take a look at the Wikipedia entries of some cryptocurrencies and try to find differences in word choice, etc. Because most of the digital currencies are very new or small, the number of Wikipedia entries is very limited.

> *TL;DR:* **We want to analyze the community of cryptocurrency enthusiasts on github and the corresponding Wikipedia pages. We want to do this, because it is a booming, highly discussed industry that could change the face of the financial world**

## 1.2 Dataset description

The dataset that we collected and analyzed contains the github repositories for over 300 cryptocurrencies and their stargazers. Also, we have information on which of the users that starred one of the repositories follow each other. The second piece of data are the Wikipedia pages to some of the aforementioned currencies.

## 1.3 Goals for user experience

With the website, we want to provide insight to several questions about the cryptocurrency community:

* How is github relevance connected to a currency's market value?
* What are the similarities between several cryptocurrencies' following bases?
* Are there communities within the group of users starring cryptocurrency repositories?
* What can we gain from analyzing the Wikipedia pages of certain currencies?

# 2. The data

## 2.1 Dataset collection and setup

### 2.1.1 Network data collection

To gain an insight on which cryptocurrencies exist and where to find their source code, we first need a place to gain this information from. On [coinmarketcap.com](https://coinmarketcap.com/), we can find all the cryptocurrencies' names:




In [None]:
from bs4 import BeautifulSoup
import urllib.request
from lxml import etree
import pickle
import os
import glob
import re
import sys

In [None]:
def getNames():
    url = r'https://coinmarketcap.com/coins/views/all/'
    response = urllib.request.urlopen(url)
    soup = BeautifulSoup(response, 'lxml')
    names_divs = soup.findAll('a', { 'class' : 'currency-name-container' })
    names_text = [div.text for div in names_divs]
    return names_text

Now, we need to clean up the currency names so that they all match the same standard

In [5]:
crypto_currency_names = [name.replace(' ','-').lower() for name in getNames()]
N = len(crypto_currency_names)

print('There are ' + str(N) + ' crypto-currencies.   -source coinmarketcap.com')
print(crypto_currency_names[0:5]) # (These are the top five)

There are 903 crypto-currencies.   -source coinmarketcap.com
['bitcoin', 'ethereum', 'bitcoin-cash', 'ripple', 'litecoin']


As a next step, we need to find the actual github repositories for the currencies. The site [coingecko.com](https://www.coingecko.com/en) allows us to get this information (Note that not all cryptocurrencies have github repositories, thus we will end up with a shorter list than before):

### 2.1.2 Network setup

With the data we gained, we can now set up a network with the following properties: 
* **Nodes** can either be **users** or **currencies**. This is specified by a node attribute "Type"
* **Edges** are either of the type **gazes** (User-to-currency) or **follows** (user-to-user)

In [1]:
from github import Github
import networkx as nx
import pickle
import random
import json
import numpy as np
import matplotlib.pyplot as plt 
import re
import itertools as it
import operator

In [2]:
# Load data
with open('./crypto_stargazers_dict.pickle', 'rb') as handle:
    crypto_stargazers_dict = pickle.load(handle)
    
Stargaze_Network = nx.DiGraph()

# Setup network
for crypto_name, stargazers_list in crypto_stargazers_dict.items():
    # Add node for the currency (if not already there)
    if crypto_name not in Stargaze_Network.nodes():
        Stargaze_Network.add_node(crypto_name, Type="Currency")
    for user in stargazers_list:
        # add nodes for all stargazers (if not already there)
        if user.login not in Stargaze_Network.nodes():
            Stargaze_Network.add_node(user.login, Type="User")
            
        # add edge from user to currency
        Stargaze_Network.add_edge(user.login, crypto_name, type="gazes")

# Save the graph
with open('./Stargaze_Network.pickle', 'wb') as handle:
    pickle.dump(Stargaze_Network, handle, protocol=pickle.HIGHEST_PROTOCOL)

### 2.1.3 Wikipedia entry collection

