In [2]:
import networkx as nx

## 1. GitHub Social Network

### Understanding the Data

A large social network of GitHub developers which was collected from the public API in June 2019. Nodes are developers who have starred at least 10 repositories and edges are mutual follower relationships between them. The vertex features are extracted based on the location, repositories starred, employer and e-mail address. The task related to the graph is binary node classification - one has to predict whether the GitHub user is a web or a machine learning developer. This target feature *(here called "ml_target")* was derived from the job title of each user.

### Importing Data Into a Graph

In [3]:
# Importing data
git_df = pd.read_csv('git_web_ml/musae_git_edges.csv')
git_node_df = pd.read_csv('git_web_ml/musae_git_target.csv')

# Creating graph
edgelist = [row[1:] for row in git_df.itertuples()]
git_graph = nx.from_edgelist(edgelist)

# Adding node attributes for assortativity analysis
node_attributes = {row[1]:row[3] for row in git_node_df.itertuples()}   
nx.set_node_attributes(git_graph, node_attributes, 'ml_target')

### Analysing the Graph

For the GitHub Social Network Graph, it is possible to surmise from the clustering coefficient and transitivity that it follow a star-like topological structure, where a user's followers don't usually follow one another. From the assortativity coefficient, we see that users in similar groups (in this case, machine learning developers or web developers) tend to be grouped together somewhat often. Something else of note is that the entire graph is a single connected component, meaning there's a way to reach any user from any other user in the network. Despite using its edges to represent a user following another, the graph is undirected, so that may have affected the results.

In [9]:
print("Average Clustering Coefficient:", nx.average_clustering(git_graph))
print("Graph Transitivity", nx.transitivity(git_graph))
print("Assortativity Coefficient", nx.attribute_assortativity_coefficient(git_graph, 'ml_target'))
print("Number of Connected Components:", nx.number_connected_components(git_graph))
print("Whether Graph is Connected:", nx.is_connected(git_graph))

Average Clustering Coefficient: 0.16753704480107323
Graph Transitivity 0.012357188884259466
Assortativity Coefficient 0.3778215022223345
Number of Connected Components: 1
Whether Graph is Connected: True
