# Network Analysis
### The next section contains the network analysis of programmers interacting with other programmers on the basis of having a problem with a given programming language.
#### The following steps are taken in this part of the Stack Overflow investigation:

1. Create individual networks for each programming language with authors as nodes and interactions (answering questions or commenting answers) as links.

2. Perform basic network analysis (counts, degree distribution etc.) on language-specific networks.

3. Create one big StackOverflow-network including all 16 programming languages with same types of nodes and links.

4. Perform basic network analysis on the StackOverflow-network.

5. Vizualize the StackOverflow-network.

6. Perform advanced network analysis on the StackOverflow-network by investigating the different subnetworks, communities, modularity etc.

7. Use the Louvain algorithm to create a network from the data and compare to the StackOverflow-network.
 

In [1]:
### Imports
import pandas as pd
import numpy as np
import networkx as nx
from scipy import stats 
from operator import itemgetter 
from collections import Counter
import re
from glob import glob as glob  # glob
from tqdm import tqdm
from pelutils import Table, thousand_seps
import itertools

import matplotlib as mpl
import matplotlib.pyplot as plt
import matplotlib.dates as mdates

def setup_mpl():
    mpl.rcParams['font.family'] = "Liberation Serif"
    mpl.rcParams['font.size'] = 11
    mpl.rcParams['figure.figsize'] = (7,2.5)
    mpl.rcParams['figure.dpi'] = 200
    #mpl.rcParams['lines.linewidth'] = 1
setup_mpl()


In [4]:
### First step is loading the data, for which we use prior function

loved_languages = {
    "rust":        86.1,
    "typescript":  67.1,
    "python":      66.7,
    "kotlin":      62.9,
    "go":          62.3,
    "julia":       62.2,
    "dart":        62.1,
    "c#":          59.7,
    # "swift":       59.5,
    # "javascript":  58.3,
    # "scala":       53.2,
    # "haskell":     51.7,
    # "r":           44.5,
    # "java":        44.1,
    # "c++":         43.4,
    # "ruby":        42.9,
    # "php":         37.3,
    # "c":           33.1,
    # "assembly":    29.4,
    # "perl":        28.6,
    "objective-c": 23.4,
    "vba":         19.6,
}

# Collect all data into single dataframe

def lang(fpath: str) -> str:
    # Get programming language from a filepath
    return "-".join(fpath[fpath.index("/")+1:].split("-")[:-1])

qdfs = list()
adfs = list()
cdfs = list()

for qf, af, cf in tqdm(zip(glob("data/*-questions.pkl"), glob("data/*-answers.pkl"), glob("data/*-comments.pkl")), total=len(loved_languages)):
    qdfs.append(pd.read_pickle(qf))
    qdfs[-1]["language"] = lang(qf)
    qdfs[-1]["type"] = "q"
    adfs.append(pd.read_pickle(af))
    adfs[-1]["language"] = lang(af)
    adfs[-1]["type"] = "a"
    cdfs.append(pd.read_pickle(cf))
    cdfs[-1]["language"] = lang(cf)
    cdfs[-1]["type"] = "c"

# Shuffle to prevent systematic biases in contiguous subsets of dataframe
so = pd.concat(qdfs + adfs + cdfs, ignore_index=True)
# so = pd.concat(qdfs + adfs + cdfs, ignore_index=True).sample(frac=1)
del qdfs, adfs, cdfs


# Data summary
t = Table()
t.add_row(["Language", "Questions", "Answers", "Comments", "Total"])
for lang in loved_languages:
    t.add_row([
        lang.capitalize(),
        *[thousand_seps(sum((so["language"] == lang) & (so["type"] == t))) for t in ("q", "a", "c")],
        thousand_seps(sum(so["language"] == lang)),
    ], [1, 0, 0, 0, 0])
t.add_row([
    "",
    thousand_seps(sum(so["type"] == "q")),
    thousand_seps(sum(so["type"] == "a")),
    thousand_seps(sum(so["type"] == "c")),
    thousand_seps(len(so)),
], [1, 0, 0, 0, 0])
print(t)
so

100%|██████████| 10/10 [00:02<00:00,  4.91it/s]
Language    | Questions | Answers | Comments | Total  
Rust        |    12,968 |  16,526 |   25,362 |  54,856
Typescript  |    15,953 |  25,381 |   26,942 |  68,276
Python      |    26,100 |  47,627 |   53,072 | 126,799
Kotlin      |     9,387 |  15,443 |   17,034 |  41,864
Go          |    19,106 |  30,043 |   37,714 |  86,863
Julia       |     6,175 |   8,259 |   10,143 |  24,577
Dart        |    11,922 |  19,245 |   16,830 |  47,997
C#          |    26,100 |  43,625 |   65,802 | 135,527
Objective-c |    26,046 |  40,216 |   51,188 | 117,450
Vba         |    25,718 |  38,393 |   58,196 | 122,307
            |   179,475 | 284,758 |  362,283 | 826,516


Unnamed: 0,language,title,creation_date,score,owner/user_id,link,question_id,body,owner/reputation,view_count,type
0,rust,What is typestate?,1278652929,52,42323,https://stackoverflow.com/questions/3210025/wh...,3210025,<p>What does TypeState refer to in respect to ...,16194,8629,q
1,rust,Sockets in Rust,1327395685,11,149482,https://stackoverflow.com/questions/8984174/so...,8984174,<p>Are there any socket or net libraries for R...,99545,7049,q
2,rust,How do you access enum values in Rust?,1328174894,46,947301,https://stackoverflow.com/questions/9109872/ho...,9109872,"<pre><code>struct Point {\n x: f64,\n y:...",3732,29429,q
3,rust,Can&#39;t compile Rust,1329327923,7,820736,https://stackoverflow.com/questions/9298459/ca...,9298459,<p>I'm on Debian and following the compile ins...,281,2086,q
4,rust,Rust pattern matching over a vector,1329247015,15,454274,https://stackoverflow.com/questions/9282805/ru...,9282805,"<p>The <a href=""http://doc.rust-lang.org/doc/t...",301,13107,q
...,...,...,...,...,...,...,...,...,...,...,...
826511,kotlin,,1577635185,0,115145,,59520360,I am uncertain why you have that <code>Binding...,904515,,c
826512,kotlin,,1577644424,0,448037,,59520360,"I tried by using extension fncn, still not abl...",3120,,c
826513,kotlin,,1577644520,0,115145,,59520360,Perhaps there is an issue with the URL or with...,904515,,c
826514,kotlin,,1577624257,0,2637449,,59519303,"What is the problem, Can you mention here? Can...",12996,,c


1. Create individual networks for each programming language with authors as nodes and interactions (answering questions or commenting answers) as links.

In [None]:


question_authors = dict(zip(python_questions['question_id'], python_questions['owner/user_id']))
answer_authors = dict(zip(python_answers['question_id'], python_questions['owner/user_id']))
comment_authors = dict(zip(python_answers['question_id'], python_questions['owner/user_id']))

2. Perform basic network analysis (counts, degree distribution etc.) on language-specific networks.

3. Create one big StackOverflow-network including all 16 programming languages with same types of nodes and links.

4. Perform basic network analysis on the StackOverflow-network.

5. Vizualize the StackOverflow-network.

6. Perform advanced network analysis on the StackOverflow-network by investigating the different subnetworks, communities, modularity etc.

7. Use the Louvain algorithm to create a network from the data and compare to the StackOverflow-network.
