# Network Analysis Project

## Assignment Description

You will have to deliver your project at the end of the course (deadline to be determined). You have to hand in a presentation (in PDF or Power Point format). It is mandatory to include the following information:

- Basic network description of your data (what type of network it is, what does it represent, is it real or synthetically generated, etc). In practice, the result of project phase #1 (finding data).
- Basic network statistics of your data (number of nodes, edges, clustering, degree distribution, etc). In practice, the result of project phase #2 (exploratory data analysis).
- A clear statement of your research question, the result of project phase #3.
- The analysis, results, and interpretation that allow you to answer your research question, the result of project phase #4.
You're free to include this in the order you prefer and to add any additional information you deem necessary, but these are the mandatory components.

The format of the oral is as follows: the students make a joint presentation followed by group questions. Subsequently the students are having individual examination with additional questions while the rest of the group is outside the room. The length of the oral will be 15 minutes X number of group members plus one -- for instance, a group of 6 will have 105 minutes ((6+1)*15). Which means you have 15 minutes of group exam plus 15 minutes of individual exam each.

## Environment Setup

This project uses Python libraries that are essential for the performed analysis. Make sure to have the dependencies listed in requirements.txt installed locally using the Python Package Manager pip.

In [1]:
%%capture
%pip install -r requirements.txt

### Packages

In [5]:
# network representation and algorithms
import networkx as nx
from networkx.algorithms import bipartite
from pyvis.network import Network
from networkx import linalg as nxla
import powerlaw as pl                                            # powerlaw fits for degree distribution
from IPython.display import display, Image, Markdown             # display images and markdown in jupyter

# general data science libraries
from matplotlib import pyplot as plt                            # basic plotting
import seaborn as sns                                           # advanced plotting
import numpy as np                                              # for representing n-dimensional arrays
import scipy as sp                                              # numerical computation
import pandas as pd                                             # dataframes

# python standard library
from time import time                                           # used for timing execution
from datetime import date, datetime                             # get current data and time
import json                                                     # read/ write json
import re                                                       # regex search 
import os                                                       # os operations
import random                                                   # randomness
from collections import Counter                                 # efficient counting
import contextlib

# custom imports
from cscripts import metrics
from cscripts import plotting
from cscripts import summarise
from cscripts import backboning # michele
#from cscripts import github_api
from cscripts import projections

### Set global style of plots

Below you can specify global style for all plots or any other setups related to plots visualization.

In [6]:
sns.set_style("darkgrid")
sns.set(rc={"xtick.bottom" : True, "ytick.left" : True})

### Flags

Flags are used to control the run flow of the notebook when executed at once. This is useful, to prevent operations that should only produce a result once, from running multiple times.

In [7]:
# section flags
LOAD_DATA = True # Loads raw data for initial inspection
TRANSFORM_DATA = False # Transforms raw data into more suitable format (Load data needs to be on as well)
COMPUTE_PROJECTIONS = False
GENERATE_SUMMARY_PROJ = False # Summary related to projections only
GET_ONE_CC = True # Do you want to get one connected component for each projection?
DO_BACKBONING = False
GENERATE_SUMMARY_PROJ_BACKB = False # Summary related to projected AND backboned graph
SAVE_FIG = False # Do you want to save all generated figures?
RANDOM_SAMPLE = False