## NetworkPlot

### Set environmental variables

In order to properly load modules within this notebook from outside the repository folder, set the script **PATH** below,  e.g. ```C:/NetworkPlot```:

In [None]:
PATH = "/path/to/NetworkPlot" # <-- optional if running from native path

In [None]:
import importlib.util, os

if not os.path.isdir(PATH):
    PATH = os.getcwd()
PATH = os.path.realpath(PATH)

spec = importlib.util.spec_from_file_location("__init__", PATH+'/__init__.py')
init = importlib.util.module_from_spec(spec)
spec.loader.exec_module(init)

%matplotlib inline
%load_ext autoreload
%autoreload 2

### Import functions

In [None]:
import plotly.offline as py

from dflib import df_load, df_describe
from NetworkPlot import NetworkPlot
from nxlib import nx_readgraph, nx_readfile

py.init_notebook_mode(connected=True)

### Load network data

Choose and import a network graph object to analyze below.

#### 1) Graph file

Reads graph file data and returns a `networkx.Graph()` object. Accepts GDF, GEXF, GML, Pickle, GraphML, LEDA, YAML, PAJEK and SHP formats.

In [None]:
graph_file = ""

G = nx_readgraph(graph_file)

#### 2) Text file

Reads content from file and builds graph object, e.g. from `TwitterCollector` data. Accepts CSV, TAB, TXT, XLS and XLSX formats.

In [None]:
input_file = "" # "tweets.csv"
source     = "" # "original_tweet_screen_name"
target     = "" # "retweet_screen_name"

G, dataset = nx_readfile(input_file, source, target)

#### 3) Citation data

Reads content from folder using `metaknowledge.RecordCollection`, split by year and assign it to a graph object.

In [None]:
import metaknowledge as mk

In [None]:
raw_data   = ""
start_year = 2017
end_year   = 2018

if raw_data:
    RC = mk.RecordCollection(raw_data, cached=True)
    RC = RC.yearSplit(start_year, end_year)
    G = RC.networkCitation()
    print(len(RC),'total records\n')
    for line in RC.glimpse().split('\n')[3:]:
        print(line)

### Network analysis

Calls NetworkPlot in order to compute centrality measures, identify modules (communities) and render graphs.

#### Advanced settings

* Layouts: `circular`, `kamada_kawai`, `fruchterman_reingold`, `spring` (F-R),  `spectral`, `forceatlas2` or `random`.

* Renderers: `networkx`, `datashader`, `ds_bundled` or `plotly`.

In [None]:
k = 0                     # value for k-core filter
it = 500                  # maximum number of iterations
df_centrality = ""        # saved centrality data frame

layout = 'forceatlas2'    # from networkx or datashader
renderer = 'networkx'     # library to plot graph
sort_by = 'degree'        # centrality measure to sort by

max_nodes = None          # number of top nodes by degree
max_modules = None        # number of top modules to output
max_r_nodes = None        # number of top nodes to render
max_labels = None         # number of top node labels to render

include_nodes = ""        # list of nodes to include (comma separated)
exclude_nodes = ""        # list of nodes to exclude (comma separated)

deg = True                # degree centrality measure
clu = False               # clustering coefficient
clo = False               # closeness centrality
eig = False               # eigenvector centrality
bet = False               # betweenness centrality
bri = False               # bridgeness centrality
bro = False               # brokeness centrality
den = False               # network density
dia = False               # network diameter
mod = False               # Louvain method for modularity

group_modules = False     # group nodes by their communities
normalized = False        # all except degree centrality
show_labels = False       # write node names (all by default)

plot_network = False      # set as True to plot network graph
plot_modules = False      # set as True to plot modules graphs
inline = True             # display rendered graphs inline

name = 'Network graph'    # graph name for whole network
output_folder = 'NETWORK' # optionally set the output folder name

#### Analyze network

Calls NetworkPlot main function based on the settings above. **Note:** a returned `df` might be used as input parameter to skip centrality computing.

In [None]:
df, dfm = NetworkPlot(G,
                      k=k,
                      it=it,
                      df=df_centrality,
                      layout=layout,
                      renderer=renderer,
                      sort_by=sort_by,
                      name=name,
                      include=include_nodes,
                      exclude=exclude_nodes,
                      deg=deg,
                      clu=clu,
                      clo=clo,
                      eig=eig,
                      bet=bet,
                      bri=bri,
                      bro=bro,
                      den=den,
                      dia=dia,
                      mod=mod,
                      group_modules=group_modules,
                      normalized=normalized,
                      show_labels=show_labels,
                      max_nodes=max_nodes,
                      max_modules=max_modules,
                      max_r_nodes=max_r_nodes,
                      max_labels=max_labels,
                      plot_network=plot_network,
                      plot_modules=plot_modules,
                      inline=inline,
                      output_folder=output_folder)

### Data frame

Display data frame of centrality measures from the nodes.

In [None]:
df

#### Data frame from modules

Display data frame of centrality measures from identified modules.

In [None]:
dfm

#### Data frame from nodes in a module

Display data frame of centrality measures from the nodes in a specific module **m**. By default, shows objects from the first module (`m=0`).

In [None]:
m = 0

df_ = df[df['module'] == m]; df_

### Statistics from data frame

Display statistics from nodes in a data frame.

In [None]:
df_describe(df)

#### Statistics from modules

Displays statistics from all modules in a data frame.

In [None]:
df_describe(dfm)

#### Statistics from nodes in a module

Display statistics from the nodes in a specific module **m**. By default, shows objects from the first module (`m=0`).

In [None]:
m = 0

df_ = df[df['module'] == m]; df_describe(df_)

### Correlate centralities

Displays chart correlating centrality measures. **Note:** one-liner requires `plotly_express` (bundled with `plotly >= 4`).

In [None]:
import plotly_express as px # >=0.3
#import plotly.express as px # >=0.4

In [None]:
x = 'in_degree'
y = 'out_degree'
size = 'degree'
color = None # 'module'

#### Correlate nodes centralities

Returns chart correlating centrality from nodes (`df`).

In [None]:
fig = px.scatter(df, x=x, y=y, size=size, color=color); fig
#fig.write_html('%s/network-centrality.html' % output_folder) # <-- uncomment to save as HTML file

#### Correlate modules centralities

Returns chart correlating centrality from modules (`dfm`).

In [None]:
fig = px.scatter(dfm, x=x, y=y, size=size, color=color); fig
#fig.write_html('%s/modules-centrality.html' % output_folder) # <-- uncomment to save as HTML file

### Filter data set by module

Returns data frame only for a specific module **m**. By default, shows objects from the first module (`m=0`). **Note:** requires a loaded `dataset` from file.

In [None]:
m = 0

mod = list(df[df['module'] == m].index)
df_ = dataset[dataset[source].isin(mod)]; df_

#df_.to_csv('%s/module_%s.csv' % (output_folder, k)) # <-- uncomment to save as CSV file

#### Filter data set for all modules (!)

Output all data filtered by top modules as CSV files. **Note:** requires a loded `dataset` from file.

In [None]:
for m in sorted(dfm.index[:max_modules]):
    mod = list(df[df['partition'] == m].index)
    df_ = dataset[dataset[source].isin(mod)]; df_
    df_.to_csv('%s/module_%s.csv' % (output_folder, m))

#### Compress output →  `output.zip`

In [None]:
!zip -r output.zip NETWORK

### [Download output files](output.zip)

_____
### References

* NetworkX: https://networkx.github.io

* Datashader: http://datashader.org/

* HoloViews: http://holoviews.org/

* Plotly: https://plot.ly