## NetworkPlot

### Set environmental variables

In order to properly load modules within this notebook from outside the repository folder, set the script **PATH** below,  e.g. ```C:/NetworkPlot```:

In [None]:
PATH = "/media/data/scripts/chn@git/chn-tools/tools/NetworkPlot" # <-- optional if running from native path

In [None]:
import importlib.util, os

if not os.path.isdir(PATH):
    PATH = os.getcwd()
PATH = os.path.realpath(PATH)

spec = importlib.util.spec_from_file_location("__init__", PATH+'/__init__.py')
init = importlib.util.module_from_spec(spec)
spec.loader.exec_module(init)

%matplotlib inline
%load_ext autoreload
%autoreload 2

### Import functions

In [6]:
#import metaknowledge as mk # <-- required for citation network
#import networkx as nx # <-- required for random graph

from NetworkPlot import NetworkPlot
from nxlib import nx_readgraph, nx_readfile

### Load network data

Choose and import a network graph object to analyze below.

#### 1) Graph data file

Common graph file formats to be imported as a ```networkx.Graph()``` object. Accepts GDF/GEXF/GML/GraphML, among others.

In [9]:
graph_file = "/home/neo/workspace/kmeans_tweets/TheExpanse_250.csv"

G = nx_readgraph(graph_file)

TypeError: nx_readfile() missing 2 required positional arguments: 'source' and 'target'

#### 2) Spreadsheet file

Reads content from file and build graph object. Accepts CSV/TAB/XLS/XLSX formats.

In [2]:
input_file = "/home/neo/workspace/kmeans_tweets/TheExpanse_250.csv"
source     = "from_user"
target     = "rt_user"

G = nx_readfile(input_file, source=source, target=target)

NameError: name 'nx_readfile' is not defined

#### 3) Citation Network

Reads content from folder using ```metaknowledge.RecordCollection```, split by year and assign it to a graph object.

In [1]:
raw_data   = ""
start_year = 2017
end_year   = 2018

if raw_data:
    RC = mk.RecordCollection(raw_data, cached=True)
    RC = RC.yearSplit(start_year, end_year)
    G = RC.networkCitation()
    print(len(RC),'total records\n')
    for line in RC.glimpse().split('\n')[3:]:
        print(line)

#### 4) Random test data

Loads a random geometric graph for testing purposes.

In [None]:
G = nx.random_geometric_graph(100, 0.1)

### Generate network analysis

Calls NetworkPlot in order to compute centrality measures, identify modules (clusters) and render graph.

Accepted layouts: `circular`, `kamada_kawai`, `fruchterman_reingold`, `spectral`, `spring`, `forceatlas2` and `random`.

In [None]:
NetworkPlot(G,                     # <-- input graph (required)
            k=1,                   # <-- k-value for k-core filter
            labels=True,           # <-- write names above nodes
            layout='kamada_kawai', # <-- from networkx or datashader
            it=500,                # <-- maximum number of iterations
            deg=True,              # <-- degree centrality measure
            clu=True,              # <-- clustering coefficient
            clo=True,              # <-- closeness coefficient
            eig=True,              # <-- eigenvector centrality
            bet=False,             # <-- betweenness centrality
            bri=False,             # <-- bridgeness centrality
            bro=False,             # <-- brokeness centrality
            mod=True,              # <-- identifies modules by Louvain method
            normalized=False,      # <-- set as True for MinMax scale
            max_nodes=1000,        # <-- number of top nodes to output
            max_modules=22,        # <-- number of top modules to output
            centrality_file=None,  # <-- previously output centrality measures
            plot_days=False,       # <-- set as True to plot daily graphs
            plot_modules=False,    # <-- set as True to plot modules' graphs
            output='network')      # <-- optionally set the output folder name

#### Compress output →  `output.zip`

In [None]:
raw_data = PATH+"/sample/raw_data.tar.bz2"

if raw_data:
    zipper.untar(raw_data, PATH+'/sample/') # <-- extract data
    RC = mk.RecordCollection(PATH+'/sample/raw_data', cached=True)
    RC = RC.yearSplit(2017, 2018) # <-- start, end
    G = RC.networkCitation() # <-- citation network
    print(len(RC),'total records\n')
    for line in RC.glimpse().split('\n')[3:]:
        print(line)

In [None]:
!zip -r output.zip network/

### [Download output files](output.zip)

_____
### References

* NetworkX: https://networkx.github.io

* Datashader: http://datashader.org/

* HoloViews: http://holoviews.org/

* plotly: https://plot.ly