# Community Detection Using the `NETWORK` Actionset in SAS Viya and Python

In this example, we load a small sample undirected graph into CAS, and show how to detect communities using the network actionset. We demonstrate how resolution and fixed nodes affect the community detection as well.

----------------

The basic flow of this notebook is as follows:
1. Load the sample graph into a Pandas DataFrame as a set of links that represent the total graph. 
2. Connect to our CAS server and load the actionsets we require.
3. Upload our sample graph to our CAS server.
4. Execute the community detection without fixed nodes using two resolutions (0.5 and 1.0).
5. Execute the community detection with fixed nodes.
6. Prepare and display the network plots showing the cliques.

----------------
__Prepared by:__
Damian Herrick (<i class="fa fa-github" aria-hidden="true"></i>: [dtherrick](www.github.com/dtherrick))

## Imports

Our imports are broken out as follows:

| Module           | Method            | Description                                                                        |
|:-----------------|:-----------------:|:----------------------------------------------------------------------------------:|
| `os`             | all               | Allows access to environment variables.                                            |
| `sys`            | all               | Used to update our system path so Python can import our custom utility functions.  |
| `swat`           | all               | SAS Python module that orchestrates communicatoin with a CAS server.               |
| `pandas`         | all               | Data management module we use for preparation of local data.                       |
| `networkx`       | all               | Used to manage graph data structures when plotting.                                |
| `bokeh.io`       | `output_notebook` | Utility function that allows rendering of Bokeh plots in Jupyter                   |
| `bokeh.io`       | `show`            | Utility function that displays Bokeh plots                                         |
| `bokeh.layouts`  | `gridplot`        | Utility function that arranges Bokeh plots in a multi-plot grid                    |
| `bokeh.palettes` | `Spectral8`       | Eight-color palette used to differentiate node types.                              |
| `bokehvis`       | all               | Custom module written to simplify plot rendering with Bokeh                        |

In [1]:
import os
import sys

import swat
import pandas as pd
import networkx as nx
from bokeh.io import output_notebook, show
from bokeh.layouts import gridplot
from bokeh.palettes import Spectral8

sys.path.append(os.path.join(os.path.dirname(os.getcwd()),r"../../common/python"))
import bokehvis as vis

# tell our notebook we want to output with Bokeh
output_notebook()

## Prepare the sample graph. 
* We pass a set of links, and a set of nodes. Nodes are passed this time because we define fix groups for later calculation on load.

In [3]:
colNames = ["from", "to"]
links = [
    ("A", "B"),
    ("A", "F"),
    ("A", "G"),
    ("B", "C"),
    ("B", "D"),
    ("B", "E"),
    ("C", "D"),
    ("E", "F"),
    ("G", "I"),
    ("G", "H"),
    ("H", "I"),
]

dfLinkSetIn = pd.DataFrame(links, columns=colNames)

colNames = ["node", "fixGroup"]
nodes = [("A", 1), ("B", 1), ("C", 2), ("D", 2), ("H", 3), ("I", 3)]

dfNodeSetIn = pd.DataFrame(nodes, columns=colNames)

Let's start by looking at the basic network itself.

We create a `networkx` graph and pass it to our `bokeh` helper function to create the initial plot.

In [4]:
G_comm = nx.from_pandas_edgelist(dfLinkSetIn, 'from', 'to')

title = 'Sample Undirected Graph for Community Detection'
hover = [('Node', '@index')]
nodeSize = 25

plot = vis.render_plot(graph=G_comm, title=title, hover_tooltips=hover, node_size=nodeSize)
show(plot)

## Connect to CAS, load the actionsets we'll need, and upload our graph to the CAS server.

In [5]:
host = os.environ['CAS_HOST_ORGRD']
port = int(os.environ['CAS_PORT'])

conn = swat.CAS(host, port)

_ = conn.loadactionset("network")

NOTE: Added action set 'network'.


### Upload the local dataframes into CAS

In [6]:
conn.setsessopt(messageLevel="ERROR")
_ = conn.upload(dfLinkSetIn, casout=dict(name='LinkSetIn'))
_ = conn.upload(dfNodeSetIn, casout=dict(name='NodeSetIn'))
conn.setsessopt(messageLevel="DEFAULT")

### Calculate the communities (without fixed groups) in our graph using the `network` actionset.

Since we've loaded our actionset, we can reference it using dot notation from our connection object.

We expect that resolution of 0.5 will detect two communities; resolution of 1.0 will detect three communities.

Note that the Python code below is equivalent to this block of CASL:
```
proc network
   links              = mycas.LinkSetIn
   outNodes           = mycas.NodeSetOut;
   community
      resolutionList  = 1.0 0.5
      outLevel        = mycas.CommLevelOut
      outCommunity    = mycas.CommOut
      outOverlap      = mycas.CommOverlapOut
      outCommLinks    = mycas.CommLinksOut;
run;
```

In [7]:
conn.network.community(links=dict(name='LinkSetIn'),
                       outnodes=dict(name='nodeSetOutA'),
                       outLevel=dict(name='CommLevelOut'),
                       outCommunity=dict(name='CommOut'),   
                       outOverlap=dict(name='CommOverlapOut'),     
                       outCommLinks=dict(name='CommLinksOut'),
                       resolutionList=[0.5, 1]
 )

NOTE: The number of nodes in the input graph is 9.
NOTE: The number of links in the input graph is 11.
NOTE: Processing community detection using 1 threads across 1 machines.
NOTE: At resolution=1, the community algorithm found 3 communities with modularity=0.392562.
NOTE: At resolution=0.5, the community algorithm found 2 communities with modularity=0.342975.
NOTE: Processing community detection used 0.00 (cpu: 0.00) seconds.


Unnamed: 0,casLib,Name,Label,Rows,Columns,casTable
0,CASUSERHDFS(daherr),nodeSetOutA,,9,3,"CASTable('nodeSetOutA', caslib='CASUSERHDFS(da..."
1,CASUSERHDFS(daherr),CommLinksOut,,3,5,"CASTable('CommLinksOut', caslib='CASUSERHDFS(d..."
2,CASUSERHDFS(daherr),CommOut,,5,9,"CASTable('CommOut', caslib='CASUSERHDFS(daherr)')"
3,CASUSERHDFS(daherr),CommLevelOut,,2,4,"CASTable('CommLevelOut', caslib='CASUSERHDFS(d..."
4,CASUSERHDFS(daherr),CommOverlapOut,,11,3,"CASTable('CommOverlapOut', caslib='CASUSERHDFS..."

Unnamed: 0,Name1,Label1,cValue1,nValue1
0,numNodes,Number of Nodes,9,9.0
1,numLinks,Number of Links,11,11.0
2,graphDirection,Graph Direction,Undirected,

Unnamed: 0,Name1,Label1,cValue1,nValue1
0,problemType,Problem Type,Community Detection,
1,status,Solution Status,OK,
2,cpuTime,CPU Time,0.00,0.0
3,realTime,Real Time,0.00,0.00016


### Calculate the communities (with fixed groups) in our graph using the `network` actionset.

Using fixed node groups, we expect to find three communities.

The Python code in the subsequent block is equivalent to this block of CASL:
```
proc network
   nodes             = mycas.NodeSetIn
   links             = mycas.LinkSetIn
   outNodes          = mycas.NodeSetOut;
   community
      resolutionList = 1.0
      fix            = fixGroup;
run;
```

In [8]:
conn.network.community(nodes=dict(name='NodeSetIn'),
                       links=dict(name='LinkSetIn'),
                       outnodes=dict(name='NodeSetOutB'),
                       resolutionList=[1.0],
                       fix='fixGroup')

NOTE: The number of nodes in the input graph is 9.
NOTE: The number of links in the input graph is 11.
NOTE: Processing community detection using 1 threads across 1 machines.
NOTE: At resolution=1, the community algorithm found 3 communities with modularity=0.342975.
NOTE: Processing community detection used 0.00 (cpu: 0.00) seconds.


Unnamed: 0,casLib,Name,Label,Rows,Columns,casTable
0,CASUSERHDFS(daherr),NodeSetOutB,,9,2,"CASTable('NodeSetOutB', caslib='CASUSERHDFS(da..."

Unnamed: 0,Name1,Label1,cValue1,nValue1
0,numNodes,Number of Nodes,9,9.0
1,numLinks,Number of Links,11,11.0
2,graphDirection,Graph Direction,Undirected,

Unnamed: 0,Name1,Label1,cValue1,nValue1
0,problemType,Problem Type,Community Detection,
1,status,Solution Status,OK,
2,cpuTime,CPU Time,0.00,0.0
3,realTime,Real Time,0.00,0.000122


### Get the community results from CAS and prepare data for plotting

------
In this step we fetch the node results from CAS, then add community assignments and node fill color as node attributes in our `networkx` graph.
Since we do the same thing for each of the three communities, we'll combine into a single cell. If we're doing this in production we'd probably make a helper method.

| Table      | Description                                                                                 |
|------------|---------------------------------------------------------------------------------------------|
| `NodeSetA` | Results and community labels for the non-fixed group calculations, resolutions 0.5 and 1.0. |
| `NodeSetB` | Results and community labels for the fixed group calculation at resolution 1.0              |

| Attribute Label   | Description                                               |
|-------------------|-----------------------------------------------------------|
| `community_0`     | Community assignment for non-fixed groups, resolution 1.0 |
| `community_1`     | Community assignment for non-fixed groups, resolution 0.5 |
| `community_fixed` | Community assignment for fixed groups, resolution 1.0     |

In [9]:
# pull the node set locally so we can plot
comm_nodes_cas = conn.CASTable('NodeSetOutA').to_dict(orient='index')
comm_fixed_nodes_cas = conn.CASTable('NodeSetOutB').to_dict(orient='index')

# make our mapping dictionaries that allow us to assign attributes
comm_nodes_0 = {v['node']:v['community_0'] for v in comm_nodes_cas.values()}
comm_nodes_1 = {v['node']:v['community_1'] for v in comm_nodes_cas.values()}
comm_fixed_nodes = {v['node']:v['community_0'] for v in comm_fixed_nodes_cas.values()}

# set the attributes
nx.set_node_attributes(G_comm, comm_nodes_0, 'community_0')
nx.set_node_attributes(G_comm, comm_nodes_1, 'community_1')
nx.set_node_attributes(G_comm, comm_fixed_nodes, 'community_fixed')

# Assign the fill colors for the nodes.
for node in G_comm.nodes:
    G_comm.nodes[node]['highlight_0'] = Spectral8[int(G_comm.nodes[node]['community_0'])]
    G_comm.nodes[node]['highlight_1'] = Spectral8[int(G_comm.nodes[node]['community_1'])]
    G_comm.nodes[node]['highlight_fixed'] = Spectral8[int(G_comm.nodes[node]['community_fixed'])]

### Create the three plots and display them

In [10]:
title_0 = 'Community Detection Example 1: Resolution 1'
hover_0 = [('Node', '@index'), ('Community', '@community_0')]

title_1 = 'Community Detection Example 2: Resolution 0.5'
hover_1 = [('Node', '@index'), ('Community', '@community_1')]

title_fixed = 'Community Detection Example 3: Fixed Nodes'
hover_fixed = [('Node', '@index'), ('Community', '@community_fixed')]

# render the plots.
# reminder - we set nodeSize earlier in the notebook. Its value is 25.
plot_0 = vis.render_plot(graph=G_comm, title=title_0, hover_tooltips=hover_0, node_size=nodeSize, node_color='highlight_0')
plot_1 = vis.render_plot(graph=G_comm, title=title_1, hover_tooltips=hover_1, node_size=nodeSize, node_color='highlight_1')
plot_fixed = vis.render_plot(graph=G_comm, title=title_fixed, hover_tooltips=hover_fixed, node_size=nodeSize, node_color='highlight_fixed')

grid = gridplot([plot_0, plot_1, plot_fixed], ncols=2)
show(grid)

## Clean up everything. 

Make sure we know what tables we created, drop them, and close our connection.
(This is probably overkill, since everything in this session is ephemeral anyway, but good practice nonetheless).

In [11]:
table_list = conn.tableinfo()["TableInfo"]["Name"].to_list()

for table in table_list:
    conn.droptable(name=table, quiet=True)

conn.close()