# Zachary's Karate Club Community Detection Using the `NETWORK` Actionset in SAS Viya and Python

In this example, we load the Zachary's Karate Club graph into CAS, and show how to detect communities using the network actionset. 

his example uses Zachary’s Karate Club data (<a href="https://go.documentation.sas.com/?docsetId=casmlnetwork&docsetTarget=casmlnetwork_network_references.htm&docsetVersion=8.5&locale=en&showBanner=walkup#casmlnetwork_networkzach_w77">Zachary 1977</a>), which describes social network friendships between 34 members of a karate club at a US university in the 1970s. This is one of the standard publicly available data tables for testing community detection algorithms. It contains 34 nodes and 78 links. The graph is shown below.

----------------

The basic flow of this notebook is as follows:
1. Load the sample graph into a Pandas DataFrame as a set of links that represent the total graph. 
2. Connect to our CAS server and load the actionsets we require.
3. Upload our sample graph to our CAS server.
4. Execute the community detection without fixed nodes using two resolutions (0.5 and 1.0).
5. Prepare and display the network plots showing the cliques.

----------------
__Prepared by:__
Damian Herrick (<i class="fa fa-github" aria-hidden="true"></i>: [dtherrick](www.github.com/dtherrick))

## Imports

Our imports are broken out as follows:

| Module           | Method            | Description                                                                        |
|:-----------------|:-----------------:|:----------------------------------------------------------------------------------:|
| `os`             | all               | Allows access to environment variables.                                            |
| `sys`            | all               | Used to update our system path so Python can import our custom utility functions.  |
| `swat`           | all               | SAS Python module that orchestrates communicatoin with a CAS server.               |
| `pandas`         | all               | Data management module we use for preparation of local data.                       |
| `networkx`       | all               | Used to manage graph data structures when plotting.                                |
| `bokeh.io`       | `output_notebook` | Utility function that allows rendering of Bokeh plots in Jupyter                   |
| `bokeh.io`       | `show`            | Utility function that displays Bokeh plots                                         |
| `bokeh.layouts`  | `gridplot`        | Utility function that arranges Bokeh plots in a multi-plot grid                    |
| `bokeh.palettes` | `Spectral8`       | Eight-color palette used to differentiate node types.                              |
| `bokehvis`       | all               | Custom module written to simplify plot rendering with Bokeh                        |

In [1]:
import os
import sys

import swat
import pandas as pd
import networkx as nx
from bokeh.io import output_notebook, show
from bokeh.layouts import gridplot
from bokeh.palettes import Spectral8

sys.path.append(os.path.join(os.path.dirname(os.getcwd()),r"../../common/python"))
import bokehvis as vis

# tell our notebook we want to output with Bokeh
output_notebook()

## Prepare the sample graph. 
* We pass a set of links, and a set of nodes. Nodes are passed this time because we define fix groups for later calculation on load.

In [2]:
colNames = ["from", "to"]
links = [
    (0, 9), (0, 10), (0, 14), (0, 15), (0, 16), (0, 19), (0, 20), (0, 21), (0, 33),
    (0, 23), (0, 24), (0, 27), (0, 28), (0, 29), (0, 30), (0, 31), (0, 32),
    (2, 1),
    (3, 1), (3, 2),
    (4, 1), (4, 2), (4, 3),
    (5, 1),
    (6, 1),
    (7, 1), (7, 5), (7, 6),
    (8, 1), (8, 2), (8, 3), (8, 4),
    (9, 1), (9, 3),
    (10, 3),
    (11, 1), (11, 5), (11, 6),
    (12, 1),
    (13, 1), (13, 4),
    (14, 1), (14, 2), (14, 3), (14, 4),
    (17, 6), (17, 7),
    (18, 1), (18, 2),
    (20, 1), (20, 2),
    (22, 1), (22, 2),
    (26, 24), (26, 25),
    (28, 3), (28, 24), (28, 25),
    (29, 3),
    (30, 24), (30, 27),
    (31, 2), (31, 9),
    (32, 1), (32, 25), (32, 26), (32, 29),
    (33, 3), (33, 9), (33, 15), (33, 16), (33, 19), (33, 21), (33, 23), (33, 24), (33, 30), (33, 31), (33, 32),
]
dfLinkSetIn = pd.DataFrame(links, columns=colNames)

Let's start by looking at the basic network itself.

We create a `networkx` graph and pass it to our `bokeh` helper function to create the initial plot.

In [3]:
G_comm = nx.from_pandas_edgelist(dfLinkSetIn, 'from', 'to')

title = "Zachary's Karate Club"
hover = [('Node', '@index')]
nodeSize = 25

plot = vis.render_plot(graph=G_comm, 
                       title=title, 
                       hover_tooltips=hover, 
                       node_size=nodeSize, 
                       width=1200, 
                       label_font_size="10px",
                       label_x_offset=-3)

show(plot)

## Connect to CAS, load the actionsets we'll need, and upload our graph to the CAS server.

In [4]:
host = os.environ['CAS_HOST_ORGRD']
port = int(os.environ['CAS_PORT'])

conn = swat.CAS(host, port)

conn.loadactionset("network")

NOTE: Added action set 'network'.


### Upload the local dataframe into CAS

In [5]:
conn.setsessopt(messageLevel="ERROR")
_ = conn.upload(dfLinkSetIn, casout='LinkSetIn')
conn.setsessopt(messageLevel="DEFAULT")

### Step 3: Calculate the communities (without fixed groups) in our graph using the `network` actionset.

Since we've loaded our actionset, we can reference it using dot notation from our connection object.

We use detection at two resolutions: 0.5 and 1.0

Note that the Python code below is equivalent to this block of CASL:
```
proc network
   links              = mycas.LinkSetIn
   outNodes           = mycas.NodeSetOut;
   community
      resolutionList  = 1.0 0.5
      outLevel        = mycas.CommLevelOut
      outCommunity    = mycas.CommOut
      outOverlap      = mycas.CommOverlapOut
      outCommLinks    = mycas.CommLinksOut;
run;
```

In [6]:
conn.network.community(links          = {'name':'LinkSetIn'},
                       outnodes       = {'name':'nodeSetOut', 'replace':True},
                       outLevel       = {'name':'CommLevelOut', 'replace':True},
                       outCommunity   = {'name':'CommOut', 'replace':True},   
                       outOverlap     = {'name':'CommOverlapOut', 'replace':True},     
                       outCommLinks   = {'name':'CommLinksOut', 'replace':True},
                       resolutionList = [0.5, 1]
 )

NOTE: The number of nodes in the input graph is 34.
NOTE: The number of links in the input graph is 78.
NOTE: Processing community detection using 1 threads across 1 machines.
NOTE: At resolution=1, the community algorithm found 4 communities with modularity=0.418803.
NOTE: At resolution=0.5, the community algorithm found 2 communities with modularity=0.371795.
NOTE: Processing community detection used 0.00 (cpu: 0.00) seconds.


Unnamed: 0,casLib,Name,Label,Rows,Columns,casTable
0,CASUSERHDFS(daherr),nodeSetOut,,34,3,"CASTable('nodeSetOut', caslib='CASUSERHDFS(dah..."
1,CASUSERHDFS(daherr),CommLinksOut,,5,5,"CASTable('CommLinksOut', caslib='CASUSERHDFS(d..."
2,CASUSERHDFS(daherr),CommOut,,6,9,"CASTable('CommOut', caslib='CASUSERHDFS(daherr)')"
3,CASUSERHDFS(daherr),CommLevelOut,,2,4,"CASTable('CommLevelOut', caslib='CASUSERHDFS(d..."
4,CASUSERHDFS(daherr),CommOverlapOut,,47,3,"CASTable('CommOverlapOut', caslib='CASUSERHDFS..."

Unnamed: 0,Name1,Label1,cValue1,nValue1
0,numNodes,Number of Nodes,34,34.0
1,numLinks,Number of Links,78,78.0
2,graphDirection,Graph Direction,Undirected,

Unnamed: 0,Name1,Label1,cValue1,nValue1
0,problemType,Problem Type,Community Detection,
1,status,Solution Status,OK,
2,cpuTime,CPU Time,0.00,0.0
3,realTime,Real Time,0.00,0.000195


### Step 4: Get the community results from CAS and prepare data for plotting

------
In this step we fetch the node results from CAS, then add community assignments and node fill color as node attributes in our `networkx` graph.

| Table      | Description                                               |
|------------|-----------------------------------------------------------|
| `NodeSetA` | Results and community labels for resolutions 0.5 and 1.0. |

| Attribute Label   | Description                          |
|-------------------|--------------------------------------|
| `community_0`     | Community assignment, resolution 1.0 |
| `community_1`     | Community assignment, resolution 0.5 |

In [7]:
# pull the node set locally so we can plot
comm_nodes_cas = conn.CASTable('NodeSetOut').to_dict(orient='index')

# make our mapping dictionaries that allow us to assign attributes
comm_nodes_0 = {v['node']:v['community_0'] for v in comm_nodes_cas.values()}
comm_nodes_1 = {v['node']:v['community_1'] for v in comm_nodes_cas.values()}

# set the attributes
nx.set_node_attributes(G_comm, comm_nodes_0, 'community_0')
nx.set_node_attributes(G_comm, comm_nodes_1, 'community_1')

# Assign the fill colors for the nodes.
for node in G_comm.nodes:
    G_comm.nodes[node]['highlight_0'] = Spectral8[int(G_comm.nodes[node]['community_0'])]
    G_comm.nodes[node]['highlight_1'] = Spectral8[int(G_comm.nodes[node]['community_1'])]

### Create and display the plots

In [8]:
title_0 = 'Community Detection Example 1: Resolution 1'
hover_0 = [('Node', '@index'), ('Community', '@community_0')]

title_1 = 'Community Detection Example 2: Resolution 0.5'
hover_1 = [('Node', '@index'), ('Community', '@community_1')]

# render the plots.
# reminder - we set nodeSize earlier in the notebook. Its value is 25.
plot_0 = vis.render_plot(graph=G_comm, title=title_0, hover_tooltips=hover_0, node_size=nodeSize, node_color='highlight_0', width=1200)
plot_1 = vis.render_plot(graph=G_comm, title=title_1, hover_tooltips=hover_1, node_size=nodeSize, node_color='highlight_1', width=1200)

In [9]:
grid = gridplot([plot_0, plot_1], ncols=1)
show(grid)

## Clean up everything. 

Make sure we know what tables we created, drop them, and close our connection.
(This is probably overkill, since everything in this session is ephemeral anyway, but good practice nonetheless.

In [10]:
table_list = conn.tableinfo()["TableInfo"]["Name"].to_list()

for table in table_list:
    conn.droptable(name=table, quiet=True)

conn.close()