# Zachary's Karate Club Community Detection Using the `NETWORK` Actionset in SAS Viya and Python

In this example, we load the Zachary's Karate Club graph into CAS, and show how to detect communities using the network actionset. 

his example uses Zachary’s Karate Club data (<a href="https://go.documentation.sas.com/?docsetId=casmlnetwork&docsetTarget=casmlnetwork_network_references.htm&docsetVersion=8.5&locale=en&showBanner=walkup#casmlnetwork_networkzach_w77">Zachary 1977</a>), which describes social network friendships between 34 members of a karate club at a US university in the 1970s. This is one of the standard publicly available data tables for testing community detection algorithms. It contains 34 nodes and 78 links. The graph is shown below.

----------------

The basic flow of this notebook is as follows:
1. Load the sample graph into a Pandas DataFrame as a set of links that represent the total graph. 
2. Connect to our CAS server and load the actionsets we require.
3. Upload our sample graph to our CAS server.
4. Execute the community detection without fixed nodes using two resolutions (0.5 and 1.0).
5. Prepare and display the network plots showing the cliques.

----------------
__Prepared by:__
Damian Herrick (damian.herrick@sas.com)

#### Imports

Our imports are broken out as follows:

| Module        | Description                                                                        |
|:--------------|:----------------------------------------------------------------------------------:|
| `os`          | Allows access to environment variables.                                            |
| `swat`        | SAS Python module that orchestrates communicatoin with a CAS server.               |
| `pandas`      | Data management module we use for preparation of local data.                       |
| `networkx`    | Used to manage graph data structures when plotting.                                |
| `bokeh`       | Module used to generate interactive plots of graphs.                               |
| `python_demo` | Custom module written for these examples that handles datasets and visualizations. |

In [1]:
import os
import swat
import pandas as pd

import networkx as nx

from bokeh.io import output_notebook, show
from bokeh.layouts import gridplot
from bokeh.palettes import Spectral8

from python_demo.datasets.examples import community_karate_club_links
from python_demo.visualization.bokeh_graphs import render_plot

The call to `output_notebook` is required by `bokeh` to render plots inside Jupyter Notebooks.

In [2]:
output_notebook()

### Step 1: Prepare the sample graph. 
* We pass a set of links, and a set of nodes. Nodes are passed this time because we define fix groups for later calculation on load.

In [3]:
dfLinkSetIn = community_karate_club_links()

Let's start by looking at the basic network itself.

We create a `networkx` graph and pass it to our `bokeh` helper function to create the initial plot.

In [4]:
G_comm = nx.from_pandas_edgelist(dfLinkSetIn, 'from', 'to')

title = "Zachary's Karate Club"
hover = [('Node', '@index')]
nodeSize = 40

plot = render_plot(G_comm, title, hover, nodeSize, width=1200)
show(plot)

### Step 2: Connect to CAS, load the actionsets we'll need, and upload our graph to the CAS server.

In [5]:
host = os.environ['CAS_HOST']
port = int(os.environ['CAS_PORT'])
print(f"{host}:{port}")

rdcgrd113.unx.sas.com:23404


In [6]:
conn = swat.CAS(host, port)

_ = conn.loadactionset("network")

NOTE: Added action set 'network'.


Before we load the data, we should verify which caslib is active. Since we just connected and have not specified, the active library should map to our user ID.

Only one caslib can be active at a time. As long as we are happy with the active caslib, we do not need to reference the caslib in subsequent calls to CAS through `swat` methods. Note that this is slightly different from the corrresponding CASL calls we reference.

In [7]:
conn.caslibinfo()

Unnamed: 0,Name,Type,Description,Path,Definition,Subdirs,Local,Active,Personal,Hidden,Transient
0,CASTestTmp,PATH,castest's test files,/bigdisk/lax/castest/,,1.0,0.0,0.0,0.0,0.0,0.0
1,CASUSER(daherr),PATH,Personal File System Caslib,/u/daherr/,,1.0,0.0,1.0,1.0,0.0,1.0
2,Formats,PATH,Format Caslib,/bigdisk/lax/formats/,,1.0,0.0,0.0,0.0,0.0,0.0


#### Upload the local dataframe into CAS

In [8]:
_ = conn.upload(dfLinkSetIn, casout=dict(name='LinkSetIn'))

NOTE: Cloud Analytic Services made the uploaded file available as table LINKSETIN in caslib CASUSER(daherr).
NOTE: The table LINKSETIN has been created in caslib CASUSER(daherr) from binary data uploaded to Cloud Analytic Services.


### Step 3: Calculate the communities (without fixed groups) in our graph using the `network` actionset.

Since we've loaded our actionset, we can reference it using dot notation from our connection object.

We use detection at two resolutions: 0.5 and 1.0

Note that the Python code below is equivalent to this block of CASL:
```
proc network
   links              = mycas.LinkSetIn
   outNodes           = mycas.NodeSetOut;
   community
      resolutionList  = 1.0 0.5
      outLevel        = mycas.CommLevelOut
      outCommunity    = mycas.CommOut
      outOverlap      = mycas.CommOverlapOut
      outCommLinks    = mycas.CommLinksOut;
run;
```

In [9]:
conn.network.community(links=dict(name='LinkSetIn'),
                       outnodes=dict(name='nodeSetOut'),
                       outLevel=dict(name='CommLevelOut'),
                       outCommunity=dict(name='CommOut'),   
                       outOverlap=dict(name='CommOverlapOut'),     
                       outCommLinks=dict(name='CommLinksOut'),
                       resolutionList=[0.5, 1]
 )

NOTE: The number of nodes in the input graph is 34.
NOTE: The number of links in the input graph is 78.
NOTE: Processing community detection using 1 threads across 1 machines.
NOTE: At resolution=1, the community algorithm found 4 communities with modularity=0.418803.
NOTE: At resolution=0.5, the community algorithm found 2 communities with modularity=0.371795.
NOTE: Processing community detection used 0.00 (cpu: 0.00) seconds.


Unnamed: 0,casLib,Name,Label,Rows,Columns,casTable
0,CASUSER(daherr),nodeSetOut,,34,3,"CASTable('nodeSetOut', caslib='CASUSER(daherr)')"
1,CASUSER(daherr),CommLinksOut,,5,5,"CASTable('CommLinksOut', caslib='CASUSER(daher..."
2,CASUSER(daherr),CommOut,,6,9,"CASTable('CommOut', caslib='CASUSER(daherr)')"
3,CASUSER(daherr),CommLevelOut,,2,4,"CASTable('CommLevelOut', caslib='CASUSER(daher..."
4,CASUSER(daherr),CommOverlapOut,,47,3,"CASTable('CommOverlapOut', caslib='CASUSER(dah..."

Unnamed: 0,Name1,Label1,cValue1,nValue1
0,numNodes,Number of Nodes,34,34.0
1,numLinks,Number of Links,78,78.0
2,graphDirection,Graph Direction,Undirected,

Unnamed: 0,Name1,Label1,cValue1,nValue1
0,problemType,Problem Type,Community Detection,
1,status,Solution Status,OK,
2,cpuTime,CPU Time,0.00,0.0
3,realTime,Real Time,0.00,0.000185


### Step 4: Get the community results from CAS and prepare data for plotting

------
In this step we fetch the node results from CAS, then add community assignments and node fill color as node attributes in our `networkx` graph.

| Table      | Description                                               |
|------------|-----------------------------------------------------------|
| `NodeSetA` | Results and community labels for resolutions 0.5 and 1.0. |

| Attribute Label   | Description                          |
|-------------------|--------------------------------------|
| `community_0`     | Community assignment, resolution 1.0 |
| `community_1`     | Community assignment, resolution 0.5 |

In [10]:
# pull the node set locally so we can plot
comm_nodes_cas = conn.CASTable('NodeSetOut').to_dict(orient='index')

# make our mapping dictionaries that allow us to assign attributes
comm_nodes_0 = {v['node']:v['community_0'] for v in comm_nodes_cas.values()}
comm_nodes_1 = {v['node']:v['community_1'] for v in comm_nodes_cas.values()}

# set the attributes
nx.set_node_attributes(G_comm, comm_nodes_0, 'community_0')
nx.set_node_attributes(G_comm, comm_nodes_1, 'community_1')

# Assign the fill colors for the nodes.
for node in G_comm.nodes:
    G_comm.nodes[node]['highlight_0'] = Spectral8[int(G_comm.nodes[node]['community_0'])]
    G_comm.nodes[node]['highlight_1'] = Spectral8[int(G_comm.nodes[node]['community_1'])]

### Step 5: Create and display the plots

In [11]:
title_0 = 'Community Detection Example 1: Resolution 1'
hover_0 = [('Node', '@index'), ('Community', '@community_0')]

title_1 = 'Community Detection Example 2: Resolution 0.5'
hover_1 = [('Node', '@index'), ('Community', '@community_1')]

# render the plots.
# reminder - we set nodeSize earlier in the notebook. Its value is 40.
plot_0 = render_plot(G_comm, title_0, hover_0, node_size=nodeSize, node_color='highlight_0', width=1200)
plot_1 = render_plot(G_comm, title_1, hover_1, node_size=nodeSize, node_color='highlight_1', width=1200)

In [12]:
grid = gridplot([plot_0, plot_1], ncols=1)
show(grid)

### Step 7: Clean up everything. 

Make sure we know what tables we created, drop them, and close our connection.
(This is probably overkill, since everything in this session is ephemeral anyway, but good practice nonetheless.

In [13]:
conn.tableinfo()

Unnamed: 0,Name,Rows,Columns,IndexedColumns,Encoding,CreateTimeFormatted,ModTimeFormatted,AccessTimeFormatted,JavaCharSet,CreateTime,Repeated,View,MultiPart,SourceName,SourceCaslib,Compressed,Creator,Modifier,SourceModTimeFormatted,SourceModTime
0,LINKSETIN,78,2,0,utf-8,2021-07-16T15:45:24-04:00,2021-07-16T15:45:24-04:00,2021-07-16T15:45:28-04:00,UTF8,1942084000.0,0,0,0,,,0,daherr,,2021-07-16T15:45:24-04:00,1942084000.0
1,NODESETOUT,34,3,0,utf-8,2021-07-16T15:45:28-04:00,2021-07-16T15:45:28-04:00,2021-07-16T15:45:44-04:00,UTF8,1942084000.0,0,0,0,,,0,daherr,,,
2,COMMLEVELOUT,2,4,0,utf-8,2021-07-16T15:45:28-04:00,2021-07-16T15:45:28-04:00,2021-07-16T15:45:28-04:00,UTF8,1942084000.0,0,0,0,,,0,daherr,,,
3,COMMOUT,6,9,0,utf-8,2021-07-16T15:45:28-04:00,2021-07-16T15:45:28-04:00,2021-07-16T15:45:28-04:00,UTF8,1942084000.0,0,0,0,,,0,daherr,,,
4,COMMLINKSOUT,5,5,0,utf-8,2021-07-16T15:45:28-04:00,2021-07-16T15:45:28-04:00,2021-07-16T15:45:28-04:00,UTF8,1942084000.0,0,0,0,,,0,daherr,,,
5,COMMOVERLAPOUT,47,3,0,utf-8,2021-07-16T15:45:28-04:00,2021-07-16T15:45:28-04:00,2021-07-16T15:45:28-04:00,UTF8,1942084000.0,0,0,0,,,0,daherr,,,


In [14]:
conn.droptable(name='LinksetIn', quiet=True)
conn.droptable(name='NodeSetOut', quiet=True)
conn.droptable(name='CommOut', quiet=True)
conn.droptable(name='CommLevelOut', quiet=True)
conn.droptable(name='CommLinksOut', quiet=True)
conn.droptable(name='CommOverlapOut', quiet=True)

In [15]:
conn.close()