# Clique Detection Using the `NETWORK` Actionset in SAS Viya and Python

In this demonstration, we load a small representative graph, use our CAS server to calculate the maximal cliques in a graph.

Recall that a clique of graph G is an induced subgraph that is a complete graph. Every node in a clique is connected to every other node in that clique. A maximal clique is a clique that is not a subset of the nodes of any larger clique. That is, it is a set C of nodes such that every pair of nodes in C is connected by a link and every node not in C is missing a link to at least one node in C.

In the graph G below, we expect to find *four* cliques.

----------------

The basic flow of this notebook is as follows:
1. Load the sample graph into a Pandas DataFrame as a set of links that represent the total graph. 
2. Connect to our CAS server and load the actionsets we require.
3. Upload our sample graph to our CAS server.
4. Execute the clique detection and output results into CAS tables.
5. Do required data manipulation to render the network graphs that show each clique.
6. Prepare and display the network plots showing the cliques.

----------------
__Prepared by:__
Damian Herrick (<i class="fa fa-github" aria-hidden="true"></i>: [dtherrick](www.github.com/dtherrick))

## Imports

Our imports are broken out as follows:

| Module           | Method            | Description                                                                        |
|:-----------------|:-----------------:|:----------------------------------------------------------------------------------:|
| `os`             | all               | Allows access to environment variables.                                            |
| `sys`            | all               | Used to update our system path so Python can import our custom utility functions.  |
| `swat`           | all               | SAS Python module that orchestrates communicatoin with a CAS server.               |
| `pandas`         | all               | Data management module we use for preparation of local data.                       |
| `networkx`       | all               | Used to manage graph data structures when plotting.                                |
| `bokeh.io`       | `output_notebook` | Utility function that allows rendering of Bokeh plots in Jupyter                   |
| `bokeh.io`       | `show`            | Utility function that displays Bokeh plots                                         |
| `bokeh.layouts`  | `gridplot`        | Utility function that arranges Bokeh plots in a multi-plot grid                    |
| `bokeh.palettes` | `Spectral8`       | Eight-color palette used to differentiate node types.                              |
| `visualization`  | all               | Custom module written to simplify plot rendering with Bokeh                        |

In [None]:
import os
import sys

import swat
import pandas as pd
import networkx as nx
from bokeh.io import output_notebook, show
from bokeh.layouts import gridplot
from bokeh.palettes import Spectral8

sys.path.append(os.path.join(os.path.dirname(os.getcwd()),r"../../common/python"))
import visualization as vis

# tell our notebook we want to output with Bokeh
output_notebook()

## Get the edge list for the sample graph and make an initial plot

In [None]:
colNames = ["from", "to"]
links = [
    (0, 1),
    (0, 2),
    (0, 3),
    (0, 4),
    (0, 5),
    (0, 6),
    (1, 2),
    (1, 3),
    (1, 4),
    (2, 3),
    (2, 4),
    (2, 5),
    (2, 6),
    (2, 7),
    (2, 8),
    (3, 4),
    (5, 6),
    (7, 8),
    (8, 9),
]

dfLinkSetIn = pd.DataFrame(links, columns=colNames)

G = nx.from_pandas_edgelist(dfLinkSetIn, 'from', 'to')

# Choose a title
title = 'Sample Graph Used for Maximal Clique Detection'

# Define the label for the nodes.
hover = [("Node", "@index")]

# How big are the nodes?
nodeSize = 25

# send to the utility function to generate the plot.
plot = vis.render_plot(graph=G, title=title, hover_tooltips=hover, node_size=nodeSize, center_x=2, center_y=0)

# display the graph
show(plot)

## Connect to CAS, load the actionsets we'll need, and upload our graph to the CAS server.

In [None]:
host = os.environ['CAS_HOST_ORGRD']
port = int(os.environ['CAS_PORT'])

# Connect to the server
conn = swat.CAS(host, port)

# Load the actionsets we need
_ = conn.loadactionset('network')
_ = conn.loadactionset('fedsql')

In [None]:
# Upload the local data into CAS.
_ = conn.upload(dfLinkSetIn, casout=dict(name='LinkSetIn'))

## Calculate the maximal cliques in our graph using the `network` actionset.

Since we've loaded our actionset, we can reference it using dot notation from our `connection` object.

Note that the Python code below is equivalent to this block of CASL:
```
proc network
   links         = mycas.LinkSetIn
   outNodes      = mycas.NodeSetOut;
   clique
      out        = mycas.Cliques
      maxCliques = all;
run;
```

In [None]:
conn.network.clique(links=dict(name='LinkSetIn'),
                    outnodes=dict(name='nodeSetOut'),
                    out=dict(name='cliques',),
                    maxcliques='ALL')

## Gather the results from CAS and prepare the data for plotting.

* Initially, we want to get a summary count of the cliques. Since the `clique` action doesn't provide a summary of total cliques, and nodes associated with each, so we use a simple `fedsql` groupby call to generate that table.

In [None]:
conn.fedsql.execdirect(
    query="create table cliqueSizes as select clique, count(*) from cliques group by clique")

If we want to see the contents of that table, we simply use the `fetch` action provided by `swat` to pull data locally and display in our notebook.

The output table confirms that the `network.clique` action found four cliques in our graph.

In [None]:
conn.fetch(table=dict(name='cliqueSizes'),
           sortby=[{'name': 'clique',
                    'order': 'ascending'}
                   ]
           )

CAS has calculated all of the information we need, so let's get anything remote back down to our local machine.

Once we have it all we can prepare our plots.

Our end product is a 2x2 grid of network plots, each one highlighting a particular clique. We have the following tables in CAS:

| Table | Description |
|-------|-------------|
| `LinkSetIn` | The initial link set that defines our graph. |
| `NodeSetOut` | A simple table of nodes. |
| `cliques` | A results table that highlights which nodes belong to which cliques. Note this only shows nodes the __belong__ to a clique; nodes outside the clique are not shown |
| `cliqueSizes` | The summary table that shows the count of nodes in a particular clique. Useful to get a list of total cliques. |

For this exercise, we will use the `networkx` `Graph` that we created earlier. Thus we do not need the `LinkSetIn` or `NodeSetOut` tables.

We need to fetch the `cliques` and `cliqueSizes` tables from CAS, and we'll manipulate them so we can add both clique membership and preferred coloring as node attributes to our graph.

We fetch the `nodeSetOut` table and convert it to a list of nodes for simplicity.

All of this will be used to create mapping dictionaries with node as a key, and the value represents an attribute. Once the mapping dictionaries are ready they are easily added to our `Graph` with the `set_node_attributes` method.

----------------------------
`clique_count` : We need the clique count list later when we iterate over cliques to highlight each clique in a separate plot. Let's generate this list by making a `CASTable` object, that we then convert to a dictionary. That dictionary has two keys: `clique` and `COUNT`; our list is the keys in the `clique` dictionary.

In [None]:
clique_count = list(conn.CASTable('cliqueSizes').to_dict()['clique'].keys())

`clique_list`: Now we do the same for the clique list, but we orient the resulting dictionary as `records` per Pandas. This gives us a JSON-like list of dictionaries. This list will then be used later to create our mapping dictionaries.

In [None]:
clique_list = conn.CASTable('cliques').to_dict(orient='records')

## Create base maps used to guide the individual plot generation.

We'll leverage a few Python conveniences over the next few cells - particularly comprehensions and easy list creation.

The results are:
* `clique_map`: a dictionary keyed by node label, with all `False` values.
* `highlight_map`: a dictionary keyed by node label, with the lightest color in the `Spectral8` palette.

When we are creating the attributes for a given clique, member nodes will be assigned `True` in the `clique_map`, and a Yellow from the `Spectral8` palette.

In [None]:
node_list = [int(x) for x in list(conn.CASTable('nodeSetOut')['node'])]
base_clique = [False]*len(node_list)
base_highlight = [Spectral8[-1]]*len(node_list)

clique_map = dict(zip(node_list, base_clique))
highlight_map = dict(zip(node_list, base_highlight))

## Assemble and Display the Plots

* We iterate over clique count, assigning attributes to the network that guide hover labeling and node shading.

In [None]:
# node size should be common over all plots.
nodeSize = 40

# initialize our plot container.
plot_list = []

# iterate over each of the cliques, and highlight (darken) the clique
for i in clique_count:
    # create a list of only the nodes that are members of the clique.
    subnode_list = [item['node']
                    for item in clique_list if item['clique'] == i]
    # create a clique and highlight map
    this_clique_map = {
        k: True if k in subnode_list else v for k, v in clique_map.items()}
    this_highlight_map = {
        k: Spectral8[-4] if k in subnode_list else v for k, v in highlight_map.items()}
    # set our node attributes.
    nx.set_node_attributes(G, this_highlight_map, 'highlight')
    nx.set_node_attributes(G, this_clique_map, 'clique')
    # set the title and hover plot values.
    title = f"Clique {i}"
    hover = [("Node", "@index"), ("Clique", "@clique")]
    # render the plot and add it to the list
    plot_list.append(vis.render_plot(
        graph=G, title=title, hover_tooltips=hover, node_size=nodeSize, node_color='highlight', center_x=2, center_y=0))

# now we know the length of the list, define how many columns we want, make our grid, and finally show it.
columns = int(len(plot_list) / 2)
grid = gridplot(plot_list, ncols=columns)
show(grid)

## Clean up everything. 

* Make sure we know what tables we created, drop them, and close our connection.
(This is probably overkill, since everything in this session is ephemeral anyway, but good practice nonetheless).

In [None]:
table_list = conn.tableinfo()["TableInfo"]["Name"].to_list()

for table in table_list:
    conn.droptable(name=table, quiet=True)

conn.close()