## The Node in the Machine: Software Architecture as Network


### Bobby Norton
### SCNA 2016
### September 17, 2016
### Los Angeles, CA


### Exercise: Who's Here?

## A design / refactoring experiment...

Take a code base you know well. Put everything in one namespace / package / your language's equivalent of organization.

The tests still pass.

Open the code. Do you like this better?

There are fewer things, right? Fewer files. Fewer directories. 

_What's not to love?_

There are fewer _boundaries_, not just 'things'. 

Everything is exposed to us every time we look at this file.

What about our test cases that defined boundaries between the sub-systems?

Do we open up some of the functions that we want to test in isolation?

## Visualizing code with Cytoscape

In [44]:
from scripts.io import *

network_data = csv_to_edgelist('./data/lein-topology-335a129.csv')

In [45]:
from scripts.graph import *

g = edgelist_to_igraph(network_data)

In [46]:
# Many thanks to Kei Ono for his work on py2cytoscape
# https://github.com/idekerlab/tsri-lecture/blob/master/lecture2/Lesson_2_reproducible_workflow.ipynb

from py2cytoscape.data.cynetwork import CyNetwork
from py2cytoscape.data.cyrest_client import CyRestClient
from py2cytoscape.data.style import StyleUtil
import py2cytoscape.util.cytoscapejs as cyjs
import py2cytoscape.cytoscapejs as renderer

In [47]:
# Be sure Cytoscape is running, then...

# Step 1: Create py2cytoscape client
cy = CyRestClient()

In [48]:
# Reset (delete all existing networks)
cy.session.delete()

In [49]:
# Step 2: Load network
cy_network = cy.network.create_from_igraph(g, name="335a129", collection="lein-topology")

In [50]:
# Step 3: Apply layout
cy.layout.apply(name='force-directed', network=cy_network)

In [51]:
# Step 4: Create Visual Style as code
custom_style = cy.style.create('B Sides')

basic_settings = {
    # You can set default values as key-value pairs.
    
    'NODE_FILL_COLOR': '#6AACB8',
    'NODE_SIZE': 55,
    'NODE_BORDER_WIDTH': 0,
    'NODE_LABEL_COLOR': '#555555',
    
    'EDGE_WIDTH': 2,
    'EDGE_TRANSPARENCY': 100,
    'EDGE_STROKE_UNSELECTED_PAINT': '#333333'
}

custom_style.update_defaults(basic_settings)
cy.style.apply(custom_style, cy_network)

In [52]:
# Map the label property in the igraph data to Cytoscape's NODE_LABEL visual property
custom_style.create_passthrough_mapping(column='label', vp='NODE_LABEL', col_type='String')
cy.style.apply(custom_style, cy_network)

In [60]:
# Step 5: (Optional) Embed as interactive Cytoscape.js widget
view = cy_network.get_first_view()
style_for_widget = cy.style.get(custom_style.get_name(), data_format='cytoscapejs')
renderer.render(view, style=style_for_widget['style'], background='radial-gradient(#FFFFFF 15%, #DDDDDD 105%)')

Now switch to the visual editor and compare the workflow by hand.

* Tools -> Network Analyzer -> Network Analysis -> Analyze Network

* Size nodes by their outdegree. This highlights which functions act as hubs.

* Remove all vertices and edges from namespaces outside of the library.

* Layout -> yFiles Hierarchical Layout

* Layout -> Clear All Edge Bends

* Move test paths to the right-hand side. Color test vertices yellow.

* Line up production vertices so that the arrow of time points in one direction.

![](./img/lein-topology-faad435.png)

Start at the lower left, in this case at leiningen.topology/topology. The flow of control starts here and moves in a depth-first search from the lower left to the upper right. At the end of each path, control returns to the caller and proceeds across the next outgoing edge.

Notice how the five namespaces in this library are arranged to be in close proximity.

The program is a tree...a directed acyclic graph.

Test coverage can be seen at a glance by following edges from the test vertices on the right.

If this diagram were static, it would be an infographic. Informative, perhaps, but ultimately prone to error as the system changes. What we'd like is the ability to _generate_ this diagram from underlying data. In fact, this diagram was mostly generated automatically, and could be completely.

### We like architecture diagrams because they provide a compact visual description of complicated engineered systems.

There is no substitute for quick exploratory analysis and pattern recognition.

### We don't like pushing pixels.

Laying out by hand isn't going to happen on every commit.

This is why diagrams get stale.

End-to-end automation creates some interesting possibilities:

* Generate edge data
* Generate visualization based on previous commit.
* Save coordinates of existing vertices.
* Generate visualization of latest code based on recent changes, laying out by hand only things that have changed.

### The problems...

* People capturing architecture as marketecture chartjunk.

* People create their own vocabularies to describe architecture, then try to build a business off of these ideas. UML is the most infamous of these vocabularies. Plenty of wasted money and time has been spent on that effort.

### The alternative: Architecture as Network

A dependency network can be represented as an edge list of the form "source,target,weight", e.g.:

```
topology.core/print-weighted-edges,clojure.core/defn,1
topology.core/print-weighted-edges,clojure.core/doseq,1
topology.core/print-weighted-edges,clojure.core/println,1
topology.core/print-weighted-edges,clojure.string/join,1
```

This raw data can be imported into visualization tools and organized as a graph and treated like a database.

Equivalently, network diagrams created in tools like Cytoscape can be saved as network data.

### Descriptive architecture based on observations...

### ...over prescriptive architecture based on prognostications.

## Dependency Graphing Exercise

Take a library you'd like to better understand...probably one you want to change.

Start with the output, then walk the dependency graph. 

**You can start with just package / namespace / class level dependencies**

Create the network manually in Cytoscape. We can export the data as a node, edge list when we're done.

You might even find it easier to work in text, building out the dependencies edgewise, then importing into cytoscape when you're done.

### Collect the Dots (Artifacts aren't just functions...)

You've seen topology create a function dependency graph from a Clojure repo. The same approach is generic: Sources of data to mine "edgewise" include git repos, Jenkins / CI, AWS infra like Route 53. hub.docker.com.

### Analyze...Connect the Dots

In memory graph analysis is appropriate for N < 1M

yourdatafitsinram

### Visualize

    "A fundamental challenge in moving from the static to the dynamic is the need to respect, in the case of the latter, what is referred to as the user’s mental map. This term is used to describe the result of the process by which, upon studying a given static network map, a user becomes familiar with it, interprets it, and navigates about it. Simply put, we would expect a certain amount of ‘stability’ across visualizations." - Statistical Analysis of Network Data in R

### Act: Refactor...Restructure

    "With the adoption of a graph-based framework for representing relational data in network analysis we inherit a rich vocabulary for discussing various important concepts related to graphs."

    "questions of interest can often be re-phrased in a useful manner as questions regarding some aspect of the structure or characteristics of a corresponding network graph.

### Applications of networks for code

* Orientation...where are things in this system?
 * Collections of vertices: Communities/Clusters => Packages/Namespaces
 * The experiment we did at the beginning is starting to be how I'm thinking about program organization. Put it all in one place, at least as a thought experiment. Where are the communities that result from the network structure? How do we describe the flow of control through the program? What if we simply organize the package structure to reflect that flow of control? **Form follows function.**
 
* Which of the containers are "hidden", visible from only a small number of consumers? `topology.symbols`, for example, is hidden behind `topology.dependencies`. The entire implementation could be swapped out without impacting the rest of the program if the contract with `topology.dependencies` is maintained. 

* Root cause analysis: Paths from a temporary root node to the node where a problem is observed. pathfinding + changelogs.

* SRP enforcement: Which namespaces are the consumers of any other given namespace? Does the provider expose a consistent interface to consumers?

* YAGNI: Find all nodes with no incoming edges that aren't in a certain namespace (like the one with the main method).

* Change propogators: High in-degree and out-degree centrality. "Change agents make systems brittle because they increase the likelihood that the effect of a change will propagate to a disproportionately large portion of the system." These seem like they would also be picked up by other ranking algorithms and centrality measures like pagerank.

## Visualizing code with a Dependency Structure Matrix (DSM)

`'s','t',1` => `[{'source': 's','target': 't','weight': '1'}]`

In [53]:
from scripts.io import *

network_data = csv_to_edgelist('./data/lein-topology-faad435.csv')

In [54]:
len(network_data)

204

In [55]:
list(network_data)[:5]

[{'source': 'topology.dependencies/dependencies',
  'target': 'clojure.core/defn-',
  'weight': '1'},
 {'source': 'topology.dependencies/filtered',
  'target': 'clojure.core/filter',
  'weight': '1'},
 {'source': 'topology.dependencies-test/should-compute-fn-calls-in-namespace',
  'target': 'clojure.core/defn',
  'weight': '1'},
 {'source': 'example/test-when', 'target': 'clojure.core/cons', 'weight': '1'},
 {'source': 'leiningen.topology/topology',
  'target': 'org.clojure/clojure',
  'weight': '1'}]

Let's try visualizing as a Dependency Structure Matrix. 

["A DSM scales better than a network visualization"](http://www.ndepend.com/docs/dependency-structure-matrix-dsm).

In [56]:
from scripts.graph import *

len(edgelist_to_nodes(network_data))

106

We will be looking at a $N^2$ matrix...the dependency structure matrix. 

`row -depends-> column`

Try sorting the entries by outdegree. Notice how most of the most of the lower half of the matrix is empty? This is otherwise known as a sparse matrix.

In [57]:
from scripts.layouts import *

matrix(network_data, 800)

### DSM Visualization Exercise Ideas

* We would ideally like to order this by group. In this case, namespace is a reasonable way to group. There are many potential options.

* Coloring cells could be done in a more interesting way.

* Abbreviating columns would be useful.

### This doesn't mean BDUF is back in style...

...but equally, NDUF (No Design Up Front) and NDE (No Design Ever) aren't cool anymore now that you have a powerful architecture model. 

* Start with the simplest structure that can possibly work.

* Descriptive over prescriptive architecture. Given the level of complication, it's tough to know a priori what you are about to create. Reverse engineer the structure of an existing system, then bring in structural analysis to your red-green-refactor cycle.

* Once desirable structural patterns are known amongst the team, you can start to write tests that express these rules.

* Techniques like TDD and BDD are design techniques, not only verification steps. The structural modeling allows you to visualize and navigate the structure of your code, however it was produced. Given the assertion that TDD / BDD result in "better" designs, network modeling may provide a means for objective evidence.

### Don't mix networks in the same data set (unless you know what you're doing)

* We've been exploring tools for network analysis. Property graph db vendors will encourage you to put everything in a graph, then query what you need. The downside is more complicated schema management.

While you _can_ use a graph database in lieu of RDBMS, it's not entirely clear that you _should_.

Be clear about what your vertices and edges represent.

### Graph Databases?

At no point until now have I said anything about 'graph databases' like Neo4j and Titan. These are persistence stores. They offer a query language, scalability, transactional support, and security along with other concerns found in a RDBMS / NOSQL persistence tier. 

Use the simplest structure that can possibly work. Given that yourdatafitsinram, you can likely go very far with an in-memory approach that reads in all the data upon system startup. If you're at a point where you _know_ you need to solve the concerns a graph database can handle, then everything we've seen today still applies to the analysis steps.

The graph db vendors don't often spend much time on network analysis ideas beyond some of the basics. Most of what you'll find from them involves using the query language, or converting their particular graph representation into others like `networkx` or `igraph`.


### Getting Practice

• Network analysis koans...practicing simple techniques that allow one to easily use these methods improvisationally to explore new data sets.

• Continuing to learn: Pick up Gabor's book and work through the exercises.

### Docs

Cytoscape Manual: http://manual.cytoscape.org/en/stable/

Cytoscape JS Tutorials: http://blog.js.cytoscape.org/2016/05/24/getting-started/


### Books

* Design Rules for modularity

* TAOUP

* Software Tools

* Visual Complexity

* Book of Trees

* Dependency Structure Matrix methods

* SANDr is the best introduction to network analysis I've found. Read the chapters on describing networks first. We might be able to model processes on networks, but that will take some time.

* Barabasi for overview of Network Science

### Community

Cytoscape has a thriving app community. 

http://www.slideshare.net/keiono/introduction-to-biological-network-analysis-and-visualization-with-cytoscape-part1

cytoscape-discuss@googlegroups.com

cytoscape-help@googlegroups.com

[SOCNET](https://insna.org/socnet.html)

Where should we continue the conversation about software architecture networks?

I think we need a Google Group. Name?

In the meantime, you can find me @bobbynorton and bobby@testedminds.com.