# 3. Graph Analysis and Visualization

# Contents

* Network datasets
* The graph visualization
    - Installing the GraphStream and BreezeViz libraries
    - Visualizing the graph data
    - Plotting the degree distribution
* The analysis of network connectedness
    - Finding the connected components
    - Counting triangles and computing clustering coefficients
* The network centrality and PageRank
    - How PageRank works
    - Ranking web pages
* Scala Build Tool revisited
    - Organizing build definitions
    - Managing library dependencies
        - A preview of the steps
        - Running tasks with SBT commands
* Summary

After doing the activities in this chapter, you will have learned the tools and concepts to:
* Visualize large-scale graph data
* Compute the connected components of a network
* Use the PageRank algorithm to rank the node importance in networks
* Build Spark applications that use third-party libraries using SBT

# Network datasets

We will be using the same datasets introduced in Chapter 2, Building and Exploring Graphs, including the social ego network, email graph, and food-compound network.

# The graph visualization
* Installing the GraphStream and BreezeViz libraries
* Visualizing the graph data
* Plotting the degree distribution

## Installing the GraphStream and BreezeViz libraries

In [None]:
GraphStream

https://oss.sonatype.org/content/repositories/releases/org/graphstream/gs-core/1.2/

https://oss.sonatype.org/content/repositories/releases/org/graphstream/gs-ui/1.2/

In [None]:
BreezeViz

http://repo.spring.io/libs-release-remote/org/scalanlp/breeze_2.10/0.9/

http://repo1.maven.org/maven2/org/scalanlp/breeze-viz_2.10/0.9/

In [None]:
JfreeChart

https://repository.jboss.org/nexus/content/repositories/thirdparty-releases/jfree/jcommon/1.0.16/

http://repo1.maven.org/maven2/jfree/jfreechart/1.0.13/

## Visualizing the graph data

Open the terminal, with the current directory set to $SPARKHOME. Launch the Spark shell. This time you will need to specify the third-party JAR files with the --jars option:

In [None]:
$ ./bin/spark-shell --jars \
lib/breeze-viz_2.10-0.9.jar,\
lib/breeze_2.10-0.9.jar,\
lib/gs-core-1.2.jar,\
lib/gs-ui-1.2.jar,\
lib/jcommon-1.0.16.jar,\
lib/jfreechart-1.0.13.jar

In [None]:
$./bin/spark-shell  --jars \
$(find "." -name '*.jar' | xargs echo | tr ' ' ',')

As a first example, we will visualize the social ego network that we have seen in the previous chapter.

First, we need to import the GraphStream classes with the following:

In [None]:
import org.graphstream.graph.{Graph => GraphStream}

In [None]:
import org.graphstream.graph.implementations._

In [None]:
// Create a SingleGraph class for GraphStream visualization
val graph: SingleGraph = new SingleGraph("EgoSocial")

So, let's create a file named stylesheet and put it in a new ./style/ folder. Insert the following lines in the style sheet:

In [None]:
node {
       fill-color: #a1d99b;
       size: 20px;
       text-size: 12;
       text-alignment: at-right;
       text-padding: 2;
       text-background-color: #fff7bc;
}
edge {
       shape: cubic-curve;
       fill-color: #dd1c77;
       z-index: 0;
       text-background-mode: rounded-box;
       text-background-color: #fff7bc;
       text-alignment: above;
       text-padding: 2;
￼}

With the style sheet now ready, we can connect it to the SingleGraph object graph:

In [None]:
// Set up the visual attributes for graph visualization
graph.addAttribute("ui.stylesheet","url(file:.//style/stylesheet)")
graph.addAttribute("ui.quality")
graph.addAttribute("ui.antialias")

Next, we have to reload the graph that we built in the previous chapter. To avoid repetitions, we omit the graph building part. After this, we now load VertexRDD and EdgeRDD of the social network into the GraphStream graph object, with the following code:

In [None]:
// Given the egoNetwork, load the graphX vertices into GraphStream
for ((id,_) <- egoNetwork.vertices.collect()) {
   val node = graph.addNode(id.toString).asInstanceOf[SingleNode]
}
// Load the graphX edges into GraphStream edges
for (Edge(x,y,_) <- egoNetwork.edges.collect()) {
   val edge = graph.addEdge(x.toString ++ y.toString,
                            x.toString, y.toString,
                            true).asInstanceOf[AbstractEdge]
   }

Now what? The only thing to do here is to make the social ego network display it. Just call the display method on graph:

In [None]:
graph.display()

<img src="figures/cap3.1.png" width=600 />

## Plotting the degree distribution

# The analysis of network connectedness
* Finding the connected components
* Counting triangles and computing clustering coefficients

## Finding the connected components

## Counting triangles and computing clustering coefficients

# The network centrality and PageRank
* How PageRank works
* Ranking web pages

## How PageRank works

## Ranking web pages

# Scala Build Tool revisited
* Organizing build definitions
* Managing library dependencies
    - A preview of the steps
    - Running tasks with SBT commands

## Organizing build definitions

## Managing library dependencies

### A preview of the steps

### Running tasks with SBT commands

# Summary