# <center>Exploring GraphX</center>
## <center>Introduction to Graph-Parallel</center>
### <center>July 29, 2016</center>

<img src = "http://spark.apache.org/docs/latest/img/graphx_logo.png", width = 600, align = 'centre'>

## Welcome to the third lab in the course, Exploring GraphX.

### GraphX is Apache Spark's API for graph and graph-parallel computations.

In this lab exercise, you will take a look at how to modify an existing graph. You will take a look at how Property and Structural Operators work. Then we use a combination of a few operators that you've learned about visualization and modification of a graph and put them into action!

### Some Notebook Commands
#### In case you haven't dealt with a Jupyter Notebook before, here are some quick, useful commands that may be handy to get started.
<ul>
    <li>Run a cell: CTRL + ENTER</li>
    <li>Create a cell above a cell: a</li>
    <li>Create a cell below a cell: b</li>
    <li>Change a cell to Markdown: m</li>
    
    <li>Change a cell to code: y</li>
</ul>

<b> If you are interested in more keyboard shortcuts, go to Help -> Keyboard Shortcuts </b>

So last exercise we looked at visualizing the graph to the best of our ability given GraphX. So we dealt with a few different useful classes, now let's try modifying our 'facebook'.

Once again we will start importing our usual librarys from before:

- org.apache.spark._ 
- org.apache.spark.graphx._
- org.apache.spark.rdd.RDD 

In [2]:
import org.apache.spark._
import org.apache.spark.graphx._
import org.apache.spark.rdd.RDD

Highlight over the box below for the answer
<table width="100%" cellspacing="0" cellpadding="0" border="0" align="center" bgcolor="#ff6600">
<td> <font color = "white">import org.apache.spark.&#95;<br>
import org.apache.spark.graphx.&#95;<br>
import org.apache.spark.rdd.RDD</font>
</td>
</table>

Now let's create the vertices of our 'facebook' graph, which included the following People:

- Billy Bill -> VertexId = 1
- Jacob Johnson -> VertexId = 2
- Andrew Smith -> VertexId = 3

and 2 Pages:

- Iron Man Fan Page -> VertexId = 4
- Captain America Fan Page -> VertexId = 5

Make this in one step again and store it as vertexRDD.

In [3]:
val vertexRDD: RDD[(Long, (String, String))] = sc.parallelize(Array((1L, ("Billy Bill", "Person")), (2L, ("Jacob Johnson", "Person")), (3L, ("Andrew Smith", "Person")), (4L, ("Iron Man Fan Page", "Page")), (5L, ("Captain America Fan Page", "Page"))))

Highlight over the box below for the answer
<table width="100%" cellspacing="0" cellpadding="0" border="0" align="center" bgcolor="#ff6600">
<td> <font color = "white">val vertexRDD: RDD[(Long, (String, String))] = sc.parallelize(Array((1L, ("Billy Bill", "Person")), (2L, ("Jacob Johnson", "Person")), (3L, ("Andrew Smith", "Person")), (4L, ("Iron Man Fan Page", "Page")), (5L, ("Captain America Fan Page", "Page"))))</font>
</td>
</table>

Now let's create the relationships of our 'facebook' graph again:

- Billy is Friends with Jacob
- Billy is Friends with Andrew
- Jacob is a Follower of the Iron Man Fan Page
- Jacob is a Follower of the the Captain America Fan Page
- Andrew is a Follower of the the Captain America Fan Page

Make this in one step again and store it as edgeRDD.

In [4]:
val edgeRDD: RDD[Edge[String]] = sc.parallelize(Array(Edge(1L, 2L, "Friends"), Edge(1L, 3L, "Friends"), Edge(2L, 4L, "Follower"), Edge(2L, 5L, "Follower"), Edge(3L, 5L, "Follower")))

Highlight over the box below for the answer
<table width="100%" cellspacing="0" cellpadding="0" border="0" align="center" bgcolor="#ff6600">
<td> <font color = "white">val edgeRDD: RDD[Edge[String]] = sc.parallelize(Array(Edge(1L, 2L, "Friends"), Edge(1L, 3L, "Friends"), Edge(2L, 4L, "Follower"), Edge(2L, 5L, "Follower"), Edge(3L, 5L, "Follower")))</font>
</td>
</table>

Then we will go ahead and create our default "fallback" vertex called defaultvertex which a tuple with "Self" and "Missing".

In [5]:
var defaultvertex = ("Self", "Missing")

Highlight over the box below for the answer
<table width="100%" cellspacing="0" cellpadding="0" border="0" align="center" bgcolor="#ff6600">
<td> <font color = "white">var defaultvertex = ("Self", "Missing")</font>
</td>
</table>

Alright! Once again let's create our graph called facebook.

In [6]:
var facebook = Graph(vertexRDD, edgeRDD, defaultvertex)

Highlight over the box below for the answer
<table width="100%" cellspacing="0" cellpadding="0" border="0" align="center" bgcolor="#ff6600">
<td> <font color = "white">var facebook = Graph(vertexRDD, edgeRDD, defaultvertex)</font>
</td>
</table>

Perfect! Here's a reminder of the visualized Graph:

<img src = "http://i.imgur.com/rhkiopM.png">

Alright, so we have the graph we made before, hopefully by now you feel comfortable dealing with this Graph :). Now we will start looking at a few Property Operators.

So the first funciton we will be look at is the mapVertices function of the Graph. So one good question is that since the Graph extends RDDs which is immutable, how does it apply the modification? 

It makes a new Graph. But it's not as simple as just creating a new Graph. GraphX is smart about it's modifications, these operators allow GraphX to reuse parts of the Graph that is unaffected by the modifcation. So if you change one small thing about the graph, it won't go through the same computational time as creating a new one from scratch.

So with the mapVertices function, it is similar to how you used a filter function previously. Here's an example.

In [7]:
val facebook_temp = facebook.mapVertices((id, user_type) => if (id == 1) ("Billy D. Bill", "Person") else user_type)

In [8]:
for (vertex <- facebook_temp.vertices.collect) {
    println(vertex)
}

                                                                                (1,(Billy D. Bill,Person))
(2,(Jacob Johnson,Person))
(3,(Andrew Smith,Person))
(4,(Iron Man Fan Page,Page))
(5,(Captain America Fan Page,Page))


Alright so in this example, we used mapVertices and mapped the vertice's attributes to id and user_type (like we did before!). Then since this map function runs through each vertex in the graph, we specified a condition so that nothing other than the vertex we want to modify changes. In this case, Billy Bill wanted to change his name on our 'facebook' so we made the update on the graph using mapVertices. Then we saved it to another variable called facebook_temp to save the changes.

Now let's try that again, but it's your turn! Modify the original graph (facebook) and return the graph into a variable called facebook_temp2. We will now change the "Captain America Fan Page" to "Captain America Fan Page is the Best!".

Hint: Try to use the ID to idenify the page!

In [9]:
val facebook_temp2 = facebook.mapVertices((id, user_type) => if (id == 5) ("Captain America Fan Page is the Best!", "Page") else user_type)

Print to confirm it!!

In [10]:
for (vertex <- facebook_temp2.vertices.collect) {
    println(vertex)
}

(1,(Billy Bill,Person))
(2,(Jacob Johnson,Person))
(3,(Andrew Smith,Person))
(4,(Iron Man Fan Page,Page))
(5,(Captain America Fan Page is the Best!,Page))


Highlight over the box below for the answer
<table width="100%" cellspacing="0" cellpadding="0" border="0" align="center" bgcolor="#ff6600">
<td> <font color = "white">val facebook_temp2 = facebook.mapVertices((id, user_type) => if (id == 5) ("Captain America Fan Page is the Best!", "Page") else user_type)</font>
</td>
</table>

Awesome! Look like you're getting the hang of things! Now let's move onto a similar concept called mapEdges. As you may be able to tell, we will be modifying Edge values and attributes!

Now we will create use facebook_temp as the variable of our new graph from our original graph facebook. So let's try modifying the relationship of Andrew to a "Supreme Follower" of the Iron Man Fan Page. (Remember, it's the original graph, so any changes we made before isn't on this graph.)

Hint: After finding the if statement, whatever value you assign will be assigned to the .attr of the edge. (the scrId is helpful)

In [11]:
val facebook_temp = facebook.mapEdges((edge) => if (edge.srcId == 3) "Supreme Follower" else edge.attr)

Highlight over the box below for the answer
<table width="100%" cellspacing="0" cellpadding="0" border="0" align="center" bgcolor="#ff6600">
<td> <font color = "white">val facebook_temp = facebook.mapEdges((edge) => if (edge.srcId == 3) "Supreme Follower" else edge.attr)</font>
</td>
</table>

Print to confirm!

In [12]:
for (edges <- facebook_temp.edges.collect) {
    println(edges)
}

Edge(1,2,Friends)
Edge(1,3,Friends)
Edge(2,4,Follower)
Edge(2,5,Follower)
Edge(3,5,Supreme Follower)


Now let's try something a little different. We will change the pural "Friends" to "Friend" of every edge relation. Save it as facebook_temp2. Then print and confirm the results!

In [14]:
val facebook_temp2 = facebook.mapEdges((edge) => if (edge.attr == "Friends") "Friend" else edge.attr)
for (edges <- facebook_temp2.edges.collect) {
    println(edges)
}

Edge(1,2,Friend)
Edge(1,3,Friend)
Edge(2,4,Follower)
Edge(2,5,Follower)
Edge(3,5,Follower)


Highlight over the box below for the answer
<table width="100%" cellspacing="0" cellpadding="0" border="0" align="center" bgcolor="#ff6600">
<td> <font color = "white">val facebook_temp2 = facebook.mapEdges((edge) => if (edge.attr == "Friends") "Friend" else edge.attr)<br>
for (edges <- facebook_temp2.edges.collect) {<br>
    println(edges)<br>
}</font>
</td>
</table>

Awesome! Now, we will want to keep this graph since it is grammatically correct, so once you have the correct graph, we will save it as the new facebook. 

In [15]:
val facebook = facebook_temp2

Now we will look at the mapTriplets class! Similar as the last two classes (mapVertices, and mapEdges), this function acts in a similar manner, however it deals with Triplets instead of Vertices and Edges. With this mapTriplets function, the map function is applied to each Triplet class.

So similar to the mapEdges function, this class will modify the Edge attribute .attr (but it will be on the triplet this time. One key difference is that by using the mapTriplets function, you will have access to all the attributes of the Triplet class. Therefore, you have more options to where you can specify the conditions.

Now lets try making a new variable called facebook_temp3 which will a modified version of facebook by using the mapTriplets function to change the relationship of Billy Bill and Jacob Johnson to "Best-Friend". Now, it may be easy to specify one condition, so let's pretend Jacob has more friends, so we want to make sure that you specify Billy Bill and Jacob Johnson in this condition.

Also - we will you cannot use vertex ids as an identifier to make you use the triplet class.

In [16]:
val facebook_temp3 = facebook_temp2.mapTriplets((triplet) => if (triplet.srcAttr._1 == "Billy Bill" && triplet.dstAttr._1 == "Jacob Johnson") "Best-Friend" else triplet.attr)

Highlight over the box below for the answer
<table width="100%" cellspacing="0" cellpadding="0" border="0" align="center" bgcolor="#ff6600">
<td> <font color = "white">val facebook_temp3 = facebook_temp2.mapTriplets((triplet) => if (triplet.srcAttr._1 == "Billy Bill" && triplet.dstAttr._1 == "Jacob Johnson") "Best-Friend" else triplet.attr)</font>
</td>
</table>

Use the following print function to confirm your results

In [17]:
for (triplet <- facebook_temp3.triplets.collect) {
    print(triplet.srcAttr._1)
    print(" is a ")
    print(triplet.attr)
    print(" of ")
    println(triplet.dstAttr._1)
}

Billy Bill is a Best-Friend of Jacob Johnson
Billy Bill is a Friend of Andrew Smith
Jacob Johnson is a Follower of Iron Man Fan Page
Jacob Johnson is a Follower of Captain America Fan Page
Andrew Smith is a Follower of Captain America Fan Page


So as you might have noticed, by using the following format of for each of the map functions, you get the following changes:

mapVertices -> Vertice Attribute <br>
mapEdges -> Edge Attribute .attr <br>
mapTriplets -> Edge Attribute .attr <br>

Since these are property operators, these functions will only modify the property values of each graph. You might have been wondering what if I wanted to change other values such as srcAttr, dstAttr, or even ID. These a good inquiries however, modifying these values would result in a change in the structure of the graph. These functions are meant for modifying property values. We'll take a look a how to handle that later on.

Now let's take a look a few Structural Operators. The first operator we will look at is the reverse function. The reverse function will take the current graph and reverse all edges in the graph. For example, if we take a look at our original facebook, It shows that Billy Bill is a Friend of Jacob Johnson. After running the reverse function, the relationship should show that Jacob Johnson is a Friend of Billy Bill. Why is this important? Well, for special use cases and more often if we wanted to find the inverse PageRank of a graph.

Using the new facebook - facebook_temp, print out a view of the graph and compute facebook's inverse PageRank with a value of 0.l.

In [None]:
val facebook_temp = facebook.reverse

In [18]:
for (triplet <- facebook_temp.triplets.collect) {
    print(triplet.srcAttr._1)
    print(" is a ")
    print(triplet.attr)
    print(" of ")
    println(triplet.dstAttr._1)
}
println("Page Rank values --------------")
for (rankee <- facebook_temp.pageRank(0.1).vertices.collect) {
    println(rankee)
}

Billy Bill is a Friends of Jacob Johnson
Billy Bill is a Friends of Andrew Smith
Jacob Johnson is a Follower of Iron Man Fan Page
Jacob Johnson is a Follower of Captain America Fan Page
Andrew Smith is a Supreme Follower of Captain America Fan Page
Page Rank values --------------
(1,0.15)
(2,0.21375)
(3,0.21375)
(4,0.21375)
(5,0.34124999999999994)



Highlight over the box below for the answer
<table width="100%" cellspacing="0" cellpadding="0" border="0" align="center" bgcolor="#ff6600">
<td> <font color = "white">for (triplet <- facebook_temp.triplets.collect) {<br>
print(triplet.srcAttr._1)<br>
print(" is a ")<br>
print(triplet.attr)<br>
print(" of ")<br>
println(triplet.dstAttr._1)<br>
}<br>
println("Page Rank values --------------")<br>
for (rankee <- facebook_temp.pageRank(0.1).vertices.collect) {<br>
    println(rankee)<br>
}</font>
</td>
</table>

Awesome! Interesting function right? Now let's look at the next one: SubGraph.

SubGraph is an Operator that takes in a edge and vertex predicate and returns a graph that only contains vertices that satisify the vertex predicate, edges that satisfy the edge predicate, and connect the vertices that satisfy the vertex predicate. This returned graph will be a "Subgraph" of our original graph, contains only the edges and vertices that pass through our predicate filters.

When calling the subgraph function, you define vpred (vertex predicate) and epred (edgetriplet predicate) in the parameter field. If you do not define a predicate, it will default to true.

When defining the predicate, we can format it similarly to how we defined cases (variables to the general vertex and/or edgetriplet). This will make defining a boolean case very simple!

Now let's use our original facebook variable, and try to create a subgraph called facebook_subgraph that will contain only People aka - no Pages!

In [19]:
val facebook_subgraph = facebook.subgraph(vpred = (id, user_type) => user_type._2 == "Person")

Highlight over the box below for the answer
<table width="100%" cellspacing="0" cellpadding="0" border="0" align="center" bgcolor="#ff6600">
<td> <font color = "white">val facebook_subgraph = facebook.subgraph(vpred = (id, user_type) => user_type._2 == "Person")</font>
</td>
</table>

Use the following to confirm your results!

In [20]:
for (triplet <- facebook_subgraph.triplets.collect) {
    print(triplet.srcAttr._1)
    print(" is a ")
    print(triplet.attr)
    print(" of ")
    println(triplet.dstAttr._1)
}

Billy Bill is a Friend of Jacob Johnson
Billy Bill is a Friend of Andrew Smith


Awesome! Now note how we only have "People" in our graph. And any relationship (or edge) that contained a "Page" (or not a "Person") it was not included. This is important to note since it might not be clear if an edge would remain on the new graph.

Now let's try to using the subgraph function again, but only specify relationships (or edges) that are "Followers". This time, use the subgraph function, and confirm your results without defining a variable!

In [21]:
for (triplet <- facebook.subgraph(epred = (edgetriplet) => edgetriplet.attr == "Follower").triplets.collect) {
    print(triplet.srcAttr._1)
    print(" is a ")
    print(triplet.attr)
    print(" of ")
    println(triplet.dstAttr._1)
}

Jacob Johnson is a Follower of Iron Man Fan Page
Jacob Johnson is a Follower of Captain America Fan Page
Andrew Smith is a Follower of Captain America Fan Page


Highlight over the box below for the answer
<table width="100%" cellspacing="0" cellpadding="0" border="0" align="center" bgcolor="#ff6600">
<td> <font color = "white">for (triplet <- facebook.subgraph(epred = (edgetriplet) => edgetriplet.attr == "Follower").triplets.collect) {<br>
    print(triplet.srcAttr._1)<br>
    print(" is a ")<br>
    print(triplet.attr)<br>
    print(" of ")<br>
    println(triplet.dstAttr._1)<br>
}</font>
</td>
</table>

Awesome! The subgraph function can be very useful! You can use it to eliminate any unwanted relationships in your graph and it can be very helpful in concentrating your information. You can also range how specific your search is by using vpred, epred, or both.

The next Operator we will be looking at is the mask function. The mask function takes in a graph and return a subgraph that contains the vertices and edges that are also found in the input graph. This is a peculiar function, which is used in special situations such as trying to find a subgraph that is related or not related to a certain graph. So we'll just touch a bit into this operator.

A quick example of this is by taking our facebook graph and running the mask function with facebook_subgraph as a parameter. What do you think will happen? Print and confirm the results without saving a variable for it!

In [22]:
for (triplet <- facebook.mask(facebook_subgraph).triplets.collect) {
    print(triplet.srcAttr._1)
    print(" is a ")
    print(triplet.attr)
    print(" of ")
    println(triplet.dstAttr._1)
}

Billy Bill is a Friend of Jacob Johnson
Billy Bill is a Friend of Andrew Smith


Highlight over the box below for the answer
<table width="100%" cellspacing="0" cellpadding="0" border="0" align="center" bgcolor="#ff6600">
<td> <font color = "white">for (triplet <- facebook.mask(facebook_subgraph).triplets.collect) {<br>
print(triplet.srcAttr._1)<br>
print(" is a ")<br>
print(triplet.attr)<br>
print(" of ")<br>
println(triplet.dstAttr._1)<br>
}</font>
</td>
</table>

It's the exact same as facebook_subgraph! That's to be expected since we derived facebook_subgraph from facebook. With the current graphs we have modified so far, we won't be expecting any surprising results since they were also derived from facebook with any major changes. 

So for now, just note that the mask function is to be used in situations where you have two premade graphs you want to compare with, if you only have one, the subgraph function is the way to go! But don't worry, we'll visit the mask function again once we make another graph!

The next Structual Operator we will be looking at is the groupEdges function. This function is particularly useful for graph that have repeating or similar relationships (edges) between vertices. It will merge multiple edges that are similar together!

For example, given the following graph:

In [23]:
val vertexRDD2: RDD[(Long, (String, String))] = sc.parallelize(Array((1L, ("Billy Bill", "Person")), (2L, ("Jacob Johnson", "Person")), (3L, ("Andrew Smith", "Person"))))
val edgeRDD2: RDD[Edge[String]] = sc.parallelize(Array(Edge(1L, 2L, "Friends"), Edge(1L, 3L, "Friends"), Edge(1L, 3L, "Friends")))
var simple_facebook = Graph(vertexRDD2, edgeRDD2, defaultvertex)

for (triplet <- simple_facebook.triplets.collect) {
    print(triplet.srcAttr._1)
    print(" is ")
    print(triplet.attr)
    print(" with ")
    println(triplet.dstAttr._1)
}

Billy Bill is Friends with Jacob Johnson
Billy Bill is Friends with Andrew Smith
Billy Bill is Friends with Andrew Smith


So in this simple_facebook it is made up of just our three friends we seen earlier:

-Billy Bill<br>
-Jacob Johnson<br>
-Andrew Smith<br>

However, there is an addition relationship present. It is the same relationship between Billy Bill and Andrew Smith. Now, in our new graph, this is a repetitive relationship that we want to avoid in our graph since it takes additional space and is not needed.

Now before we try running the groupEdges function on simple_facebook, we will need to use the partitionBy function on simple_facebook with the parameter PartitionStrategy.EdgePartition1D. This will partition the graph so that it can be represented in a distributed scheme. This is needed before groupEdges can run to produce the correct result.

Let's run the partitionBy function with the above details and save it as simple_facebook_partitioned.

In [24]:
val simple_facebook_partitioned = simple_facebook.partitionBy(PartitionStrategy.EdgePartition1D)

Highlight over the box below for the answer
<table width="100%" cellspacing="0" cellpadding="0" border="0" align="center" bgcolor="#ff6600">
<td> <font color = "white">val simple_facebook_partitioned = simple_facebook.partitionBy(PartitionStrategy.EdgePartition1D)</font>
</td>
</table>

Alright! Just for addition info, there are more PartitionStrategies:

- CanonicalRandomVertexCut
- EdgePartition1D
- EdgePartition2D
- RandomVertexCut

We used Edge Partition1D because it is simple. For more information please check out http://spark.apache.org/docs/latest/api/scala/index.html#org.apache.spark.graphx.PartitionStrategy

Now let's create a new variable called new_simple_facebook. We will use the groupEdges function on simple_facebook_partitioned. Now, groupEdges takes in a parameter called merge. It is similar to what we looked at before, but not exactly.

Similar to before, we have to define two variables in a tuple equal to merge. These two variables are similar to when you defined variables when you used "case". The two variables (let's called edge1 and edge2) represents the edge's attributes that are both connected to the same vertices (same direction). In this case, we will use the => and pick only one of the variables (lets say edge1).

That may be a lot to take in. Let's quickly review what you need to do. Create a new variable called new_simple_facebook that is the groupEdges function ran on simple_facebook_partitioned with the parameter merge equal to edge1 and edge2 (respresenting the edge's attribute) and only selecting edge1.

In [26]:
val new_simple_facebook = simple_facebook_partitioned.groupEdges(merge = (edge1, edge2) => edge1)

Highlight over the box below for the answer
<table width="100%" cellspacing="0" cellpadding="0" border="0" align="center" bgcolor="#ff6600">
<td> <font color = "white">val new_simple_facebook = simple_facebook_partitioned.groupEdges(merge = (edge1, edge2) => edge1)</font>
</td>
</table>

Now create a view of new_simple_facebook and see what you obsserve.

In [27]:
for (triplet <- new_simple_facebook.triplets.collect) {
    print(triplet.srcAttr._1)
    print(" is ")
    print(triplet.attr)
    print(" with ")
    println(triplet.dstAttr._1)
}

Billy Bill is Friends with Jacob Johnson
Billy Bill is Friends with Andrew Smith


Highlight over the box below for the answer
<table width="100%" cellspacing="0" cellpadding="0" border="0" align="center" bgcolor="#ff6600">
<td> <font color = "white">for (triplet <- new_simple_facebook.triplets.collect) {<br>
print(triplet.srcAttr._1)<br>
print(" is ")<br>
print(triplet.attr)<br>
print(" with ")<br>
println(triplet.dstAttr._1)<br>
}</font>
</td>
</table>

So we only have two edges left! It eliminated the extra relationship in our graph. So groupEdges can be important to eliminate repetitive relationships like the one in our graph. It can also be used for other purposes, such as if the relationship had a numeric nature, then we can define it as commutative associative function such as sum or divide. GroupEdges is very important, but make sure you run partitionBy beforehand to obtain the correct results!

Alright! So we have looked at a few new operators. Let's create a new graph and modify with some new skills that you have learned!

The new graph will have the following Vertices: (Person)

- Billy Bill -> VertexId = 1
- Jacob Johnson -> VertexId = 2
- Stan Smith -> VertexId = 3
- Homer Simpson -> VertexId = 4
- Clark Kent -> VertexId = 5
- James Smith -> VertexId = 6

and the following Edges:

- Jacob Johnson is Friends with Billy Bill
- Jacob Johnson is Friends with Clark Kent
- Stan Smith is Friends with Billy Bill
- Stan Smith is Friends with Homer Simpson
- Stan Smith is Friends with Clark Kent
- Stan Smith is Friends with James Smith
- Homer Simpson is Friends with James Smith
- Clark Kent is Friends with Billy Bill
- Clark Kent is Friends with Billy Bill
- Clark Kent is Friends with James Smith
- James Smith is Friends with Bruce Lee (VertexId = 7)

and with the same default vertex of "Self", "Missing".

Note: We will be using the same variables as our first graph, however if there are any changes, please add a number 2 so the original variable is not overwritten. For example: vertexRDD would become vertexRDD2.

In [29]:
val vertexRDD2: RDD[(Long, (String, String))] = sc.parallelize(Array((1L, ("Billy Bill", "Person")), (2L, ("Jacob Johnson", "Person")), (3L, ("Stan Smith", "Person")), (4L, ("Homer Simpson", "Person")), (5L, ("Clark Kent", "Person")), (6L, ("James Smith", "Person"))))
val edgeRDD2: RDD[Edge[String]] = sc.parallelize(Array(Edge(2L, 1L, "Friends"), Edge(2L, 5L, "Friends"), Edge(3L, 1L, "Friends"), Edge(3L, 4L, "Friends"), Edge(3L, 5L, "Friends"), Edge(3L, 6L, "Friends"), Edge(4L, 6L, "Friends"), Edge(5L, 1L, "Friends"), Edge(5L, 1L, "Friends"), Edge(5L, 6L, "Friends"), Edge(6L, 7L, "Friends")))
var defaultvertex = ("Self", "Missing")
var facebook2 = Graph(vertexRDD2, edgeRDD2, defaultvertex)

Highlight over the box below for the answer
<table width="100%" cellspacing="0" cellpadding="0" border="0" align="center" bgcolor="#ff6600">
<td> <font color = "white">val vertexRDD2: RDD[(Long, (String, String))] = sc.parallelize(Array((1L, ("Billy Bill", "Person")), (2L, ("Jacob Johnson", "Person")), (3L, ("Stan Smith", "Person")), (4L, ("Homer Simpson", "Person")), (5L, ("Clark Kent", "Person")), (6L, ("James Smith", "Person"))))<br>
val edgeRDD2: RDD[Edge[String]] = sc.parallelize(Array(Edge(2L, 1L, "Friends"), Edge(2L, 5L, "Friends"), Edge(3L, 1L, "Friends"), Edge(3L, 4L, "Friends"), Edge(3L, 5L, "Friends"), Edge(3L, 6L, "Friends"), Edge(4L, 6L, "Friends"), Edge(5L, 1L, "Friends"), Edge(5L, 1L, "Friends"), Edge(5L, 6L, "Friends"), Edge(6L, 7L, "Friends")))<br>
var defaultvertex = ("Self", "Missing")<br>
var facebook2 = Graph(vertexRDD2, edgeRDD2, defaultvertex)</font>
</td>
</table>

Print to confirm!

In [30]:
for (triplet <- facebook2.triplets.collect) {
    print(triplet.srcAttr._1)
    print(" is ")
    print(triplet.attr)
    print(" with ")
    println(triplet.dstAttr._1)
}

                                                                                Jacob Johnson is Friends with Billy Bill
Jacob Johnson is Friends with Clark Kent
Stan Smith is Friends with Billy Bill
Stan Smith is Friends with Homer Simpson
Stan Smith is Friends with Clark Kent
Stan Smith is Friends with James Smith
Homer Simpson is Friends with James Smith
Clark Kent is Friends with Billy Bill
Clark Kent is Friends with Billy Bill
Clark Kent is Friends with James Smith
James Smith is Friends with Self


Now I'm sure you have noticed some flaws in our new graph. We will fix this next lab as a quick review of the Operators we have learned here! Just make sure all the relationships are present and try to figure out how we can fix this graph with the Operators you have learned!

(Note: James Smith will be Friends with Self instead of Bruce Lee).