## Getting to Know iGraph

**`igraph`** is a sophisticated library for drawing networks, and social networks especially. 
Here is an example of using igraph to visualize question elicitation in an online discussion forum. 

In [None]:
## load libraries. igraph and dplyr
require('igraph')
require('dplyr')

## Read the data

In this file, we have removed all the nodes from our prior example which are not question elicitation. 
We then calculate a synthetic variable we call "socratic rank", after Socrates, whose method of instruction was question focused.

To obtain this list of question focused interactions, we used computational linguistic strategies. 
Computational linguistics are covered in other parts of the Data Science curriculum. 

In [None]:
mydata = read.csv("/dsa/data/all_datasets/netdata/SocraticRank/mdl_forum_posts_scrubbed_snadata_question_info.csv")
mydata$isQuestion = mydata$message_q_mark == 1;

graphData = mydata[,c("userid_from", "userid_to", "isQuestion")]
colnames(graphData) = c("from", "to", "question")

# filter the data to only include replies that have questions
graphData = graphData[graphData$question, ]

### Generate Graphs


In [None]:
# find all the unique interaction counts
# you can learn more about how these functions work by
# looking into dplyr 
# https://cran.rstudio.com/web/packages/dplyr/vignettes/introduction.html 

graphData = graphData %>% group_by(from, to) %>% summarise(count=n())

# remove all replies to oneself
graphData[graphData$from == graphData$to,]$count = 0
graphData = graphData[graphData$count != 0,]

# building igraph 
mygraph = graph.data.frame(graphData)

# use page rank, which is the original 
# algorithm that google used to beat the 
# results of other search engines
# This says, basically, that the more 
# connnected you are, the higher your rank
pranks = page.rank(mygraph, weights=graphData$count)$vector

# People in different subnetworks or courses
# are likely to have radically different numbers of 
# edges, deriving in part from the size of the 
# course. 


# Therefore, we normalize page rank as z-score
V(mygraph)$rank = (pranks - mean(pranks)) / sd(pranks) 

# get a subgraph of the top thought leaders
highRankGraph = induced.subgraph(mygraph, which(V(mygraph)$rank > 5))

# plot the graph
layout = layout.circle(highRankGraph)
plot(highRankGraph, 
     main="Top Question Elictors on Moodle (z-scores of SocraticRank > 5)", 
     sub="Node Size = SocraticRank (PageRank); Edge Value = # of Reply Posts", 
     edge.label = E(highRankGraph)$count, 
     edge.arrow.size = .25, 
     edge.label.cex = .5, 
     vertex.size = 5 * (V(highRankGraph)$rank / max(V(highRankGraph)$rank)), 
     vertex.label = V(highRankGraph)$name, 
     vertex.label.cex = .5, 
     vertex.label.dist = .25, 
     layout=layout)

## Interpreting The Resulting Graph
 - Can you easily identify the individuals with the most socratic rank?
 - How does the graph compare to other possible layouts? [Layout List](http://igraph.org/r/doc/#L)
     - what does a fruchterman.reingold layout show you that the circle does not?

In [None]:
layout = layout.fruchterman.reingold(highRankGraph)
plot(highRankGraph, 
     main="Top Question Elictors on Moodle (z-scores of SocraticRank > 5)", 
     sub="Node Size = SocraticRank (PageRank); Edge Value = # of Reply Posts", 
     edge.label = E(highRankGraph)$count, 
     edge.arrow.size = .25, 
     edge.label.cex = .5, 
     vertex.size = 5 * (V(highRankGraph)$rank / max(V(highRankGraph)$rank)), 
     vertex.label = V(highRankGraph)$name, 
     vertex.label.cex = .5, 
     vertex.label.dist = .25, 
     layout=layout)

## <span style="background:yellow">Your Turn</span>

Create your own graph from the socratic rank data using an alternative layout.  
Experiment with a couple, then leave your favorite in place for submission.

In [None]:
# Add your code below  this comment
# ----------------------------------








# SAVE YOUR NOTEBOOK, then File > Close and Halt