Latest commit f5b7a01 May 30, 2013 Aron Lindberg finished
Failed to load latest commit information.
case_studies finished May 30, 2013
.Rapp.history stuff Apr 1, 2013
.gitignore First commit Apr 18, 2012
1.7_Krack_reports_color.pdf stuff Apr 1, 2013
1.8_Krack_overlayed_ties.pdf stuff Apr 1, 2013
key_actor_analysis.pdf added resource May 22, 2012
krack_full.txt stuff Apr 1, 2013
outfile.txt stuff Apr 1, 2013
r_keyactorcentrality.csv Updates Apr 18, 2012

Social Network Analysis in R

The purpose of this repository is to learn how to run SNA in R during the summer of 2012.


Here's a great example of SNA on GitHub using R:

Characteristics of the Network Data at GitHub

The network data consists of the following edges:

  • Coders following other coders
  • Coders watching repositories
  • Repositories being forked off of other repositories
  • Coders connected to repositories through commits, comments etc.
  • Coders connected to coders through committing or commenting on the same repository, branch, or the same file/commit/issue etc.
  • Coders who are on the same team/organization

Hence, there are both directed and undirected graphs, as well as multidimensional since there are both human (coders) and non-human (repositories and their constituent parts) vertices.

N.B. The graph package cannot mix directed and undirected graphs in the same model. Can we work around this within the package or do we need a different package?

These are whole networks, not ego networks, meaning that they are not centered on any given individual.

Anything else?

Strong ties? Weak ties?

Potentially Interesting Measures That We Could Correlate With Various Sequence Characteristics

  • The density (actual ties/possible ties) of a repository network
  • Size of network
  • Cohesion/Geodesics, i.e. the number of direct paths in the network
  • How many other repositories do the coders contribute to?
  • Dispersion of code contribution
  • Are there cliques in the repository network?
  • Centrality of actors in relation to sequences or sequence aspects
  • Network centralization (a measure of inequality of contribution in the network as a whole)
  • Graph hierarchy (do coders follow each other mutually, or do only certain coders get followed?)
  • Least upper boundedness
  • Efficiency
  • Changes in any of these measures over time

Thoughts on How to Manage the Summer Course

  • Maybe we could divide the 10 workshops from the Stanford SNA in R/SoNIA between ourselves, and then go through them one by one?



Here are some resources to get us started:


Crowston, K., & Howison, J. 2006. Hierarchy and centralization in free and open source software team communications. Knowledge, Technology & Policy, 18(4): 65-85.