Skip to content
This repository

HTTPS clone URL

Subversion checkout URL

You can clone with HTTPS or Subversion.

Download ZIP
Browse code

vers 0.2-4 upload

  • Loading branch information...
commit 1802430c46adcaea9787110865045eb4656f52d7 1 parent 9e277a7
Bryan Hanson authored
4 DESCRIPTION
... ... @@ -1,8 +1,8 @@
1 1 Package: HiveR
2 2 Type: Package
3 3 Title: 2D and 3D Hive Plots for R
4   -Version: 0.2-3
5   -Date: 2012-6-11
  4 +Version: 0.2-4
  5 +Date: 2012-6-25
6 6 Author: Bryan A. Hanson DePauw University, Greencastle Indiana USA
7 7 Maintainer: Bryan A. Hanson <hanson@depauw.edu>
8 8 Description: HiveR is an R package for creating and plotting 2D and 3D hive plots. Hive plots are a unique method of displaying networks of many types in which node properties are mapped to axes using meaningful properties rather than being arbitrarily positioned. The hive plot concept was invented by Martin Krzywinski at the Genome Science Center (www.hiveplot.net/). Keywords: networks, food webs, linnet, systems biology, bioinformatics.
4 NEWS
@@ -11,9 +11,11 @@ doi: 10.1093/bib/bbr069
11 11 Bryan A. Hanson DePauw University, Greencastle Indiana USA
12 12
13 13 ###### Version: 0.2-4
14   -Date:
  14 +Date: 25 June 2012
15 15 News:
16 16 >> Improved vignette, now built using knitr outside of the build/check process
  17 +>> Added additional files for the E coli data set. See inst/extdata/E_coli/README for details.
  18 +>> Revised animateHive to permit different hives to be animated using different sets of arguments. Gives maximum flexibility. See the help page for a silly example.
17 19
18 20 ###### Version: 0.2-3
19 21 Date: 11 June 2012
15 R/animateHive.R
... ... @@ -1,14 +1,12 @@
1   -
2   -
3   -animateHive <- function(hives = list(), ...) {
  1 +animateHive <- function(hives = list(), cmds = list(), xy = 400, ...) {
4 2
5   - # Function to create 1 or more coordinated
6   - # rgl animations
  3 + # Function to create coordinated rgl animations
  4 + # using different plotting arguments for each hive plot
7 5
8 6 nh <- length(hives)
9 7 if (nh == 0) stop("No hives specified")
10 8
11   - # Draw each hive in its own window
  9 + # Draw each hive in its own window w/its own parameters
12 10
13 11 win.list <- c()
14 12
@@ -19,11 +17,12 @@ animateHive <- function(hives = list(), ...) {
19 17 warning(msg)
20 18 next
21 19 }
22   - open3d()
  20 + open3d(windowRect = c(0, 0, xy, xy))
23 21 win.name <- paste("window", rgl.cur(), sep = "")
24 22 win.list <- c(win.list, win.name)
25 23 rgl.bringtotop(TRUE)
26   - plot3dHive(hives[[n]])
  24 + do.call(plot3dHive, args = c(hives[n], cmds[[n]]))
  25 + # Since hives is a list of lists, you must unlist it one level
27 26 }
28 27
29 28 # Set up a controller
2  README.asciidoc
Source Rendered
@@ -13,7 +13,7 @@ library("HiveR")
13 13
14 14 vignette("HiveR") # To see the user guide
15 15
16   -If you use branch = "devel" you can get the development branch if it is available. Devel versions would be ahead of what's on CRAN.
  16 +If you use branch = "devel" you can get the development branch if it is available. devel versions would be ahead of what's on CRAN. devel versions may be buggy in certain functions/actions.
17 17
18 18 From CRAN using R:
19 19 ------------------
BIN  inst/doc/HiveR.pdf
Binary file not shown
0  inst/extdata/E_coli/ecoli.dot → inst/extdata/E_coli/E_coli_P.dot
File renamed without changes
5,492 inst/extdata/E_coli/E_coli_TF.dot
5,492 additions, 0 deletions not shown
0  inst/extdata/E_coli/EdgeInst.csv → inst/extdata/E_coli/EdgeInst_P.csv
File renamed without changes
1  inst/extdata/E_coli/EdgeInst_TF.csv
... ... @@ -0,0 +1 @@
  1 +dot.tag,dot.val,hive.tag,hive.val interaction,repressor,color,red interaction,activator,color,green interaction,dual,color,orange
0  inst/extdata/E_coli/NodeInst.csv → inst/extdata/E_coli/NodeInst_P.csv
File renamed without changes
0  inst/extdata/E_coli/NodeLabels.csv → inst/extdata/E_coli/NodeLabels_P.csv
File renamed without changes
19 inst/extdata/E_coli/README
... ... @@ -1,8 +1,10 @@
1   -The file ecoli.dot contains the gene regulatory network of E. coli as discussed in:
  1 +Summary of E coli files in inst/extdata/E_coli
2 2
3   -Yan KK, Fang G, Bhardwaj N, Alexander RP, Gerstein M. 2010. Comparing genomes to computer operating systems in terms of the topology and evolution of their regulatory control networks. Proc Natl Acad Sci U S A 107(20): 9186-9191.
  3 +Part of HiveR package by Bryan Hanson. Files provided by Martin Krzywinski of the Genome Sciences Center and used with permission.
  4 +
  5 +*****
4 6
5   -However, the original publication of the regulatory network traces to:
  7 +The main source of data for the regulatory network is:
6 8
7 9 Gama-Castro S, Salgado H, Peralta-Gil M, Santos-Zavaleta A, Muniz-Rascado
8 10 L, Solano-Lira H et al (2011). RegulonDB version 7.0: transcriptional
@@ -11,7 +13,13 @@ response units (Gensor Units). Nucleic Acids Research 39: D98-D105.
11 13
12 14 http://www.ncbi.nlm.nih.gov/pubmed/21051347?dopt=Abstract
13 15
14   -This data set has been extended by Martin Krzywinski of the Genome Science Center by the addition of persistence and edge classifiers as described below.
  16 +*****
  17 +
  18 +The files Ecoli_P.dot, EdgeInst_P.csv, NodeInst_P.csv and NodeLabels_P.csv pertain to the gene regulatory network of E. coli as discussed in:
  19 +
  20 +Yan KK, Fang G, Bhardwaj N, Alexander RP, Gerstein M. 2010. Comparing genomes to computer operating systems in terms of the topology and evolution of their regulatory control networks. Proc Natl Acad Sci U S A 107(20): 9186-9191.
  21 +
  22 +This data set has been extended by Martin Krzywinski by the addition of persistence and edge classifiers as described below.
15 23
16 24 Nodes are classified as 'persistent' or 'nonpersistent' according to the definition in the original paper (Yan et al). Edges are classified using a type=N label where N=0,1,2,3 defined as follows. For E. coli
17 25
@@ -20,3 +28,6 @@ type=1 - E. coli gene names share 1 common start characters (arca acee)
20 28 type=2 - E. coli gene names share 2 common start characters (argr arti)
21 29 type=3 - E. coli gene names share 3 common start characters (acrr acrb)
22 30
  31 +*****
  32 +
  33 +The files Ecoli_TF.dot and EdgeInst_TF.csv are from an more recent version of RegulonDB; the edges are coded according to whether the transcription factor is an activator, repressor, or dual function protein. There are no node instructions for the transcription factor data set.
4 man/HiveR-package.Rd
@@ -16,8 +16,8 @@ HiveR is an R package for creating and plotting 2D and 3D hive plots. Hive plots
16 16 \tabular{ll}{
17 17 Package: \tab HiveR\cr
18 18 Type: \tab Package\cr
19   -Version: \tab 0.2-3\cr
20   -Date: \tab 2012-06-11\cr
  19 +Version: \tab 0.2-4\cr
  20 +Date: \tab 2012-06-25\cr
21 21 License: \tab GPL-3\cr
22 22 }
23 23 }
29 man/animateHive.Rd
@@ -4,15 +4,21 @@
4 4 Animate one or more 3D hive plots with an handy controller
5 5 }
6 6 \description{
7   -This function takes a list of \code{HivePlotData} objects of \code{type = "3D"} and plots each in its own \code{rgl} window, then adds a controller which handles rotation and scaling.
  7 +This function takes a list of \code{HivePlotData} objects of \code{type = "3D"} and plots each in its own \code{rgl} window using its own arguments, then adds a controller which handles rotation and scaling.
8 8 }
9 9 \usage{
10   -animateHive(hives = list(), ...)
  10 +animateHive(hives = list(), cmds = list(), xy = 400, ...)
11 11 }
12 12 \arguments{
13 13 \item{hives}{
14 14 A list of \code{HivePlotData} objects.
15 15 }
  16 + \item{cmds}{
  17 +A list of arguments corresponding to how you want each hive plotted.
  18 +}
  19 + \item{xy}{
  20 +An integer giving the size of the \code{rgl} window in pixels.
  21 +}
16 22 \item{\dots}{
17 23 Other parameters to be passed downstream to \code{rgl}.
18 24 }
@@ -31,10 +37,21 @@ Bryan A. Hanson, DePauw University. \email{hanson@depauw.edu}
31 37
32 38 \examples{
33 39 \dontrun{
34   -tA <- ranHiveData(type = "3D", nx = 4)
35   -tB <- ranHiveData(type = "3D", nx = 5)
36   -tC <- ranHiveData(type = "3D", nx = 6)
37   -animateHive(hives = list(tA, tB, tC))
  40 +require("rgl")
  41 +require("tkrgl")
  42 +# Sillyness: let's draw different hives with different settings
  43 +# List of hives
  44 +t4 <- ranHiveData(type = "3D", nx = 4)
  45 +t5 <- ranHiveData(type = "3D", nx = 5)
  46 +t6 <- ranHiveData(type = "3D", nx = 6)
  47 +myhives <- list(t4, t5, t6)
  48 +# List of arguments to plot in different coordinate systems
  49 +cmd1 <- list(method = "abs", LA = TRUE, dr.nodes = FALSE, ch = 10)
  50 +cmd2 <- list(method = "rank", LA = TRUE, dr.nodes = FALSE, ch = 2)
  51 +cmd3 <- list(method = "norm", LA = TRUE, dr.nodes = FALSE, ch = 0.1)
  52 +mycmds <- list(cmd1, cmd2, cmd3)
  53 +#
  54 +animateHive(hives = myhives, cmds = mycmds)
38 55 }
39 56 }
40 57 \keyword{ interactive }
697 vignettes/HiveR.Rnw
... ... @@ -1,697 +0,0 @@
1   -%\VignetteIndexEntry{HiveR: 2D and 3D Hive Plots for R}
2   -%\VignetteDepends{RColorBrewer, grid, rgl, RFOC, bipartite, sna, xtable, mvbutils, FuncMap, lattice, reshape}
3   -%\VignettePackage{HiveR}
4   -
5   -\documentclass[10pt]{article}
6   -
7   -\SweaveOpts{echo = T, pdf = T, eps = F, eval = T, keep.source = T, prefix.string = graphics/vig}
8   -\usepackage{Sweave}
9   -\setkeys{Gin}{width = 0.6\textwidth} % part of Sweave I believe; 0.8 is the default
10   -
11   -\graphicspath{{./graphics/}}
12   -
13   -\usepackage{mathpazo}
14   -\usepackage{color}
15   -\usepackage{graphicx}
16   -\usepackage[margin = 2.0cm]{geometry}
17   -\geometry{letterpaper}
18   -\usepackage[parfill]{parskip} % Activate to begin paragraphs with an empty line rather than an indent
19   -%\usepackage{pdflscape} % Use to turn the whole document or individual pages
20   -\usepackage{fancyhdr}
21   -\usepackage{xtab}
22   -
23   -\usepackage[square, comma, numbers, sort&compress]{natbib} % allows grouping of references [1-4, 8]
24   -
25   -\usepackage{ccaption} % Stuff to change the format of a figure caption
26   -\captionnamefont{\bfseries\large}
27   -\captiontitlefont{\bfseries\large}
28   -
29   -\usepackage{hyperref, url}
30   -
31   -\setlength{\belowcaptionskip}{10pt} % not part of ccaption
32   -
33   -\renewcommand*\familydefault{\sfdefault} % Use if the base font of the document is to be sans serif
34   -
35   -%%%%% End of Configuration Stuff %%%%%
36   -
37   -<< getVersion, echo = FALSE >>=
38   -desc <- packageDescription("HiveR")
39   -vers <- paste("Version", desc$Version)
40   -@
41   -
42   -\title{The \texttt{HiveR} Package\\
43   -\Sexpr{vers}\\}
44   -\author{Bryan A. Hanson\\
45   -\\
46   -DePauw University\\
47   -Department of Chemistry \& Biochemistry\\
48   -Greencastle Indiana USA\\
49   -\\
50   -e-mail: \href{mailto:hanson@depauw.edu}{hanson@depauw.edu}\\
51   -\\
52   -\href{http://github.com/bryanhanson/HiveR}{github.com/bryanhanson/HiveR}\\
53   -\href{http://CRAN.R-project.org/package=HiveR}{CRAN.R-project.org/package=HiveR}\\
54   -}
55   -\date{\today}
56   -
57   -%%%%%%%%%
58   -\begin{document}
59   -
60   -\maketitle
61   -
62   -This document describes some features of the \texttt{HiveR} package including current capabilities and future plans. The current release contains a core set of functions for creating and drawing hive plots. Additional features are contemplated. There may well be bugs and features that can be improved. Your comments are always welcome.
63   -
64   -As with any \texttt{R} package, details on functions discussed below can be found by typing \texttt{?function\_name} in the \texttt{R} console after installing \texttt{HiveR}. A complete list of functions available can be had by typing \texttt{?HiveR} and then at the bottom of the page that opens, click on the "index" link.
65   -
66   -\section{Background, Inspiration and Motivation} %%%%%
67   -
68   -\texttt{HiveR} was inspired by the concept of hive plots as developed by Martin Krzywinski at the Genome Science Center (\href{http://www.hiveplot.com/}{www.hiveplot.com}). Hive plots are a reaction to "hair ball" style networks in which the layout of the network is arbitrary and hypersensitive to even small changes in the underlying network. Hive plots are particularly useful for the discovery of emergent properties of networks.
69   -
70   -The key innovation in a hive plot, compared to other means of graphically displaying network structure, is in how node information is handled. Nodes are assigned to axes based upon qualitative or quantitative characteristics of the the node, for instance membership in a certain category, and the position of the node along the axis is based upon some quantitative characteristic of the node. In a hive plot, edges are handled in a fairly standard way, but may be colored or have a width or weight which encodes an interesting value. In creating a hive plot, one maps network parameters to the hive plot, and thus the process can be readily tuned to meet one's needs. The mappable parameters are listed in Table~\ref{Mapping}, and the mapping is limited only by one's creativity and the particular knowledge domain. Thus ecologists have their own measures of food webs, social network analysts have various measures describing interconnectedness etc. An essential point is that mapping network parameters in this way results in a reproducible plot which is particularly well-suited for comparing related networks. Comparison of "hair balls" is notoriously fraught with problems.
71   -
72   -Krzywinski has an excellent paper detailing the features and virtues of hive plots and is a must-read.\cite{Krzywinski2011} He notes the following virtues of hive plots:
73   -
74   -\begin{itemize}
75   - \item Hive plots are rational in that only the structural properties of the network determine the layout.
76   - \item Hive plots are flexible and can be tuned to show interesting features.
77   - \item Hive plots are predictable since they arise from rules that map network features to plot features.
78   - \item Hive plots are robust to changes in the underlying network.
79   - \item Hive plots of different networks can be compared.
80   - \item Hive plots are transparent and practical.
81   - \item Plots of networks are generally complex and require some investment to understand. Complexity scales well in a hive plot and details can be inspected.
82   -\end{itemize}
83   -
84   -For comparison, Suderman and Hallett have published a nice review of a wide range of other programs for visualizing biological networks though it is now slightly out of date.\cite{Suderman2007}
85   -
86   -\begin{table}
87   -\begin{center}
88   -\begin{tabular}{|l|}
89   -\hline
90   -mappable hive plot parameters\\
91   -\hline
92   -Axis to which a node is assigned\\
93   -Radius of a node\\
94   -Color of a node\\
95   -Size of a node\\
96   -Color of an edge\\
97   -Width or weight of an edge\\
98   -\hline
99   -\end{tabular}
100   -\end{center}
101   -\caption{Hive plot features that can be mapped to network parameters\label{Mapping}}
102   -\end{table}
103   -
104   -Inspired by the examples given by Kryzwinski in his materials on the web, I created the \texttt{R} package \texttt{FuncMap} in December 2010. This single function package maps the function calls made by an \texttt{R} package into 3 types: sources, which are functions that make only outgoing calls, sinks, which take only incoming calls, and managers, which do both. Figure~\ref{FuncMap} shows an example of a plot made by \texttt{FuncMap}; this is a true hive plot. In this plot, functions in a package are assigned to an axis by their role, and the radius is determined by the number of calls made or received by a function (which is the number of edges or degree of the node). This is also the basis for the width of the edges. In this plot, calls (edges) originating on the source axis are shown in green, while those originating on the manager axis are in blue. By defintion, the sink axis only receives calls.
105   -
106   -\texttt{HiveR} takes things quite a bit further. \texttt{HiveR} is intended as an implementation of hive plots in \texttt{R}, not a port of linnet \emph{per se} (Krzywinski's program that draws hive plots, written in Perl). As such, it does some things differently, and not all features are implemented (and they may or may not be in the future). \texttt{HiveR} will draw 2D hive plots with 2-6 axes in a style close to that created by linnet. However, \texttt{HiveR} adds value by making 3D, interactive plots possible when there are 4-6 axes. These 3D plots were inspired by the ideas of VSEPR theory in chemistry: the axes of these 3D plots are arranged with tetrahedral, trigonal bipyramidal or octahedral geometries for 4-6 axes respectively (see Figure~\ref{VSEPR} and \href{https://secure.wikimedia.org/wikipedia/en/wiki/Vsepr}{wikipedia/VSEPR}). Other differences are discussed below.
107   -
108   -\begin{figure}
109   -\begin{center}
110   -\includegraphics[scale = 2]{VSEPR.pdf}
111   -\end{center}
112   -\caption{Idealized geometries according to VSEPR theory\label{VSEPR}}
113   -\end{figure}
114   -
115   -<< SetUp, echo = F, results = hide, eval = TRUE>>=
116   -set.seed(123)
117   -library(lattice) # these are only needed for the automatic vignette build, which occurs
118   -library(mvbutils) # in a clean environment
119   -library(grid)
120   -library(FuncMap)
121   -library(HiveR)
122   -library(sna)
123   -library(xtable)
124   -library(bipartite)
125   -library(reshape)
126   -if (!file.exists("graphics")) dir.create("graphics")
127   -@
128   -
129   -\begin{figure}
130   -\begin{center}
131   -<< FuncMapExample, fig = TRUE, echo = FALSE, width = 5, height = 5 >>=
132   -fw <- foodweb(where = "package:lattice", plotting = FALSE)
133   -ans <- FuncMap(fwb = fw, pkg = "lattice", method = "abs")
134   -@
135   -\end{center}
136   -\caption{FuncMap for package lattice\label{FuncMap}}
137   -\end{figure}
138   -
139   -\section{\texttt{HiveR} Features} %%%%%
140   -
141   -\subsection{Internal Storage}
142   -
143   -\texttt{HiveR} stores the information needed to create a hive plot in a \texttt{HivePlotData} object which is an S3 class. As an S3 class, this structure can be easily extended by the user to store additional information (though using that information as part of a hive plot would require more work). Utilities are provided to summarize the contents of these objects and to check their integrity (functions \texttt{sumHPD} and \texttt{chkHPD} respectively). The structure and content of a \texttt{HivePlotData} object is shown in Table~\ref{Struc}.
144   -
145   -
146   -\begin{table}
147   -\begin{center}
148   -
149   -\begin{tabular}{|l|l|l|l|}
150   -\hline
151   -\emph{element} & \emph{(element)} & \emph{type} & \emph{description}\cr
152   -\hline
153   -\$nodes & & data frame & Data frame of node properties \\
154   -& \$id & int & Node identifier \\
155   -& \$lab & chr & Node label \\
156   -& \$axis & int & Axis to which node is assigned \\
157   -& \$radius & num & Radius (position) of node along the axis \\
158   -& \$size & num & Node size in pixels \\
159   -& \$color & chr & Node color \\
160   -\hline
161   -\$edges & & data frame & Data frame of edge properties \\
162   -& \$id1 & int & Starting node id \\
163   -& \$id2 & int & Ending node id \\
164   -& \$weight & num & Width of edge in pixels \\
165   -& \$color & chr & Edge color \\
166   -\hline
167   -\$type & & chr & Type of hive (2D or 3D) \\
168   -\hline
169   -\$desc & & chr & Description of data \\
170   -\hline
171   -\$axis.cols & & chr & Colors for axes \\
172   -\hline
173   -- attr & & chr "HivePlotData" & The S3 class designation\\
174   -\hline
175   -\end{tabular}
176   -\end{center}
177   -\caption{The structure of a HivePlotData object\label{Struc}}
178   -\end{table}
179   -
180   -
181   -
182   -\subsection{Generation of Random Network Data Sets}
183   -
184   -\texttt{HiveR} has the ability to generate random network data sets with between 2 and 6 axes, using function \texttt{ranHiveData}. These are useful for testing and demonstration purposes and will be used in the examples below. A data set has a type, either 2D or 3D. Type 2D may have 2-6 axes and is plotted in a 2D window using \texttt{grid} graphics which are extremely fast. Type 3D applies to 4-6 axes only and these hive plots are drawn in 3D using \texttt{rgl} and are interactive. When using \texttt{ranHiveData} you can specify which type you desire.
185   -
186   -\subsection{Built-in Data Sets}
187   -
188   -\texttt{HiveR} contains two related 2D type data sets, \texttt{Safari} and \texttt{Arroyo}. These plant-pollinator data sets give the number of visits for each plant-pollinator pair. The \emph{E. coli} gene regulatory network is also included as a .dot file. This data is discussed in Yan \emph{et. al.}\cite{Yan2010} but is based upon data in the RegulonDB.\cite{Gama2010} The version here was extended by Krzywinski and provided in the linnet package. This .dot file can be processed into either a 2D or 3D type hive plot. Each of these data sets are used in the examples below.
189   -
190   -\subsection{Importing Real Data Sets}
191   -
192   -The function \texttt{dot2HPD} will import files in .dot format and convert them to \texttt{HivePlotData} objects (see \href{https://secure.wikimedia.org/wikipedia/en/wiki/DOT\_language}{wikipedia/DOT\_language}). This is done with the aid of two external files. One contains information about how to map node labels to \texttt{HivePlotData} properties. The other contains information about mapping edge properties. This approach gives one a lot of flexibility to process the same graph into various hive plots. This process is demonstrated later for the \emph{E. coli} data set. Currently, only a very small set of the .dot standard is implemented and one should not expect any particular .dot file to process correctly.
193   -
194   -\subsection{Modifying \texttt{HivePlotData} Sets}
195   -
196   -Function \texttt{mineHPD} has several options for extracting information within an existing \texttt{HivePlotData} object and converting it to a modifed \texttt{HivePlotData} object. Currently, there are three options, but more are easily added. One option assigns the radius of a node based upon the number of edges connected to it (the degree). Another assigns axes based upon whether a given node is a source node, manager node or sink node. This latter option is designed to create hive plots similar to those featured by Krzywinski for the \emph{E. coli} data set, and is demonstrated later. The final option removes any orphaned nodes (these have no edges). In addition, function \texttt{manipAxis} can also be used to modify a \texttt{HivePlotData} object by scaling or inverting axes.
197   -
198   -\subsection{Making Hive Plots}
199   -
200   -In a hive plot, because the position of the node along an axis (the radius) is quantitative, the nodes can be plotted at their absolute value (native units), normalized to run between 0\ldots1, plotted by rank or by a combination of ranking and norming. Some aspects of the plot that depend upon these options are shown in Table~\ref{Method}. These different ways of plotting the same data often look dramatically different, and for a particular data set, some methods of plotting nodes may provide more insight. Functions \texttt{plotHive} and \texttt{plot3dHive} have an argument \texttt{method} which controls node plotting on the fly; function \texttt{manipAxis} is used in the background and can be called independently if desired.
201   -
202   -\bottomcaption{Comparison of methods for plotting node radii\label{Method}}
203   -
204   -\begin{center}
205   -\begin{xtabular}{| p{0.25\textwidth} | p{0.25\textwidth} | p{0.1\textwidth} | p{0.3\textwidth} |}
206   -\hline
207   -\emph{method} & \emph{axis length} & \emph{center hole} & \emph{other} \\
208   -\hline
209   -\hline
210   -native units (abs) & varies ($\propto no.\ nodes$) & asymmetric & nodes may overlap\\
211   -\hline
212   -ranked units (rank) & varies ($\propto rank(no.\ nodes)$) & circular & nodes evenly spaced (1, 2, 3 \dots) and don't overlap \\
213   -\hline
214   -normed units (norm) & all equal & circular & nodes may overlap\\
215   -\hline
216   -ranked \& normed (ranknorm) & all equal & circular & nodes evenly spaced (1, 2, 3 \dots) and don't overlap \\
217   -\hline
218   -\end{xtabular}
219   -\end{center}
220   -
221   -\subsubsection{Type 2D Hive Plots}
222   -
223   -Figures~\ref{HP2} shows a 2 axis hive plot using randomly generated data and the function \texttt{plotHive}. Figure~\ref{HP3a} shows a hive plot of a random 3 axis network using absolute scaling; Figure~\ref{HP3r} shows the 3 axis example with the nodes displayed by rank and Figure~\ref{HP3n} the same data normed. FIgure~\ref{HP5} shows a 5 axis example. \texttt{plotHive} places axis number 1 at the top (vertical) except in the 2 axis case where it is on the right. Nodes are drawn in these examples, however, drawing nodes is optional and the more nodes there are, the less likely you will want to draw them. As these plots show, depending upon their size and radii, nodes may overlap. The nodes "on top" will be those drawn last (also true of edges). In some cases users may wish to sort the nodes and edges so that certain nodes and edges are drawn last and thus "show". Nodes and edges with various characteristics can also be subsetted and recombined if simple sorting won't do the job. This method is used in some of the examples which follow.
224   -
225   -\begin{figure}
226   -
227   -\begin{center}
228   -<< HP2, fig = TRUE, echo = FALSE, width = 5, height = 2.5 >>=
229   -hp2 <- ranHiveData(nx = 2)
230   -plotHive(hp2, ch = 10)
231   -@
232   -\end{center}
233   -\caption{A randomly generated hive plot with 2 axes (native units)\label{HP2}}
234   -\end{figure}
235   -
236   -
237   -\begin{figure}
238   -
239   -\begin{center}
240   -<< HP3a, fig = TRUE, echo = FALSE, width = 5, height = 5 >>=
241   -hp3 <- ranHiveData(nx = 3)
242   -plotHive(hp3, ch = 10)
243   -@
244   -\end{center}
245   -\caption{A randomly generated hive plot with 3 axes (native units)\label{HP3a}}
246   -\end{figure}
247   -
248   -
249   -\begin{figure}
250   -
251   -\begin{center}
252   -<< HP3r, fig = TRUE, echo = FALSE, width = 5, height = 5 >>=
253   -plotHive(hp3, method = "rank", ch = 1)
254   -@
255   -\end{center}
256   -\caption{A randomly generated hive plot with 3 axes (nodes by rank)\label{HP3r}}
257   -\end{figure}
258   -
259   -\begin{figure}
260   -
261   -\begin{center}
262   -<< HP3n, fig = TRUE, echo = FALSE, width = 5, height = 5 >>=
263   -plotHive(hp3, method = "norm", ch = 0.1)
264   -@
265   -\end{center}
266   -\caption{A randomly generated hive plot with 3 axes (nodes normed)\label{HP3n}}
267   -\end{figure}
268   -
269   -\begin{figure}
270   -
271   -\begin{center}
272   -<< HP5, fig = TRUE, echo = FALSE, width = 5, height = 5 >>=
273   -hp5 <- ranHiveData(nx = 5, allow.same = TRUE)
274   -plotHive(hp5, ch = 10)
275   -@
276   -\end{center}
277   -\caption{A randomly generated hive plot with 5 axes (native units; edges along the same axis permitted)\label{HP5}}
278   -\end{figure}
279   -
280   -\subsubsection{Type 3D Hive Plots}
281   -
282   -With type 3D and 4 to 6 axes, plots are interactive and cannot be shown here. See the help page for \texttt{plot3dHive} for an example you can run when have the package installed (\texttt{?plot3dHive}). Note that \texttt{plot3dHive} has an argument \texttt{LA} which controls whether antialiasing is used when drawing the edges. \texttt{LA} defaults to \texttt{FALSE} which plots quickly. Further testing and optimization is needed, but \texttt{LA = TRUE} should probably be reserved for making final plots, as it is at least 20 times slower.
283   -
284   -\subsubsection{Performance}
285   -
286   -\texttt{HiveR} draws hive plots very quickly when using either \texttt{plotHive} or \texttt{plot3dHive}. As of version 0.1-5, the bottlenecks holding \texttt{plot3dHive} back have been eliminated. Figure~\ref{perf} shows the performance of this function on a MacBook Pro running OSX 10.6.8 using 8 Mb RAM and an Intel i7 chip at 2 GHz. As of version 0.1-6, speed improvements have been made to \texttt{plotHive} and Figure~\ref{perf2D} shows the performance on the same hardware. These benchmarks were determined before byte compiling was turned on and so the performance is likely even better.
287   -
288   -\begin{figure}
289   -
290   -\begin{center}
291   -\includegraphics{plot3dHive_performance.pdf}
292   -\end{center}
293   -\caption{Performance of plot3dHive\label{perf}}
294   -\end{figure}
295   -
296   -\begin{figure}
297   -
298   -\begin{center}
299   -\includegraphics{plotHive_performance.pdf}
300   -\end{center}
301   -\caption{Performance of plotHive\label{perf2D}}
302   -\end{figure}
303   -
304   -\subsection{Some Things to Keep in Mind} %%%%%
305   -
306   -\begin{enumerate}
307   - \item As currently implemented in \texttt{HiveR}, hive plots are agnostic graphs in that they are not necessarily directed or undirected. However, some of the functions actually do draw edges in a way that could readily be converted into a directed graph in the future. For example, \texttt{plotHive} draws edges between axes 1 and 2 in a separate step from those starting on 2 and ending on 1. This is so that the correct curvature of the splines is used, but it could be used to encode directionality. Further, some options in \texttt{mineHPD} assume that the \texttt{HivePlotData} object represents a directed graph, and while \texttt{dot2HPD} currently doesn't distinguish between directed and non-directed graphs, it could in the future.
308   -
309   - \item linnet creates hive plots that are essentially parallel coordinate plots\cite{Wegman1990} that have been wrapped into a radial arrangement. \texttt{HiveR} plots of type 2D are essentially the same thing. As with any parallel coordinate plot, the order of the axes affects what you see. With 2 or 3 axes this isn't a problem. For 4-6 axes and type 2D, the user has to give some thought as to how to assign the axes. One should assign the axes in a way that avoids edges jumping over or crossing an axis when using type 2D. Edges should be arranged 1 $\rightarrow$ 2, 2 $\rightarrow$ 3, \ldots 5 $\rightarrow$ 6 but not 1 $\rightarrow$ 4 for example. Function \texttt{sumHPD} with \texttt{chk.ax.jump = TRUE} will tell you if any edges cross. For type 3D, one doesn't have to worry about this, but must guard against edges that start and end on the same axis or start and end on colinear axes. \texttt{ranHiveData} takes care of these exceptions automatically. By they way, these conditions don't cause errors, but they overdraw the axes and it doesn't look good.
310   -
311   - \item On the other hand, \texttt{HiveR} plots using type 3D are not a parallel coordinate plots. For 4 axes plotted as a tetrahedron, any pair of axes are intrinsically next to each other and it is not possible for an edge to cross another axis. For 5 and 6 axes, crossings are a potential problem but generally it is possible to connect axes in more combinations than for type 2D. For instance, with 5 axes and type 2D, any one axis is between only 2 other axes, and hence can be connected to at most 2 other axes. But for type 2D and 5 axes, an axis in the apical position can be connected to 3 other axes, and an axis in the equatorial position can be connected to 4 other axes (could use a diagram showing this).
312   -
313   - \item Some ideas for network parameters that might be mapped to node radii (see Table~\ref{Mapping}):
314   - \begin{enumerate}
315   - \item Ecology: see various species descriptors computed by function \texttt{specieslevel} in package \texttt{bipartite}.
316   - \item Social networks: see the section "Node-level indices" in the article describing package \texttt{sna}.\cite{Butts2008} Briefly, degree, betweeness and closeness are the key ideas.
317   - \item See Table 1 in the article by Krzywinski.\cite{Krzywinski2011}
318   - \end{enumerate}
319   -\end{enumerate}
320   -
321   -\section{A Simple Example Using a Plant-Pollinator Network} %%%%%
322   -
323   -\texttt{HiveR} currently contains the built-in data sets, \texttt{Safari} and \texttt{Arroyo} which provide a useful demonstration of \texttt{HiveR}.\footnote{Be warned: I am not an ecologist and these data sets and plots are merely a demonstration of \texttt{HiveR}.} These are plant-pollinator data sets which were derived from Vasquez and Simberloff, 2003 \cite{Vazquez2003}. These describe two-trophic level systems that consist of almost exactly the same suite of plants and pollinators. \texttt{Safari} is based upon observations of an undisturbed area, while \texttt{Arroyo} is from a nearby location grazed by cattle. The original data is composed of plant-pollinator pairs and a count of visits for each pair.
324   -
325   -Figures~\ref{PPN1} and \ref{PPN4} show two means of plotting \texttt{Safari} using package \texttt{bipartite}.\footnote{Note that we are using the data set \texttt{Safariland} from package \texttt{bipartite}; \texttt{Safari} was derived from \texttt{Safariland}.} Figure~\ref{PPN1} is a simple diagram giving plant-pollinator visits as a gray scale heat map. There are two parameters encoded here: the pairings and the number of visits (arguably, the dimensions of the matrix give the number of species involved as well). Figure~\ref{PPN4} displays plants across the bottom and pollinators across the top. The width of the connecting bands in the middle encodes the number of visits for a given plant-pollinator pair. The width of the top or bottom panel for a species is the total number of visits in which that species participates. Thus there are three parameters shown in this figure: the pairings, the total visits for a single species, and visits between a given pair. This second plot makes it pretty clear that four plant-pollinator pairs have by far the most number of visits.
326   -
327   -\begin{figure}
328   -
329   -\begin{center}
330   -<< PPN1, fig = TRUE, echo = FALSE, width = 5, height = 5 >>=
331   -require(bipartite)
332   -data(Safariland)
333   -visweb(Safariland)
334   -@
335   -\end{center}
336   -\caption{Safariland data set using visweb\label{PPN1}}
337   -\end{figure}
338   -
339   -
340   -\begin{figure}
341   -
342   -\begin{center}
343   -<< PPN4, fig = TRUE, echo = FALSE, width = 5, height = 5 >>=
344   -plotweb(Safariland)
345   -@
346   -\end{center}
347   -\caption{Safariland data set using plotweb\label{PPN4}}
348   -\end{figure}
349   -
350   -Another approach to presenting this network graphically would be to use function \texttt{gplot} in the very powerful social network analysis package \texttt{sna}. \texttt{gplot} is flexible and has many options. Figure~\ref{PPN5} shows one possible display of \texttt{Safari} (actually, \texttt{Safariland}). In this plot, plant nodes are colored green and insect nodes red. The width of the edges is proportional to the number of visits between a pair of species. Figure~\ref{PPN6} shows the same data using a different layout algorithm, one which shows that there are actually two networks present (and which is not apparent from the hive plots below). Edge width here is the same as before, but because high traffic pair nodes are close to each other, the connecting, wide edge looks a bit odd (clearly, one could experiment to improve this detail).
351   -
352   -\begin{figure}
353   -
354   -\begin{center}
355   -<< PPN5, fig = TRUE, echo = FALSE, width = 5, height = 5 >>=
356   -gplot(Safariland, gmode = "graph", edge.lwd = 0.05,
357   - vertex.col = c(rep("green", 9), rep("red", 27)),
358   - mode = "circle")
359   -@
360   -\end{center}
361   -\caption{Safariland data set using gplot (mode = circle)\label{PPN5}}
362   -\end{figure}
363   -
364   -\begin{figure}
365   -
366   -\begin{center}
367   -<< PPN6, fig = TRUE, echo = FALSE, width = 5, height = 5 >>=
368   -gplot(Safariland, gmode = "graph", edge.lwd = 0.05,
369   - vertex.col = c(rep("green", 9), rep("red", 27)))
370   -@
371   -\end{center}
372   -\caption{Safariland data set using gplot (mode = Fruchterman-Reingold)\label{PPN6}}
373   -\end{figure}
374   -
375   -Figures~\ref{PPN2} and \ref{PPN3} show \texttt{Safari} and \texttt{Arroyo} respectively, using \texttt{plotHive} (instrinically type 2D since there are only 2 axes in the data set). In these plots, plants are on one axis, and pollinators are on the other. Each organism was assigned a radius on its axis based by calculating $d'$ using function \texttt{dfun} in package \texttt{bipartite}. $d$' is an index of specialization; higher values mean the plant or pollinator is more specialized.\footnote{These plots use the absolute value of $d'$ for the node radii.} Edge weights were assigned proportional to the square root of the normalized number of visits of a pollinator to a plant. Thus the width of the edge drawn is an indication of the visitation rate. The transformed number of visits was divided manually into 4 groups and used to assign edge colors ranging from white to red. The redder colors represent greater numbers of visits, and the color-coding is comparable for each figure. Thus both the edge color and the edge weight encode the same information. It would of course be possible to encode an additional variables by changing either edge color or weight, or node size. These plots show a rich amount of information not available from the more standard plots and show that the networks are fundamentally different:
376   -
377   -\begin{itemize}
378   - \item The degree of specialization with each network is different. A greater number of visits (wider, redder edges) occur between more specialized species (nodes at larger radii) in \texttt{Safari} than \texttt{Arroyo}.
379   - \item There are more plant species in \texttt{Arroyo}: the plant axis is longer.
380   - \item The huge number of visits encoded in red in \texttt{Safari} (the ungrazed site) is missing in \texttt{Arroyo}, which was an interesting aspect of the study.
381   -\end{itemize}
382   -
383   -\begin{figure}
384   -
385   -\begin{center}
386   -<< PPN2, fig = TRUE, echo = FALSE, width = 5, height = 2.5 >>=
387   -data(Safari)
388   -Safari$nodes$size <- 0.5
389   -plotHive(Safari, ch = 0.1, axLabs = c("plants", "pollinators"), axLab.pos = c(0.15, 0.15), rot = c(-90, 90))
390   -@
391   -\end{center}
392   -\caption{Safari data set using plotHive\label{PPN2}}
393   -\end{figure}
394   -
395   -\begin{figure}
396   -
397   -\begin{center}
398   -<< PPN3, fig = TRUE, echo = FALSE, width = 5, height = 2.5 >>=
399   -data(Arroyo)
400   -Arroyo$nodes$size <- 0.5
401   -plotHive(Arroyo, ch = 0.1, axLabs = c("plants", "pollinators"), axLab.pos = c(0.15, 0.15), rot = c(-90, 90))
402   -@
403   -\end{center}
404   -\caption{Arroyo data set using plotHive\label{PPN3}}
405   -\end{figure}
406   -
407   -\section{The \emph{E. coli} Gene Regulatory Network}
408   -
409   -\texttt{HiveR} includes the \emph{E. coli} gene regulatory network, discussed in Yan \emph{et. al.}\cite{Yan2010} and based upon the RegulonDB\cite{Gama2010} and extended by Krzywinski. It is contained in a file called \texttt{ecoli.dot} in the \texttt{extdata/E\_coli} directory. It can be read in with \texttt{dot2HPD} and further processed with \texttt{mineHPD} as shown below. \texttt{dot2HPD} relies on two external .csv files which tell the function how to map node and edge information in the .dot file to the \texttt{HivePlotData} object. Tables~\ref{NI} and \ref{EI} show the contents of the files used in this case. If you choose to draw the nodes, persistent nodes will be red and non-persistent nodes grey. The type of edge (1\dots4) is also encoded by color. Gene pairs (edges) that are closer physically and genetically are colored gray $\rightarrow$ yellow $\rightarrow$ orange $\rightarrow$ red with red being the most related pairs.
410   -
411   -<< NI, results = tex, echo = FALSE >>=
412   -tab <- read.csv(file = system.file( "extdata", "E_coli", "NodeInst.csv", package = "HiveR"))
413   -NI <- xtable(tab)
414   -caption(NI) <- "Contents of NodeInst.csv"
415   -label(NI) <- "NI"
416   -print(NI, include.rownames = FALSE)
417   -@
418   -
419   -<< EI, results = tex, echo = FALSE >>=
420   -tab <- read.csv(file = system.file( "extdata", "E_coli", "EdgeInst.csv", package = "HiveR"))
421   -EI <- xtable(tab)
422   -caption(EI) <- "Contents of EdgeInst.csv"
423   -label(EI) <- "EI"
424   -print(EI, include.rownames = FALSE)
425   -@
426   -
427   -First, read in the data set and process it using the two external files (this assumes your working directory is set to the folder with the relevant files).
428   -
429   -<< E_coli_1a, echo = FALSE >>=
430   -EC1 <- dot2HPD(file = system.file("extdata", "E_coli", "ecoli.dot", package = "HiveR"),
431   - node.inst = system.file("extdata", "E_coli", "NodeInst.csv", package = "HiveR"),
432   - edge.inst = system.file( "extdata", "E_coli", "EdgeInst.csv", package = "HiveR"),
433   - desc = "E coli gene regulatory network (Yan et al PNAS vol 107 pg 9186 (2010)) ",
434   - axis.cols = rep("grey", 3))
435   -@
436   -
437   -<< E_coli_1b, eval = FALSE >>=
438   -EC1 <- dot2HPD(file = "ecoli.dot",
439   - node.inst = "NodeInst.csv",
440   - edge.inst = "EdgeInst.csv",
441   - desc = "E coli gene regulatory network (Yan et al PNAS vol 107 pg 9186 (2010)) ",
442   - axis.cols = rep("grey", 3))
443   -@
444   -
445   -Next, assign the node radius based upon the edge degree. Then assign the nodes to axes based upon their role as source, manager or sink. Finally, let's remove any orphaned nodes (nodes that have no edges). Note that if desired, \texttt{> sumHPD(EC3, chk.orphan.node = TRUE)} could be used to preview the list of orphans.
446   -
447   -<< E_coli_1c >>=
448   -EC2 <- mineHPD(EC1, option = "rad <- tot.edge.count")
449   -
450   -EC3 <- mineHPD(EC2, option = "axis <- source.man.sink")
451   -
452   -EC4 <- mineHPD(EC3, option = "remove orphans")
453   -@
454   -
455   -If you try to plot this now (\texttt{> plotHive(EC4)}), you encounter an error because two edges start and end on the same node (so they are on the same axis with the same radius). This would result in an edge length of zero, which is not possible (see \texttt{?sumHPD} for more details). We can use \texttt{sumHPD} to find out where the problem is. It turns out that two nodes are common to both problem edges. To avoid this problem, we'll nudge one node to a different value.
456   -
457   -<< E_coli_1d >>=
458   -sumHPD(EC4, chk.sm.pt = TRUE)
459   -
460   -EC4$nodes$radius[1149] <- 9.5
461   -@
462   -
463   -Finally, we'll need to organize the edge list so that the reddest edges are drawn last, which will make the plots a bit easier to interpret (see later for another approach).
464   -
465   -<< E_coli_1e >>=
466   -edges <- EC4$edges
467   -gray_edges <- subset(edges, color == "gray")
468   -yel_edges <- subset(edges, color == "yellow")
469   -or_edges <- subset(edges, color == "orange")
470   -red_edges <- subset(edges, color == "red")
471   -edges <- rbind(gray_edges, yel_edges, or_edges, red_edges)
472   -EC4$edges <- edges
473   -@
474   -
475   -Now we're ready to plot!
476   -
477   -Figures~\ref{E_coli_2}, \ref{E_coli_3}, and \ref{E_coli_4} shows the hive plot of this network using methods \texttt{absolute}, \texttt{rank} and \texttt{norm} respectively. Each plot takes about 10 seconds to draw. Figure~\ref{E_coli_5} is the same as Figure~\ref{E_coli_3} but adds the nodes: red nodes are persistent meaning they are common to a group of about 200 bacterial species. When plotting with \texttt{method = "rank"} (as here) each gene gets a unique node (the other two method overlap nodes if more than one is present, and thus the last node plotted determines the color). With this many nodes, overplotting is a problem, so we shrank the node size and sorted the nodes so that the red nodes were drawn last (a strategy documented in more detail in an upcoming example). Another approach might be to expand the axis length, but that's probably not realistic: there are 1,274 nodes on this axis. Note that the manager axis nodes all appear to be persistent (red).
478   -
479   -\begin{figure}
480   -\begin{center}
481   -<< E_coli_2, fig = TRUE, echo = FALSE, width = 5, height = 5 >>=
482   -plotHive(EC4, dr.nodes = FALSE, ch = 20, axLabs = c("source", "sink", "manager"),
483   -axLab.pos = c(40, 75, 35), axLab.gpar = gpar(fontsize = 10, col = "white", lwd = 2),
484   -arrow = c("degree", 30, 60, 120, 50))
485   -@
486   -\end{center}
487   -\caption{Hive plot of \emph{E. coli} gene regulatory network (native node units)\label{E_coli_2}}
488   -\end{figure}
489   -
490   -\begin{figure}
491   -\begin{center}
492   -<< E_coli_3, fig = TRUE, echo = FALSE, width = 5, height = 5 >>=
493   -plotHive(EC4, dr.nodes = FALSE, method = "rank", ch = 100, axLabs = c("source", "sink", "manager"),
494   -axLab.pos = c(100, 125, 180), axLab.gpar = gpar(fontsize = 10, col = "white"))
495   -@
496   -\end{center}
497   -\caption{Hive plot of \emph{E. coli} gene regulatory network (nodes ranked)\label{E_coli_3}}
498   -\end{figure}
499   -
500   -\begin{figure}
501   -\begin{center}
502   -<< E_coli_4, fig = TRUE, echo = FALSE, width = 5, height = 5 >>=
503   -plotHive(EC4, dr.nodes = FALSE, method = "norm", ch = 0.1, axLabs = c("source", "sink", "manager"),
504   -axLab.pos = c(0.1, 0.2, 0.2), axLab.gpar = gpar(fontsize = 10, col = "white"))
505   -@
506   -\end{center}
507   -\caption{Hive plot of \emph{E. coli} gene regulatory network (nodes normed)\label{E_coli_4}}
508   -\end{figure}
509   -
510   -\begin{figure}
511   -\begin{center}
512   -<< E_coli_5, fig = TRUE, echo = FALSE, width = 5, height = 5 >>=
513   -EC4a <- EC4
514   -EC4a$nodes$size <- EC4a$nodes$size * 0.1
515   -nodes <- EC4a$nodes
516   -nodes <- sort_df(nodes, vars = "color")
517   -EC4a$nodes <- nodes
518   -plotHive(EC4a, dr.nodes = TRUE, method = "rank", ch = 100, axLabs = c("source", "sink", "manager"),
519   -axLab.pos = c(100, 125, 180), axLab.gpar = gpar(fontsize = 10, col = "white"))
520   -@
521   -\end{center}
522   -\caption{Hive plot of \emph{E. coli} gene regulatory network (nodes ranked)\label{E_coli_5}}
523   -\end{figure}
524   -
525   -
526   -\section{Further Explorations of the \emph{E. coli} Network}
527   -
528   -In this section we'll demonstrate some slightly more advanced manipulations of the \emph{E. coli} network data, including how one can make \emph{hive panels} which are useful in comparing multiple hive plots. In some of the manipulations below, data types are coerced away from the definition found in a \texttt{HivePlotData} object and must be restored. It might be helpful to study the description of the required structure at \texttt{?HPD}.
529   -
530   -First, we are going to re-code some of the information in the network. In the original publication, nodes were classified as either persistent or non-persistent. This classification was based upon comparison of the \emph{E. coli} genome to roughly 200 other bacterial genomes. A gene was considered persistent if it was present in these other genomes, otherwise it is non-persistent and unique to \emph{E. coli}. In our processing above, genes (nodes) that are persistent are red, while non-persistent nodes are black. We can use the existing axis assignments, based upon role as source, manager or sink, along with the persistence information, to display the network taking this information into account. In principle, there are six possible combinations: (persistent, non-persistent) x (source, manager, sink). However, it turns out that one of these combinations doesn't exist (persistent sources), so we'll re-code this information into a five axis hive plot.\footnote{Not only am I not an ecologist, I am not a molecular biologist. I have no idea if this analysis is actually worthwhile, I just thought it would be interesting to see these relationships. Plus, it also permits further manipulations to be demonstrated.} Here's the first step, starting from where we left off above:
531   -
532   -<< E_coli_5 >>=
533   -EC5 <- EC4
534   -nodes2 <- nodes <- EC5$nodes
535   -nn <- length(nodes$axis)
536   -#
537   -for (n in 1:nn) {
538   - if ((nodes$axis[n] == 1) & (nodes$color[n] == "black")) nodes2$axis[n] <- 1
539   - if ((nodes$axis[n] == 2) & (nodes$color[n] == "black")) nodes2$axis[n] <- 2
540   - if ((nodes$axis[n] == 3) & (nodes$color[n] == "black")) nodes2$axis[n] <- 3
541   - if ((nodes$axis[n] == 2) & (nodes$color[n] == "red")) nodes2$axis[n] <- 4
542   - if ((nodes$axis[n] == 3) & (nodes$color[n] == "red")) nodes2$axis[n] <- 5
543   - }
544   -#
545   -# Final assembly & checking...
546   -#
547   -nodes2$axis <- as.integer(nodes2$axis)
548   -EC5$nodes <- nodes2
549   -EC5$axis.cols <- rep("gray", 5) # we added 2 more axes!
550   -#
551   -sumHPD(EC5)
552   -# sumHPD(EC5, chk.all = TRUE) # not run, the output is long
553   -@
554   -
555   -With \texttt{sumHPD}, one can use \texttt{chk.all = TRUE} which runs some additional checks on the data (see \texttt{?sumHPD} for full details). Had we done so in this case, we would find that some edges start and stop on the manager axis; perhaps you noticed this earlier. These are managers that call other managers. Also, somewhat miraculously, there are no edges crossing axes in this particular partitioning of nodes (\texttt{chk.all} also looks for this condition). In the basic summary that we did run, axis five has only one node on it. This will plot fine except for the case where one uses \texttt{method = "norm"} which will fail because to normalize the node radii, there has to be more than one radius value. To fix this, we'll add in a phantom, invisible node to anchor axis five as follows:
556   -
557   -<< E_coli_6>>=
558   -EC6 <- EC5
559   -tmp <- data.frame(id = 1379, lab = "axis_5_anchor",
560   - axis = 5, radius = 1, size = 1, color = "grey")
561   -EC6$nodes <- rbind(EC6$nodes, tmp)
562   -#
563   -# Clean up, re-size nodes, sort nodes so
564   -# persistent (red) ones are drawn last & check:
565   -#
566   -EC6$nodes$axis <- as.integer(EC6$nodes$axis)
567   -EC6$nodes$id <- as.integer(EC6$nodes$id)
568   -EC6$nodes$size <- EC6$nodes$size * 0.1
569   -#
570   -nodes <- EC6$nodes
571   -nodes <- sort_df(nodes, vars = "color")
572   -EC6$nodes <- nodes
573   -#
574   -sumHPD(EC6)
575   -@
576   -
577   -Next, we are going to copy the current version of the network (EC6) and scale the axes of the copy, because the summary above shows that the axis lengths are quite different and the shorter axes will be nearly invisible if we don't scale them up at least a bit.
578   -
579   -<< E_coli_7 >>=
580   -EC7 <- manipAxis(EC6, method = "scale", action = c(1, 10, 1, 10, 10))
581   -sumHPD(EC6)
582   -@
583   -
584   -Now we'll make a hive panel showing this same network displayed using different methods. The code follows; it uses the \texttt{grid} graphics systems and associated viewport concepts to create a 2 x 2 hive panel. The resulting hive panel is Figure~\ref{E_coli_8}.
585   -
586   -\setkeys{Gin}{width = 0.7\textwidth}
587   -
588   -\begin{figure}
589   -\begin{center}
590   -<< E_coli_8, fig = TRUE, width = 8, height = 8, echo = TRUE >>=
591   -vplayout <- function(x, y) viewport(layout.pos.row = x, layout.pos.col = y)
592   -#
593   -grid.newpage()
594   -pushViewport(viewport(layout = grid.layout(2, 2)))
595   -#
596   -pushViewport(vplayout(1, 1)) # upper left plot
597   -plotHive(EC6, ch = 20, np = FALSE)
598   -popViewport(2)
599   -#
600   -pushViewport(vplayout(1, 2)) # upper right plot
601   -plotHive(EC7, ch = 0.1, method = "norm", np = FALSE,
602   - axLabs = c("non-persistent\nsource", "non-persistent\nsink",
603   - "non-persistent\nmanager", "persistent\nmanager", "persistent \nsink"),
604   - axLab.pos = rep(0.2, 5), axLab.gpar = gpar(fontsize = 10, col = "white"),
605   - rot = c(0, 72, 0, 0, -72), anNode.gpar = gpar(fontsize = 10, col = "pink", lwd = 0.5),
606   - anNodes = system.file("extdata", "E_coli", "NodeLabels.csv", package = "HiveR"))
607   -popViewport(2)
608   -#
609   -pushViewport(vplayout(2,1)) # lower left plot
610   -plotHive(EC7, ch = 100, method = "rank", np = FALSE)
611   -popViewport(2)
612   -#
613   -pushViewport(vplayout(2,2)) # lower right plot
614   -plotHive(EC7, ch = 20, np = FALSE)
615   -@
616   -\end{center}
617   -\caption{Hive panel showing \emph{E. coli} regulatory network with different display options \label{E_coli_8}}
618   -\end{figure}
619   -
620   -Finally, the lower right hive plot in Figure~\ref{E_coli_8} can serve as a starting point for teasing out even more information. Instead of drawing all the edges in one hive plot, we'll make a hive panel showing each edge category in a different panel. The steps are given below; the resulting panel is Figure~\ref{E_coli_9} (the steps to produce the panel are not shown here, but are the same as before).
621   -
622   -<< E_coli_9 >>=
623   -EC11 <- EC10 <- EC9 <- EC8 <- EC7
624   -
625   -edges <- EC7$edges
626   -gray_edges <- subset(edges, color == "gray")
627   -yel_edges <- subset(edges, color == "yellow")
628   -or_edges <- subset(edges, color == "orange")
629   -red_edges <- subset(edges, color == "red")
630   -
631   -EC8$edges <- gray_edges
632   -EC9$edges <- yel_edges
633   -EC10$edges <- or_edges
634   -EC11$edges <- red_edges
635   -@
636   -
637   -\begin{figure}
638   -\begin{center}
639   -<< E_coli_10, fig = TRUE, width = 8, height = 8, echo = FALSE >>=
640   -grid.newpage()
641   -pushViewport(viewport(layout = grid.layout(2, 2)))
642   -#
643   -pushViewport(vplayout(1, 1))
644   -plotHive(EC8, ch = 20, np = FALSE)
645   -popViewport(2)
646   -#
647   -pushViewport(vplayout(1, 2))
648   -plotHive(EC9, ch = 20, np = FALSE)
649   -popViewport(2)
650   -#
651   -pushViewport(vplayout(2,1))
652   -plotHive(EC10, ch = 20, np = FALSE)
653   -popViewport(2)
654   -#
655   -pushViewport(vplayout(2,2))
656   -plotHive(EC11, ch = 20, np = FALSE)
657   -@
658   -\end{center}
659   -\caption{Hive panel showing \emph{E. coli} regulatory network with edges encoded by genetic distance (Red edges are the closest; each set of edges plotted separately) \label{E_coli_9}}
660   -\end{figure}
661   -
662   -\section{Comparison to linnet} %%%%%
663   -
664   -linnet (for linear networks) is the Perl program written by Krzywinski that draws hive plots. Here are some notes about how \texttt{HiveR} compares to linnet.
665   -
666   -\begin{enumerate}
667   - \item To show more information, in linnet one can clone an axis to specifically show connections that would start and end on the same axis (if it isn't cloned). Cloned axes appear a bit on either side of where the original axis would have been. In \texttt{HiveR}, the same notion can be implemented, but rather than clone an existing axis, one can simply add a new axis based upon some property of the system. Alternatively, for 2D hive plots, \texttt{HiveR} is able to show edges that start and end on the same axis (linnet does not do this).
668   - \item No segmentation of an axis is currently possible with \texttt{HiveR}.
669   - \item linnet uses bezier curves to create the edges; \texttt{HiveR} uses splines with a single control point.
670   -\end{enumerate}
671   -
672   -\section{Features Planned and Under Consideration} %%%%%
673   -
674   -\begin{enumerate}
675   - \item Add the ability to subtract 2 hive plots and display the result.
676   - \item Set up animations for the 3D mode. Perhaps include the possibility of running two animations of related hives side by side.
677   - \item Set up a mechanism to automatically permute the axes in 3D mode when nx = 5 or 6 so that the best option can be selected. Might also be worth doing in 2D mode for 4-6 axes, except in this case it's not a question of how you display but how you import the data. Wegman\cite{Wegman1990} has a formula describing all possible combinations that would be needed.
678   - \item Set up mouse controls in 3D mode.
679   - \item Smallish items
680   - \begin{enumerate}
681   - \item The current 3D spline calculation produces an asymmetric spline. It could be made symmetric.
682   - \item The current splines could be converted to Bezier curves.
683   - \item Could add line type as an edge parameter. This might be simple, or not.
684   - \end{enumerate}
685   -\end{enumerate}
686   -
687   -\section{Acknowledgements} %%%%%
688   -
689   -Naturally, I thank Martin Krzywinski for numerous helpful communications. I also appreciate helpful discussions on gene ontology concepts with my colleague Professor Chet Fornari.
690   -
691   -\begin{flushleft}
692   -\bibliographystyle{ieeetr} % Cause refs to be numbered and collected in order used
693   -\addcontentsline{toc}{section}{References}
694   -\bibliography{HiveR}
695   -\end{flushleft}
696   -
697   -\end{document}
145 vignettes/HiveR.bib
... ... @@ -1,145 +0,0 @@
1   -
2   -
3   -@article{Krzywinski2011,
4   -author = {Krzywinski, Martin and Birol, Inanc and Jones, Steven JM and Marra, Marco A},
5   -title = {Hive plots -- rational approach to visualizing networks},
6   -year = {2011},
7   -doi = {10.1093/bib/bbr069},
8   -abstract ={Networks are typically visualized with force-based or spectral layouts. These algorithms lack reproducibility and perceptual uniformity because they do not use a node coordinate system. The layouts can be difficult to interpret and are unsuitable for assessing differences in networks. To address these issues, we introduce hive plots (http://www.hiveplot.com) for generating informative, quantitative and comparable network layouts. Hive plots depict network structure transparently, are simple to understand and can be easily tuned to identify patterns of interest. The method is computationally straightforward, scales well and is amenable to a plugin for existing tools.},
9   -URL = {http://bib.oxfordjournals.org/content/early/2011/12/09/bib.bbr069.abstract},
10   -eprint = {http://bib.oxfordjournals.org/content/early/2011/12/09/bib.bbr069.full.pdf+html},
11   -journal = {Briefings in Bioinformatics}
12   -}
13   -
14   -@article{ Gama2010,
15   -Author = {Gama-Castro, Socorro and Salgado, Heladia and Peralta-Gil, Martin and
16   - Santos-Zavaleta, Alberto and Muniz-Rascado, Luis and Solano-Lira, Hilda
17   - and Jimenez-Jacinto, Veronica and Weiss, Verena and Garcia-Sotelo, Jair
18   - S. and Lopez-Fuentes, Alejandra and Porron-Sotelo, Liliana and
19   - Alquicira-Hernandez, Shirley and Medina-Rivera, Alejandra and
20   - Martinez-Flores, Irma and Alquicira-Hernandez, Kevin and Martinez-Adame,
21   - Ruth and Bonavides-Martinez, Cesar and Miranda-Rios, Juan and Huerta,
22   - Araceli M. and Mendoza-Vargas, Alfredo and Collado-Torres, Leonardo and
23   - Taboada, Blanca and Vega-Alvarado, Leticia and Olvera, Maricela and
24   - Olvera, Leticia and Grande, Ricardo and Morett, Enrique and
25   - Collado-Vides, Julio},
26   -Title = {{RegulonDB version 7.0: transcriptional regulation of Escherichia coli
27   - K-12 integrated within genetic sensory response units (Gensor Units)}},
28   -Journal = {Nucleic Acid Research},
29   -Year = {{2011}},
30   -Volume = {{39}},
31   -Number = {{1}},
32   -Pages = {{D98-D105}},
33   -Month = {{January}},
34   -Abstract = {{RegulonDB (http://regulondb.ccg.unam.mx/) is the primary reference
35   - database of the best-known regulatory network of any free-living
36   - organism, that of Escherichia coli K-12. The major conceptual change
37   - since 3 years ago is an expanded biological context so that
38   - transcriptional regulation is now part of a unit that initiates with the
39   - signal and continues with the signal transduction to the core of
40   - regulation, modifying expression of the affected target genes
41   - responsible for the response. We call these genetic sensory response
42   - units, or Gensor Units. We have initiated their high-level curation,
43   - with graphic maps and superreactions with links to other databases.
44   - Additional connectivity uses expandable submaps. RegulonDB has summaries
45   - for every transcription factor (TF) and TF-binding sites with internal
46   - symmetry. Several DNA-binding motifs and their sizes have been redefined
47   - and relocated. In addition to data from the literature, we have
48   - incorporated our own information on transcription start sites (TSSs) and
49   - transcriptional units (TUs), obtained by using high-throughput
50   - whole-genome sequencing technologies. A new portable drawing tool for
51   - genomic features is also now available, as well as new ways to download
52   - the data, including web services, files for several relational database
53   - manager systems and text files including BioPAX format.}},
54   -DOI = {{10.1093/nar/gkq1110}},
55   -ISSN = {{0305-1048}},
56   -}
57   -
58   -@article{ Yan2010,
59   -Author = {Yan, Koon-Kiu and Fang, Gang and Bhardwaj, Nitin and Alexander, Roger P.
60   - and Gerstein, Mark},
61   -Title = {{Comparing genomes to computer operating systems in terms of the topology
62   - and evolution of their regulatory control networks}},
63   -Journal = {Proceedings of the National Academy of Sciences of the United States of America},
64   -Year = {{2010}},
65   -Volume = {{107}},
66   -Number = {{20}},
67   -Pages = {{9186-9191}},
68   -Month = {{May 18}},
69   -Abstract = {{The genome has often been called the operating system (OS) for a living
70   - organism. A computer OS is described by a regulatory control network
71   - termed the call graph, which is analogous to the transcriptional
72   - regulatory network in a cell. To apply our firsthand knowledge of the
73   - architecture of software systems to understand cellular design
74   - principles, we present a comparison between the transcriptional
75   - regulatory network of a well-studied bacterium (Escherichia coli) and
76   - the call graph of a canonical OS (Linux) in terms of topology and
77   - evolution. We show that both networks have a fundamentally hierarchical
78   - layout, but there is a key difference: The transcriptional regulatory
79   - network possesses a few global regulators at the top and many targets at
80   - the bottom; conversely, the call graph has many regulators controlling a
81   - small set of generic functions. This top-heavy organization leads to
82   - highly overlapping functional modules in the call graph, in contrast to
83   - the relatively independent modules in the regulatory network. We further
84   - develop a way to measure evolutionary rates comparably between the two
85   - networks and explain this difference in terms of network evolution. The
86   - process of biological evolution via random mutation and subsequent
87   - selection tightly constrains the evolution of regulatory network hubs.
88   - The call graph, however, exhibits rapid evolution of its highly
89   - connected generic components, made possible by designers' continual
90   - fine-tuning. These findings stem from the design principles of the two
91   - systems: robustness for biological systems and cost effectiveness
92   - (reuse) for software systems.}},
93   -DOI = {{10.1073/pnas.0914771107}},
94   -ISSN = {{0027-8424}},
95   -}
96   -
97   -
98   -@article{ Vazquez2003,
99   -Author = {D. P. Vazquez and D. Simberloff},
100   -Title = {{Changes in interaction biodiversity induced by an introduced ungulate}},
101   -Journal = {{Ecology Letters}},
102   -Year = {{2003}},
103   -Volume = {{6}},
104   -Pages = {{1077-1083}}
105   -}
106   -
107   -@article{ Suderman2007,
108   -Author = {Suderman, Matthew and Hallett, Michael},
109   -Title = {{Tools for visually exploring biological networks}},
110   -Journal = {Bioinformatics},
111   -Year = {{2007}},
112   -Volume = {{23}},
113   -Number = {{20}},
114   -Pages = {{2651-2659}},
115   -Month = {{Oct 15}},
116   -DOI = {{10.1093/bioinformatics/btm401}}
117   -}
118   -
119   -@article{ Wegman1990,
120   -Author = {Edward J. Wegman},
121   -Title = {Hyperdimensional data-analysis using parallel coordinates},
122   -Journal = {Journal of the American Statistical Association},
123   -Year = {{1990}},
124   -Volume = {{85}},
125   -Number = {{411}},
126   -Pages = {{664-675}},
127   -Month = {{Sep}},
128   -DOI = {{10.2307/2290001}}
129   -}
130   -
131   -@article{Butts2008,
132   - author = "Carter T. Butts",
133   - title = "Social Network Analysis with sna",
134   - journal = "Journal of Statistical Software",
135   - volume = "24",
136   - number = "6",
137   - pages = "1--51",
138   - day = "8",
139   - month = "5",
140   - year = "2008",
141   - CODEN = "JSSOBK",
142   - ISSN = "1548-7660",
143   - URL = "http://www.jstatsoft.org/v24/i06",
144   -}
145   -
BIN  vignettes/VSEPR.pdf
Binary file not shown
BIN  vignettes/plot3dHive_performance.pdf
Binary file not shown
BIN  vignettes/plotHive_performance.pdf
Binary file not shown

0 comments on commit 1802430

Please sign in to comment.
Something went wrong with that request. Please try again.