GitHub - cirtwill/rgraph

Branches Tags

Name		Name	Last commit message	Last commit date
Latest commit History 278 Commits
bipartmod		bipartmod
conflict		conflict
frontend		frontend
lib		lib
m4		m4
netcarto		netcarto
recommender		recommender
reliability		reliability
src		src
test		test
utils		utils
AUTHORS		AUTHORS
COPYING		COPYING
ChangeLog		ChangeLog
INSTALL		INSTALL
Makefile.am		Makefile.am
NEWS		NEWS
README		README
acinclude.m4		acinclude.m4
arg-nonnull.h		arg-nonnull.h
autogen.sh		autogen.sh
c++defs.h		c++defs.h
configure.ac		configure.ac
depcomp		depcomp
install-sh		install-sh
missing		missing
warn-on-use.h		warn-on-use.h

Repository files navigation

----------------------------------------------------------------------
----------------------------------------------------------------------
RGraph README

$LastChangedDate$

$Revision$
----------------------------------------------------------------------
----------------------------------------------------------------------

This package includes the RGraph libraries, the C libraries for
complex network analysis developed by Roger Guimera. Some executables,
built from the libraries, are also included.

For the libraries to compile, you will NEED TO INSTALL, first:

1) The GNU Scientific Libraries (GSL), available from
http://www.gnu.org/software/gsl/

---- Unix ----

In a Unix-like system, you can install the RGraph libraries and the
executables by uncompressing the tarball (tar -xzvf
rgraph-version.tar.gz) and running the usual stuff from the
rgraph-version directory:

cd rgraph-version

./configure

./make

./make install

(In a Windows system, you will first need to install some sort of
"Unix emulation." I have successfully compiled the libraries using
either Cygwin or MinGW. See below for Windows
installation steps).

This will install the libraries in your_default_lib_directory/rgraph
and the executables in your_default_bin_directory. To install in a
different directory run

./configure --prefix=path_to_install_directory

instead of just "./configure". For other configure options run:

./configure -h

You can uninstall the whole thing by running

./make uninstall

from the installation directory.

You can also test that everything is working by running

./make check

from the installation directory.

---- Windows (MinGW) ----

1) First of all, you have to download and install MinGW (http://sourceforce.net/projects/mingw/files/latest/download?source=files)

During the installation, when it prompts you the packages to install, select gcc, msys and mingw base. The other default options are OK.

2) Download GNU Scientific Libraries (GSL), available from
http://www.gnu.org/software/gsl/. In my installation, I've used version 1.15.

3) Launch MinGW console (Programs -> MinGW -> MinGW Shell or C:\MinGW\msys\1.0\msys).

4) Unzip the contents of the GSL downloaded file under your msys home which is at C:\MinGW\msys\1.0\home\user\ (it's important to perform step 3 or you won't have the home directory).

5) In your msys console, cd into the gsl-15 folder and type the following:

./configure --prefix=/MinGW

make

make install

All this steps may take a while.

6) Untar the contents of rgraph under your msys home and type the following:

./configure

make

[make install]

7) To check that it's working, use make check command or try to execute any of the executables generated by the make command (for example ./netcarto/netcarto).

----------------------------------------------------------------------
librgraph
----------------------------------------------------------------------

librgraph is the library itself. You can use it to build your own
network analysis programs. Sorry, as of now no documentation is
available, but you may want to take a look at the header files and try
to figure things out.

----------------------------------------------------------------------
netcarto
----------------------------------------------------------------------

Given a network, the program netcarto identifies
modules---i.e. densely connected groups of nodes in the network---and
classifies nodes according to their roles, as defined in Refs. [1, 2].

In case you use the results of the program in a publication, please
cite the following papers:

1. Guimera, R. & Amaral, L.A.N., Functional cartography of complex
metabolic networks, Nature 433, 895-900 (2005).

2. Guimera, R. & Amaral, L.A.N., Cartography of complex networks:
modules and universal roles, J. Stat. Mech.-Theory Exp.,
art. no. P02001 (2005).

For the comparison to randomized networks, please cite:

3. Guimera, R., Sales-Pardo, M. & Amaral, L.A.N., Modularity from
fluctuations in random graphs and complex networks, Phys. Rev. E
70, art. no. 025101 (2004).

Input parameters
----------------------------------------------------------------------

To run the program, type netcarto.exe in the command line or
double-click on the icon in Windows. The program will prompt for the
following parameters:

- Seed for the random number generator: Must be a positive
integer. Since the module identification algorithm is stochastic,
different runs will yield, in general, slightly different different
modules. Two runs with the same seed, though, should give the exact
same results.

- Name of the network file: Name of the file that contains the
network. The file must be a list of links with the format:

n1 n2
n3 n4
. .
. .
. .

This represents a network with a link between nodes n1 and n2,
another between nodes n3 and n4, and so on. Nodes must be separated
by spaces.

- Iteration factor (f): At each temperature of the simulated annealing
(SA), the program performs fN^2 individual-node updates (involving
the movement of a single node from one module to another) and fN
collective updates (involving the merging of two modules and the
split of a module). The number "f" is the iteration factor. Large
values of f (1 or larger) will result, in general, in better results
(higher modularities) and longer execution times. The recommended
range for f is [0.1, 1], although smaller values may be needed for
large and/or dense networks. Note, also, that a minimum number of
iterations is imposed at each temperature, so that when f is very
small, the minimum number will be used instead of fN^2 or fN.

- Cooling factor (c): After the desired number of updates is done at a
certain temperature T, the system is cooled down to a new
temperature T'=cT, where c is the cooling factor. the cooling factor
must be strictly larger than 0 and strictly smaller than 1. In
general, values close to one will result in better results and
longer execution times. Recommended values of the cooling factor f
are [0.990, 0.999], although smaller values (0.95 or even 0.9) may
be needed for large and/or dense networks.

- Number of randomizations: After modules and roles are identified in
the original network, there is the option to calculate the value of
the modularity in a random graph with the same degree (connectivity)
distribution as the original network. As discussed in [3], this test
is necessary to establish whether the modular structure of the
original network is significant or not. Calculation of the
modularity for each random network will take approximately the same
time as for the original network. If you do not want to run any
randomization, just enter 0 here.

Program output
----------------------------------------------------------------------

After entering these parameters, the algorithm will start to identify
the modules in the network. As the SA proceeds, the program prints
three columns on the screen, which indicate the inverse of the
temperature, the modularity at that temperature, and the temperature
itself, respectively. This provides you with a fast way to check if
the process is too slow or, conversely, if it is fast and the accuracy
can be increased. The program will stop when the modularity remains
unchanged during 25 different temperatures. In general, finding a good
partition requires decreasing the temperature several orders of
magnitude. Thus, as a rule of thumb, if the time between one
temperature and the next is larger than a second, the program will
likely take days to complete (this, however, will also depend on the
cooling factor). Of course, there is nothing wrong with long runs, as
long as you are willing to wait!

When the SA for the original network finishes, the program calculates
the role of each node almost instantly and outputs the following
files:

- network.net: a Pajek file containing the giant component of the
network (for information on Pajek, visit
http://vlado.fmf.uni-lj.si/pub/networks/pajek/).

- modules.clu: a Pajek partition containing the modules as identified
by the algorithm.

- roles.clu: a Pajek partition containing the roles as identified
by the algorithm.

- modules.dat: A text file containing some basic information about the
modules (can be edited with any text editor such as NotePad, or
imported in Excel as a csv file). The format of the file is as
follows. Each line corresponds to a different module. The first
number is an ID number for the module, mostly irrelevant. The second
is the number of nodes in the module. The third is the total number
of links in the module, the fourth the number of within-module
links, and the fifth the number of links from this module to other
modules (of course, the third column must be equal to the sum of the
fourth and fifth columns). Then there is a "---" and the next
columns correspond to the list of nodes in the module. The last line
of the file contains the value of the modularity for this partition.

- roles.dat: A text file containing some basic information about the
roles (can be edited with any text editor such as NodePad, or
imported in Excel as a csv file). The format of the file is as
follows. Each line corresponds to a different role. The first number
is the role number, as defined in [1, 2]. The second is the number
of nodes with that role. The third is the total number of links of
nodes with that role, the fourth the number of within-role links,
and the fifth the number of links from this role to other roles (of
course, the third column must be equal to the sum of the fourth and
fifth columns). Then there is a "---" and the next columns
correspond to the list of nodes with that role.

- node_prop.dat: A text file with four columns. The first one is the
number of the node. The second is the degree (number of links) of
the node. The third is the participation coefficient as defined in
[1, 2]. The fourth one is the within-module relative degree, as
defined in [1, 2].

After all of this, the program starts to calculate the modularity for
as many randomizations as you have selected, printing on the screen,
as before, the inverse of the temperature, the modularity, and the
temperature. Once all randomizations are done, the program creates one
last file that contains the modularity of the original network, the
average modularity of the randomized networks, and the standard
deviation of the modularity of the randomized networks.

----------------------------------------------------------------------
netcarto_cl
----------------------------------------------------------------------

netcarto_cl is equivalent to netcarto, but arguments are passed
directly from the command line (so that it is easier to script around
the executable). Additionally, netcarto_cl allows you to fix the
initial temperature. Please read the netcarto documentation to learn
more about netcarto_cl.

The arguments need to be passed in the following order:

> netcarto_cl net_file_name seed T_ini iteration_factor cooling_factor
#_randomizations

T_ini, iteration_factor, and cooling_factor can be set to -1 to use
the defaults (2/size_of_network, 1.0, and 0.995, respectively).

----------------------------------------------------------------------
bipartmod
----------------------------------------------------------------------

Given a bipartite network, the program returns a partition of the
nodes in one of the sets of nodes (the 'actors' or the 'teams')
obtained according to the algorithm described in:

4. Guimera, R., Sales-Pardo, M. & Amaral, L.A.N., Module
identification in bipartite and directed networks, Phys. Rev. E 76,
036102 (2007)

In case you use the results of the program in a publication, please
cite the paper above.

Input parameters
----------------------------------------------------------------------

The program will prompt for the following parameters:

- Seed for the random number generator: Must be a positive
integer. Since the module identification algorithm is stochastic,
different runs will yield, in general, slightly different different
modules. Two runs with the same seed, though, should give the exact
same results.

- Name of the network file: Name of the file that contains the
network. The file must be a list of links with the format:

n1 n2
n3 n4
. .
. .
. .

This represents a network with a link between nodes n1 and n2,
another between nodes n3 and n4, and so on. Nodes must be separated
by spaces.

- Iteration factor (f): At each temperature of the simulated annealing
(SA), the program performs fN^2 individual-node updates (involving
the movement of a single node from one module to another) and fN
collective updates (involving the merging of two modules and the
split of a module). The number "f" is the iteration factor. Large
values of f (1 or larger) will result, in general, in better results
(higher modularities) and longer execution times. The recommended
range for f is [0.1, 1], although smaller values may be needed for
large and/or dense networks. Note, also, that a minimum number of
iterations is imposed at each temperature, so that when f is very
small, the minimum number will be used instead of fN^2 or fN.

- Cooling factor (c): After the desired number of updates is done at a
certain temperature T, the system is cooled down to a new
temperature T'=cT, where c is the cooling factor. the cooling factor
must be strictly larger than 0 and strictly smaller than 1. In
general, values close to one will result in better results and
longer execution times. Recommended values of the cooling factor f
are [0.990, 0.999], although smaller values (0.95 or even 0.9) may
be needed for large and/or dense networks.

- Invert (0 or 1): If 0 (conversely, 1), the program will identify
modules in the first (second) set of nodes, that is, the first
(second) column in the input file.

Program output
----------------------------------------------------------------------

The program returns a file modules_bipart.dat, which is formally
identical to the modules.dat described above for netcarto.

----------------------------------------------------------------------
reliability
----------------------------------------------------------------------
Given a network observation, the programs in reliability:

1) reliability_links: evaluate the reliability of links

2) reliability_reconstruct: reconstruct the network

In case you use the results of the program in a publication, please
cite the following papers:

5. Guimera, R. & Sales-Pardo, M., Missing and spurious interactions
and the reconstruction of complex networks,
Proc. Natl. Acad. Sci. USA ?????? (2009).

Input parameters
----------------------------------------------------------------------
The programs take two arguments:

- Name of the network file: Name of the file that contains the
network. The file must be a list of links with the format:

n1 n2
n3 n4
. .
. .
. .

This represents a network with a link between nodes n1 and n2,
another between nodes n3 and n4, and so on. Nodes must be separated
by spaces.

- Seed for the random number generator: Must be a positive
integer. Since the reliability algorithms are stochastic, different
runs will yield, in general, slightly different different
results. Two runs with the same seed, though, should give the exact
same results.

Program output
----------------------------------------------------------------------

The "links" program generates two files: missing.dat and
spurious.dat. Each of these files has the format:

score12 n1 n2
score13 n1 n3
...

missing.dat contains all scores for links that are not observed in the
network. High scores in missing.dat correspond to links that are
likely to be missing.

spurious.dat contains all scores for links that are observed in the
network. Low scores in spurious.dat correspond to links that are
likely to be spurious.

The "reconstruct" program returns a file net_reconstructed.dat with
the reconstructed network.

----------------------------------------------------------------------
Utils
----------------------------------------------------------------------

Additionally, a few utility programs are also compiled and installed.

*countlinks netA: count the number of links in a network.

*netcompare netA netA: compares two networks.

*netprop netA: print a number of properties of a network.

*netrandomize netA: randomize an undirected unweighted network and
print result to standard output.

----------------------------------------------------------------------
CONTACT
----------------------------------------------------------------------

roger.guimera@urv.cat

License

cirtwill/rgraph

Folders and files

Latest commit

History

Repository files navigation

About

Resources

License

Stars

Watchers

Forks

Languages