Skip to content

Graph compression with FCA and ASP. New versions at:

License

Notifications You must be signed in to change notification settings

Aluriak/PowerGrASP-1

Repository files navigation

PowerGrASP

This repository is inactive, source of the powergrasp package until version 0.7.0. For 0.8.0 and higher, see the currently active repository.

The Python Powergraph analysis tool, based on Answer Set Programming solving and formal concept analysis.

More technical documentation about PowerGrASP can be found in the documentation file. PowerGrASP results, compared to the existing BIOTEC tool, can be found in the results file.

Installation & Requirements

pip install powergrasp

Just be sure that you provides a clingo binary in your $PATH.

Basic use

PowerGrASP can be used as a script:

python3 -m powergrasp --graph-data=human_proteom.lp --output-file=for_cytoscape.bbl

Or can be embedded in any python program:

import powergrasp
powergrasp.recipes.powergraph('human_proteom.lp', 'for_cytoscape.bbl')

Other recipes

Some other recipes are implemented, including oriented graph compression, and alternative scoring methods.

General overview

The compression is configurable through command line arguments or compress function parameters. Used ASP source code can be changed, interactive mode can be set,… Please look at help and docstring:

# in terminal
python3 -m powergrasp --help
# in python
>>> help(powergrasp)
# or, in the PowerGrASP git directory, with make
make help

I/O

NB: Create another output or input format support is possible by implement a new Converter class (see powergrasp/converter/).

Input file

PowerGrASP doesn't generate logging for pleasure : it actually perform a treatment on input data, if provided. The supported input file formats are currently :

  • ASP: atoms edge/2, with edge(X,Y) describing a link between nodes X and Y.
  • SBML: a regular SBML file, when species and reactions will be treated as nodes.
  • GML: a regular Graph Modeling Language file, readable by networkx python module.
  • GraphML: a regular GraphML file, readable by networkx python module.

Example of inputs can be found in the powergrasp/test/ directory. Some of these are simple and used for unit testing. Take a look to ddiam, coli or star.

Other formats will be supported in the future.

Output

Bubble formatted file

The output of PowerGrASP is a Bubble formatted file. This file can be used to get a visualization of the compressed graph. The format Bubble is an endemic format designed by Royer et Al for power graph describing.

Bubble files can be used in several ways:

Cytoscape

Print a power graph through Cytoscape is made possible by the CyOog plugin, which handle the format Bubble. Cytoscape, in order to using the CyOog plugin, must be in version 2.x.

Caveats

CyOog does not handles edges that are printed multiple times, nor oriented edges. Thus, oriented compression is not efficiently reprensented using Cytoscape.

Oog Command line tool

BIOTEC team also released a command line tool for power graph analysis. This tool allow to print a bubble file without Cytoscape usage, with something like:

java -jar Oog.jar -inputfiles=path/to/bubble.bbl -img -f=png

Add &>/dev/null to prevent any logging output.

Bubbletools

This python lib could help to generate any type of output from bubble files. It provides some examples, like GEXF for Gephi (that stops to handle it since last release). It is used by PowerGrASP itself to convert bubble output to another formats.

Standard output management

By default, PowerGrASP generates lots of outputs in stdout, essentially for debugging and compression tracking. With the option loglevel, its possible to control this behavior:

python3 -m powergrasp --graph-data=tests/proteome_yeast_2.lp --loglevel=warning

This will block all outputs with a strictly lesser priority than warning. Available levels comes from logging API:

log level        | PowerGrASP
-----------------|-------------------
critical         | totally silencious
error            | very rarely disturbing
warning          | rarely disturbing
info             | trackable
debug            | trackable with __high__ verbosity
notset           | kraken released

Please note that some options (notabily count-models and count-cc) are completely independant of this logging management, as they work with the standard output.

Statistics

The compression compute some statistics about itself, and generate the final results at the end of the compression in the standard output. With some arguments, you can also show a colored graphic :

python3 -m powergrasp --graph-data=tests/proteome_yeast_2.lp --stats-file=data/statistics.csv --plot-stats

Instead of show it, powergrasp can save it in png (note that the --plot-stats flag is not necessary when plot-file option is given):

python3 -m powergrasp --graph-data=tests/proteome_yeast_2.lp --stats-file=data/statistics.csv --plot-file=data/statistics.png

Answer Set Programming

ASP is a declarative and logic language, designed for the treatment of combinatorial problems (like graph compression). The implementation used in this project is the Potsdam Answer Set Solving Collection.

All ASP source codes necessary for the PowerGrASP program can be found in powergrasp/ASPsources/ directory.

Interests & References

The Power Graph approach for graph compression allows a lossless compression with an emphasis on biological meaning. In fact, formal concepts used by Power Graph analysis have a sens in biology, especially in the case of proteomes.

All graphs can be compressed through Power Graph, and will be more readable once compressed, but interactomes, at least, also gain in interpretability.

The main inspiration of PowerGrASP : PowerGraph Analysis:

Loïc Royer, Matthias Reimann, Bill Andreopoulos, and Michael Schroeder.
Unraveling Protein Networks with Power Graph Analysis.
PLoS Comput Biol, 4(7):e1000108, July 2008.

Usage of the PowerGraph Analysis:

Loic Royer, Matthias Reimann, A. Francis Stewart, and Michael Schroeder.
Network Compression as a Quality Measure for Protein Interaction Networks.
PLoS ONE, 7(6):e35729, June 2012.

Yun Zhang, Charles A Phillips, Gary L Rogers, Erich J Baker, Elissa J Chesler, and Michael A Langston.
On finding bicliques in bipartite graphs: a novel algorithm and
its application to the integration of diverse biological data types.
BMC Bioinformatics, 15(1):110, 2014.

ASP through Potassco implementation:

M. Gebser, R. Kaminski, B. Kaufmann, M. Ostrowski, T. Schaub, and M. Schneider.
Potassco: The Potsdam answer set solving collection.
AI Communications, 24(2):107–124, 2011.