Compress graphs
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Failed to load latest commit information.
data
out
powergrasp
test
.gitignore
MANIFEST.in
Makefile
README.mkd
setup.cfg
setup.py

README.mkd

PowerGrASP

Graph compression.

Note that this is a full reimplementation of PowerGrASP, taking advantage of ASP and Python lifting and simplifications. For the version published in 2017, see this repository.

CLI

python -m powergrasp mygraph.gml -o compressed.bbl

help !

python -m powergrasp --help

API

import powergrasp
with open('compressed.bbl', 'w') as fd:
    for line in powergrasp.compress_by_cc('mygraph.gml'):
        fd.write(line + '\n')

help !

Sorry, no technical doc for the moment.

Bubble files usage

The bubble file is the main output of PowerGrASP, and describes a power graph. The bubble is a format handled by the CyOog plugin for Cytoscape 2, allowing one to load a bubble formatted file and to visualize the corresponding graph. As a consequence, you have to install Cytoscape 2, and put the CyOog.jar file into path/to/cytoscape/install/dir/plugins/.

Another way to go is to convert the bubble to other formats handling hierarchical graphs, like gexf, dot or the API of cytoscape.js. This is a task implemented by bubbletools.

Configuration

PowerGrASP has some configuration values, that can be overwritten by a powergrasp.cfg file placed in the working directory.

The configuration may be printed in stdout using:

python -m powergrasp --config

The config file may be either in ini or json format, as shown in test/powergrasp.oneshot.cfg and test/powergrasp.manyoptions.cfg.

Configuration allow user to define how much text will be outputed by powergrasp, and to tune the core compression and related optimizations.

See complete list in the options section.

installation

pip install powergrasp

On random error, try adding --no-cache-dir at the end of the command, and check that you are using python 3.

You must have clingo in your path. Depending on your OS, it might be done with a system installation, or through downloading and (compilation and) manual installation.

Changelog

  • 8.17
  • 8.11
    • slight bounds improvements for non-star-biclique motif
    • statistics/timers now provides time needed to search for each motif
    • support for recipes, that small file that allows to define the compressions to do
  • 8.10
    • option parallel cc compression to enable parallel compression of connected components
    • option bubble embeds cc, to put each cc in a dedicated powernode
    • option bubble with simple edges, to keep or discard the simple edges from output
    • statistics on time and compression metrics are computed by cc and for all graph. See new options.
    • option cc statistic file, to retrieve stats and metrics for connected components.
    • option global statistics, to compute statistics/metrics over all connected components.
    • option bubble with statistics, to include stats and metrics in output bubble file
    • motif type order option, allowing user to tune the order in which motifs are searched.
    • parallel motif search option, making motifs search in different threads, and the overhaul compression much slower.
    • CLI: --config flag to print the full configuration in stdout
    • handling spaces instead of _ in fields names for INI format
    • do not filter out nodes alone in their connected component (cf keep-single-nodes option)
    • perf gain: the first motif to reach the best score is compressed, instead of the last
    • improve handling of statistic files, that now contains cc idx and motif bounds
    • expose __version__ attribute in the package
    • add a list of available options in the README
    • graph filtering: optimization leading to general performance gain when enabled
    • improvements on {low,upp}erbounds computations
  • 8.9
    • many bugfixes
    • CLINGO_OPTIONS can be a dict, mapping motif name to arbitrary parameters
  • 8.8
    • bugfix: lowerbound can't go under 2 (3 for cliques)
    • clique search: lowerbound is initially computed as the maximal degree of node of clustering coefficient equals to 1
    • use new versions of clyngor and bubbletools
  • 8.7
    • raise error when an invalid key is found in config file
    • StarSearch is dedicated to find stars, simplifying BicliqueSearch's job (enabled by default, user may use Biclique instead of both Star and NonStarBiclique)
    • specify a prefix for all (power) nodes with OUTPUT_NODE_PREFIX config field
    • arbitrary options for clingo using CLINGO_OPTIONS config field
    • bugfix on string options given in ini file, when unecessary quotes are kept
  • 8.6
    • bugfixes for INI format
  • 8.5
    • config may now be given in INI format
    • improve logging
    • config allow optimization target between memory or CPU (currently define a constraint implementation in ASP)
    • bugfix about edges yielded by ASP and multithreading parameter
    • no statistics file to write by default
    • timer for output writing
    • config allow user to define the number of CPU for clingo to use
    • simplify ASP code by removing useless arg in block/4 atoms
  • 8.4
    • replace ASP code constraint to achieve much more efficient grounding
    • configuration allow for improved lowerbound computation of bicliques
    • generated bubble allow client to give heading comments
    • handle keyboard interruption during search with grace
    • implement timers and statistics recording
    • enable multishot motif search by default
    • various bugfixes

Recipes

A recipe is a description of the motifs to compress. For instance, the data/recipe-test.txt:

biclique	c	g h i
biclique	a b	d e
biclique	a b	f
biclique	c	d e

Using the --recipe option in CLI, user can provide a recipe for its own graph, allowing to specify prior motifs to search. The order of second and third columns does not have any influence.

The first column usually contains the motif types separated by comma (altough it is not necessary), and allows to provide some options (separated by comma):

  • primer: allows the program to extend the given motif if possible
  • breakable: allows the program to compress the line with multiple motifs
  • optional: if not found, ignore instead of stopping
  • last: stop compression here (useful to avoid the compression process to take over)

The recipe file will be read line by line. Once all lines have been applied, the normal compression compression takes over (unless last option is used).

For a living example, see data/recipe-option-test.txt.

Options

Description of all powergrasp options. All values example are given for INI format.

test integrity

Run some integrity tests during runtime to ensure that compression is working well. May slow the compression a lot, and is mainly a debugging tool.

Default value:

test_integrity = true

show story

Print in stdout the main steps of compression. Default value:

show_story = true

show motif handling

Print in stdout the motif transformation. Default value:

show_motif_handling = false

timers

Maintain and print in stdout some timers. Default value:

timers = true

statistics file

A file in which some statistics will be written in CSV format, giving among others size of compressed motifs, and their compression times (if timers option is enabled). Default value:

statistics_file = None

Example value:

statistics_file = statistics.csv

cc statistic file

Filename in which the statistics about each connected component will be written. Data will contain one row for each connected component, with the following data:

  • connected component index
  • the convertion rate
  • the edge reduction
  • the compression rate
  • time needed to compress the connected component (if TIMERS is enabled)

Default value stands for no file:

cc_statistic_file = None

Example value:

cc_statistic_file = statistics-cc.csv

global statistics

When enabled, statistics and metrics computation on the overall graph will be performed at the end of compression. Default value:

global_statistics = True

bubble with statistics

When enabled, output bubble will contain statistics about each connected component and the overall graph (if global_statistics is enabled). Default value:

bubble_with_statistics = True

bubble for each step

Generate and save a bubble representation of the graph at each step. Mainly used for debugging. Default value:

bubble_for_each_step = false

output node prefix

Prefix to add to all (power)nodes names in output. Default value:

output_node_prefix = ''

show debug

Show full trace of the compression. Useful for debugging. Default value:

show_debug = False

covered edges from ASP

Recover covered edges from ASP. If falsy, will ask motif searcher to compute the edges, which may be quicker. Default value:

covered_edges_from_asp = False

bubble with nodes

Nodes and sets are optional in the output bubble. Default value:

bubble_with_nodes = True
bubble_with_sets = True

bubble poweredge factor

Edges in bubble are associated to a factor. Default value:

bubble_poweredge_factor = '1.0'
bubble_edge_factor = '1.0'

bubble embeds cc

If enabled, put each connected component in a dedicated powernode. Default value:

bubble_embeds_cc = no

bubble simplify quotes

When possible, delete the quotes around identifiers in the bubble. May lead to node name collision. Default value:

bubble_simplify_quotes = True

bubble with simple edges

If disabled, will discard simple (i.e. non-power) edges of output. Default value:

bubble_with_simple_edges = True

config file

Load options found in given config file, if it exists. Default value:

config_file = 'powergrasp.cfg'

multishot motif search

Search for multiple motif in a single search. Accelerate the solving for graph with lots of equivalent motifs. It is generally a good option to enable. Default value:

multishot_motif_search = True

biclique lowerbound maxnei

Optimization on biclique lowerbound computation. Can be costly. Deactivate with 2. With value at n, up to n neighbors are considered. Default value:

biclique_lowerbound_maxnei = 2

clingo options

Arbitrary parameters to give to clingo (note that some, like multithreading or optmode, may already be set by other options). Default value:

clingo_options = {}

Give a particular configuration for bicliques search:

clingo_options = {'no-star-biclique': '--configuration=handy'}

clingo multithreading

Number of CPU available to clingo, or 0 for autodetect number of CPU. Default value:

clingo_multithreading = 1

Use as many thread there are available CPU:

clingo_multithreading = 0

Specify a competing search between 4 threads:

clingo_multithreading = '4,compete'

parallel cc compression

To use to compress connected components in different processes. Default value (optimized in memory):

parallel_cc_compression = 1

Compress with 4 processes :

parallel_cc_compression = 4

Compress with one process for each connected component :

parallel_cc_compression = 0

use star motif

Two different motifs for stars and bicliques, so the search space for bicliques is smaller. Yields good performance improvements on big graphs. Default value:

use_star_motif = True

optimize for memory

When possible, prefer memory over CPU. Default value:

optimize_for_memory = False

graph filtering

Ignore edges dynamically determined as impossible to compress. Default value:

graph_filtering = True

keep single nodes

If a node is found to be alone in its connected component, it will be kept nonetheless. Default value:

keep_single_nodes = True

parallel motif search

Use multithreading to search for all motifs in the same time, instead of sequentially. Interestingly, this yields very poor results, and it's even worse with multiprocessing.

parallel_motif_search = False

motif type order

Define in which order the motifs are searched, e.g. cliques, then bicliques then stars.

motif_type_order = star,clique,non-star-biclique,biclique

Instead of expliciting the exact motif names and order, it is also possible to specify a bound-based order:

motif_type_order = greatest-upperbound-first

Or any variation with worst, lowerbound and last.