Benchmark for the Correlation Clustering Resolution methods
- Copyright 2020-21 Nejat Arınık
BenchmarkCC is free software: you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation. For source availability and license information see the file LICENCE
- GitHub repo: https://github.com/arinik9/BenchmarkCC
- Contact: Nejat Arınık arinik9@gmail.com
BenchmarkCC aims to regroup exact and heuristic methods in the same repository in order to solve the Correlation Clustering (CC) problem. Exact methods can also enumerate all optimal solutions. Moreover, we can also evaluate heuristic methods based on how they explore the space of optimal solutions. See Chapters 2 and 5 in [Arınık'21] for more details.
Exact methods rely on two different ILP formulation types: 1) decision variables defined on vertex-pair (Fv: "vertex" formulation type) or 2) edge (Fe: "edge" formulation type). If we denote "n" by the number of vertices in the graph and "m" by the number of edges, there are (n(n-1)/2) variables in Fv, whereas there are m variables in Fe.
The formulation Fv contains n*(n-1)*(n-2)/2 triangle inequalities. In the literature, the authors of [Miyauchi'18] show that we do not need to consider all these triangle inequalities, when searching for a single optimal solution. We call F*v this reduced form of formulation. Finally, we call F*e the formulation based on Fe and defined only on cycle inequalities with a single negative edge. See Chapter 2 in [Arınık'21] for more details.
- Finding a single optimal solution of a given instance
- C&B(Fv): Cut&Branch based on the Fv formulation (source code)
- C&B(F*v): Cut&Branch based on the F*v formulation (source code)
- B&C(Fe): Branch&Cut based on the Fe formulation (source code)
- BDCC(F*e): Benders Decomposition based on the F*e formulation (available upon request from the authors)
- Note that BDCC fails to complete the execution process for some signed graphs, it is not very stable.
- Enumerating all optimal solutions of a given instance
- OneTreeCC (source code)
- OneTreeCC(Fv): OneTreeCC based on the Fv formulation
- OneTreeCC(F*v): OneTreeCC based on the F*v formulation
- OneTreeCC(Fe): OneTreeCC based on the Fe formulation
- EnumCC (source code)
- EnumCC(v, k): EnumCC based on the Fv formulation
- EnumCC(F*v, k): EnumCC based on the F*v formulation
- EnumCC(Fe, k): EnumCC based on the Fe formulation
- OneTreeCC (source code)
- (stochastic) Iterated Local Search: ILS-CC_l1_a_g_p3_t<TIME_LIMIT>_i<ITER_NUMBER> (available upon request from the authors)
- (stochastic) Greedy Randomized Adaptive Search Procedure: GRASP-CC_l1_a_g_t<TIME_LIMIT>_i<ITER_NUMBER>_n1 (available upon request from the authors)
- (stochastic) Variable neighborhood search: Brusco-VNS-CC_t<TIME_LIMIT> (source code)
- (stochastic) Tabu Search: TS-CC_t<TIME_LIMIT> (source code)
- (stochastic) Simulated Annealing: SA-CC_t<TIME_LIMIT> (source code)
- (stochastic) Memetic: MLMSB-CC_r1_t<TIME_LIMIT> (available upon request from the authors)
- (deterministic) Message Passing, followed by Greedy Additive Edge Contraction and Kernighan-Lin with joins: MP-GAEC-KLj-CC_t<TIME_LIMIT> (source code)
- (deterministic) Iterative Cycle Packing, followed by Greedy Additive Edge Contraction and Kernighan-Lin with joins: ICP-GAEC-KLj-CC_t<TIME_LIMIT> (source code)
- (deterministic) Greedy Additive Edge Contraction, followed by Kernighan-Lin with joins: GAEC-KLj-CC_t<TIME_LIMIT> (source code)
Input signed networks are generated through this random signed network generator. You can also download already obtained partition results from Datasets 2.2 and 5.1 on Figshare, and place them into the out
folder.
- Install the
R
language - Install the following R packages:
- Install
IBM CPlex
. Tested with the version 12.8 and 20.1. Set correctly the variableCPLEX.BIN.PATH
indefine-algos.R
(e.g./opt/ibm/ILOG/CPLEX_Studio128/cplex/bin/x86-64_linux/
). Note that BDCC runs only with Cplex 12.8.- For ubuntu, type the following command:
sudo ./cplex_studio<YOUR_VERSION>.linux-x86-64.bin
- The default installation location for education version is:
/opt/ibm/ILOG/CPLEX_Studio<YOUR_VERSION
. - The default installation location for trial version is:
/opt/ibm/ILOG/CPLEX_Studio_Community<YOUR_VERSION/cplex/bin/x86-64_linux/
.
- The default installation location for education version is:
- For ubuntu, type the following command:
- Download exact and heuristic methods, as indicated above, and place them into the
lib
folder.
- Set correctly the variable
CPLEX.BIN.PATH
in the filesrc/define-algos.R
. - Set correctly the MATLAB installation path in
get.MLMSB.command()
,get.BDCC.command()
andget.ZONOCC.command()
in the filesrc/define-algos.R
- Open the
R
console. - Set the current directory as the working directory, using
setwd("<my directory>")
. - Run the main script
src/main.R
.
-
[Arınık'21] N. Arınık, Multiplicity in the Partitioning of Signed Graphs. PhD thesis in Avignon Université (2021).
-
[Miyauchi'18] A. Miyauchi, T. Sonobe, and N. Sukegawa, Exact Clustering via Integer Programming and Maximum Satisfiability, in: AAAI Conference on Artificial Intelligence 32.1 (2018).