This is a Python module implementing the algorithms developed in the paper Convex optimization for the densest subgraph and densest submatrix problems.
See also R-package and Matlab package.
The problem of identifying a dense submatrix is a fundamental problem in the analysis of matrix structure and complex networks. This code provides tools for identifying the densest submatrix of the fixed size in a given graph/matrix using first-order optimization methods.
See the tutorial below to get started.
The densub module contains the following:
-
plantedsubmatrix: function that generates binary matrix sampled from dense submatrix of particular size. -
mat_shrink: function for evaluating the matrix soft-threholding operator applied to vector of singular values (used in the$X$ -update step ofdensub.m) -
DenSub: class of ADMM algorithms for our relaxation of the densest subgraph and submatrix problems.
Like other python modules, densub can be used after moving it to your file path and loading it:
from densub import *
Note we can also load densub with a prefix using one of the following.
import densubimport densub as dsWe illustrate use of this package on three different types of data:
-
Using random matrices sampled from the planted dense
$m \times n$ submatrix model; -
Using random adjacency matrices of random graphs sampled from the planted dense subgraph model;
-
Using a real-world collaboration network.
We generate a random matrix with noise obscuring the planted submatrix using the function plantedsubmatrix and then use the solver from the DenSub class to recover the planted submatrix.
# Dimensions of the full matrix.
M, N = 100, 200
# Dimensions of the dense block.
m, n = 50, 40
# In-group and noise density.
q, p = 0.85, 0.25
# Make binary matrix with planted mn-submatrix
[A, X0, Y0] = plantedsubmatrix(M,N,m,n,p,q)After generating the random matrix with desired planted structure, we can visually represent the matrix and planted submatrix as two-tone images, where dark pixels correspond to nonzero entries, and light pixels correspond to zero entries.
The vizualization of the randomly generated matrix
We call the ADMM solver with default settings and visualize the output:
ds = DenSub(verbose=True)
[X,Y,Q,its] = ds.solve(A, m, n, 6/(np.sqrt(m*n)*(q-p)))The ADMM solver returns the optimal solutions
It must be noted that matrices
DenSub can also be used to solve the densest subgraph problem without modification.
To illustrate this process, we first make a random graph containing stochastic_block_model function from the Networkx package.
# Size of random matrix.
M, N = 60, 60
# Size of planted block.
m = 40
# Connectivity probabilities.
p = [[0.85, 0.25], [0.25, 0.25]]
# Sample the graph.
G = nx.stochastic_block_model(sizes=[m, M - m], p=p, seed=42)
# Generate the adjacency matrix.
AG = nx.adjacency_matrix(G).toarray()We can find this subgraph by calling DenSub. Here, we change the optimization parameters from their defaults for illustration purposes.
tau = 0.5
tol = 1e-5
maxits = 1000
m = 40
ds = DenSub(tau=tau, opt_tol=tol, maxiter=maxits, verbose=True)
[X,Y,Q,its] = ds.solve(AG, m, m, 8/m)As before, the DenSub solver converges to the matrix representation of the planted dense subgraph highlighted in the figures below.
Our implementation of DenSub as a class of optimization algorithms, distinguished only by iteration parameters, allows us to solve different problem instances with these same settings.
For example, we can solve the original instance of the planted submatrix problem with the settings we used to recover the planted subgraph using the same solver.
[X,Y,Q,its] = ds.solve(A, m=50, n=40, gamma=6/np.sqrt(200*0.6))Here, we converge to the same solution but in more iterations when using this choice of Augmented Lagrangian parameter
The following is an example on how one could use the package to analyze the collaboration network found in the JAZZ dataset (see Community Structure in Jazz. Gleiser and Danon. 2003). This is a social/collaboration network where each pair of musicians are linked if they have performed together. The maximum clique in this network contains 30 musicians.
# Read the graph from adjacency list file.
Gjazz = nx.read_adjlist("DEMO/jazz.txt")
Gjazz = nx.convert_node_labels_to_integers(Gjazz, first_label=0, ordering='default', label_attribute=None)
# Generate the adjacency matrix.
Ajazz = nx.adjacency_matrix(Gjazz).toarray()We are now ready to try to identify the densest submatrix of size 30 in this adjacency matrix. Here, we set DenSub.
m = 30
ds = DenSub(tau=0.35, opt_tol=1e-4, maxiter=2000, verbose=True)
[X,Y,Q,its] = ds.solve(Ajazz, m, m, 12/m)Our algorithm finds the maximum clique, corresponding to the group of musicians indexed by nonzero entries of
- Fork, clone, edit, commit, push, create pull request.
If you encounter a clear bug, please file a minimal reproducible example on github.





