# Algorithms for the Maximum Weight Connected $k$-Induced Subgraph Problem

E. Althaus, M. Blumenstock, A. Disterhoft, A. Hildebrandt and M. Krupp (2014)

Article pdf: https://pdfs.semanticscholar.org/b308/faada274c0f2a5c56eeb7d72608f1729765d.pdf

The paper discusses some methods to find connected subgraphs with specific characteristics in biochemical networks. It focuses on finding a connected subgraph induced by $k$ vertices with maximum weight. One of the methods discussed is based on Linear Programming techniques.

## Motivation

The motivation for this paper was also rooted in bioinformatics, specifically the detection of differentially regulated pathways or subgraphs in a biochemical networks, where it is required to determine which parts of the network are most sensitive to environmental changes. The premise is very similar to our problem wherein the input is a vertex weighted graph and each vertex is assigned a set of numerical values for various characteristics.

The paper builds upon the work of several other publications that focus on finding deregulated networks. ILP based approaches for the problem have been discussed in the following:
1. Backes et al: An integer linear programming approach for finding deregulated subgraphs in regulatory networks. (2011)
https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3315310/
2. Dittrich et al: Identifying functional modules in protein–protein interaction networks- an integrated exact approach. (2008)
https://academic.oup.com/bioinformatics/article/24/13/i223/231653
3. Zhao et al: Uncovering signal transduction networks from high-throughput data by integer linear programming. (2008)
https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2396433/

I have read the 1st paper and following idea is inspired by the same.

## Maximum weight subgraph problems

Given a simple graph $G = (V,E)$ and edge weights $w : E \rightarrow \mathbb{R}$, find a subset $V'  \subseteq V$ such that the subgraph induced by $V'$ is connected and has maximum total edge weight (i.e. $\sum_{e\in E \cap(V'*V')} w(e)$ is maximized). The problem we look at is slightly different wherein our case has vertex weights instead of edge weights. Coupled with degree constraints (maximum degree of a vertex being 2), the chosen subset $V'$ should induce either a path or a cycle. Considering that the problem wants to find one subgraph at a time, the relevant constraints (forcing the graph to be connected) would work well into the iterative plasmid MILP approach.  

## Approach by Backes et al

The linear programming formulation in the paper is from Backes et al. In both the papers, they are interested in finding a connected subgraph on a vertex set of size at most $k$. 

The set of constraints we would like to adapt are the connectivity constraints. For every set $C \subseteq V$ with $|C| < k$, at least one adjacent vertex is also selected. Thus, $$\sum_{w \in In(C)} y_w \geq y_v$$ for all $v \in C$ and for all $C \subseteq V$ with $|C| < k$. Here, $In(C)$ is the set of vertices in $V - C$ with at least one edge incident to $C$.

However, the number of such constraints will be exponential. Thus, they also use a branch-and-cut procedure, adding a constraint only it is needed (if there are multiple components in the solution graph). 

## Takeaway: How can we use this idea?

We can not adopt exactly the same constraints as the parameter $k$ is not well defined for us. However, we can use the principle itself. We have to compromise on the cyclic structure that is typically displayed by plasmids. However, we are guaranteed that the plasmid will be a single path and not a path with collection of cycles.

The current formulation for the plasmids assembly already has 
1. Degree constraints forcing each vertex to have a maximum degree of 2
2. Exactly two vertices to have degree 1. 
Note that in our case, the vertices are contig extremities. 

Our problem is the existence of cyclic components in the same plasmid. If we add the constraint to remove all the cycles, then ultimately, we will reach a stage where the best solution is a single path. This is exactly the idea that we used to remove circular chromosomes in the SPP. 