Skip to content

[Model] PrizeCollectingSteinerForest #1026

@zhangjieqingithub

Description

@zhangjieqingithub

Motivation

The catalog already has Steiner-tree models, but it does not have the biology-facing prize-collecting Steiner forest variant used to recover multiple pathway components at once.

This model is useful because it:

  • generalizes prize-collecting Steiner tree from one connected tree to a forest,
  • keeps the paper's node-prize / edge-cost objective directly in the model,
  • separates the biological model from the artificial-root trick, which should live in a reduction rule rather than in the base problem definition.

Associated rule:

Definition

Name: PrizeCollectingSteinerForest
Reference:

  • Nurcan Tuncbag et al., "Simultaneous Reconstruction of Multiple Signaling Pathways via the Prize-Collecting Steiner Forest Problem," Journal of Computational Biology 20(2):124-136, 2013. https://doi.org/10.1089/cmb.2012.0092
  • Nurcan Tuncbag et al., "Simultaneous Reconstruction of Multiple Signaling Pathways via the Prize-Collecting Steiner Forest Problem," RECOMB 2012, LNBI 7262, pp. 287-301, 2012. https://doi.org/10.1007/978-3-642-29627-7_31

For a network
G = (V, E, c, p)
with nonnegative edge costs c(e) >= 0, nonnegative vertex prizes p(v) >= 0, and parameters beta >= 0, omega >= 0, find a forest subgraph
F = (V_F, E_F)
minimizing
beta * sum_{v notin V_F} p(v) + sum_{e in E_F} c(e) + omega * kappa(F),
where kappa(F) is the number of tree components of F.

Important conventions for this proposal:

  • singleton vertices are allowed as one-vertex tree components,
  • the empty forest is feasible,
  • at the paper level, the input network may be directed or undirected,
  • in the directed case, "forest" means acyclic after forgetting edge directions.

So this is a minimization problem.

Variables

Let n = |V| and m = |E|.

  • Count: n + m binary variables
  • Per-variable domain: {0,1}
  • Meaning:
    • x_v = 1 iff vertex v belongs to V_F
    • y_e = 1 iff edge or arc e belongs to E_F

A configuration is feasible iff the selected edges are incident only to selected vertices and the selected subgraph is a forest. Singleton selected vertices are allowed.

Schema (data type)

Type name: PrizeCollectingSteinerForest

Paper-level variants:

  • undirected networks
  • directed networks

Recommended first concrete repo variant:

  • SimpleGraph initially
  • weight types i32 and f64
  • directed-network support can be added later as a variant extension
Field Type Description
graph G The underlying network
vertex_prizes Vec<W> Vertex prizes p : V -> R_{>= 0}
edge_costs Vec<W> Edge or arc costs c : E -> R_{>= 0}
beta W Tradeoff coefficient on omitted prizes
omega W Cost per tree component

Complexity

A conservative exact baseline is brute-force over vertex selections and edge selections, checking whether the chosen subgraph is a forest and evaluating the objective.

This gives a conservative worst-case bound
O(2^(num_vertices + num_edges) * poly(num_vertices + num_edges))

so the registry complexity string should be
2^(num_vertices + num_edges).

I have not verified a sharper exact worst-case bound for this specific paper-style PCSF variant, so the proposal should use the conservative baseline above.

References for the model and its practical solver context:

Extra Remark

This is not the standard demand-pair penalty version of PCSF.

The intended model is the biology-paper variant with:

  • node prizes,
  • edge costs,
  • an explicit component penalty omega * kappa(F).

The artificial root is not part of the base model. It belongs in the companion reduction rule.

Reduction Rule Crossref

How to solve

Example Instance

Take the undirected path
0 - 1 - 2
with:

  • edge costs
    • c(0,1) = 1
    • c(1,2) = 6
  • vertex prizes
    • p(0) = 5
    • p(1) = 2
    • p(2) = 5
  • beta = 1
  • omega = 2

Expected Outcome

An optimal forest is:

  • V_F = {0,1,2}
  • E_F = {(0,1)}

So the forest has two components:

  • the tree on {0,1}
  • the singleton tree {2}

Its objective value is:

  • omitted-prize term: 0
  • edge-cost term: 1
  • component term: 2 * 2 = 4

Total:
1 + 4 = 5

This is better than:

  • the full path {(0,1),(1,2)}, which costs 1 + 6 + 2 = 9
  • three singleton vertices, which cost 0 + 0 + 3 * 2 = 6
  • the empty forest, which costs 5 + 2 + 5 = 12

So the optimum is 5.

BibTeX

@article{TuncbagEtAl2013PCSF,
  author  = {Nurcan Tuncbag and Alfredo Braunstein and Andrea Pagnani and Shao-Shan Carol Huang and Jennifer Chayes and Christian Borgs and Riccardo Zecchina and Ernest Fraenkel},
  title   = {Simultaneous Reconstruction of Multiple Signaling Pathways via the Prize-Collecting Steiner Forest Problem},
  journal = {Journal of Computational Biology},
  volume  = {20},
  number  = {2},
  pages   = {124--136},
  year    = {2013},
  doi     = {10.1089/cmb.2012.0092},
  url     = {https://doi.org/10.1089/cmb.2012.0092}
}

@inproceedings{TuncbagEtAl2012RECOMB,
  author    = {Nurcan Tuncbag and Alfredo Braunstein and Andrea Pagnani and Shao-Shan Carol Huang and Jennifer Chayes and Christian Borgs and Riccardo Zecchina and Ernest Fraenkel},
  title     = {Simultaneous Reconstruction of Multiple Signaling Pathways via the Prize-Collecting Steiner Forest Problem},
  booktitle = {Research in Computational Molecular Biology (RECOMB 2012)},
  series    = {Lecture Notes in Bioinformatics},
  volume    = {7262},
  pages     = {287--301},
  year      = {2012},
  publisher = {Springer},
  doi       = {10.1007/978-3-642-29627-7_31},
  url       = {https://doi.org/10.1007/978-3-642-29627-7_31}
}

Metadata

Metadata

Assignees

No one assigned

    Labels

    modelA model problem to be implemented.

    Type

    No type

    Projects

    Status

    No status

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions