Motivation
EXPECTED RETRIEVAL COST (P152) from Garey & Johnson, A4 SR4. An NP-complete problem (in the strong sense) from the Storage and Retrieval category. It models the optimization of record placement on drum-like storage devices (rotating storage media with fixed-position read heads), where the goal is to distribute records across m sectors to minimize the expected rotational latency. The problem captures the trade-off between access probability and physical sector distance on a circular layout. NP-complete even for fixed m >= 2 (though solvable in pseudo-polynomial time per fixed m). The main reduction is from PARTITION / 3-PARTITION.
Associated rules:
- R098: Partition / 3-Partition → Expected Retrieval Cost (as target)
Definition
Name: ExpectedRetrievalCost
Canonical name: EXPECTED RETRIEVAL COST
Reference: Garey & Johnson, Computers and Intractability, A4 SR4, p.227
Mathematical definition:
INSTANCE: Set R of records, rational probability p(r) in [0,1] for each r in R, with sum_{r in R} p(r) = 1, number m of sectors, and a positive integer K.
QUESTION: Is there a partition of R into disjoint subsets R_1, R_2, ..., R_m such that, if p(R_i) = sum_{r in R_i} p(r) and the "latency cost" d(i,j) is defined to be j - i - 1 if 1 <= i < j <= m and to be m - i + j - 1 if 1 <= j <= i <= m, then the sum over all ordered pairs i,j, 1 <= i,j <= m, of p(R_i) * p(R_j) * d(i,j) is at most K?
Variables
- Count: n = |R| variables. Each variable x_r in {1, 2, ..., m} assigns record r to a sector.
- Per-variable domain: {1, 2, ..., m} — the sector index to which each record is assigned.
- dims():
vec![m; n] — each of the n records can be assigned to one of m sectors.
- Meaning: The variable assignment encodes a partition of records into m sectors. Record r is placed in sector x_r. The objective is to find an assignment such that the weighted latency cost sum_{i,j} p(R_i)*p(R_j)*d(i,j) <= K, where the latency d(i,j) measures the circular distance from sector i to sector j on the drum.
Schema (data type)
Type name: ExpectedRetrievalCost
Variants: none (no type parameters)
| Field |
Type |
Description |
probabilities |
Vec<f64> |
Access probability p(r) for each record r in R; must sum to 1.0 |
num_sectors |
usize |
Number of sectors m on the drum-like device |
bound |
f64 |
Maximum allowable expected retrieval cost K |
Size field getters:
num_records() → probabilities.len() (n = |R|)
num_sectors() → num_sectors (m)
Notes:
- This is a satisfaction (decision) problem:
Metric = bool, implementing SatisfactionProblem.
- The latency function d(i,j) is implicit and computed from the circular sector layout: d(i,j) = (j - i - 1) mod m for the "forward" direction, specifically d(i,j) = j - i - 1 if i < j, and d(i,j) = m - i + j - 1 if j <= i. Note d(i,i) = m - 1 (full rotation when head is already at sector i and next request is also sector i).
- The probabilities are rational numbers in [0,1], but for implementation purposes we use f64.
- The original GJ formulation uses "positive integer K", but the cost function computes products of rational probabilities × integer latencies, yielding rational (not necessarily integer) values. We use
f64 for both probabilities and the bound to avoid the complexity of exact rational arithmetic, which is standard practice for this type of problem.
Complexity
- Best known exact algorithm: For fixed m, the problem is solvable in pseudo-polynomial time via dynamic programming. For general m, it is strongly NP-complete. Brute-force: enumerate all m^n assignments of records to sectors, compute cost for each; time O(m^n * m^2). No sub-exponential exact algorithm is known for general m.
- Concrete complexity string:
"num_sectors ^ num_records" (brute-force enumeration of all sector assignments)
- NP-completeness: NP-complete in the strong sense [Cody and Coffman, 1976], via reduction from 3-PARTITION.
- Approximation: Cody and Coffman showed that highest-access-frequency-first assignment heuristics provide near-optimal solutions with provable worst-case bounds.
- References:
- R. A. Cody and E. G. Coffman, Jr. (1976). "Record allocation for minimizing expected retrieval costs on drum-like storage devices." Journal of the ACM 23(1):103-115.
Extra Remark
Full book text:
INSTANCE: Set R of records, rational probability p(r) in [0,1] for each r in R, with sum_{r in R} p(r) = 1, number m of sectors, and a positive integer K.
QUESTION: Is there a partition of R into disjoint subsets R_1, R_2, ..., R_m such that, if p(R_i) = sum_{r in R_i} p(r) and the "latency cost" d(i,j) is defined to be j - i - 1 if 1 <= i < j <= m and to be m - i + j - 1 if 1 <= j <= i <= m, then the sum over all ordered pairs i,j, 1 <= i,j <= m, of p(R_i) * p(R_j) * d(i,j) is at most K?
Reference: [Cody and Coffman, 1976]. Transformation from PARTITION, 3-PARTITION.
Comment: NP-complete in the strong sense. NP-complete and solvable in pseudo-polynomial time for each fixed m >= 2.
How to solve
Example Instance
Instance 1 (YES, balanced allocation possible):
Records R = {r_1, r_2, r_3, r_4, r_5, r_6} with m = 3 sectors.
Probabilities: p(r_1) = 0.2, p(r_2) = 0.15, p(r_3) = 0.15, p(r_4) = 0.2, p(r_5) = 0.1, p(r_6) = 0.2.
Sum = 1.0 ✓.
Latency matrix for m = 3:
- d(1,1) = 2, d(1,2) = 0, d(1,3) = 1
- d(2,1) = 1, d(2,2) = 2, d(2,3) = 0
- d(3,1) = 0, d(3,2) = 1, d(3,3) = 2
Allocation: R_1 = {r_1, r_5} (p = 0.3), R_2 = {r_2, r_4} (p = 0.35), R_3 = {r_3, r_6} (p = 0.35).
Cost = sum_{i,j} p(R_i)p(R_j)d(i,j):
= p1p1d(1,1) + p1p2d(1,2) + p1p3d(1,3) + p2p1d(2,1) + p2p2d(2,2) + p2p3d(2,3) + p3p1d(3,1) + p3p2d(3,2) + p3p3d(3,3)
= 0.30.32 + 0.30.350 + 0.30.351 + 0.350.31 + 0.350.352 + 0.350.350 + 0.350.30 + 0.350.351 + 0.350.352
= 0.18 + 0 + 0.105 + 0.105 + 0.245 + 0 + 0 + 0.1225 + 0.245
= 1.0025
Set K = 1.01. Cost = 1.0025 <= 1.01. Answer: YES ✓
Instance 2 (NO, impossible to achieve low cost):
Records R = {r_1, r_2, r_3, r_4, r_5, r_6} with m = 3 sectors.
Probabilities: p(r_1) = 0.5, p(r_2) = 0.1, p(r_3) = 0.1, p(r_4) = 0.1, p(r_5) = 0.1, p(r_6) = 0.1.
Sum = 1.0 ✓.
The highly skewed distribution (r_1 has probability 0.5) makes it impossible to balance sectors evenly. Brute-force enumeration over all 3^6 = 729 assignments yields:
Optimal allocation: R_1 = {r_1} (p = 0.5), R_2 = {r_2, r_3, r_4} (p = 0.3), R_3 = {r_5, r_6} (p = 0.2).
Minimum achievable cost:
= 0.50.52 + 0.50.30 + 0.50.21 + 0.30.51 + 0.30.32 + 0.30.20 + 0.20.50 + 0.20.31 + 0.20.22
= 0.50 + 0 + 0.10 + 0.15 + 0.18 + 0 + 0 + 0.06 + 0.08
= 1.07
Set K = 0.5. Minimum cost = 1.07 > 0.5. Answer: NO ✓
Motivation
EXPECTED RETRIEVAL COST (P152) from Garey & Johnson, A4 SR4. An NP-complete problem (in the strong sense) from the Storage and Retrieval category. It models the optimization of record placement on drum-like storage devices (rotating storage media with fixed-position read heads), where the goal is to distribute records across m sectors to minimize the expected rotational latency. The problem captures the trade-off between access probability and physical sector distance on a circular layout. NP-complete even for fixed m >= 2 (though solvable in pseudo-polynomial time per fixed m). The main reduction is from PARTITION / 3-PARTITION.
Associated rules:
Definition
Name:
ExpectedRetrievalCostCanonical name: EXPECTED RETRIEVAL COST
Reference: Garey & Johnson, Computers and Intractability, A4 SR4, p.227
Mathematical definition:
INSTANCE: Set R of records, rational probability p(r) in [0,1] for each r in R, with sum_{r in R} p(r) = 1, number m of sectors, and a positive integer K.
QUESTION: Is there a partition of R into disjoint subsets R_1, R_2, ..., R_m such that, if p(R_i) = sum_{r in R_i} p(r) and the "latency cost" d(i,j) is defined to be j - i - 1 if 1 <= i < j <= m and to be m - i + j - 1 if 1 <= j <= i <= m, then the sum over all ordered pairs i,j, 1 <= i,j <= m, of p(R_i) * p(R_j) * d(i,j) is at most K?
Variables
vec![m; n]— each of the n records can be assigned to one of m sectors.Schema (data type)
Type name:
ExpectedRetrievalCostVariants: none (no type parameters)
probabilitiesVec<f64>num_sectorsusizeboundf64Size field getters:
num_records()→probabilities.len()(n = |R|)num_sectors()→num_sectors(m)Notes:
Metric = bool, implementingSatisfactionProblem.f64for both probabilities and the bound to avoid the complexity of exact rational arithmetic, which is standard practice for this type of problem.Complexity
"num_sectors ^ num_records"(brute-force enumeration of all sector assignments)Extra Remark
Full book text:
INSTANCE: Set R of records, rational probability p(r) in [0,1] for each r in R, with sum_{r in R} p(r) = 1, number m of sectors, and a positive integer K.
QUESTION: Is there a partition of R into disjoint subsets R_1, R_2, ..., R_m such that, if p(R_i) = sum_{r in R_i} p(r) and the "latency cost" d(i,j) is defined to be j - i - 1 if 1 <= i < j <= m and to be m - i + j - 1 if 1 <= j <= i <= m, then the sum over all ordered pairs i,j, 1 <= i,j <= m, of p(R_i) * p(R_j) * d(i,j) is at most K?
Reference: [Cody and Coffman, 1976]. Transformation from PARTITION, 3-PARTITION.
Comment: NP-complete in the strong sense. NP-complete and solvable in pseudo-polynomial time for each fixed m >= 2.
How to solve
Example Instance
Instance 1 (YES, balanced allocation possible):
Records R = {r_1, r_2, r_3, r_4, r_5, r_6} with m = 3 sectors.
Probabilities: p(r_1) = 0.2, p(r_2) = 0.15, p(r_3) = 0.15, p(r_4) = 0.2, p(r_5) = 0.1, p(r_6) = 0.2.
Sum = 1.0 ✓.
Latency matrix for m = 3:
Allocation: R_1 = {r_1, r_5} (p = 0.3), R_2 = {r_2, r_4} (p = 0.35), R_3 = {r_3, r_6} (p = 0.35).
Cost = sum_{i,j} p(R_i)p(R_j)d(i,j):
= p1p1d(1,1) + p1p2d(1,2) + p1p3d(1,3) + p2p1d(2,1) + p2p2d(2,2) + p2p3d(2,3) + p3p1d(3,1) + p3p2d(3,2) + p3p3d(3,3)
= 0.30.32 + 0.30.350 + 0.30.351 + 0.350.31 + 0.350.352 + 0.350.350 + 0.350.30 + 0.350.351 + 0.350.352
= 0.18 + 0 + 0.105 + 0.105 + 0.245 + 0 + 0 + 0.1225 + 0.245
= 1.0025
Set K = 1.01. Cost = 1.0025 <= 1.01. Answer: YES ✓
Instance 2 (NO, impossible to achieve low cost):
Records R = {r_1, r_2, r_3, r_4, r_5, r_6} with m = 3 sectors.
Probabilities: p(r_1) = 0.5, p(r_2) = 0.1, p(r_3) = 0.1, p(r_4) = 0.1, p(r_5) = 0.1, p(r_6) = 0.1.
Sum = 1.0 ✓.
The highly skewed distribution (r_1 has probability 0.5) makes it impossible to balance sectors evenly. Brute-force enumeration over all 3^6 = 729 assignments yields:
Optimal allocation: R_1 = {r_1} (p = 0.5), R_2 = {r_2, r_3, r_4} (p = 0.3), R_3 = {r_5, r_6} (p = 0.2).
Minimum achievable cost:
= 0.50.52 + 0.50.30 + 0.50.21 + 0.30.51 + 0.30.32 + 0.30.20 + 0.20.50 + 0.20.31 + 0.20.22
= 0.50 + 0 + 0.10 + 0.15 + 0.18 + 0 + 0 + 0.06 + 0.08
= 1.07
Set K = 0.5. Minimum cost = 1.07 > 0.5. Answer: NO ✓