## 26.5 Theory and Practice

I close this chapter with some considerations about
the theory and practice of algorithmic complexity.

### 26.5.1 Theory

Problem reduction is very important in helping to understand the nature of problems.
If A reduces to B in polynomial time, then A can't be harder than B:

- if B can be solved in polynomial time, so can A, and
- vice versa, if A can't be solved in polynomial time, neither can B.

Polynomial-time reductions allow us to make general statements, like
all NP-complete problems being equally hard and
harder than any other NP problem.

Starting from SAT, computer scientists used polynomial-time reductions to
classify hundreds of problems.
I haven't shown any polynomial algorithms for the
[travelling salesman](../22_Backtracking/22_5_tsp.ipynb#22.5-Back-to-the-TSP),
[0/1 knapsack](../22_Backtracking/22_7_knapsack.ipynb#22.7-Back-to-the-knapsack),
[maximal independent set](../11_Search/11_5_subsets.ipynb#Exercise-11.5.1) or
[subset sum](../11_Search/11_6_practice.ipynb#11.6.2-Subset-sum) problem for one simple reason:
they all have been proven to be NP-hard.

One of the most fascinating aspects of complexity theory for me is that
seemingly similar problems fall into different complexity classes.
Here are some examples.

The **fractional knapsack** problem allows us to put a fraction of each item
in the knapsack, e.g. when each item is in liquid or powder form.
There's a simple greedy log-linear algorithm that solves this problem:
go through the items from most to least profitable (value-to-weight ratio) and
take as much as possible from each one until the knapsack is full.
This means that the fractional knapsack problem is in P but
the 0/1 knapsack problem is NP-hard.
If we can put only 0% or 100% of each item in the knapsack,
the problem becomes much harder.

Given a graph, the **all-pairs longest path** problem asks for
a non-cyclic path with the most edges (if the graph is unweighted) or
with the highest sum of weights (if it's weighted), among all pairs of nodes.
This problem is NP-hard whereas the **all-pairs shortest path** problem is in P
because we can use breadth-first search or Dijkstra's algorithm to solve
the [single-source shortest paths problem](../18_Greed/18_4_shortest_path.ipynb#18.4-Shortest-paths)
for each graph node, and then take the shortest path of them all.

[Exercise 11.2.1](../11_Search/11_2_factorisation.ipynb#Exercise-11.2.1) implicitly
showed that the **primality** problem ('is a given positive integer prime?')
can be reduced in polynomial time to the **factorisation** problem
('list the positive divisors of a given positive integer') because
a number is prime if and only if it has exactly two factors.
There's an [exponential algorithm](../11_Search/11_2_factorisation.ipynb#11.2.3-Sort-candidates)
for the factorisation problem and therefore for the primality problem.
This suggests that both problems are intractable, but we know that
reducing A to B isn't necessarily the most efficient way of solving problem A.
In fact, in 2002 three Indian scientists found a polynomial algorithm to check
if a number is prime, proving that primality is in P.
However, to this day it's not known whether factorisation is NP-hard, so
finding a polynomial algorithm for it won't prove P = NP.

The **2-SAT** decision problem is a specialised version of SAT in which the
input Boolean expression is a conjunction of disjunctions,
where each disjunction has exactly two variables or their negations, e.g.
'(A or B) and (not B or C)'. As you probably guessed,
although SAT is NP-complete, 2-SAT is in P.
The more specific problem can be solved much more efficiently than
by applying exponential brute-force search, as for the general problem.
Not all Boolean expressions can be written as a conjunction of
two-variable disjunctions, so the general SAT problem remains important.

### 26.5.2 Practice

Reductions also have some practical value:
if problem A reduces to problem B then *any* algorithm that
solves B also solves A, via input and output transformations.
Usually the obtained algorithm won't be the most efficient to solve A,
but at least it's a start.

If somebody ever finds *one* polynomial algorithm for *one* NP-complete problem,
then polynomial algorithms for the hundreds of known NP-complete problems can be
immediately implemented, with the polynomial-time reductions used to
prove that those problems are NP-complete.

Polynomial algorithms are often characterised as being efficient, because
they are being contrasted with the non-polynomial exponential and factorial
algorithms. In practice, an exponential algorithm may be efficient enough and
a polynomial algorithm may be inefficient, because there are factors at play
that are often ignored by the theory, to simplify the analysis.

Complexity analysis and the classification of problems is mostly based on
worst-case scenarios, but in real life many inputs aren't worst cases, so
theoretically inefficient algorithms may in practice
solve most problem instances in a reasonable time. For example,
backtracking can prune the search space effectively for many inputs even though
there may be inputs for which all candidates have to be generated.

A further example is SAT solvers: they can routinely handle Boolean expressions
with hundreds of variables and thousands of operators.
Even though the worst case is exponential, it rarely occurs.
In addition, SAT solvers return an interpretation that satisfies the expression,
if there is one, i.e. they also return the certificate.

Complexity analysis predicts how the run-time grows for ever-increasing
input sizes, but sometimes we're only interested in small problem instances.
In such cases, an inefficient algorithm may be sufficient in practice.
For example, supermarket delivery vans typically visit 15–20 customers
before returning to the warehouse. A dynamic programming algorithm for the TSP
can compute the best tour for graphs with that many nodes in a few seconds.

On the other hand, problems in P may actually have no practical algorithm.
A polynomial algorithm with complexity O($n^c$) is inefficient if the exponent
*c* is high or the constant factor (ignored by complexity analysis) is high.
What exactly 'high' means depends again on the input size.
If the input size *n* is in the thousands,
then a cubic algorithm (*c* = 3) may be very slow.

Another example is the polynomial algorithm to test primality of a number *n*:
it has complexity O($(\log n)^{12}$) and is therefore hardly used in practice.
Primality testing is required for cryptography,
where numbers can have more than 100 digits, i.e. log *n* > 100,
which makes this algorithm impractically slow.

When no exact algorithm is fast enough for large real-world problem instances,
practitioners use heuristic algorithms that only provide approximate or
probabilistic results. For example, primality is usually decided with an
algorithm that gives the right decision with high (but not 100%) probability.
And for large graphs, the TSP is solved with algorithms that return
an approximate 'good enough' result.
You have seen a greedy heuristic algorithm for the TSP in
[Exercise 18.3.1](../18_Greed/18_3_mst.ipynb#Exercise-18.3.1).

So if you have an NP-complete or NP-hard problem at hand,
first consider what input sizes you will need to cope with.
If they are small, an exact exponential algorithm might do the job. Otherwise,
look for a heuristic that gives a good approximation of the result or
the right result with high probability.

If your problem can be reduced in polynomial time to SAT or the TSP, then implementing the
input and output transformations yourself and using a good SAT or TSP solver
might be your best bet.

⟵ [Previous section](26_4_hardness.ipynb) | [Up](26-introduction.ipynb) | [Next section](26_6_summary.ipynb) ⟶