-
Notifications
You must be signed in to change notification settings - Fork 1
Home
The shortest driving route through all 48 contigous US states (Takes 113 hours)
Source: http://t.co/eQpXlo4v8L
- pic.twitter.com/sKt2zlPpl3
— Amazing Maps (@Amazing_Maps) June 8, 2014
<script async src="//platform.twitter.com/widgets.js" charset="utf-8"></script>
Shortest Hamiltonian path problem?
A musician is expected to make a round trip at all the venues she'll be performing at, before returning home. This musician likes to be efficient, and wants to minimize her time on the road. A short route would help a lot...
TSP is an NP-hard problem in combinatorial optimization. That's a term academics use to classify its complexity. When a problem is NP-hard, we cannot (yet) prove that a polynomial time solution exists.
A problem is said to be solvable in polynomial time if the number of steps required to complete the algorithm for a given input is O(nk). Polynomial-time algorithms are said to be "fast."
BUT WHAT DOES IT MEAN‽
In layman's terms, when a problem is described as "NP-hard", that means it belongs to a class of problems that very quickly become computationally expensive to solve as the input size grows. These problems are in fact rather easy to solve by enumerating all possible solutions and testing them, but solving them takes a long time on larger instances of the problems.
TSP is one of the most intensively studied problems in optimization. It's used as a benchmark for many optimization methods. Even though the problem is computationally difficult, a large number of heuristics and exact methods are known, so that some instances with tens of thousands of cities can be solved completely and even problems with millions of cities can be approximated within a small fraction of error.
TSP can be modelled as an undirected weighted graph.
- vertices = cities
- edges = paths
- edge weight = path distance
It's a minimization problem starting and finishing at a specified vertex after having visited each other vertex exactly once. Often, the model is a complete graph (i.e. each pair of vertices is connected). If no path exists between two cities, adding an arbitrarily long edge will complete the graph.
Note that the symmetry which undirected graphs provide is often an oversimplification. Traffic collisions, construction, one-way streets, and paid routes are examples of how this symmetry could break down. In such cases a directed graph will be more accurate.
A genetic algorithm is a search technique inspired by Darwin's Theory of Evolution and is used in computing to find exact or approximate solutions to optimization and search problems.
GAs are mainly used for problems where satisficing is expected. A good solution is satisfactory, and there is little benefit to be gained by going for an expensive optimal solution.
BUT WHAT DOES IT MEAN‽
Think of it as a guided random search. You create a bunch of random solutions and evaluate each of them. Then you repeatedly choose some good-ish solutions, and modify them. Some are big modifications (crossovers), some are tiny (mutation). For the big modifications, you "combine" the two solutions.
If you were to graph the average fitness of all the solutions you have, you'll see it gradually improve, until at some point it converges. At that point you can modify some parameters, retry, and hope the next run goes better.
Often the most difficult part can be data representation. How do you represent a real world problem so that your GA can work with it? In terms of efficiency, the fitness evaluation can usually be the bottleneck, especially if fitness is "subjective".
There are 6 different (user specified) GA parameters used in this program:
- Population Size
- Maximum Generations
- Crossover Rate or Crossover Probability
- Mutation Rate or Mutation Probability
- Number of Elites
- K value for Tournament Selection
Optimal value ranges for these GA parameters:
- 100 ≤ popSize ≤ 500
- 100 ≤ maxGen ≤ 300
- 0.7 ≤ Pc ≤ 0.9 (lower from 1.0)
- 0.1 ≤ Pm ≤ 0.3 (raise from 0.0)
- 1 ≤ numElites ≤ 3
- 2 ≤ k ≤ 4

