Skip to content
This repository has been archived by the owner on May 1, 2018. It is now read-only.

andyleejordan/uidaho-cs472-project1

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Project #1 Genetic Algorithm

1 Implementation

1.1 Structure

The main component of this program is the Genetic class derived from the Algorithm base class; this structure allows me to swap in the prior algorithms with ease (the HillClimber and SimulatedAnnealing classes). When instaniated, this class requires a Problem object, which is a base class representing the interface to the various test functions we were given. Potential solutions to a problem are represented by an Individual class.

1.2 Individual

The Individual class’s primary data structure is a C++11 array container, a template similar to a vector, but more efficient and requiring a size (the given dimension of thirty). Thus its genome (potential solution) is an array of thirty parameter types, where parameter is a typedef for double (“parameter” could honestly be “chromosome”, except that it is used elsewhere not as a chromosome but where it should still share the type). It additionally has a fitness parameter, set when created by the Problem class, and updated when mutated by the Algorithm class. The Individual class has a mutate(gene, gene_i) function whose purpose is to bound-check a mutation operation on any single given gene, and clip it to the domain’s minimum and maximum value as needed. Through the use of templated operator overloads and begin() and end() member functions, an Individual object can be treated like an iterator (of its genome array), compared by fitness such that a greater fitness is defined by its minimization flag (that is, if(minimize) then zero is greater than positive (worse) fitnesses), and arithmetically added by fitness. This makes the implementation of the algorithms very clean.

1.3 Problem

The Problem class holds the following data: the interval of the domain, the interval of the range (used for normalization), a minimization flag (that is, whether or not the goal is to minimize the problem’s value), the goal itself, and the maximum number of iterations/generations the algorithm should run for that problem. The Problem class provides several member functions: fitness(Individual) which is a virtual function implemented by the derived classes and calculates the fitness score using the definition of the problem’s function, potential() which returns an Individual object instaniated with the data it needs (the domain interval, minimization flag, random potential solution generated by using a uniform_real_distribution object, and fitness value for that potential solution), normal(Individual) which normalizes an Individual object’s fitness score onto the interval [0, 1] using the problem’s range (where 1 is the maximum fitness, with minimization taken into account), worst() which finds the worst fitness using an in-place search across iterations number of random potential solutions (used to find an adequate range), and finally represent() which returns a friendly string representation of the problem, including its name and other parameters.

We were asked to implement the following problems defined at:

https://www.cs.cmu.edu/afs/cs/project/jair/pub/volume24/ortizboyer05a-html/node6.html

The source of the fitness functions is included here:

1.3.1 Ackley

  • Domain: [-30, 30]
  • Approximate range: [0, 22]
parameter Ackley::fitness(const Individual & subject) const {
  parameter p_inverse = 1. / subject.size();
  parameter sum_pow = 0;
  parameter sum_cos = 0;
  for (const parameter & gene : subject) {
    sum_pow += std::pow(gene, 2);
    sum_cos += std::cos(2 * M_PI * gene);
   }
  parameter exp_0 = std::exp(-0.2 * std::sqrt(p_inverse * sum_pow));
  parameter exp_1 = std::exp(p_inverse * sum_cos);
  return 20 + M_E - 20 * exp_0 - exp_1;
}

1.3.2 Griewangk

  • Domain: [-600, 600]
  • Approximate range: [0, 1700]
parameter Griewangk::fitness(const Individual & subject) const {
  parameter sum = 0;
  parameter product = 1;
  for (unsigned long i = 0; i < subject.size(); ++i) {
    sum += std::pow(subject[i], 2) / 4000.;
    // i from 1 to p
    product *= std::cos(subject[i] / std::sqrt(i + 1));
  }
  return 1 + sum - product;
}

1.3.3 Rastrigin

  • Domain: [-5.12, 5.12]
  • Approximate range: [0, 900]
parameter Rastrigin::fitness(const Individual & subject) const {
  parameter sum = 0;
  for (const parameter & gene : subject)
    sum += std::pow(gene, 2) - 10 * std::cos(2 * M_PI * gene);
  return sum + 10 * subject.size();
}

1.3.4 Rosenbrock

  • Domain: [-2.048, 2.048]
  • Approximate range: [0, 46000]
parameter Rosenbrock::fitness(const Individual & subject) const {
  parameter sum = 0;
  for (unsigned long i = 0; i < subject.size() - 1; ++i)
    sum += 100 * std::pow(subject[i + 1] - std::pow(subject[i], 2), 2)
      + std::pow(subject[i] - 1, 2);
  return sum;
}

1.3.5 Schwefel

Note that the first Schwefel function on the web page is Schwefel’s double sum, the actual Schwefel function is defined after the Rastrigin function.

  • Domain: [-512.03, 511.97]
  • Approximate range: [0, 21000]
parameter Schwefel::fitness(const Individual & subject) const {
  parameter sum = 0;
  for (const parameter & gene : subject)
    sum += gene * std::sin(std::sqrt(std::abs(gene)));
  return 418.9829 * subject.size() + sum;
}

1.3.6 Spherical

  • Domain: [-5.12, 5.12]
  • Approximate range: [0, 500]
parameter Spherical::fitness(const Individual & subject) const {
  parameter sum = 0;
  for (const parameter value : subject) sum += std::pow(value, 2);
  return sum;
}

1.4 Genetic Algorithm

1.4.1 Population

This implementation of the genetic algorithm uses a generational population model, where a population is a vector composed of 512 Individual objects. The first generation’s members are populated with random values in the problem domain’s interval. To create a new generation, an empty offspring vector is made, which is then populated until it reaches the population size. This is done in four stages: selection, crossover, mutation, and elitism.

1.4.2 Selection

This implementation of the algorithm uses tournament selection. To create a new parent, the best member is selected through a tournament among 4 randomly selected members of the previous generation. Tournament selection suffers from fewer problems than the previous roulette wheel selection, and was about as easy to implement.

1.4.3 Crossover

For every two parents selected in the previous stage, a binary two-point crossover operation is performed to produce new children. The crossover happens with only an 80 percent chance each time. It is implemented by choosing a random start point and random length, both within the size of the genome (that is, less than the given dimension of 30). Using the rotate() function, the parents’ genomes are rotated to the left such that the chosen start point becomes the start of the genome. For up to the chosen length, each pair of genes in the parents’ genes get swapped. The now recombined parents are returned as a pair of children.

Arithmetic and uniform crossover techniques were also tried, but fared either on par or worse than two-point, and were significantly slower.

1.4.4 Mutation

The prior Gaussian mutation sequence performed too poorly for my liking on functions with more complex fitness landscapes (such as the Schwefel problem). Shea Newton’s suggestion of a “jumping” mutation, however, has proved to work much better.

This jumping mutation is an example of “change a little by a lot”. For each gene in a member’s genome, there is a 2.5 percent chance that the gene is mutated to some new random value in the problem’s domain. This amounts to, on average, 0.75 genes per member being mutated.

1.4.5 Elitism

Because this is a generational algorithm, it is best to introduce some elitism. After the new offspring generation has been created (with the members having already undergone the crossover and mutation sequences), two random members are replaced with the best member from the previous population.

2 Results

The goal for each of these problems is minimization, that is, reducing the problem value to zero.

2.1 Ackley

  • Generations: 140
  • Running time: 0.25 seconds
  • Fitness: 0.04

./logs/Ackley.png

Solution:

(0.006996) (0.006996) (0.006996) (0.006996) (0.006996) (-0.012439)
(0.006996) (0.006996) (0.006996) (0.006996) (-0.012439) (0.006996)
(-0.012439) (0.006996) (-0.012439) (0.006996) (0.006996) (-0.012439)
(0.006996) (0.006996) (0.006996) (0.006996) (0.006996) (-0.012439)
(0.006996) (-0.012439) (-0.012439) (0.006996) (0.006996) (0.006996)

Raw fitness: 0.0392386
Normalized fitness: 0.998216
./search  0.24s user 0.01s system 99% cpu 0.250 total

2.2 Griewangk

  • Generations: 100
  • Running time: 0.25 seconds
  • Fitness: 0.5

./logs/Griewangk.png

Solution:

(0.252414) (0.252414) (0.252414) (0.252414) (0.790291) (0.252414)
(0.252414) (0.252414) (0.252414) (0.252414) (0.252414) (0.252414)
(-1.247154) (-1.247154) (-1.247154) (-1.247154) (-1.247154) (0.252414)
(0.252414) (-1.247154) (0.252414) (0.252414) (-1.247154) (0.252414)
(0.252414) (-1.247154) (-1.247154) (-1.247154) (-1.247154) (-1.247154)

Raw fitness: 0.481103
Normalized fitness: 0.999717
./search  0.23s user 0.01s system 99% cpu 0.247 total

2.3 Rastrigin

  • Generations: 80
  • Running time: 0.19 seconds
  • Fitness: 0.13

./logs/Rastrigin.png

Solution:

(0.000239) (0.000239) (0.000239) (0.000239) (0.000239) (0.000239)
(0.011444) (0.000239) (0.010782) (0.000239) (0.000239) (0.000239)
(0.000239) (-0.004172) (0.000239) (-0.001574) (0.000239) (0.000239)
(0.000239) (0.011444) (0.000239) (0.000239) (0.000239) (0.000239)
(0.011444) (0.000239) (0.010782) (0.000239) (0.000239) (0.000239)

Raw fitness: 0.128233
Normalized fitness: 0.999858
./search  0.16s user 0.01s system 88% cpu 0.192 total  

2.4 Rosenbrock

  • Generations: 70
  • Running time: 0.13 seconds
  • Fitness: 28.95

This is the only function that was not minimized close to zero; however, given its large range and difficult valley, this fitness score is still relatively good.

./logs/Rosenbrock.png

Solution:

(0.023596) (0.023596) (0.012501) (-0.000837) (0.012501) (0.012501)
(0.023596) (-0.000837) (0.012501) (0.012501) (0.023596) (0.012501)
(0.023596) (0.023596) (0.012501) (0.023596) (0.012501) (0.023596)
(0.012501) (0.023596) (0.023596) (0.012501) (-0.008473) (0.012501)
(0.023596) (0.012501) (0.023596) (0.012501) (0.023596) (-0.008473)

Raw fitness: 28.952
Normalized fitness: 0.999371
./search  0.11s user 0.01s system 95% cpu 0.129 total    

2.5 Schwefel

  • Generations: 100
  • Running time: 0.33 seconds
  • Fitness: 0.16

./logs/Schwefel.png

Solution:

(-420.765987) (-420.765987) (-420.765987) (-420.765987) (-420.765987)
(-420.765987) (-420.765987) (-420.765987) (-420.765987) (-420.765987)
(-420.765987) (-420.765987) (-420.765987) (-420.765987) (-420.765987)
(-420.765987) (-420.765987) (-420.765987) (-420.765987) (-420.765987)
(-420.765987) (-420.765987) (-420.765987) (-420.765987) (-420.765987)
(-420.765987) (-420.765987) (-420.765987) (-420.765987) (-420.765987)

Raw fitness: 0.155996
Normalized fitness: 0.999993
./search  0.29s user 0.01s system 93% cpu 0.326 total

2.6 Spherical

  • Generations: 50
  • Running time: 0.08 seconds
  • Fitness: 0.068

./logs/Spherical.png

Solution:

(-0.064686) (0.052518) (0.006137) (0.052518) (0.006137) (0.006137)
(0.006137) (0.110435) (0.057621) (0.006137) (-0.064686) (0.025083)
(-0.064686) (0.025083) (-0.064686) (0.052518) (0.052518) (0.006137)
(0.052518) (0.006137) (0.006137) (0.006137) (0.057621) (0.006137)
(-0.064686) (0.025083) (-0.064686) (0.025083) (-0.064686) (0.052518)

Raw fitness: 0.0675684
Normalized fitness: 0.999865
./search  0.06s user 0.01s system 85% cpu 0.083 total

3 Conclusion

In general, this Genetic Algorithm performed exceptionally well. With the same parameters for population size, tournament size, crossover and mutation probability, using the same mutation and crossover sequences (jumping and two-point crossover respectively), this algorithm solves every problem except the Rosenbrock problem to a raw fitness less than 1. The Rosenbrock plateaus at a value of 28, which is still pretty good. The Schwefel problem, known for being notorious, is easily taken care of thanks to the jumping mutation sequence. Although a terminating condition exists, for these tests the goal was set high enough that all generations would be exhausted before the algorithm exited. All algorithms took less than a second to exhaust the set number of generations (maximum of 140, more info available in the results section), with most completing in a quarter second or less. Generally more generations further increases the fitness, and most can be brought much closer to zero, but these results are difficult to visually present.

I was very happy with how this algorithm turned out. For the sake of improving my C++ skills, I have a list of ideas I want to implement, most of which are just refactoring: I want to implement proper namespaces for algorithm, problem, individual, and random, which would allow me to uncouple many member functions which do not require their class’s member variables, and make it easier to reorganize my files; implement mutator and crossover delegator objects to make swapping out the various mutation and crossover sequences cleaner; a command-line interface using the Boost program options library, which would make running my program a bit easier; a more automatic Makefile using autoconf/automake/makedepends, which would require being more explicit with my dependencies inside my files, rather than relying on header inheritance; signal handling for killing a run early and still saving the data; unit testing to verify correctness; evolving mutation rate and individual ranges for each gene; and threads to parallelize “slower” parts of the algorithm. None of this is necessary, but all of it will be fun.

About

Genetic, hill-climbing, and simulated annealing algorithms in C++

Resources

License

Code of conduct

Security policy

Stars

Watchers

Forks

Packages

No packages published