Skip to content
NicMcPhee edited this page Feb 13, 2013 · 10 revisions

The intent is to use this page to discuss and document our shared style guidelines for this project. Thanks to Kris for the suggestion!

We should consider creating an Eclipse style file so that everyone's code will share similar indentation, etc.

Keep declarations as local as possible

There have been some contradictory suggestions on the first set of pull requests that come down to whether one prefers

def i, j
for (i in things) {
  j = stuff
  //...
}

or

for (def i in things) {
  def j = stuff
  //...
}

I (Nic) strongly prefer the latter. I think it's almost always a Very Good Thing to keep declarations as local as possible. It reduces the scope of variables and their likelihood of interacting (badly), and generally makes code more readable. Someone had been suggesting the opposite, which I think may have been coming from a desire to avoid creating numerous objects inside a loop. That is indeed a desirable goal (or at least a thing to watch out for), but is rarely helped by putting the declarations (the def lines) outside the for loop. The def statement just brings the name into the world, but doesn't actually cause any object to be created. The objects in the examples above are the objects inside things (for i) and whatever object stuff is or returns (for j). defing the variables before the loop has no effect on object creation in either case.

So, unless we've got a good reason to declare things early, I strongly recommend declaring things as close to the usage as possible.

Make For loops as GROOVY as possible

One common thing I (Adrian) noticed when reviewing code is that many people still write their loops in a java-like fashion;

thePopulation = aSetOfCandidates
for(int i = 0; i< thePopulation .size();i++) {
  sum += problem.quality(thePopulation[i])
  //...
}

The above can be refactored as follows:

thePopulation = aSetOfCandidates
for (candidate in thePopulation) {
  sum += problem.quality(candidate)
  //...
}

One of the nice things about this looping style is that it works for several different data structures; 'thePopulation' can be a string, map, array, etc. There are also other methods of looping such as .each and .times, all of which are worth learning because of their convenience and functionality.

The times function is particularly useful and has a lot of potential applications in our code. Instead of:

population = []
for (int i=0; i<numIndividuals; ++i) {
   population.add(problem.create())
}

try

population = []
numIndividuals.times {
   population.add(problem.create())
}

Use assertions to check for important pre-conditions

If you're writing something where there are important pre-conditions, such as numChildren > 0 or firstParent.size() == secondParent.size(), include them in nifty Groovy assertions. For example:

assert firstParent.size() == secondParent.size(), 'Parents must have the same length'

Assertions like this will be checked at run time and an exception will be thrown if they aren't satisfied. You can include assertions at key points "inside" your algorithms if, for example, you want to check that an array has the right number of elements.

Make sure your code runs, i.e., at least write smoke tests

If you think that you have code that is ready to be integrated into the system, at least make sure you've run it before you indicate that you'd like it merged. (Note that this doesn't mean that you can't issue a pull request on "unfinished" code. That might be a totally reasonably way to start a conversation and get feedback; you just need to make it clear that you're not yet expecting to have this merged in.)

Tests would be best since they'll be run repeatedly in the future, providing continuing evidence that your code hasn't been broken by other changes. Writing tests for these highly stochastic algorithms is certainly tough, but with a little creativity you can often test at least some aspects of the system, showing on the way that the code does actually run.

In these tests on crossover for example, the ability to include default arguments allows for the tester to control the random number generation in the test "Crossover happens properly", and the test "Output strings are mirror images of each other" checks an important property of crossover that should hold regardless of the random number generation. There are similar tests for GaussianConvolution.

For searchers it's harder to write good tests since it's really difficult to control their behavior in ways that make it predictable in tests. This test for RandomSearch, for example, really is almost nothing more than a smoke test. I creates the searcher, runs it on a small, simple problem (so the test runs pretty quickly), and then confirms that it got an answer of the right length. That obviously doesn't test very much, but it does ensure that the code runs, terminates, and doesn't throw an exception, which is better than nothing. We could presumably extend that test to confirm that all the elements of the result were 0 or 1. We could also probably include some sort of check that the result had more 1's than 0's, although that would fail on rare occasions when we just got super unlikely, and we should think twice before including a test like that.