ComparingSolutions/ComparingSolutions.pier

! Comparing Solutions: Benchmarking and Charting at Work

Written by Natalia Tymchuk

Often you have several solutions to a given problem and you would like to understand and evaluate which ones are the best.
To get such an assessment, you will measure your solutions, however it is important to be able to perform your measures
in a controlled and repeatable way. Once you are able to repeat your measuring process, it is also important to put the measures in perspective and charting them is a nice and powerful way to do so. 

In this chapter we will show you how to use SMark - a benchmarking frameworks and Graph-ET - a charting libraries to analyze and compare different solutions - we will take SciSmalltalk as an example. SciSmalltalk is a new Smalltalk project, similar to existing scientific libraries like NumPy, SciPy for Python or SciRuby for Ruby. Within such project we will use the ==Math-ODE== package which is a library of  Ordinary Differential Equations (ODE) solvers. You can read more about ODE in Wikipedia *http://en.wikipedia.org/wiki/Ordinary_differential_equation*.

The main idea of this chapter is that you (the reader) can easily use Graph-ET and SMark for your own project.

					
!! A small Introduction to Math-ODE

To understand how you can work with the ODE package, let's look at the example with ''Adams-Bashforth'' second order method.

@@comment May be you should give one example of an ODE and an example of 

Let's start. First of all you need to understand how this method works.  The Adams–Bashforth methods are multistep explicit methods. Numerical methods for ordinary differential equations approximate solutions to initial value problems of the form
 ''x' = f( t,x )'',  ''x( t@@0@@ ) = x@@0@@''. 

The result is approximations for the value of  ''x(t)''  at discrete times '' t@@i@@'': ''x@@i@@ {{{html: ≈}}} {{{markdown: ≈}}} {{{latex: $\approx$}}} x( t@@i@@ )'' where ''t@@i@@ = t@@0@@ + i*h'', where h is the time step (sometimes referred to as delta t or dt ). 

Multistep methods use information from the previous ''s'' steps to calculate the next value.
 If you want to solve some ODE using this method you need to do the following  steps:

#Check whether time ''t@@1@@'' is equal to ''lastT'' - time value in which we want find the value of our function, if not then
#Find the first state ''x@@1@@'' using first order method from Adams-Bashforth family of methods (this is the Euler method: '' x@@n + 1@@ = x@@n@@ \+ dt \* f( t@@n@@, x@@n@@ )'' ) and initial data: initial state ''x@@0@@'', initial time ''t@@0@@'' and step size ''dt''. Check whether ''t@@2@@'' is equal to ''lastT'', if not then
#Find the second state ''x@@2@@'' using second order Adams-Bashforth method (''x@@n + 2@@ = x@@ n + 1@@ \+ dt \* (3/2) \* f( t@@n + 1@@, x@@n + 1@@ ) - dt \* (1/2) \* f( t@@n@@, x@@n@@ )'' ) and previous datas: ''x@@1@@'', ''x@@0@@'', ''t@@1@@'', ''t@@0@@'', ''dt''. Check whether ''t@@3@@'' is equal to ''lastT'', if not then
#Find the ''n'' next states ''x@@3@@'', ..., ''x@@n@@'' using second order Adams-Bashforth method and previous datas. Here ''n'' is equal to ''(t - t@@0@@) / dt''. If on some ''(k + 1)'' step you have ''t@@k@@ < lastT < t@@ k + 1@@'' then find ''x@@ k + 1 @@'' using ''k + 1'' order Adams-Bashforth method but with ''new_dt = lastT - t@@k@@''. If on some ''k'' step ''t@@k@@ = lastT'' then find ''x@@k@@'' and it's all.

Now, we have the code, which does all previous steps, so if you are not interested, you can skip it.

@@comment may be you should split the methods with an initialize phase
@@comment I simplified the code. If you want I can do a code review of Math-ODE


Fig *FigAB2Example* shows a typical test case for an Adams-Bashforth solver.

+A typical testcase for an Adams-Bashforth solver example>file://pictures/AB2.png|width=80|label=FigAB2Example+


[[[
WHATCLASS>>solve: aSystem startState: initialState startTime: initialTime endTime: endTime
	"add a comment here: what is a result for example"
	
    | prevState statesPair |
    self system: aSystem.
    self stepper: (self firstStepperClass onSystem: self system).
    state := initialState.
    lastTime := initialTime.

    "announce initial conditions"
    self announceState: state time: initialTime.

    (lastTime+dt > endTime and: lastTime < endTime)
    ifTrue:[
        state := self lastStepState: state endTime: endTime].

    lastTime+dt <= endTime 
    ifTrue:[
        prevState := initialState.

        state := self firstStepStartTime: initialTime.
        "announce first step"
        self announceState: state time: (initialTime + dt).

        self stepper: (self stepperClass onSystem: self system).

        "step until the end"
        statesPair := self mainStepsPrevState: prevState startTime: ( initialTime + dt) endTime: endTime.
        state := statesPair second.
        prevState := statesPair first.

        "sanity check"
        self assert: [(lastTime between: initialTime and: endTime) 
            or: [lastTime between: endTime and: initialTime]].

        "take another step if needed"
        state := self lastStepPrevState: prevState endTime: endTime.].

    ^ state]]]


In previous code-block we used the following methods:

[[[
WHATCLASS>>firstStepStartTime: t
  state := stepper doStep: state time: t stepSize: self dt. 
  self announceState: state time: t + self dt.
  lastTime := t + self dt.
  ^ state]]]

which sends a message to ==MidpointStepper==. This method does the second step.

[[[
WHATCLASS>>mainStepsPrevState: prevState startTime: initialTime endTime: endTime
	| previousState |
    previousState := prevState. 
    "don't go to end time to avoid overrunning"
    (initialTime to: endTime - self dt  by: self dt) do:
    	[:time | | tempState|
        tempState := state.
        state := stepper
            doStep: state 
            prevState: previousState
            time:  time 
            stepSize: self dt.
        previousState := tempState.
        "announce step results"
        self announceState: state time: time + self dt.
        lastTime := time + self dt].        
    ^ {previousState . state}]]]
	
This method does the third and forth steps. It sends the messages to the method ==doStep: prevState:== from ==
AB2Stepper==. @@not clear to me@@

[[[
WHATCLASS>>doStep: aState prevState: prevState time: t
    self stepSize isNil
        ifTrue: [ self error: 'step size required by stepper' ].
    ^ (self stepSize / 2) * (3 * (system x: aState t: t) - (system x: prevState t: t - self stepSize)) + aState]]]
	
[[[
WHATCLASS>>lastStepPrevState: prevState endTime: endTime
	"catch partial or full step at end"
    (lastTime equalsTo: endTime ) ifFalse:
    	[state := stepper 
        lastStep: state 
        prevState: prevState
        time: lastTime 
        stepSize: endTime - lastTime
        deltaT: dt.
        self announceState: state time: endTime].           
    ^ state]]]

This method works in this case ''t@@k@@ < lastT < t@@ k + 1 @@''. It also sends a message to the method ==lastStep: prevState: time: deltaT:== from ==AB2Stepper==.

[[[
WHATCLASS>>lastStep: aState prevState: prevState time: t deltaT: incrementOfTime
    self stepSize isNil
    	ifTrue: [ self error: 'step size required by stepper' ].
    ^ self stepSize / 2 * (3 * (system x: aState t: t) - (system x: prevState t: t - incrementOfTime)) + aState]]]
	

Now that you get an idea of the process to solve an ordinary differential equation we can proceed.

!!SMark: A benchmarking frameworks

When it comes to numerical methods it's not enough to have correctly working methods. When you use numerical calculations another important criterion is execution time. To test our code execution time we can write benchmarks for it. Now what is key is to be able to write reproducible benchmarks. Similarly to SUnit supporting the definition and execution of unit tests, SMark supports the definition and execution of benchmarks.

!!!Loading SMark and using it
	
SMark is a framework developed by Stefan Marr for writing benchmarks. To load this package in your image execute in workspace the following expression:

[[[
Gofer new
    url: 'http://smalltalkhub.com/mc/StefanMarr/SMark/main';
    package: 'ConfigurationOfSMark';
    load.
(Smalltalk at: #ConfigurationOfSMark) load	
]]]

@@note we can use the configuration form to load. To verify

[[[
Gofer new
    smalltalkuser: 'StefanMarr' project: 'SMark';
    configuration';
    load.
(Smalltalk at: #ConfigurationOfSMark) load	
]]]


+Download SMark>file://pictures/Download-SMark.png|width=75|label=Download SMark+


!!!Writing a benchmark
Now we can get started and write a first benchmark. 
To do so, you need to create the new class, where your future benchmarks will be defined. Pay attention: this class must be a subclass of ==SMarkSuit==. Let's call it ==SomeBenchmark==.

+Writing benchmark>file://pictures/Writing-benchmark1.png|width=70+

It's time to write your first benchmark. To follow SMark's convention you have to prefix names of your benchmark methods with the prefix ==bench"==. Then write your code. For example:

[[[
SomeBenchmark>>benchFactorial
  1000 factorial
]]]

But not all benchmarks look so simple. For example in the context of ODE project:

WRITE IT HERE as Pharo code !!!

+Midpoint test and benchmark>file://pictures/Midpoint-benchmark.png|width=100+

As you can see on this image benchmark is very similar to a test. So if you are writing tests for your code, adding some benchmarks will be a piece of cake.


!!!Running benchmarks

One of the most important reasons why we write the benchmark is the result of it's execution.
To see this result in workspace you should execute ==SomeBenchmark run== or better ==SomeBenchmark run: iterationNumber== and pass a number of iterations.

+Running benchmarks 1>file://pictures/Runing-benchmark1.png|width=70+

Same as the tests, benchmarks have ==setUp== method, which can be used to initialise all the shared data for benchmarks. In my case it looked like that:

+Running benchmarks 2>file://pictures/Runing-benchmark2.png|width=80+

And then our benchmark was DRYed to this: (SD what does it mean DRYed?)

+Running benchmarks 3>file://pictures/Runing-benchmark3.png|width=80+


""Nota bene"": Don't forget to write ==super setUp== in the ==setUp== methods of subclasses.


!!!Metacello configuration
You can extend your configuration to automatically load SMark. To do that go to the configuration of your project (e.g. ==ConfigurationOfSciSmalltalk==) open last baseline and add:

[[[
spec for: #common do: [
  spec
    ...
    package: 'Math-Benchmarks-ODE' with: [spec requires: #( 'Math-ODE' 'SMark' )];
    ...
  spec project: 'SMark' with: [
    spec
      className: 'ConfigurationOfSMark';
      repository: 'http://smalltalkhub.com/mc/StefanMarr/SMark/main' ].
]]]


!!!Improving SMark

If you have SomeBenchmark's subclasses of benchmarks and want to run them all with one message, then add the next method to the class side of your SomeBenchmark.
[[[
SomeBenchmark class>>runAll:  numOfIterations
  ^ (self withAllSubclasses collect:[ :each |
        each run: numOfIterations ]) joinUsing: Character cr
]]]

+Improving SMark>file://pictures/Improving-Smark1.png|width=80+


!!Quality criteria for Math-ODE

Once we have defined all the benchmarks, we can compare them. Obviously, all methods give us a different execution time, and according to our previous quality criteria the best method is the one that is passing all unit tests and has the fastest execution time. In fact we forgot about the accuracy, because all the numerical method's solutions differ from the actual value that can be found by solving equation analytically.


!!!Accuracy

For Math-ODE project, we wrote new package Math-Accuracy-ODE. There we have methods which compute the accuracies:

[[[
Put some code here too
]]]

+Accuracy for AB2 method>file://pictures/Accuracy-for-AB2-method.png|width=80+

+Accuracy for AB3 method>file://pictures/Accuracy-for-AB3-method.png|width=80+

+Accuracy's initialise>file://pictures/Accuracy-initialise.png|width=80+

and methods for running them:

[[[
WHATCLASS>>run
	| instance checkSelectors |
	instance := self new.
	checkSelectors := self selectors select: [ :each | each beginsWith: 'check' ].
	^ (checkSelectors collect: [ :selector |
		(selector copyFrom: 6 to: selector size) ->
		((instance perform: selector) - instance standard) abs ]) asDictionary]]]
		
We can get the results in XML.	
[[[
WHATCLASS>>runToXML
	| writer |
	writer := XMLWriter new.
	writer
		enablePrettyPrinting;
		xml.
	writer tag: 'accuracy' with: [ 
		self run keysAndValuesDo:  [ :key :value |
			writer tag: key with: value asString]].

	^ writer ]]]

We have again something similar to SUnit, but this time calculating the deviation of a numerical method from the actual value. So using these automated tests helps you to stay updated with a quality changes of your code. You will spot when the time of execution becomes twice as big because of an architecture refactoring, or the method's accuracy decreases dramatically because of performance optimizations.


!! Graph-ET 
Depending on your data, you may want to have live visual feedback to understand the results. In such a case you can use Graph-ET. It is a graph charting library written by Daniel Aviv Notario. You can use that Graph-ET  to draw the graphs, experiment with size, position, - everything that you have on your image and what you would like to add.

!!!Loading Graph-ET and using it

To download Graph-ET in your Pharo image you should execute the following code in the workspace: (SD this does not load the code)

[[[
MCHttpRepository
 location: 'http://smalltalkhub.com/mc/DanielAvivNotario/Graph-ET/main'
 user: ''
 password: '']]]


!!!Benchmarks in Graph-ET

We get our benchmarks data as an ordered collection shown in Figure *grresults*


+Benchmarks results>file://pictures/Benchmarks-results.png|width=80|label=grresults+

However it is not suitable for Graph-ET and this is why we transform it to the dictionary as follows

[[[
benchmarksData
^((((ODEBenchmark runAll: 100) collect: [ :suite |
		suite results ]) fold: [ :result :current |
			result addAll: current; yourself ]) collect: [ :value |
				(value inject: 0 into: [ :subTotal :each |
					subTotal + each total ]) / value size ] )
]]]

SD: It is not clear to me that we get a dictionnary!


+benchmarksData>file://pictures/benchmarksData.png|width=80|label=benchmarksData+

Then the method which draws the diagram has the following definition: 
[[[
graphBenchmarks
	| diagram model | 
	model := self benchmarksData associations.
	
	diagram := GETDiagramBuilder new.
	diagram verticalBarDiagram
		models: model;
		y: #value;
		spacing: 15;
		regularAxis.
		
	diagram interaction
		popUpText.
		
	 diagram open.
		
	model do: [ :value | 
    	  | bar label |
		      "We define a label, and add it to the view"
		      label := ROLabel elementOn:( value key joinUsing: Character cr) .
		      diagram rawView add: label.

		      "We get the bar, the gray element that grows up"
			  bar := diagram rawView elementFromModel: value.

             "Move the label below its corresponding bar"
             ROConstraint move: label below: bar ].
]]]

When you execute the following expression in a workspace ==SciSmalltalkMetrificator graphBenchmarks==
you will get the graph shown in Figure *graphBenchmarks* and displayed with Graph-ET as Figure *graphBenchmarksIn*.

+graphBenchmarks result>file://pictures/graphBenchmarks-result.png|width=50|label=graphBenchmarks+


+Benchmarks in Graph-ET>file://pictures/4BenchmarksinGraph-ET.png|width=50|label=graphBenchmarksIn+

 
""Nota bene:"" in the case that your benchmark results are close to zero, it is necessary to increase the ''problemSize'', which multiplies the results (as shown in Figure *ProblemSize*). Make the smallest your results greater or equal to 10.

%+ProblemSize equal 100.>file://pictures/5ProblemSize-equal-100.png|width=50+ 


+ProblemSize equal 1>file://pictures/6ProblemSize-equal-1.png|width=50|label=ProblemSize+

   
As you can see the results are similar, but not identical.


!!!Accuracies in Graph-ET

In addition to benchmarks we get our accuracies data as a dictionary as shown in Figure *AccRes*.
This means that we can chart them immediately. 

+Accuracies results>file://pictures/Accuracies-results.png|width=80|label=AccRes+


[[[
accuracyData
   ^ ODEAccuracy run ]]]

The method looks similar to ==graphBenchmarks==

[[[
WAHTClass>>graphAccuracy
	| diagram model |
	model := self accuracyData associations.
	
	diagram := GETDiagramBuilder new.
	diagram verticalBarDiagram
		models: model;
		spacing: 25;
		"barWidth: 25;"
		y: #value;
		regularAxis.
		
	diagram interaction	popUpText.
	diagram open.
	
    model do: [ :value | 
        | bar label |
        "We define a label, and add it to the view"
        label := ROLabel elementOn:( value key joinUsing: Character cr) .
        diagram rawView add: label.

        "We get the bar, the gray element that grows up"
        bar := diagram rawView elementFromModel: value.

        "Move the label below its corresponding bar"
        ROConstraint move: label below: bar ].
]]]

 ""Nota bene:"" you can get an image similar to the Figure *smallStep*.
 
+Accuracies with the small step size>file://pictures/7.png|width=50|label=smallStep+

The problem is that step size, that I used, was so small, that every method returned almost the same result.
  The graph of accuracies with the larger step looks different :

+Accuracies with the larger step size>file://pictures/8Accuracies-in-Graph-ET.png|width=50|label=8Accuracies-in-Graph-ET+
 
You can experiment a lot with Graph-ET. For example, in Figure *8Accuracies-in-Graph-ET* you can see that the biggest deviation comes from the Euler and Backward Euler methods, which are the first order methods. And the family of Adam-Moulton multistep methods (Trapezoid, AM3, AM4) returns the best results.


!!!A new question
Now we ask ourselves an important question: How does the deviation vary depending on the step size?

On the image the darker colors  correspond to the smaller steps. We can see that the colors are distributed differently. It depends on how each method's result is close to the actual point. For example, the Euler and BackwardEuler methods are approaching to the point from one side, so the colors change smoothly.

+Example 1>file://pictures/9.png|width=50+

An even more impressive thing is the dependence of accuracy on the initial state.

+Example 2>file://pictures/10.png|width=50+

On the ''y''-axis we have deviation between given and actual results. The intensity of the color displays the shift of initial value both positive or negative.
The shades of a blue color indicate the negative deviation, and the shades of red - the positive deviation.
The best methods are those, that at zero point have black colour. Because, the closer we are to the initial point (it is indicated with a black colour close to ''x''-axis ), the closer we are to the actual value.

+Deviation scale>file://pictures/11.jpg|width=50+

 As with previous images, the graphics of Euler and BackwardEuler methods are the simplest, with the colors that are changing gradually. However, exactly these methods draw the most attention, and from their graphs we can understand that they are not the best methods. This happens because, if we take the point ''(A@@0@@ - 1)'' as the initial one, than we will get a value less than ''A@@4@@'' and closer to the function which is drawn with blue colour. 

+Euler method>file://pictures/12.png|width=50+


For the BackwardEuler method it's the same situation with initial point ''(A@@0@@ + 1)''. 

Those last two bar graphs are very exciting, because we can see live how the numerical methods work.


!!!Benchmarks with Accuracies

In Graph-ET you even can merge the graphs of the benchmarks and the accuracies.

Benchmarks are measured in milliseconds, and the accuracies are just the numbers. Therefore, to merge it correctly, we converted all values to percentages.

+Benchmark's and accuracy's values in percentages>file://pictures/13.png|width=50+

In this image accuracy is red and benchmark's time is blue.

It's important and interesting, because we can see both the most accurate and the quickest methods. As a result we can determine which method is the best for our ODE.