-
Notifications
You must be signed in to change notification settings - Fork 1
Conversation
@LiekevdHeide you can use this branch to add the statistics tracking stuff! |
hgs_vrptw/include/Statistics.h
Outdated
using clock = std::chrono::system_clock; | ||
using timedDatapoints = std::vector<std::pair<clock::time_point, double>>; | ||
|
||
// TODO measure and store population diversity statistic? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@LiekevdHeide (and/or @leonlan): do you have any idea on how to measure population diversity?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are you asking how it is currently measured (broken pairs distance) or are you asking whether we have other suggestions for measuring diversity?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The latter, the former I already know about. I'm looking for some alternatives that we can quickly compute for the whole population, rather than using the broken pairs distance which is currently only computed for the neighbourhood defined by nbGranular
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You could calculate broken pairs over the entire population if you ignore nbClose
in Individual::avgBrokenPairDistanceClosest
, no?
Prins (p. 36) also outlines three other measures for the giant tour chromosome: 1) Hamming, 2) Broken Pairs, and 3) Levenshtein distance. I think the Hamming distance might be interesting to try out. It's easy to compute, cheap (same time complexity as broken pairs, and for VRPTW does not have the drawback of "circular shifts" as stated in the slides.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll add something like this and then I think this PR is done.
Defaults to false, since we do not usually want to collect statistics. Also fix a rounding issue with the seconds/runtime calculation in the plots.
I think this is done. @LiekevdHeide @leonlan can you review this (hopefully today)? |
I will review it today! |
benchmark.py
Outdated
@@ -8,6 +8,7 @@ | |||
from tqdm.contrib.concurrent import process_map | |||
|
|||
import tools | |||
from python.classes import Measures | |||
|
|||
|
|||
def parse_args(): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Suggestion: add argument --collect_statistics
to make it optional.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Question: how should we evaluate a benchmark without statistics?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we want to benchmark solver on the average costs for the given competition time limits, then I don't see the need for collecting statistics. (Assuming that collecting statistics adds non-negligible computational overhead.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's small, as in, I don't see a difference in solution quality with stats on/off ATM.
Two issues:
- It's a hassle to make this optional, and I want to sleep. :-)
- I already use these stats to make decisions on how well ideas work, in addition to just the average cost. E.g. the # iterations tells me whether the local search went faster/slower than previously, and similarly the # improving moves is useful in checking the algorithm actually does something for each instance.
Since the performance hit is miniscule, I do not yet see the need to remove collection from the benchmark script.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the performance hit is miniscule, then I don't see any issues here.
I didn't realize that measures/stats were coupled, so when reviewing I thought it would be an almost trivial addition. But I now understand your first point so I think it's bedtime for me :-)
Closes #21.