Refactor/cvrp speedup #146

krypt-n · 2018-08-13T13:14:36Z

Issue

Fixes #144

Implements expression templates for amount to reduce heap allocations.

Tasks

Update CHANGELOG.md (remove if irrelevant)
review

Benchmarks

All CVRP instances, 8 threads, exploration level = 1

origin/master:

Total jobs,44993
Total jobs assigned,44908,99.81%
,
Instances,189
All jobs solutions,165,87.3%
Optimal solutions,10,5.29%
,
,Gaps,Computing times
Min,0.0,1
First decile,0.62,6
Lower quartile,1.82,13
Median,3.46,109
Upper quartile,6.17,988
Ninth decile,9.11,4614
Max,19.47,24572

refactor/cvrp-speedup:

Total jobs,44993
Total jobs assigned,44908,99.81%
,
Instances,189
All jobs solutions,165,87.3%
Optimal solutions,10,5.29%
,
,Gaps,Computing times
Min,0.0,0
First decile,0.62,4
Lower quartile,1.82,7
Median,3.46,52
Upper quartile,6.17,423
Ninth decile,9.11,2247
Max,19.47,17857

Solutions do not change, everything gets a bit faster.

krypt-n · 2018-08-13T14:07:58Z

A similar speedup can be observed with x=3:

origin/master:

,Gaps,Computing times
Min,0.0,2
First decile,0.0,22
Lower quartile,0.64,48
Median,2.06,375
Upper quartile,3.78,4540
Ninth decile,6.28,23736
Max,19.47,116124

refactor/cvrp-speedup:

,Gaps,Computing times
Min,0.0,2
First decile,0.0,13
Lower quartile,0.64,26
Median,2.06,192
Upper quartile,3.78,2139
Ninth decile,6.28,15662
Max,19.47,90277

How did you calculate the 35% number for my last PR? I'd like to add a similarly calculated number to the changelog for this PR.

jcoupey · 2018-08-14T14:07:27Z

Thanks for the implementation, great to see you managed to keep amounts usage unchanged. I'll probably come up with a few questions once I have time to look into the details.

How did you calculate the 35% number for my last PR?

As I recall, this was simply based on overall computing time comparison, summing values in the "computing times" column of the files produced by compare_to_BKS.py. The -35% value being (PR_time/master_time - 1) * 100. This should be the same as comparing total durations provided when running time ./run.sh [...].
The gain for the last PR is actually a bit higher than that, thanks to commit 8cb214f you pushed after my test.

I'll also run the same comparisons for the current PR with various x values on my usual benchmarking machine.

jcoupey · 2018-08-14T20:41:46Z

Running on all CVRP instances with 8 threads, I'm getting the following total computing times comparisons:

`x`	`master`	this PR	Gain (%)
0	2m23.429s	0m47.793s	-66.7
1	4m21.055s	1m51.477s	-57.3
2	10m15.670s	4m28.015s	-56.5
3	14m31.113s	6m19.620s	-56.4
4	27m20.504s	12m2.998s	-55.9
5	53m57.172s	25m37.835s	-52.5

I confirm that all solutions are exactly the same. This is great! 🎉

krypt-n · 2018-08-15T12:35:01Z

Apologies for the unorganized work, but I added two commits that improve the computing times for instances with a lot of vehicles/with a bottleneck in cvrp_local_search::try_job_additions. This does not affect most of the instances, but for the ones with a lot of vehicles it is quite a noticeable improvement.

jcoupey · 2018-08-16T13:01:43Z

Did you spot the modifications from dc640a2 and aca17de because some overhead was showing up when profiling? By the way (out of curiosity), what profiling tool(s) did you use?

krypt-n · 2018-08-16T13:44:53Z

I'm using callgrind with kcachegrind as a UI. My workflow:

Add -ggdb as a compilation flag to the makefile such that the profiler can read linenumbers, etc.
Find something I want to profile that runs in 1-5s. Long enough that improvements don't get lost in noise and short enough for not having to wait for results. For the first commits in this PR this was X-n344-k43
Get a baseline measurement with time vroom -i X-n344-k43.json -t 1 (I'm using a single thread since I'm thinking that that could be less susceptible to noise)
Run callgrind valgrind --tool=callgrind --dump-instrs=yes vroom -i X-n344-k43.json -t 1. --dump-instrs=yes makes profiling information on the assembly instruction level available, this can be useful to find out where calls to inlined functions come from.
View the profiling result in kcachegrind and try to find something that looks like a hotspot (e.g. allocation of amounts)
Try to make performance improvements to the code and check with time vroom -i X-n344-k43.json -t 1 if something changes

For every change that has a big impact like this PR there are a lot of changes that don't have a noticeable improvement.

Did you spot the modifications from dc640a2 and aca17de because some overhead was showing up when profiling?

I did not spot it with X-n344-k43 since it does not spend a lot of time in cvrp_local_search::try_job_additions. However when running the first commits with all CVRP instances I noticed that the computing time of instances with a lot of vehicles did not improve as much as expected (less that 50%). So I did the above workflow again with X-n219-k73 which spent around 40% of its computing time in cvrp_local_search::try_job_additions. The hot code there was again allocation of amounts (dc640a2) and std::min since the loops were pretty deeply nested and did not scale well with input size (aca17de).

jcoupey · 2018-08-16T14:01:48Z

Thanks for sharing. I've been really focused on the big-O complexity (based on number of jobs/vehicles) while designing the solving approach, but this is a great (and complementary) way to spot improvements. What I find really interesting is that (except for aca17de) the overall algorithmic complexity is untouched, yet the gains are stunning.

jcoupey · 2018-08-16T16:25:42Z

CHANGELOG.md

@@ -6,6 +6,7 @@

 - Update `clang-format` to 6.0 for automatic code formatting (#143)
 - Improve performance of the TSP heuristic by ~35% on all TSPLIB instances (#142)
+- Improve performance of the CVRP heuristic by ~55% on all CVRPLIB instances (#146)


I think the term "perfomance" for the solving approach is expected to relate to the solution quality. What about using "speed-up" instead? (Same applies to the entry for your latest PR on the TSP heuristic)

Also you might want to replace "heuristic" with "approach" for the CVRP entry as your fix is useful throughout the whole solving phase (heuristic + local search).

I replaced "heuristic" with "solving", since I don't think that "approach" fits very well.

jcoupey · 2018-08-16T16:29:08Z

src/problems/cvrp/local_search/local_search.cpp

@@ -348,7 +349,7 @@ void cvrp_local_search::update_amounts(index_t v) {
  std::transform(_sol_state.fwd_amounts[v].cbegin(),
                 _sol_state.fwd_amounts[v].cend(),
                 _sol_state.bwd_amounts[v].begin(),
-                 [&](const auto& a) {
+                 [&](const auto& a) -> amount_t {


I guess this is necessary so the compiler can resolve the return type, due to the template change for amount_t?

No, it's actually a bit worse. With this annotation the return type of this lambda is amount_t and an implicit conversion at the return-statements converts the amount_diff_t into an amount_t by actually copying the values and everything is fine.

Without it the return type is inferred to be amount_diff_t and the return-statement does not perform any implicit conversions, this is a problem since amount_diff_t contains references to the lhs and rhs subexpressions. This way however undefined behaviour happens somewhere and this calculation is not carried out correctly. However now that I think about it, I understand the exact reason: the auto total_amount = line copies into a local amount object. The lifetime of this local object ends after the return, but the amount_diff_t still holds a reference to it. I pushed a commit that fixes this issue a bit more elegantly.

This is the part I don't like about this PR, it makes it a bit harder to use amount_t correctly.

jcoupey · 2018-08-16T16:31:04Z

src/problems/cvrp/local_search/local_search.cpp

@@ -494,6 +495,20 @@ void cvrp_local_search::try_job_additions(const std::vector<index_t>& routes,
        }
      }

+      auto smallest = std::numeric_limits<gain_t>::max();


Well done for using smallest and second smallest, that's a clever way to get rid of the inner loop!

jcoupey · 2018-08-16T16:34:46Z

src/problems/cvrp/local_search/local_search.cpp

@@ -494,6 +495,20 @@ void cvrp_local_search::try_job_additions(const std::vector<index_t>& routes,
        }
      }

+      auto smallest = std::numeric_limits<gain_t>::max();
+      auto second_smallest = std::numeric_limits<gain_t>::max();
+      size_t smallest_idx = std::numeric_limits<gain_t>::max();


Should be std::size_t for consistency with other occurences.

jcoupey · 2018-08-16T16:35:57Z

src/structures/vroom/amount.h

+  capacity_t operator[](std::size_t i) const {
+    return lhs[i] - rhs[i];
+  };
+


The formatting script complains about spaces here.

jcoupey · 2018-08-16T16:36:41Z

src/structures/vroom/amount.cpp

@@ -31,51 +25,3 @@ amount_t& amount_t::operator-=(const amount_t& rhs) {
  return *this;
 }



The formatting script complains about newlines here.

jcoupey · 2018-08-16T16:46:32Z

src/structures/vroom/amount.h

-  using parent::push_back;
+  std::size_t size() const {
+    return elems.size();
+  };

  amount_t& operator+=(const amount_t& rhs);


I guess the reason for having all implementations for expression templates in the .h file is the same as for my question in the previous PR (allows the compiler to have the details from other compilation units)?

If so, wouldn't that make sense for amount_t::operator+= and amount_t::operator-= too?

I moved them to the header and deleted the .cpp file, since it is now empty.

krypt-n added 3 commits August 10, 2018 13:02

Implement expression templates for amount_t

9ac5db7

Small cleanup

248deaf

Remove test function

17be653

jcoupey added enhancement CVRP labels Aug 14, 2018

jcoupey added this to the v1.3.0 milestone Aug 14, 2018

krypt-n added 2 commits August 15, 2018 12:55

Fix another amount performance bottleneck

dc640a2

Add precomputation of possible regrets

aca17de

Add changelog entry

cd0b720

jcoupey self-requested a review August 16, 2018 14:40

jcoupey reviewed Aug 16, 2018

View reviewed changes

krypt-n added 5 commits August 16, 2018 21:28

Fix formating

b57ab67

Reword changelog entries

f2e6865

size_t -> std::size_t

773b0a9

Move amount operators to header

9935c2b

Fix the lambda in bit more elegantly

d9ed6bb

jcoupey approved these changes Aug 17, 2018

View reviewed changes

jcoupey merged commit d9ed6bb into VROOM-Project:master Aug 17, 2018

jcoupey mentioned this pull request Aug 17, 2018

CVRP search operators performance #144

Closed

jcoupey added a commit that referenced this pull request Aug 17, 2018

Update gain on CVRPLIB after merging #146.

5524825

jcoupey mentioned this pull request Oct 3, 2018

Tuning for heuristic solutions #166

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Refactor/cvrp speedup #146

Refactor/cvrp speedup #146

krypt-n commented Aug 13, 2018 •

edited by jcoupey

krypt-n commented Aug 13, 2018

jcoupey commented Aug 14, 2018

jcoupey commented Aug 14, 2018

krypt-n commented Aug 15, 2018

jcoupey commented Aug 16, 2018

krypt-n commented Aug 16, 2018

jcoupey commented Aug 16, 2018 •

edited

jcoupey Aug 16, 2018

krypt-n Aug 16, 2018

jcoupey Aug 16, 2018

krypt-n Aug 16, 2018

jcoupey Aug 16, 2018

jcoupey Aug 16, 2018

krypt-n Aug 16, 2018

jcoupey Aug 16, 2018

krypt-n Aug 16, 2018

jcoupey Aug 16, 2018

krypt-n Aug 16, 2018

jcoupey Aug 16, 2018

krypt-n Aug 16, 2018

		@@ -31,51 +25,3 @@ amount_t& amount_t::operator-=(const amount_t& rhs) {
		return *this;
		}

Refactor/cvrp speedup #146

Refactor/cvrp speedup #146

Conversation

krypt-n commented Aug 13, 2018 • edited by jcoupey

Issue

Tasks

Benchmarks

krypt-n commented Aug 13, 2018

jcoupey commented Aug 14, 2018

jcoupey commented Aug 14, 2018

krypt-n commented Aug 15, 2018

jcoupey commented Aug 16, 2018

krypt-n commented Aug 16, 2018

jcoupey commented Aug 16, 2018 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

krypt-n commented Aug 13, 2018 •

edited by jcoupey

jcoupey commented Aug 16, 2018 •

edited