Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve performance #3

Open
meisterT opened this issue May 31, 2016 · 34 comments
Open

Improve performance #3

meisterT opened this issue May 31, 2016 · 34 comments

Comments

@meisterT
Copy link
Member

For some checktestdata programs performance is terrible. Are there low hanging fruits that improve the general performance of checktestdata?

@eldering
Copy link
Member

Some issues to look into:

  • FLOAT is slow: changing from REGEX("[0-9.]+") to FLOAT(0,6.2832) made reading 10^5 floats take 44 seconds longer (total runtime including reading other stuff went from 11 to 55 seconds)
  • INARRAY is even slower: 10^5 checks of an INT inside an array of 10^5 INTs increased the runtime of the same testcase to 7.5 minutes.

eldering added a commit that referenced this issue Aug 13, 2016
@eldering
Copy link
Member

These combined fixes improved checktestdata runtime on this testcase from ~8 min to ~2 sec.

@austrin
Copy link
Contributor

austrin commented Oct 17, 2016

An example where I gave up on using checktestdata and switched to python due to speed: problem I (Interception) from NCPC 2016. Problem package available here: http://ncpc.idi.ntnu.no/ncpc2016/ncpc2016-packages.tar.bz2 (there is an unused checktestdata script included).

There is lots of data so maybe there is not much to do, but one feature that would have been helpful here would be a version of the UNIQUE function that compares entries as unordered (multi)sets rather than ordered tuples. Then one could have skipped the annoying if-swap in the main loop, which takes a bit of the time (and pollutes the code). (That feature alone wouldn't have made the script fast enough, but it would at least have made it a bit less slow.)

@eldering
Copy link
Member

I've tested this on my computer. The test case 09.max_ans.in check runs in 31 seconds, without the swap it takes 20 seconds. That's a decent bit less, but I'm not sure it warrants introducing a completely new command.
I also analyzed the call graph with callgrind and a quick glance did not show any clear targets for optimization. A fair amount of time (14%) Is spent in GMP functions, another 10% in boost::variant, and 8% just in std::__lexicographical_compare_impl over GMP integer types.

@austrin
Copy link
Contributor

austrin commented Oct 21, 2016

That's a decent bit less, but I'm not sure it warrants introducing a completely new command.

Right, I agree that this speed improvement on its own is not sufficient reason to add such a command, but there may be other small reasons as well which together make it worth it. Another such reason could be that this type of constraint is somewhat common and the swap dance is a bit annoying and makes the scripts harder to read, so such a command could be nice to have regardless of performance issues. I don't really have a strong opinion on this, but at least wanted to bring up the possibility for you to consider.

@austrin
Copy link
Contributor

austrin commented Apr 22, 2018

@eldering having been frustrated by a few slow ctd scripts in the past weeks, I was wondering a bit more about potential speed improvements. One thought: would it be possible to add some sort of pragma-ish directive to force 64-bit integer arithmetic instead of arbitrary precision (since GMP takes a lot of the time)? I guess one would then want to do overflow checks every time one does an operation (and raise an error on overflow), but maybe it would still be faster, in particular for the common "read a million integers between 1 and 1000" type of format.

@eldering
Copy link
Member

[Funny, I was just looking into a few minor checktestdata issues, including this one.]

Yes, I think that would be a good avenue to explore, and would make sense if it improves performance significantly. I currently don't have the WF2018 problemset available. Is is (publicly) available somewhere? Otherwise, I'd have to wait a little before our backups are online.

@austrin
Copy link
Contributor

austrin commented Apr 22, 2018

I don't think so, but one set you could look at is NWERC 2017, in particular the problems ascendingphoto and factorfree. The validators look almost the same, except that one stores the values in an array (without using them for anything), and that one is about 3x slower than the other. But even the faster one is a bit sluggish -- on my machine it takes around 1.5s just to read a million numbers which with a lot of test files gets a bit annoying.

eldering added a commit that referenced this issue Apr 27, 2018
Improves performance #3.

On the problems ascendingphoto and factorfree from NWERC 2017
this speeds things up about 10% and 15% respectively.
eldering added a commit that referenced this issue Apr 27, 2018
Improves performance #3.

On the problems ascendingphoto and factorfree from NWERC 2017
this speeds things up about 10% and 40% respectively. This cache
may especially help if the checktestdata script contains constant
expressions like "INT(1,5*10^6)" where the 5*10^6 would previously
incur performing exponentiation and multiplication every time the
command was executed.
@TPolzer
Copy link
Contributor

TPolzer commented Jul 12, 2018

I have an experimental branch of checktestdata in antlr_rewrite that seems to be a lot better performance wise, on the other hand it is by far not as well tested and it currently doesn't really have helpful error messages for non conforming input.

@thijskh
Copy link
Member

thijskh commented Jul 13, 2018

Well, the testsuite passes, so that's already quite something?

@meisterT
Copy link
Member Author

meisterT commented Oct 3, 2018

What's the plan for @TPolzer's rewrite? Can we merge that?

@TPolzer
Copy link
Contributor

TPolzer commented Oct 3, 2018

The one area where it would be a clear regression is that it cannot generate inputs. Is that a feature people actually use?

@TPolzer
Copy link
Contributor

TPolzer commented Oct 3, 2018

It also lacks an option to be generous with white space, but that would probably be easy to implement.

@eldering
Copy link
Member

eldering commented Oct 6, 2018

Summary from a short discussion on #domjudge on the antlr rewrite: I'd prefer to keep using Make instead of Bazel since Make is available everywhere. If we can improve a few of the missing features (and aim for feature parity in the longer run), then I'm fine with replacing the current code with this rewrite.

eldering added a commit that referenced this issue Oct 21, 2018
@meisterT
Copy link
Member Author

@TPolzer can you please describe how to build the current version of the antlr branch, i.e. which dependencies have to be installed?

@TPolzer
Copy link
Contributor

TPolzer commented Oct 28, 2018

Yes, .travis.yml isn't too bad a guide, necessary build dependencies are

  • boost development headers
  • gmp development headers
  • a recent(ish) g++
  • bazel

Then do
'bazel test test' to run the testsuite
'bazel build checktestdata' to build the main executable (which will end up at bazel-bin/checktestdata)

@TPolzer
Copy link
Contributor

TPolzer commented Oct 28, 2018

I'm not a big fan of make and actually think that having the build depend on bazel and then host prebuilt static binaries as github releases might be a nicer experience for everybody involved.

@eldering
Copy link
Member

Can you put your version in a separate directory? I don't like replacing the current version completely for a number of reasons. First, the current version still has features that your code does not (yet) support, such as generating test data. Secondly, Bazel is neither Debian or Ubuntu, which makes that make at this point is still more of an off the shelf build system.
I agree we can release static binaries, but that can be done either way, and I don't think that should be the only form of releases, for example, being able to create debian packages is also a nice feature, which doesn't work with Bazel at the moment.

@meisterT
Copy link
Member Author

meisterT commented Nov 14, 2018

I was thinking about this issue a bit more.
We have to separate the group of contributors to checktestdata from the group of users of checktestdata to come up with a qualified decision how to proceed.

For users, the benefit of the rewrite is a much faster checktestdata than the old version - on average it's close to a 10x improvement. Users typically don't care how we achieve that speedup, the normal user also doesn't need to build the software from source.
We haven't been updating the checktestdata debian package in a while and haven't been distributing static binary releases, but we probably should do both in any case. We should probably also look into distributing it via homebrew.

For contributors on the other hand, the build system matters more, but it's also a much smaller group of people. Overall, we had contributions from 8 people in total.

For Bazel there's a debian package which can be installed like this:

echo "deb [arch=amd64] http://storage.googleapis.com/bazel-apt stable jdk1.8" | sudo tee /etc/apt/sources.list.d/bazel.list
curl https://bazel.build/bazel-release.pub.gpg | sudo apt-key add -
sudo apt-get update && sudo apt-get install bazel

So it's not hard to install (and keep it up to date).

btw. Bazel supports building debian packages itself: https://docs.bazel.build/versions/master/be/pkg.html#pkg_deb
(disclaimer: I've never used the packaging rules)

Note that I was also trying to add a Makefile in the new branch while I was trying to make up my mind. However, the antlr version in debian stable cannot generate cpp code: https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=902798

Regarding the idea of putting the binary in a separate directory: I don't really like this because it increases the maintenance overhead. We have the original version, the haskell implementation and now the antlr_rewrite. Any change we make we have to implement in every version either slowing down development or diverging the implementations.
We saw how that didn't work for the branches in the DOMjudge repository so I would rather prefer not do the same mistake here.

To summarize: users benefit greatly from the new version, while it gets a little bit harder for us as contributors. I think this is a fair tradeoff to make.

I propose to do the following:

  • investigate if the dependency on g++-7 is necessary or whether we can make it work with g++-6
  • add more test data to make sure that the new version is actually correct
  • improve error messages
  • add option to generate test data in the new version
  • switch to the new version with static releases
  • look into homebrew

@nickygerritsen
Copy link
Member

I propose to maybe also look into making a Docker image with a checktestdata binary in it when we do static releases.

I agree with the rest of your ideas

@austrin
Copy link
Contributor

austrin commented Nov 14, 2018

We have to separate the group of contributors to checktestdata from the group of users of checktestdata to come up with a qualified decision how to proceed.

Some further comment/question on this. While not being a contributor, from the view of problemtools (living downstream of checktestdata and maybe contributing a non-negligible fraction of checktestdata's users?) it would be desirable if checktestdata continues to be easily buildable using standard packages. But that 10x speedup would be so sweet to get.

If I understand correctly the bazel-vs-make-vs-whatever-floats-your-boat issue is in principle independent of the rewrite, and there is no inherent reason why it would have to use bazel?

Is the bazel part only needed to build the parser, or is it needed also for buliding the binary from the parsers (i.e., as on the current release branch)?

@meisterT
Copy link
Member Author

Well, currently bazel is used for everything, and it's probably cumbersome to have two build systems for one project.

Just to rule out that possibility: would it work for you to distribute the released checktestdata binary instead? Or that you require installing checktestdata separately like you require installing other packages (https://github.com/Kattis/problemtools#requirements-and-compatibility) and also compilers.

@eldering
Copy link
Member

eldering commented Nov 15, 2018 via email

@meisterT
Copy link
Member Author

Jaap, what's your concrete suggestion then? Waiting until the next debian version is released?

@eldering
Copy link
Member

I'm not sure I follow. I'm ok with depending on antlr4 from Debian Buster (since that's at least going to be in the upcoming release, and thus also in Ubuntu). But I'm proposing to keep/rewrite the build system in make instead of Bazel. Is there any reason this cannot be done?

@meisterT
Copy link
Member Author

Ah ok, I was implicitly assuming that you insist on debian stable. If that's not the case, I'll try to cook something up that works on buster without bazel.

@austrin austrin mentioned this issue Nov 16, 2018
@austrin
Copy link
Contributor

austrin commented Nov 16, 2018

In case anyone else is interested in how much faster the antlr rewrite is, I made the following plot:

ctd_antlr_rewrite_speed

Each dot is a checktestdata script for a problem. Its x coordinate is the sum of runtimes (in seconds) of the script on all test cases using checktestdata version 20180411, and its y coordinate is the sum of runtimes using the antlr rewrite. The dotted line is the y=x line. Note that the plot uses log-log axes. (The plot excludes all scripts where the two checktestdata versions currently produce different results, cf #7.)

There's really only one script where the antlr-rewrite does significantly worse than the current checktestdata, and I've made it available here if you want to investigate it further: https://gist.github.com/austrin/65576baa9aa8601e3b06da2cc8cad26d (yes, it's a funny one)

@TPolzer
Copy link
Contributor

TPolzer commented Nov 16, 2018 via email

@meisterT
Copy link
Member Author

Thanks Per. Do you mind sharing the raw data points?

@austrin
Copy link
Contributor

austrin commented Nov 17, 2018

@TPolzer re the cluster around 100 ms, I think it is also just about the startup overhead you mention. Below is a similar plot for individual test cases (i.e., each dot is a run of a checktestdata on a single test case). As you can see the cases where the rewrite is slower are almost entirely runs in the sub-10 ms range (and then many problems consist of around 20 or so such such test cases which then causes the cluster in the previous graph).

ctd-runtimes

@TPolzer
Copy link
Contributor

TPolzer commented Nov 17, 2018

I've looked a bit into the gcc 6 issue and it's not trivial, because I used "constexpr if" liberally (variant and optional can be taken from boost, string_view from absl).
Postponing that.

meisterT added a commit that referenced this issue Oct 20, 2019
@meisterT
Copy link
Member Author

If anyone wants to build this branch themselves, I have updated installation instructions (see https://github.com/DOMjudge/checktestdata/blob/antlr_rewrite/README.md).

I also added this branch to our gitlab CI which now uploads the resulting binary, so you don't have to build yourself: https://gitlab.com/DOMjudge/checktestdata/-/jobs

@TPolzer
Copy link
Contributor

TPolzer commented Oct 20, 2019 via email

@meisterT
Copy link
Member Author

I wondered the same yesterday but didn't have time to investigate.

Today I did some test cleanups (cherry-picked changes from master), disabled tests that do not work on the antlr branch and ran with --runs_per_test=1000 and all iterations passed.

Then I realized that the non-determinism comes from the part we don't support (yet?) in the antlr rewrite: the testdata generation.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

6 participants