Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Setting :random-seed has practically no effect, Renaming functions has huge effect #909

Closed
cliffordwolf opened this issue Feb 22, 2017 · 8 comments

Comments

@cliffordwolf
Copy link

I am trying to evaluate the performance of different SMT2 encoding schemes. To this goal I create test cases using the encodings I'm trying to compare and run each test case many times with different :random-seed settings. However, it turns out that setting :random-seed has minimal to no effect. Randomly renaming the functions I declare/define without changing anything else on the other hand changes the performance of my test cases by more than 2x!

See this SO question for more information. The test cases test1.smt2, test2.smt2, and test3.smt2 used in the analysis below can be found here.

On my workstation, and with latest z3 (c67cf16), the test cases take about 2 minutes to execute.

I ran each of them eight times with :random-seed set to 1 .. 8. The runtime for each of the test cases is for the most part not effected by the :random-seed setting:

results

As mentioned above: The three test cases are practically identical! The only difference is the names I use for the identifiers I declare/define.

I am now considering writing a preprocessor that randomly renames the functions in an smt2 file so I can use it in addition to setting :random-seed..

There is an argument to be had that a solver should preserve the order in which statements are processed, because quite often related clauses show up in the SMT2 file close to each other. But I have never heard such an argument with regard to the identifiers declared/defined in the SMT2 input. I think whatever random changes I inject into the solver by renaming my functions, the same kind of random changes should be triggered by changing the value of :random-seed.

@cliffordwolf
Copy link
Author

I've now written this program to run all identifiers in my SMT2 code through a randomly generated substitution alphabet. It is a hack, but good enough to demonstrate my point. I've re-run my tests and added a third category "shuffle": the test1.smt file but run through my smtshuffle program with different seed values. This gives me a span of over 3x between best and worst case run time for seed values 1 .. 8:

image

Using a substitution alphabet also has the advantage that identifiers that share a prefix still share a prefix after the transformation.

I am not familiar with the z3 code base. But I would assume that there are places where containers (like std::map<>) that use identifiers as keys are iterated in sort order. Simply using a custom compare function for those containers that uses a substitution alphabet that is randomly generated from :random-seed should improve the situation.

(It looks like :random-seed literally does nothing and all variation I got before was because other load on the machine influenced the result. This time I let it run completely undisturbed and the curves for test1 .. test3 show even less variation than before.)

@wintersteiger
Copy link
Contributor

Depending on what appears in your benchmarks, e.g., what theories/logics they use, and what options you chose, there will be more or less randomness. For instance, the default settings for smt.phase_selection and sat.phase are to use phase-caching (without randomness), but you can set them to 5 and random respectively to enable random phase selection in the SAT and SMT solvers. (See also smt.random_freq.) Note that auto_config may have to be disabled, otherwise the automatic configuration may overwrite some of your options. Most users don't want randomness, so by default all those options are set conservatively.

Regarding the renaming of identifiers: We should at the very least look at it briefly. Do you have a small, human-readable file that exhibits a large difference in performance upon renaming?

@cliffordwolf
Copy link
Author

Do you have a small, human-readable file that exhibits a large difference in performance upon renaming?

I don't have any small, human-readable files that run for long enough to reasonably measure the runtime..

Do you have any small, human-readable files (QF_AUFBV / QF_ABV / or something similar) that run for at least a few seconds that I could use? I'd be happy to run some experiments with them.

(You can find my test[123].smt files that I used for my analysis at the link I posted above.)

Most users don't want randomness, so by default all those options are set conservatively.

I guess that most users do not run (set-option :random-seed <seed>) as part of their SMT input. I would suggest that as soon as a user sets this option, it is pretty safe to assume that they want randomness.

@delcypher
Copy link
Contributor

@cliffordwolf Have your tried running with ASLR turned off? Some parts of Z3's code base are sensitive to the addresses of pointers which will vary from run to run unless ASLR is switched off.

@cliffordwolf
Copy link
Author

cliffordwolf commented Feb 23, 2017

@delcypher I did not (I don't even know how), but what you are describing would the the opposite of what I have observed: I see no variation in behavior from one run to the next, even when I'm modifying the (set-option :random-seed ...) setting. (I have not tried it, but I would not expect more variation when re-running z3 with exactly the same input file. :)

I want variation so I can take the median run time of several runs to get a performance metric that is not effected by random variation (very much) but instead reflects the intrinsic quality of the encoding used, so I can make a reasonable comparison of different encoding schemes for the same problem.

Edit: Ah! ASLR as in Address Space Layout Randomization. :) I thought you were speaking about a Z3 feature... ASLR is enabled on my machine, but that does not seem to have a significant effect on the run time I measure.

@NikolajBjorner
Copy link
Contributor

There is a lot of useful data that I hope to look at when I get a chance.
One small comment, perhaps it is smt.random_seed or sat.random_seed.
Sorry, these names are not streamlined.

@cliffordwolf
Copy link
Author

jfyi: I've now re-run my benchmarks. (10 hours on a 36 core machine on AWS.) With the increased variation in runtime I get from my smtshuffle approach I've now switched to the median of 9 runs instead of 5 runs:

(Hover with your mouse cursor over a table cell to see the raw data points before the median.)

As you can see, yices and boolector are consistently >10x faster than z3 on this benchmarks. For some cases its even close to 100x. That's why I'm using test cases that are so long-running: I want 10-100 seconds run time on the fastest solver so I can still do reasonable comparisons. That same test cases happen to run for 400-4000 seconds on z3..

If there is interest in this test cases as a benchmark for z3, I can create versions that run in the neighborhood of 2 minutes, like the test[123].smt examples I posted above.

@NikolajBjorner
Copy link
Contributor

If there is interest in this test cases as a benchmark for z3, I can create versions that run in the neighborhood of 2 minutes, like the test[123].smt examples I posted above.

test1/2/3 actually exposed a regression in 4.8.12

The more interesting question to me seems to be what is in these benchmarks http://scratch.clifford.at/compact_smt2_enc_r1108.html where z3 is 10-100x off.
It is typically something very blatant. For example, I found a long running inefficiency in how we bit-blast not too long time ago.
I am not sure I have access to the files mentioned on the web page

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants