Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

(continuously updating) List of benchmarks on which z3str3 return an incorrect result #2758

Closed
numairmansur opened this issue Nov 30, 2019 · 14 comments
Assignees

Comments

@numairmansur
Copy link

numairmansur commented Nov 30, 2019

Hi,
For the following list of benchmarks, z3str3 return incorrect result:

All the files listed below are sat.
z3str3 returns unsat for all of them.
CVC4 returns sat for all of them.
Z3 (without z3str3) retrurns sat for all of them. For some it hangs. Those files are marked with (z3 hangs).

I am using the z3 nightly build
Z3 version 4.8.8 - 64 bit

@numairmansur numairmansur changed the title List of benchmarks on which z3str3 return an incorrect result (continuously updating) List of benchmarks on which z3str3 return an incorrect result Nov 30, 2019
@NikolajBjorner
Copy link
Contributor

strings4.smt2.zip

@NikolajBjorner
Copy link
Contributor

strings3.smt2.zip

@NikolajBjorner
Copy link
Contributor

strings.smt2.zip

@numairmansur
Copy link
Author

@mtrberzi
Copy link
Collaborator

mtrberzi commented Jan 2, 2020

Taking a look at these cases. Some are now resolved (z3str3 returns sat), others are actually segfaults and a few are timing out. Is there a way we would prefer to track progress on these (maybe editing the list in the comment)?

@numairmansur
Copy link
Author

I can rearrange them.
What timeout are you using ?

@mtrberzi
Copy link
Collaborator

mtrberzi commented Jan 3, 2020

20 seconds. I'll post the full list tomorrow (I will need to update the segfaults list as well as I tested those on the latest build).

@mtrberzi
Copy link
Collaborator

mtrberzi commented Jan 3, 2020

Actually, what timeout did you use? I could re-test the "timeout" cases if you used a longer interval.

@numairmansur
Copy link
Author

The timeout I used was 120 seconds

@mtrberzi
Copy link
Collaborator

mtrberzi commented Jan 3, 2020

I re-ran the timeout cases for 120 seconds. Here are the latest results for commit 0b486d26daea05f918643a9d277f12027f0bc2f6 (release build):

Cases which are still UNSAT:

  • 28
  • 58
  • 62

Cases which timeout:

  • 2
  • 17
  • 18
  • 23
  • 25
  • 26
  • 27
  • 29
  • 39
  • 41
  • 42
  • 43
  • 44
  • 55
  • 63
  • strings3

Cases which segfault:

  • 9
  • 10
  • 47
  • 57

And all 42 other cases return SAT.

@numairmansur
Copy link
Author

@mtrberzi Here are some more examples :
15.smt2
13.smt2
12.smt2
11.smt2
10.smt2
9.smt2
8.smt2
7.smt2
6.smt2
3.smt2
2.smt2
1.smt2

All the formulas at each level in the assertion stack are satisfiable. But z3str3 gives unsat at some level. For example for the file 1.smt2, z3str3 gives:
sat
sat
unsat

while z3-seq gives:
sat
sat
sat

OS:
NAME="Ubuntu"
VERSION="16.04.5 LTS (Xenial Xerus)"

$ git log -1
commit 321bad2
Author: Christoph M. Wintersteiger cwinter@microsoft.com
Date: Mon Jan 20 17:06:35 2020 +0000
Fix for implicit consts in FPA models. Fixes #2865.

@mtrberzi
Copy link
Collaborator

OK, thanks for the new cases. Is that commit the one you tested on?

I'm actually in the middle of replacing the length and value testing algorithm with an improved procedure that should deal with many of these correctness issues. I tracked down several wrong-answer cases to bad behaviour in the legacy algorithm and the best way is to implement this new technique (which should also be more efficient in many cases). I'll let you know when it's ready to be tested, I would really appreciate the cross-check from your tool.

@numairmansur
Copy link
Author

numairmansur commented Jan 23, 2020

That sounds good. Just let me know when it's ready :)
and yes that is the commit on which I tested on.

@mtrberzi
Copy link
Collaborator

OK, all of these cases now return SAT, or time out. I'll close this issue for now, and look at addressing the slow cases after some of the bigger old issues are resolved. Please do keep reporting any wrong answer cases or crashes you find, I appreciate it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants