Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CNF conversion refactoring #5547

Merged
merged 2 commits into from Sep 20, 2021
Merged

Conversation

jameysharp
Copy link
Contributor

While digging into cut-set enumeration, I wanted to experiment with how Z3 converts goals to CNF. But I got confused when I found that the tseitin-cnf tactic isn't used by the sat tactic, which instead has an entirely separate goal2sat implementation of Tseitin encoding. Both implementations have a lot of similarities but it looks to me like each one has some optimizations that the other is missing.

So it seemed to me that it would be worthwhile trying to make tseitin-cnf use goal2sat for all the hard work, perhaps after making goal2sat handle the cases that tseitin-cnf currently does.

I have not actually tried to unify them yet. The first challenge I ran into is that goal2sat uses the abstract base class sat::solver_core, which requires subclasses to implement a lot of methods that don't make sense in something like the tseitin-cnf tactic. I think there's a subset of solver_core which makes sense as a target for writing CNF into, primarily the add_clause and add_var methods, which the tseitin-cnf tactic could implement easily enough.

So this pull request just involves removing most of solver_core's methods, making the corresponding methods in solver not be override, and ensuring that sat_tactic and sat2goal use the solver subclass instead of the solver_core base class so they can still call all the other methods. Then goal2sat is the only user of the abstract solver_core class.

There should not be any behavioral change from the commits in this pull request. There might be an imperceptible performance improvement from making some formerly-virtual method calls into concrete calls, I guess? But my goal was for this to be purely a code refactoring.

Am I missing anything about why tseitin-cnf and goal2sat are separate? Any reason not to go down this route?

These two classes need different things out of the sat::solver class,
and separating them makes it easier to fiddle with their dependencies
independently.

I also fiddled with some headers to make it possible to include
sat_solver_core.h instead of sat_solver.h.
And switch sat2goal and sat_tactic over to relying on the derived
sat::solver class instead. There were no other uses of solver_core.

I'm hoping this makes it feasible to reuse goal2sat's CNF conversion
from places like the tseitin-cnf tactic, so they can be unified into a
single implementation.
@@ -29,7 +30,7 @@ class sat_tactic : public tactic {
ast_manager & m;
goal2sat m_goal2sat;
sat2goal m_sat2goal;
scoped_ptr<sat::solver_core> m_solver;
scoped_ptr<sat::solver> m_solver;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this was introduced at a point when Maate Soos was considering integrating CMS as a secondary backend. Having an abstraction layer for the solver would tell what features of a SAT solver to support. This abstraction layer would support minimal features and therefore not extensions.
Since the CMS experiment never happened, it is very possible we should just abandon sat_solver_core.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the CMS experiment still “interesting”? If so, I’d be happy to pick-up where Mate left.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maate didn't "go" anywhere. There was a collaborator who considered it, but it is about +2 years ago.

Note that it will be essentially impossible to use CMS for the SMT extensions (and I don't see why it would be useful).

Also, the starting point is not to release z3 with CMS dependencies. It will tax development time and hurt more users who don't need this.

Something to consider is also that plugging in CMS just at the level of the SAT solver might also be the wrong design: having a self-contained non-contaminating pipeline or even a self-contains cms_tactic could be the way to go. Then it would have to do its own CNF transformation but have the benefit of being "one-shot" so no complications around incrementality. Of course there can be a use case for incrementality so the tactic route is painting into a corner.

It is really a question of what would it be used for.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, the starting point is not to release z3 with CMS dependencies.
It will tax development time and hurt more users who don't need
this.

100% agree -- if I were to pick this up, it would be an CMake-only "option" (e.g., -DENABLE_CMS:BOOL=ON) that would only work if CMake could find CMS on the system.

If CMake can't find CMS, or you don't ask for CMS, then Z3 would be "unchanged".

that it will be essentially impossible to use CMS for the SMT
extensions

Interesting; I guess I don't understand enough of the Z3 archictecture here; I was thinking that CMS could be "plugged in" as part of a typical DPLL(T)-loop.

plugging in CMS just at the level of the SAT solver might also be
the wrong design

Would it be at all interesting in doing this as a "Step 1" just to show viability? It seems that this would be the lowest-hanging fruit, even if it wasn't the end-goal in itself.

Basically, taking something too ambitious to start with ends-up with stagnation in my experience.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The SMT integration creates and deletes variables during search. So "impossible" means "a lot of extra work".

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it would be an CMake-only "option"

probably, it would also probably bit-rot at some time and then be kicked out when not maintained.
z3 currently has no "links" to other github projects (commonly used elsewhere). It is also a feature based on (1) my ignorance of using features (2) keeping the build process as friction less as possible (it is still not without friction).
Of course, there are various plugin dependencies for API languages (OCaml, .Net, Java), and the main grief is that not everything builds on all variants. There are many complaints about OCaml issues. Nevertheless OCaml is being used.
Just yesterday I learned about a rather interesting and useful application: https://engineering.fb.com/2021/09/13/core-data/superpack/

@NikolajBjorner
Copy link
Contributor

Lots of changes. Before doing too much more, would it be possible to split them into self-contained PRs so the indisputable changes make it quicker?

  • splint goal2sat into goal2sat and sat2goal
  • separate smt_internalize file
  • revert sat::solver_core (see comments)

@NikolajBjorner
Copy link
Contributor

Comment regarding sat2goal not part of cnf-tseitin.
Clausification has to be close to the solver. CNF transformation is mainly useful for quantified formulas, but the solvers really should be able to deal with arbitrary formulas, not just CNF. The core reason is that quantifier instantiation and unfolding of recursive definitions may create new formulas that don't satisfy any of the invariants ensured by cnf-tseitin. Overall, relying n "long-distance" invariants over expressions is one of the issues that keeps biting with corner case bugs. The question is of course how to deal with special case expressions in depth. The sat::solver core does not assume anything about normalization. The smt (theory solver extensions, not relevant for bit-vector blasting experiments) extensions for the sat core has to invoke clausification on the fly.

@jameysharp
Copy link
Contributor Author

Comment regarding sat2goal not part of cnf-tseitin.
Clausification has to be close to the solver. CNF transformation is mainly useful for quantified formulas, but the solvers really should be able to deal with arbitrary formulas, not just CNF. The core reason is that quantifier instantiation and unfolding of recursive definitions may create new formulas that don't satisfy any of the invariants ensured by cnf-tseitin. Overall, relying n "long-distance" invariants over expressions is one of the issues that keeps biting with corner case bugs. The question is of course how to deal with special case expressions in depth. The sat::solver core does not assume anything about normalization. The smt (theory solver extensions, not relevant for bit-vector blasting experiments) extensions for the sat core has to invoke clausification on the fly.

I don't yet understand details like how quantifiers are handled, but this sounds like why I concluded it would be best to have tseitin-cnf use goal2sat, rather than having the sat tactic use tseitin-cnf as I first assumed it would.

Lots of changes. Before doing too much more, would it be possible to split them into self-contained PRs so the indisputable changes make it quicker?

* splint goal2sat into goal2sat and sat2goal

* separate smt_internalize file

Sure, i can do these two easily, although I also don't care strongly about them—they just made it much easier for me to identify which methods were being used by which classes without doing repeated cycles of "try deleting it and see what compiler errors come out".

* revert sat::solver_core (see comments)

I don't actually want to do that, because my plan was to have tseitin-cnf behave like a second implementation of sat::solver, only instead of solving anything it would just dump the clauses out as new goals.

It wouldn't be too hard to go the other direction, and introduce a second abstract base class for the functionality needed by sat2goal, and maybe even a third if there's anything additional that sat_tactic needs. But all I currently need, I think, is to limit the interface to only those methods that goal2sat uses to push clauses into a solver.

So maybe it doesn't make sense to call it solver_core and it should be renamed to, I don't know, sat::clause_set or something. But I think it's still useful.

Of course I'm also open to other ways to unify the common parts of tseitin-cnf and goal2sat, or reasons why they should be separate. They just look so similar.

@NikolajBjorner
Copy link
Contributor

I owe some more substantive comments on this thread. In the interest of providing comments now instead of later here are some:

  • I was under the impression that the goal was to eliminate the unused abstraction layer solver_core, but the converse seems to be at stake: you want a clausifier consumer. sat::solver could implement it. This is fine too, except you run into an issue with the non-propositional formulas where we send the interpreted atom to a theory component (euf::solver) that knows how to dispatch it. Not sure what your plans would be for this.
  • clausification isn't the most complicated piece. The differences that you see are perhaps about how to handle quantifiers and other non-propositional constructs. The smt_internalizer.cpp code also clausifies. There are 3 clausifiers! It is all very similar. The sat::solver allows a backend where xor formulas could be represented as xors instead of clauses and for PB constraints it offers some special purpose backend. The smt::context does not. When clausifying (ite cond th el) it is possible to add also the clause (or th el) besides (or (not cond) th) and (or cond el). Otherwise, clausification is simply supplying names to formulas. Since I don't understand the background motivation for touching this code, maybe I am barking up the wrong tree so bear with me.

@jameysharp
Copy link
Contributor Author

The experiment I want to try is to generate clauses from truth tables, because I'm toying with ideas around tracking bounded-size truth tables during cut-set enumeration.

I've written a Python prototype of a Tseitin-style clausifier for truth tables that's O(n) in space and time with respect to the size of the truth table. (Hiding exponential growth in big-O notation is always fun.) My prototype produces the clauses which aren't implied by any other valid single clause, but for example always produces the six-clause form for if-then-else rather than the minimal four-clause form.

As a first step I thought I'd try making z3 generate clauses by way of a small truth table for each individual operator, just to see how that would compare with the current way it does it. I hoped the output might be indistinguishable and the implementation might be simpler, in which case it could be a nice standalone pull request.

But I had to pick a place to try doing that experiment, and when I found two (out of three!) choices I got sidetracked into wondering why they're separate. For experimentation purposes it'd be nice to be able to use (apply tseitin-cnf) to see what the clausifier is producing, but also be able to (check-sat-using sat) and see whether clausifier changes affect SAT-solving performance, so being able to change both implementations in one place would be nice. That's where this pull request came from.

As for how to handle non-boolean pieces, I had vague ideas of handing anything the clausifier doesn't recognize to the solver and asking for a variable to use to represent it, and letting the solver keep track of whether it's using the PB or EUF solvers or SMT theories or something else. I don't know if that works, but at least in the case of the tseitin-cnf tactic I think it could just pass those pieces through to the goal unchanged, maybe.

Also I've been enjoying reading up on how cryptominisat incorporates XOR into the solver, and I'm wondering if I can get my truth table clausifier to generate XOR clauses as well. But if I'm not mistaken, z3 doesn't use XOR solving yet, right? So that's not top of my list of things to tackle.

@NikolajBjorner NikolajBjorner merged commit 4263063 into Z3Prover:master Sep 20, 2021
@NikolajBjorner
Copy link
Contributor

I am merging the pull request. It is overall an improvement.
It is still very unclear to me if this is useful for your stated goal and whether the cnf transformation is useful for what you want to accomplish in the first place. When you start with Boolean expression networks, a CNF format is isomorphic. Smart compilations to CNF have come and gone over years and maybe less important overall.

Also I've been enjoying reading up on how cryptominisat incorporates XOR into the solver, and I'm wondering if I can get my truth table clausifier to generate XOR clauses as well. But if I'm not mistaken, z3 doesn't use XOR solving yet, right? So that's not top of my list of things to tackle.

There was a naive xor solver, but I removed it because it contained unfixed bugs and wasn't useful.
Since then it was understood how to better support xor for relatively dense constraints using bit-tables in Soos & Mehl's recent CAV paper. This would be the more appropriate starting point for an xor solver.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants