Approximate a CNF formula solutions entropy (defined in paper referenced below), by using hashing & optimization technique as implemented in STS SAT model counter (reference below).
A supplement for the book "Perturbations, Optimization, and Statistics" (edited by Tamir Hazan, George Papandreou and Daniel Tarlow) - Chapter 9: "Probabilistic Inference by Hashing and Optimization" by Stefano Ermon
A talk about Chapter 9 is also available at Seminar.pdf
Notice: The tools attached here (STS, WISH) are under copyright of Stefano Ermon , his README's are inside the directories
For the latest versions you can check his page: http://www.cs.cornell.edu/~ermonste
In my blog post I describe a patch to Ermon's SampleTreeSearch tool, to compute the approximated entropy (the average degree of freedom for it's variables) of a CNF formula.
Applying the patch could be done in the following way (tested under Linux Ubuntu):
(can skip this and instead download and compile STS-Entropy)
[user@localhost ~]$ patch Main.cc -i entropy.patch -o Main.cc
cd <STS-DIR>
export MROOT=$PWD
cd core
make
./STS ../test/t2307.cnf
Different : 457
Chi-square : 39.260000
Estimated log-z: 7.666618
Estimated Z: 2.135845e+03
Estimated entropy: 0.472847
Output file: t2307.cnf.entropy.out
Output file: t2307.cnf.entropy.out
As described in STS --help
:
-nsamples = <int32> [ 0 .. 300000000] (default: 10)
-k = <int32> [ 0 .. 100000000] (default: 50)
nsamples - Number of sampling iterations
k - Number of samples per level (the higher the value, the more uniform the solutions are)
Let's examine a simple formula with 3 variables:
p cnf 3 3
1 2 3 0
1 -2 3 0
1 2 -3 0
We can write it down and see that there are total 5 solutions:
{ (1,0,0), (1,0,1), (1,1,0), (1,1,1), (0,1,1) }
Ratios of each literal (number of times it appears in solutions):
r(1) = 4/5, r(-1) = 1/5, r(2) = 3/5, r(-2) = 2/5, r(3) = 3/5, r(-3) = 2/5
In this scenario of tiny formula the approximator sampled 50 solutions. Some of the solutions are identical of course, but usually it isn't the case where we handle larger formulas. In particular the samples should be sampled uniformly (randomly). Let's take a look of it's output:
Var,TotalSols,PosLitSols,NegLitSols,EntropyShan
1,50,0.800000,0.200000,0.721928
2,50,0.600000,0.400000,0.970951
3,50,0.600000,0.400000,0.970951
#Estimated entropy: 0.887943
The ratios (PosLitSols/NegLitSols) converged exactly to the correct values.
Stefano Ermon, Carla Gomes, and Bart Selman.
Uniform Solution Sampling Using a Constraint Solver As an Oracle.
UAI-12. In Proc. 28th Conference on Uncertainty in Artificial Intelligence, August 2012.
Dor Cohen, Ofer Strichman
The impact of Entropy and Solution Density on selected SAT heuristics
abs/1706.05637, arXiv pre-print, June 2017