- Dor Zarka 316495357
- Avishay Tsur 316508431
One of the major unsolved issue of computer science is P versus NP.
This issue asks if every problem whose solution can be quickly (aka: polynomial time) verified, can also be quickly solved.
Definitions:
P contains all decision problems that can be solved by a deterministic Turing Machine using a polynomial time.
NP contains all decision problems that can be solved by a non-deterministic Turing Machine using a polynomial time.
NP-Complete contains all decision problems in NP that can be simulate by every other problem in NP. It other words, problem L is NP-Complete if L is NP, and from every problem L' that is NP there is a polynomial reduction from L' to L.
In 1971, Stephen Cook and Leonid Levin showed that if there is a NP-Complete problem that can be solved in polynomial time, then every NP problem can be solved in polynomial time also. It means that P = NP.
Some of the well-known NP-Complete problems are:
- Knapsack problem
- Hamiltonian path problem
- Travelling salesman problem (decision version)
- Subgraph isomorphism problem
- Subset sum problem
- Clique problem
- Vertex cover problem
- Independent set problem
- Dominating set problem
- Graph coloring problem
One of the famous NP-Complete problems is Boolean Satisfiability Problem.
Boolean Satistiability Problem (SAT) is the problem of determining if there exists an interpretation that satisfies a given Boolean formula.
In other words, it asks whether the variables of a given Boolean formula can be consistently replaced by the values TRUE or FALSE in such way that the formula evaluates to TRUE. If this is the case, the formula is satisfiable, otherwise the formula is unsatisfiable.
Every SAT formula can be converted to Conjunctive Normal Form.
Definitions:
Literal is an atomic formula. It is a variable or its negation. Examples:
Clause is a disjunction of literals. Example:
Conjunctive Normal Form (CNF) is a conjunction of clauses. Example:
CNF-Solver_EA is an evolutionary algorithm which get a CNF formula and finds satisfiable assignment.
The problem is to find satisfiable assignment to a given CNF formula as described above and to find the optimal parameters for the evolutionay algorithm.
The naive solution is to run over all the
This approach is very not efficient and when
In this project we will show a solution that finds the optimal the parameters for the evolutionary algorithm, and solve CNF formula.
It is important to find optimal parameters because there are cases which the algorithm does not provide a satisfiable assignment and fails to solve the CNF formula.
Another reason is that we want to use optimal parameters that finds solution and solve the CNF formula with the least amount of time, i.e the optimal parameters that solve the formula most efficiently.
We will solve the CNF formula and find a satisfiable assignment by Evolutionary Algorithm.
First, we will find the optimal parameters by Parameters Search algorithm that will be described below.
When we find the optimal parameters, we will run the evolutionary algorithm with them.
CNF formula represents as list of sub-lists of integers, such that each sub-list represents a clause, and each integer represents literal.
The integer number represents the index of the variable. For example: 2 represent
If the integer if positive, it represents the positive form of the variable, and if the integer is negative it represents and negative form of the variable. For example: 4 represents
Entire CNF formula example:
Each individual will be represented as bit string vector, such that the length of the vector is the count of the variables.
For example: in
For example:
The population consists of M individuals, such that each individual is bit string vector as describes above.
assignment_clause_count(assignment)
: Recieves an assignment and count all the satisfiable clauses from the assignment.
Higher fitness means higher satisfiable clauses.
The maximum fitness is when all the clauses satistiable, means that the CNF formula is satisfiable.
Given a CNF formula, we would like to find satisfiable assignment with the shortest time. Therefore, we need to find optimal parameters for the evolutionary algorithm.
The parameters that we would like to get their optimal values for the algorithm are:
- Population size
- Elitism rate
- Crossover probability for individual
- Mutation probability for individual
- Mutation probability for each bit in individual
- Tournament size
parameter_search(t_time, params_ranges, loop_num, indicator, son, dynamic_search)
:
Non-Dynamic Search:
This function tries to find the optimal parameters with the constants N and M. The searching is done in a random search, i.e., each iteration the function chooses randomly an optional value for each global parameter and runs the evolutionary algorithm with the random parameters.
This was done intentionally in order to reduce significantly the runtime. Only if the random parameters achieved 'better' runtime and were no more less than 'delta' from the optimal fitness (i.e., M) they will be set globally.
Dynamic Search:
if we assume that rather than the idea of a singular suitable set of global parameters, there are many ‘local’s maxima’ we can get by different sets of global parameters, we may wish to keep on searching ‘locally’ after finding better new set of a global parameters. Nevertheless, we also wish to keep on searching for different ‘local’s maxima’.
This idea is exactly what dynamic search is doing. If a new random set of parameters got better results than the last best set, dynamic search will search for loop_num/2 iterations on relatively small ranges around the new parameters which was found. If a new better local set was found, then it will be set as the best new set and the function will search locally recursively. After the local search ends, the search will keep on searching randomly on the full ranges.
Conclusion
- When we want to solve a CNF formula, we can search for parameters once, and then use those parameters for further runs when the CNF-formula is with the same size order.
- We can see that for different number of variables, there is different value to the parameter that optimize the algorithm.
After we found the optimal parameters, we want to compare it to other algorithms that solve CNF formulas:
- Naive Algorithm
- EA With default parameters
- Pysat Algorithm
-
collect_data()
: This function collects data on all the different runtime of the different algorithms (naïve algorithm, EC-KitY, EC-KitY after parameters search and pysat). This function generates a random CNF with N variables and M clauses at each iteration, and averages the sum of the runtimes obtained after 'experiment_loop' iterations. The function will measure the runtime for a range of different N (in a ratio of 1:4 with M). The range can be set by the variable 'experiment_range'. -
assignment_clause_count(assignment)
: Counts the number of the satisfied clauses. -
gen_cnf(n)
: Generates a random CNF clause. -
naive_solver()
: Solves the CNF clause with naive algorithm. -
by_pysat()
: Solves the CNF clause with pysat algorithm. -
run()
: Run the evolutionary algorithm.
We ran all 4 algorithms with different values, and messures their runtime until they find a solution to the CNF formula.
Note: If the naive algorithm run for more than 10 seconds, we stopped it.
Comparison between different algorithms and different parameters:
we increased the number of iterations, in order to get better results:
Conclusion
- We can see that the runtime of the impoved run (with the new parameters) decreased in comparison to the default parameters and also found better results.
The improved run is better than the default not only in time, but also in success! - The naive algorithm's runtime is very high and can't solve any CNF formula with decent number of variables.
- There are algorithms that are very efficient such as pysat.
In order to use the CNF-Solver_EA to solve NP-Complete problem, the user must define the reduction from the problem to CNF formula.
It means that the user need to define encoder and decoder for the problem.
Then the user might want to search for optimal parameters as described above. We highly recommend to use this because without using optimization it might not solve the CNF formula, and if it does solve, we want it to be fast as it can be.
The CNF-Solver uses Evolutionary Algorithm by EC-KitY library.
The algorithm uses:
- Population and Representation:
GABitStringVectorCreator(length=N)
. Each individual is bit string vector with size N (number of variables in formula). - Evaluator and Fitness:
CNFSolverEvaluator()
. Returns the count of the satisfiable clauses by the individual's assignment vector. Higher is better. - Cross-over:
VectorKPointsCrossover(probability=CROSSOVER_PROBABILITY, k=1)
. 1-point cross-over, with the optimal probability that found in parameters search. - Mutation:
BitStringVectorNFlipMutation(probability=MUTATION_PROBABILITY, probability_for_each=MUTATION_PROBABILITY_FOR_EACH, n=N)
. For each individual it makes mutation with probability MUTATION_PROBABILITY, and flips each bit with probability MUTATION_PROBABILITY_FOR_EACH. The parameters are the ones that found in parameters search. - Selection:
TournamentSelection(tournament_size=TOURNAMENT_SIZE, higher_is_better=True)
. Selection is done by tournament with TOURNAMENT_SIZE parameter that found in parameters search.
Sudoku problem is NP-Complete and has reduction to SAT problem.
The reduction process is:
- Encoder: Takes Sudoku board with size
$n^2 * n^2$ and generates CNF formula according the board. - CNF-Solver: Takes the generated CNF formula and finds satisfiabilty assigment to it.
- Decoder: Takes the assignment from the CNF-Solver and fills the rest of the Sudoku board from it.
-
create_CNF(n, board)
: Gets a sudoku board with size$n^2 * n^2$ , and generates from it CNF fromula. -
num_of_variables(cnf)
: Gets a CNF formula and returns the number of different variables. -
fill_board(n, board, assignment)
: Gets a sudoku board, and fills it cells according the assignment. -
print_board(board)
: Gets a sudoku board and prints it. -
v(i, j, d, n)
: Returns an integer according the cell and the value of the board. The calculation is:$pow(n, 4) * (i - 1) + n * n * (j - 1) + d$ .
The idea behind it is that for each cell$i,j$ , we need to create$n*n$ variables. -
map_to_index(literal)
: Gets a literal and returns its corresponding index of the individual.
Reduction Encoder: The CNF formula that generated from the board is:
This formula has 20 variables and 41 clauses.
Graph of #Generations/#Unsasifiable clauses:
Result: Before the evolutionary algroithm started to run, the number of unsatifiable clauses of the best individual's assigment (from the initial population) was 4.
After 7 generations, there was an individual in the population that its assignment satisfied the CNF formula.
Reduction Encoder: The CNF formula that generated from the board is:
This formula has 87 variables and 292 clauses.
Graph of #Generations/#Unsasifiable clauses:
Result: Before the evolutionary algroithm started to run, the number of unsatifiable clauses of the best individual's assigment (from the initial population) was more than 50 (out of 292).
After less than 550 generations, there was an individual in the population that its assignment satisfied the CNF formula.
Evolutionary Algorithm is a way to solve NP-Complete problem:
- Find reduction to CNF problem.
- Run encoder and recieve CNF formula.
- Run parameters search algorithm on the formula in order to find satisfiable assignment (effieciently).
- Run the CNF-Solver Evolutionary Algorithm with the optimal parameters that found in parameters search.
- Run decoder on the assignment and solve the NP-complete problem.
- The naive algorithm is very unefficient and fails to solve many CNF formulas.
- Parameter Search is essential because without optimal parameters, there are cases that the algorithm will not find satisfiable assignment. Also, with the optimal parameters the algorithm will solve it quicker.
- There are CNF-Solver algorithms that are better than evolutionary algorithm, such as Pysat.
- Larger the population, better the results and higher runtime.
- There is a tradeoff between number of satisfiable clauses and runtime.
- Most of the searches found that Mutation probability (not probability for each) and Cross-over probability should be 1.
- We can search for parameters once, and use those parameters to solve other CNF formulas that with the same size order (variables and clauses count).
- With default parameters, the evolutionary algorithm didn't manage to solve Soduko 9x9 board (with logical amount of iterations), but after parameters search it managed to solve it quickly and efficiently.
- Parameters search is important.