Skip to content

Implement non-empty clause counting whi…#129

Merged
eladrion merged 6 commits intomainfrom
fix_119_empty_clause_creation
Sep 29, 2025
Merged

Implement non-empty clause counting whi…#129
eladrion merged 6 commits intomainfrom
fix_119_empty_clause_creation

Conversation

@eladrion
Copy link
Contributor

@eladrion eladrion commented Sep 25, 2025

…ch fixes an empty clause error when having only one tool in the workflow

Pull Request Overview

Fixes #119 which is introduced since APE generates an empty clause which is not accepted by minisat and thus not counted as clause. Hence, the computed count of clauses by APE does not match the ones recognised by minisat.

Related Issue

#119

Changes Introduced

Since (in the case of a workflow of length 1) the output of op 1 cannot be reused, connectedModules seemingly yields an empty list of CNF literals. Thus, we check whether the constraints are empty and only set a clause delimiter (0) if there are constraints.

How Has This Been Tested?

Successfully tested locally with config_bug119.json

Checklist

  • I have referenced a related issue.
  • I have followed the project’s style guidelines.
  • My changes include tests, if applicable.
  • All tests pass locally.

…ch fixes an empty clause error when having only one tool in the workflow
@eladrion eladrion self-assigned this Sep 25, 2025
@eladrion
Copy link
Contributor Author

@vedran-kasalica I am currently not sure whether some comparable can happen in other situations since both useModuleInput and useModuleOutput also append a 0 outside of any loop. If necessary, I can add that.

@eladrion
Copy link
Contributor Author

candidate_workflow_1 By the way, the (first 3) generated workflows candidate_workflow_2 candidate_workflow_3

@eladrion
Copy link
Contributor Author

@vedran-kasalica I now added a clause counting function based on DIMACS format that counts the non-empty clauses and warns if one was found. Using this function, minisat is more forgiving and ignores the empty clauses since the correct count is provided.

@eladrion
Copy link
Contributor Author

Woop, still working on that one. Something is still wrong

…mat, warn if empty clauses are found and dump the respective clause number.
@eladrion
Copy link
Contributor Author

@vedran-kasalica: behaviour stabilised and checks are passing. Skipped one line in the CNF file which made the count wrong. Ready for review

@eladrion eladrion changed the title [FIX] Omit trailing 0 for empty constraint in connected modules whi… Implement non-empty clause counting whi… Sep 25, 2025
* @param cnfEncoding the CNF encoding
* @return the count of non-empty clauses
*/
public static int countCNFClauses(InputStream cnfEncoding) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Have you compared the performance of this method against countLines? I’m concerned that using hasNextInt may slow down solving when dealing with 1M+ clauses, since it relies on regex matching (e.g., nextInt). In countLines I used byte-wise comparison for efficiency. Could we reuse countLines and extend it for this case?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just did that:

I tested the above JSON config with maximum workflow length 20 and in workflow length 4 and 11 we get

for the countLines variant

Total problem setup time: 3.019 sec (1258226 clauses).
...
Total problem setup time: 9.667 sec (4030028 clauses).

and for the countCNFClauses variant

Total problem setup time: 4.999 sec (1258203 clauses).
...
Total problem setup time: 17.289 sec (4030004 clauses)

so it definitely is slower to use countCNFClauses

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for testing. Do you think it's doable to update countLines to fix the bug?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let me take a short look...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, if I get that correctly, countLines actually does more or less nothing different (concerning complexity class) since it regexes for \n. In order to recognise in a stable way that a clause has ended, we need the pattern " " >> "0" >> _space where space may, but does not have to be a newline (according to DIMACS CNF spec on sat4J). To implement the desired behaviour in countLines, we would have to scan line by line. But a line could have n clause endings with n being arbitrary. So the implementation would be more complex (but still doable)

What we can do is to assume that if a line ends with 0, it is a clause and if it's length is 2, it's an empty clause which is not counted. I can implement that easily and test it. But I would propose to not call this function countLines, then.

I will implement and test that.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Small correction:

Usually each clause is listed on a separate line, using spaces between each of the literals and the final zero.
Sometimes long clauses use multiple lines.

So we can assume that a clause end always at the end of the line.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, the new variant has the following timings for depth 4 and 11:

Total problem setup time: 3.06 sec (1258226 clauses).
Total problem setup time: 9.072 sec (4030007 clauses).

This is comparable to countLines but there is no guarantee that the count result is really a count of clauses conforming to the format specification.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we want to make sure that it is a list of numbers, we have to introduce more checks and will have more or less the performance of countCNFClauses

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do you prefer, @vedran-kasalica?

@eladrion
Copy link
Contributor Author

Hi @vedran-kasalica, after thinking a bit about a "good" solution, I have the following proposal. We have countCNFClauses in two variants. The one parameter variant always checks the syntax and is thus slower. In the two parameter variant, the user can decide by the second parameter (boolean) whether the syntax shall be checked and internally, we use the non-checking variant since we know, that the syntax is correct if we did not do some foo. How does that sound?

@vedran-kasalica
Copy link
Member

vedran-kasalica commented Sep 29, 2025

Hi @vedran-kasalica, after thinking a bit about a "good" solution, I have the following proposal. We have countCNFClauses in two variants. The one parameter variant always checks the syntax and is thus slower. In the two parameter variant, the user can decide by the second parameter (boolean) whether the syntax shall be checked and internally, we use the non-checking variant since we know, that the syntax is correct if we did not do some foo. How does that sound?

Hi @eladrion, thanks for checking this. If the new 2-parameter countCNFClauses works well, I’d make it the default. I’m not sure overloading is the best fit here, since I’d normally expect the version with more parameters to be the specialised one, and not the default. But this is a minor comment, and we can merge it as it is, not to drag the PRs further.

The following comment is more of a discussion-, rather than an action-point.
If we look at the 1-param version in detail, it should be faster to implement it as you mentioned before, by extending the existing countCNFClauses 2-param version with a clause if len() == 2 log a warning or if the first char is ? That is quite fast to compute, while parsing every int per clause (nextInt) takes time as it parses the whole row. In addition, it doesn't provide real syntax checking, because we are not checking whether the ints are higher than the number of variables (that would be a syntax error), and nextInt reads ints, so it would skip syntax errors such as using non-numeric characters. Those are rare and captured by MiniSAT error handling, so it might not be needed to check anyway.

@eladrion
Copy link
Contributor Author

Hi @vedran-kasalica IIRC, the countLinesand also the new countCNFClauses are used only on CNF encodings without the p ... preamble. So the count of variables is in that case not really fixed. If we wanted to support scanning the SAT encodings, then we would have to check the preamble, too. But you have a point that there are no constraints on the numbers themselves. The speed of coundCNFClauses without syntax check is practically the same as countLines. And you have a point that it is counterintuitive to use the two-parameter function to not check. I think this can be resolved quite easily and the warning for empty clauses should also be quite easy to implement.

@eladrion
Copy link
Contributor Author

I separated the functionality a bit now to resolve the counter-intuition and we now have countCNFClauses and checkCNFClauses with both having only the CNF encoding as parameter. I think we can leave it as is, now.

@vedran-kasalica
Copy link
Member

I separated the functionality a bit now to resolve the counter-intuition and we now have countCNFClauses and checkCNFClauses with both having only the CNF encoding as parameter. I think we can leave it as is, now.

@eladrion thank you, looks good! Feel free to merge the PR. Should we make it part of the 2.5.3 release?

@eladrion
Copy link
Contributor Author

Hi @vedran-kasalica, yes, we should introduce this in 2.5.3 since it is a functionality flaw and resolved. I will merge.

@eladrion eladrion merged commit 112b831 into main Sep 29, 2025
1 check passed
@vedran-kasalica vedran-kasalica deleted the fix_119_empty_clause_creation branch September 29, 2025 14:07
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[BUG] No workflows generated for trivial input/output

2 participants