Skip to content

Proof Scripts in JML#3657

Merged
wadoon merged 104 commits into
KeYProject:mainfrom
mattulbrich:jmlScripts
Jun 29, 2026
Merged

Proof Scripts in JML#3657
wadoon merged 104 commits into
KeYProject:mainfrom
mattulbrich:jmlScripts

Conversation

@mattulbrich

@mattulbrich mattulbrich commented Sep 8, 2025

Copy link
Copy Markdown
Member

Related Issue

For quite some time, we have a prototype for JML proof scripts which we finally want to bring to the mater branch.

Intended Change

Proof scripts can be written into JML comments and can be obeyed during automatic verification.
The major benefit is that the localisation of the proof node to which a proof script should be applied is natural.
The other benefit is that one can use JML expressions and thus does not have to leave the realm of JML to do script-guided proofs.

This relies on #3587 and needs #3654 merged.

Plan

  • Refine the syntax
  • Adapt lexer and parser
  • Reimport the implementation that was available on an older branch
  • Make it more convenient
  • Add examples
  • Code cleanup
  • Document the changes

Type of pull request

  • New feature (non-breaking change which adds functionality)
  • There are changes to the (Java) code
  • There are changes to the taclet rule base (possibly small adaptations)

Ensuring quality

WIP:

  • I made sure that introduced/changed code is well documented (javadoc and inline comments). ✔️
  • I made sure that new/changed end-user features are well documented (https://github.com/KeYProject/key-docs). ✔️
  • I added new test case(s) for new functionality. ✔️
  • I have tested the feature as follows: ... examples and test cases
  • I have checked that runtime performance has not deteriorated. not for non-script proofs.

Additional information and contact(s)

Planned to be done Mid 09/25

The contributions within this pull request are licensed under GPLv2 (only) for inclusion in KeY.

Comment thread key.core/src/main/antlr4/JmlParser.g4
@wadoon

wadoon commented Sep 9, 2025

Copy link
Copy Markdown
Member

I have written an integer split command.

An induction command is on the way with support for int, user-data types, bsum(?).

Commit: 4644331

@mattulbrich

Copy link
Copy Markdown
Member Author

The history of this PR was totally changed while I was working on it.
I will have to force-push my changes of the last few days.
In case it is still needed: e841235

@mattulbrich mattulbrich force-pushed the jmlScripts branch 2 times, most recently from 96a6a98 to 2382655 Compare October 4, 2025 01:01
@mattulbrich mattulbrich force-pushed the jmlScripts branch 4 times, most recently from 2066973 to ae8b141 Compare October 11, 2025 00:01
@mattulbrich mattulbrich marked this pull request as ready for review October 11, 2025 00:11
@mattulbrich mattulbrich added the Review Request Waiting for review label Oct 11, 2025
@WolframPfeifer WolframPfeifer modified the milestones: v2.13.0, v3.0.0 Jan 7, 2026
@wadoon

wadoon commented Feb 3, 2026

Copy link
Copy Markdown
Member

@WolframPfeifer Could you please revert the merge of main into this PR (and do a rebase instead, or just let to @mattulbrich for the sake of confusion avoidance)?

This was a merge-free PR, which results in an easier-to-read git history.

@WolframPfeifer

Copy link
Copy Markdown
Member

@WolframPfeifer Could you please revert the merge of main into this PR (and do a rebase instead, or just let to @mattulbrich for the sake of confusion avoidance)?

This was a merge-free PR, which results in an easier-to-read git history.

Ok, I reverted the merge. I see the point about a clean history. However, I am not sure what a good workflow would be. Should we do a merge with main only at the end of the feature development, before the merge of the PR?

@wadoon

wadoon commented Feb 23, 2026

Copy link
Copy Markdown
Member
image
35160      DEBUG Test worker     d.u.i.k.s.ProofScriptEngine Commands: macro  "autopilot-prep";
tryclose  ;
macro  "simp-upd";
rule  seqPermFromSwap;
rule  andRight;
auto  ;
instantiate with: (i_0) var: "iv" hide;
instantiate with: j_0 var: jv hide;
auto  ;
macro  "simp-upd";
rule  seqPermFromSwap;
rule  andRight;
auto  ;
instantiate with: (i_0) var: "iv" hide;
instantiate with: (to) var: "jv" hide;
auto  ; 

Error while executing script: Could not convert value 'j_0' from type 'ProofScriptExpressionContext' to type 'JTerm'

Command: instantiate with: j_0 var: jv hide;
Position: [file:/home/weigl/work/key/key.core/../key.ui/examples/heap/quicksort/split.key:30/2]

de.uka.ilkd.key.scripts.ScriptException: Error while executing script: Could not convert value 'j_0' from type 'ProofScriptExpressionContext' to type 'JTerm'

Command: instantiate with: j_0 var: jv hide;
Position: [file:/home/weigl/work/key/key.core/../key.ui/examples/heap/quicksort/split.key:30/2]

	at de.uka.ilkd.key.scripts.ProofScriptEngine.execute(ProofScriptEngine.java:162)
	at de.uka.ilkd.key.scripts.ProofScriptEngine.execute(ProofScriptEngine.java:80)
	at de.uka.ilkd.key.proof.runallproofs.ProveTest.autoMode(ProveTest.java:194)

image
5028      INFO  Test worker     d.u.i.k.p.r.ProveTest     (parallelGcd.proof|1412243253) Proof will be saved to: /home/weigl/work/key/key.core/../key.ui/examples/heap/verifyThis15_2_ParallelGcd/parallelGcd.proof.proof 
25028      INFO  Test worker     d.u.i.k.p.r.ProveTest     (parallelGcd.proof|1412243253) Start proving 
25028      INFO  Test worker     d.u.i.k.p.i.AbstractProblemLoader Loading environment from /home/weigl/work/key/key.core/../key.ui/examples/heap/verifyThis15_2_ParallelGcd/parallelGcd.proof 
25028      DEBUG Test worker     d.u.i.k.p.i.AbstractProblemLoader Environment load took 150.53ns 
25030      INFO  Test worker     d.u.i.k.p.i.AbstractProblemLoader Creating init config 
26664      DEBUG Test worker     d.u.i.k.p.i.AbstractProblemLoader Init config took 1.63s 
26895      DEBUG Test worker     d.u.i.k.p.i.IntermediateProofReplayer Proof replay took 194.78ms 
26895      INFO  Test worker     d.u.i.k.p.i.AbstractProblemLoader Replay result: Proof replayed successfully.Errors while reading the proof. Not all branches could be load successfully. 
26896      INFO  Test worker     d.u.i.k.p.r.ProveTest     (parallelGcd.proof|1412243253) Proving done 

Loading problem file failed
Expected :false
Actual   :true
<Click to see difference>

image
Macro 'script-auto' raised an exception: Error while executing script: Multiple or no formulas modified or added in last step, cannot identify sequent formula to skolemize.

Command: __obtain from_goal: true var: x ;
Position: [file:/home/weigl/work/key/key.core/../key.ui/examples/heap/BoyerMoore/src/BoyerMoore.java:74/18]

de.uka.ilkd.key.scripts.ScriptException: Macro 'script-auto' raised an exception: Error while executing script: Multiple or no formulas modified or added in last step, cannot identify sequent formula to skolemize.

Command: __obtain from_goal: true var: x ;
Position: [file:/home/weigl/work/key/key.core/../key.ui/examples/heap/BoyerMoore/src/BoyerMoore.java:74/18]

	at app//de.uka.ilkd.key.scripts.MacroCommand.execute(MacroCommand.java:117)
	at app//de.uka.ilkd.key.scripts.AbstractCommand.execute(AbstractCommand.java:79)
	at app//de.uka.ilkd.key.scripts.ProofScriptEngine.execute(ProofScriptEngine.java:129)
	at app//de.uka.ilkd.key.scripts.ProofScriptEngine.execute(ProofScriptEngine.java:80)
	at app//de.uka.ilkd.key.proof.runallproofs.ProveTest.autoMode(ProveTest.java:194)
	at app//de.uka.ilkd.key.proof.runallproofs.ProveTest.runKey(ProveTest.java:135)
	at app//de.uka.ilkd.key.proof.runallproofs.ProveTest.assertProvability(ProveTest.java:63)
	at app//de.uka.ilkd.key.proof.runallproofs.gen._example_algos.testBM_bm(_example_algos.java:36)
	at java.base@25.0.2/java.lang.reflect.Method.invoke(Method.java:565)
	at java.base@25.0.2/java.util.ArrayList.forEach(ArrayList.java:1604)
	at java.base@25.0.2/java.util.ArrayList.forEach(ArrayList.java:1604)
Caused by: de.uka.ilkd.key.scripts.ScriptException: Error while executing script: Multiple or no formulas modified or added in last step, cannot identify sequent formula to skolemize.

Command: __obtain from_goal: true var: x ;
Position: [file:/home/weigl/work/key/key.core/../key.ui/examples/heap/BoyerMoore/src/BoyerMoore.java:74/18]

	at app//de.uka.ilkd.key.scripts.ProofScriptEngine.execute(ProofScriptEngine.java:162)
	at app//de.uka.ilkd.key.macros.ApplyScriptsMacro.applyTo(ApplyScriptsMacro.java:210)
	at app//de.uka.ilkd.key.macros.AbstractProofMacro.applyTo(AbstractProofMacro.java:72)
	at app//de.uka.ilkd.key.macros.SequentialProofMacro.applyTo(SequentialProofMacro.java:99)
	at app//de.uka.ilkd.key.macros.AbstractProofMacro.applyTo(AbstractProofMacro.java:72)
	at app//de.uka.ilkd.key.scripts.MacroCommand.execute(MacroCommand.java:113)
	... 10 more
Caused by: java.lang.IllegalStateException: Multiple or no formulas modified or added in last step, cannot identify sequent formula to skolemize.
	at de.uka.ilkd.key.scripts.ObtainCommand.identifySequentFormula(ObtainCommand.java:139)
	at de.uka.ilkd.key.scripts.ObtainCommand.executeFromGoal(ObtainCommand.java:91)
	at de.uka.ilkd.key.scripts.ObtainCommand.execute(ObtainCommand.java:77)
	at de.uka.ilkd.key.scripts.AbstractCommand.execute(AbstractCommand.java:79)
	at de.uka.ilkd.key.scripts.ProofScriptEngine.execute(ProofScriptEngine.java:129)
	... 15 more

@mattulbrich mattulbrich marked this pull request as draft June 27, 2026 06:01
@mattulbrich

Copy link
Copy Markdown
Member Author

I have no idea why the Windows test cases fail. I even used my Windows PC at home to run the tests: And they succeed. ...

Thanks for investigating, Richard @unp1

Apparently (at least C's analysis) line endings on Windows. It suggests two fixes:

Who is C? Or is it Claude?

This branch is on a reduced github workflow. It seems that the parameters are parsed identically. (see details below). In the current version, all \r are filtered out anyways.

What definitely is different is the order in which the files are investigated. I added a sort routine to make this order deterministic.

However, for some reason there are now again 8 merge conflicts even though I merged only last week; that is why the tests do not run. I do not know if I will find the time to do this merge in the next days. :(

Grep results showing that parameters are identically parsed. AutoOnly2 fails on Windows, not on ubuntu.
linux:2026-06-27T06:25:25.3047846Z     !!! Parameters for /home/runner/work/key/key/key.core/build/resources/test/de/uka/ilkd/key/scripts/jml/AutoOnly2.java: Parameters{shouldClose=false, method='null', exception='null', deleteTmpDir=true, settings=null}
windows:2026-06-27T06:26:33.9999167Z     !!! Parameters for D:\a\key\key\key.core\build\resources\test\de\uka\ilkd\key\scripts\jml\AutoOnly2.java: Parameters{shouldClose=false, method='null', exception='null', deleteTmpDir=true, settings=null}

# Conflicts:
#	key.core/src/main/java/de/uka/ilkd/key/java/loader/JP2KeYConverter.java
#	key.core/src/main/java/de/uka/ilkd/key/scripts/ProofScriptEngine.java
#	key.ncore/src/main/antlr/KeYCommonParser.g4
# Conflicts:
#	key.core/src/main/java/de/uka/ilkd/key/scripts/ProofScriptEngine.java
@unp1

unp1 commented Jun 29, 2026

Copy link
Copy Markdown
Member

I have no idea why the Windows test cases fail. I even used my Windows PC at home to run the tests: And they succeed. ...

Thanks for investigating, Richard @unp1

Apparently (at least C's analysis) line endings on Windows. It suggests two fixes:

Who is C? Or is it Claude?
yupp, at the moment I have still the license (not for long anzmore :-)

This branch is on a reduced github workflow. It seems that the parameters are parsed identically. (see details below). In the current version, all \r are filtered out anyways.

What definitely is different is the order in which the files are investigated. I added a sort routine to make this order deterministic.

However, for some reason there are now again 8 merge conflicts even though I merged only last week; that is why the tests do not run. I do not know if I will find the time to do this merge in the next days. :(

Grep results showing that parameters are identically parsed.

If you want I can try to get Claude doing the resolution and we both check afterwards that everything is correct?
Only problem is I don't have access to a Windows machine, so tests will run on CI.

@mattulbrich

Copy link
Copy Markdown
Member Author

If C could do the merge and you review it, that would be splendid.
The history on this branch is a bit broken since I modified the github workflows for quicker testing. So history rewrite would be required here. I was able to partially merge intermediate masters into the branch, but it is rather painful.

My bet and hope is that c0a7ae8 will solve the problem. ..

@mattulbrich

Copy link
Copy Markdown
Member Author

Only problem is I don't have access to a Windows machine, so tests will run on CI.

I do have a Windows machine: And it goes through there 🫢 . But not on the CI machine. Therefore, I assume the indeterministic file order may be the problem.

This of course raises another issue: Why does this order matter? At least on could investigate then. ... if that is the reason.

unp1 added 2 commits June 29, 2026 16:49
Resolve conflicts from the source-location move to key.ncore (KeYProject#3870):
Position/Location now live in org.key_project.util.parsing. Updated the
stale imports (JP2KeYConverter, KeyAst, ScriptCommandAst, JmlFacade,
JMLTransformer, ApplyScriptsMacro) and moved the branch's
Location.fromPositionInfo helper into key.core's JavaSourceLocations (it
uses PositionInfo, so it cannot live in the now JavaParser-free ncore
Location).
…ripts

The character loop's `case '\r' -> { }` did nothing, so a carriage return fell
through to the `cmdBuilder.append(...)` below and ended up inside the command --
e.g. in a quoted value that spans a CRLF line break. Script files are marked
`text` in .gitattributes and are therefore checked out with CRLF on Windows (and
depending on the CI runner's eol config), so scripts parsed there carried stray
'\r's and could fail to match. Skip the carriage return with `continue`.

Regression test (ScriptLineParser-level) added in ProofScriptParserTest; it fails
without the fix and passes with it.

Created with AI tooling
@wadoon

wadoon commented Jun 29, 2026

Copy link
Copy Markdown
Member

I do have a Windows machine: And it goes through there 🫢 .

Windows != Windows. On Github, it is a Windows Server (2023 or 2025, I think.)
Behavior on these machines can slightly differ.

In the old days, e.g., JVM decided for -server mode on Windows Server resulting into different behavior on volatile fields.

unp1 added a commit to mattulbrich/key that referenced this pull request Jun 29, 2026
… -- REVERT

Logs the applied-rule sequence + proof stats + line separator per case so the
Windows-vs-ubuntu CI output can be diffed to find where automatic proof search
diverges. To be reverted once the cause is found.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
unp1 added a commit to mattulbrich/key that referenced this pull request Jun 29, 2026
…ect#3657) -- REVERT

Prints per-goal whether a JML assert/script was found, the rendered command
lines (with \r/\n visualized), and any exception thrown by pse.execute, to
locate the Windows-only failure. To be reverted with the JmlScriptTest one.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
@unp1

unp1 commented Jun 29, 2026

Copy link
Copy Markdown
Member

I do have a Windows machine: And it goes through there 🫢 . But not on the CI machine. Therefore, I assume the indeterministic file order may be the problem.

This of course raises another issue: Why does this order matter? At least on could investigate then. ... if that is the reason.

The reason is actually quite interesting as a reminder of the subtleties between the platforms:

The problem was in SymbolicExecutionMacro:

ADMITTED_RULES = Arrays.asList(
               Streams.toString(SymbolicExecutionOnlyMacro.class
                       .getResourceAsStream("SymbolicExecutionOnlyMacro.admittedRules.txt"))
                       .split("\n"));
ADMITTED_RULE_SETS = Arrays.asList(
               Streams.toString(SymbolicExecutionOnlyMacro.class
                       .getResourceAsStream("SymbolicExecutionOnlyMacro.admittedRuleSets.txt"))
                       .split("\n"));

depending on the line endings "simplify\n" --> "simplify" and a later equals check to "simplify" when deciding whether to apply the taclet or not succeeded. While when the line endings in the txt file were
"simplify\r\n" --> "simplify\r" and the later equals check with "simplify" became false and no rules were applied.

The difference between the CI and your machine is most likely the git configuration that converts / does not convert the line endings in the files. Even so, not necessary, the fix for the deprecated ScriptLineParser is also in as additional hardening.

The merge and the commit that fixes the test are now on the branch.

SymbolicExecutionOnlyMacro read its admitted rule / rule-set lists from the
bundled .txt resources and split them on "\n". On a Windows checkout these
resources have CRLF line endings (.gitattributes marks them as text), so every
entry kept a trailing '\r' ("assignment\r", "simplify_prog\r", ...).

isAdmittedRule compares the actual rule name ("assignment") against those
entries via List.contains, which then never matched. As a result no symbolic-
execution rule was admitted, the macro applied nothing, ApplyScriptsMacro never
reached a goal carrying a JmlAssert, and all goals fell through to the
TryCloseMacro fallback -- making JmlScriptTest cases close (AutoOnly2) or fail
to close (NestedAssert, ObtainFromGoal) only on Windows.

Read the lists via String.lines() and strip each entry so the names match
regardless of the checkout's line endings. Reproduced locally by converting the
two resources to CRLF (3 cases fail), fixed by this change (all 7 pass).

with AI tooling support
@wadoon wadoon marked this pull request as ready for review June 29, 2026 21:55
@wadoon wadoon added this pull request to the merge queue Jun 29, 2026
Merged via the queue into KeYProject:main with commit 993820e Jun 29, 2026
36 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants