Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

OSS: extremely bad replaceKnown performance #3248

Open
FliegendeWurst opened this issue Aug 19, 2023 · 11 comments · May be fixed by #3272
Open

OSS: extremely bad replaceKnown performance #3248

FliegendeWurst opened this issue Aug 19, 2023 · 11 comments · May be fixed by #3272

Comments

@FliegendeWurst
Copy link
Member

FliegendeWurst commented Aug 19, 2023

Description

Automated proof search, replaceKnown accounts for 43% of all CPU time: (the view below is zoomed in)

image

As far as I can tell, almost no replace_known_left / right rules were actually applied.

Reproducible

always

Steps to reproduce

  1. Load a Java file with a hard to prove specification
  2. Run one of the strategies

I can provide the Java file if needed.

Expected behaviour: acceptable performance, KeY spends CPU time on useful stuff
Actual behaviour: performance gets worse as the proof grows, KeY wastes half its CPU time

Additional information

I should mention that I modified the Simplifier to consider more rulesets.

@mattulbrich
Copy link
Member

Thanks for investigating. A number of years back, we had once investigated this issue, and I confirm your observation: The majority of time is spent (wasted?) on equality treatment. There were a few suggestions to change the treatment then.

One plan was to set up a set of left-hand-sides of equations and only check for newly introduced terms if they occur in this set. Trigger a rule application then. For some reason the implementation turned out not to be faster. But I still believe it should have been faster.

I'd be happy if we could revisit this issue.

@FliegendeWurst
Copy link
Member Author

FliegendeWurst commented Aug 20, 2023

This is the hash code used for the map:

@Override
public int hashCode() {
return term.op().hashCode(); // Allow more conflicts to ensure that naming and term
// labels are ignored.
}

I think performance can be improved by adding more data to the hash code. For example the term depth must also be equal (I believe):

@Override
public int hashCode() {
    return Objects.hash(term.op(), term.depth());
} 

@mattulbrich
Copy link
Member

I don't think that this coarse hash code scheme is the problem.

I replaced it with the hashing function in https://gist.github.com/mattulbrich/28c0f53f5a9f608c8c86f2ab1da39546, and it produced proofs of almost the same runtime (and same number of rule apps).

I am not fully sure how it works, but Richard's compiled taclet matcher may be an efficient alternative to checking all subterms with a hashtable ...

/cc @unp1

@FliegendeWurst
Copy link
Member Author

FliegendeWurst commented Aug 20, 2023

I replaced it with the hashing function in gist.github.com/mattulbrich/28c0f53f5a9f608c8c86f2ab1da39546, and it produced proofs of almost the same runtime (and same number of rule apps).

I do think it has a big impact, at least in some situations. Your proposed hash is much slower (mine: 282 seconds for 100% replay, yours: 360 seconds for 19% of the replay).

@mattulbrich
Copy link
Member

Ok, that obviously depends on the program to load. I apparently looked at a smaler program. Which example do you use for reference?

@mattulbrich
Copy link
Member

While you are at it: The treatment of equalities (applyEq) was found out to be also very slow and inefficient and turned out be the major time waster in many proofs. Feel free to fix it if you stumble across it. :-P

@FliegendeWurst
Copy link
Member Author

Ok, that obviously depends on the program to load. I apparently looked at a smaler program. Which example do you use for reference?

I use https://gist.github.com/FliegendeWurst/9be15e20e3bfd0722fcdd7f5c4afebcd. It requires some of my changes in https://github.com/FliegendeWurst/key/tree/testing, most notably the support for \dl_seqGet__int (although by now I think it could simply be written as (int) \dl_seqGet).

@FliegendeWurst
Copy link
Member Author

FliegendeWurst commented Aug 21, 2023

With some more optimizations it is possible to eliminate this particular slowdown:
Screenshot_20230821_171442
Now the other bit of logic in the Simplifier takes approx. 90% of the CPU time. This is actuallly expected because I added some more rules to the OSS for this experiment (selectOfStore, selectOfCreate, selectOfAnon, selectOfMemset, dismissNonSelectedField, selectCreatedOfAnon, castDel, sortsDisjointModuloNull, ifthenelse_negated, elementOfUnion, elementOfArrayRange).

In fact, almost all CPU time is spent in refactorLabelsRecursive.

@mattulbrich
Copy link
Member

mattulbrich commented Aug 21, 2023 via email

@FliegendeWurst
Copy link
Member Author

FliegendeWurst commented Aug 21, 2023

Regarding selectOfStore etc. Christoph Scheben implemented a set of taclets which pull out common heap terms such that such select-store chains need not be reduced more than once. It may be that adding taclets to the OSS may render these techniques useless.

That's of course even better. I will look into it.

In any case, I have done another profiling session (this time with origin tracking disabled):
image
It appears CPU time is spent on:

  • finding the taclet (~75%)
  • rewriting the term (~20%)

@FliegendeWurst
Copy link
Member Author

I have done some more tests on ShortestPath.java:

Settings Proof Rule apps Time
No origin tracking, OSS: with additional rules 3008 nodes 226,182 / 462,239 71.2, 92.2 seconds
Origin tracking, OSS: with additional rules 3008 nodes 226,182 / 462,239 110.2, 136.6 seconds
No origin tracking, OSS: no additional rules >53k nodes 16,877 / >124k <25 seconds, >9h

I ran the macros Finish Symbolic Execution and Heap Simplification in each configuration (rule app count above is after executing each macro).
The last run did not finish in 9 hours. After stopping the macro the UI became unresponsive, meaning the numbers are a lower bound.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants