-
Notifications
You must be signed in to change notification settings - Fork 469
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
REST API: dryrun scratch slot type fix #3736
Conversation
daemon/algod/api/server/v2/dryrun.go
Outdated
@@ -136,7 +136,7 @@ func (ddr *dryrunDebugReceiver) updateScratch() { | |||
|
|||
if any { | |||
if ddr.scratchActive == nil { | |||
ddr.scratchActive = make([]bool, maxActive+1, 256) | |||
ddr.scratchActive = make([]bool, maxActive, 256) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This doesn't seem right. If scratch slot 1 is active, maxActive will be 1, and you need a 2 element slice to hold info on slots 0 and 1, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is only called when its the first time scratch space is used and immediately after this we iterate from len(scratchActive) to maxActive inclusive using append to set the values.
I think we could probably just create it with 0 len and capacity of 256
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We could also just make it with len of maxActive+1 and iterate from 0 to maxActive inclusive and set the values with index instead of using append
daemon/algod/api/server/v2/dryrun.go
Outdated
} | ||
for i := len(ddr.scratchActive); i <= maxActive; i++ { | ||
ddr.scratchActive = make([]bool, maxActive+1, 256) | ||
for i := 0; i <= maxActive; i++ { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
creating a new slice and iterating starting from 0 is less efficient but seems much easier for me to understand
Completely refactored this to make it make more sense to me, I'm still not sure why we need to zero anything at the bottom of the function. When is that relevant? General question on these, should I push to my own fork and pr from there? or is it ok to push a branch here? |
daemon/algod/api/server/v2/dryrun.go
Outdated
if !ddr.scratchActive[i] { | ||
(*ddr.history[lasti].Scratch)[i].Type = 0 | ||
(*ddr.history[lasti].Scratch)[i] = generated.TealValue{} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we're zero'ing the type, presumably we dont want anything in the uint/bytes either so set this to empty struct
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suspect we do it to ensure that nothing appears in the response. Though I'm unclear why it would have been previously populated. Is that your question as well? I'm not sure how ddr.history[lasti]
is set.
Codecov Report
@@ Coverage Diff @@
## master #3736 +/- ##
==========================================
+ Coverage 49.65% 49.68% +0.03%
==========================================
Files 392 392
Lines 68588 68578 -10
==========================================
+ Hits 34059 34075 +16
+ Misses 30781 30761 -20
+ Partials 3748 3742 -6
Continue to review full report at Codecov.
|
@@ -393,6 +393,25 @@ func checkAppCallPass(t *testing.T, response *generated.DryrunResponse) { | |||
} | |||
} | |||
|
|||
func checkAppCallScratchType(t *testing.T, response *generated.DryrunResponse, slot int, tt basics.TealType) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd rather see this check function take the txn index, that way it can assert that txn in question has an AppCallTrace. And then it should assert that traceline.Scratch is not nil, and that the slot has the right type. As written, it's too lenient. It allows the scratch to not be set, or all of the txns to be of the wrong type, etc.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It might also be useful to have a way to assert that a scratch slot is not included in the response. If I understand correctly, though point of all this maxActive
stuff is to keep the responses small, and not contain slots that were never touched, right? We should test that is done right.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Correct, I'll make both changes
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps you're dealing with the fact that at the start of the trace, not type is set, since it hasn't been used yet. In that case, I'd still take the txn index, and confirm AppCallTrace, but I'd leave the other continue
s. But, we want to be sure that line 408 happens at least once, or we're not really checking anything. So I'd set a flag there, and assert it at the end of the function.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good, please consider a small optimization suggested
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, I think I finally see what the old bug was, and I think you've fixed it, but you've also introduced a change that I am ok with, but I want to say it outloud.
I think the old bug was not an off-by-one error. Instead, scratchActive[i]
was set once, the first time a slot i
or higher was set, and never reconsidered. So, if the first line of a program set slot 10 to "hello", then scratchActive[9]
(and lower) was set to false, and stayed there forever. So I bet that in your buggy example, slot 6 was set before slot 5.
In your new code, the entire scratchActive array is computed every time, so I think you fixed this bug. (But if I'm right, your unit test doesn't explicitly test that situation.)
But, this also means that a later frame can have fewer active slots than an earlier frame, if the highest active scratch space is set to a uint64 0 in TEAL. I'm ok with that. (not just shorter - the whole array can become nil after previously having slots)
On the other hand, I think you said your unit test did uncover the error. So, that makes my explanation suspect. The trouble is that I just don't see an off-by-one error in the old code.
Our usual pattern is to have your own fork, make a branch there, then PR it
to algorand's master branch. No need to change what you did on this one.
…On Thu, Mar 10, 2022 at 6:37 AM Ben Guidarelli ***@***.***> wrote:
Completely refactored this to make it make more sense to me, I'm still not
sure why we need to zero anything at the bottom of the function. When is
that relevant?
General question on these, should I push to my own fork and pr from there?
or is it ok to push a branch here?
—
Reply to this email directly, view it on GitHub
<#3736 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AADL7TZ573RAMDEGEEPX44DU7HNHLANCNFSM5QKVOVYQ>
.
You are receiving this because your review was requested.Message ID:
***@***.***>
|
@jannotti I think the original bug was that on the first use of populated scratch we'd initialize the slice of the correct length but then immediately append to it rather then setting at the index. So a scratch var at idx 5 would have it's bool populated at idx 6 and so on |
Simplify handling of "active slots" to fix possible obo error. Unit tests added to confirm typing.
Summary
When performing a dryrun, if the program uses scratch space there is some logic to determine which slots are active and if inactive the type is set to 0.
There seems to be an off-by-one bug where the types are offset by one element in the scratch slot array. This was causing output like:
Note that the N-1 element has its type set to 0.
Test Plan
Modified existing test, without the change in the dryrun.go file, the test fails.