REST API: dryrun scratch slot type fix #3736

barnjamin · 2022-03-09T20:45:40Z

Summary

When performing a dryrun, if the program uses scratch space there is some logic to determine which slots are active and if inactive the type is set to 0.

There seems to be an off-by-one bug where the types are offset by one element in the scratch slot array. This was causing output like:

  "scratch": [
                        {
                            "bytes": "",
                            "type": 0,
                            "uint": 0
                        },
                        {
                            "bytes": "",
                            "type": 0,
                            "uint": 0
                        },
                        {
                            "bytes": "",
                            "type": 0,
                            "uint": 0
                        },
                        {
                            "bytes": "",
                            "type": 0,
                            "uint": 0
                        },
                        {
                            "bytes": "SW4gYXBwIGlkIDIzIHdoaWNoIHdhcyBjYWxsZWQgd2l0aCBhc3NldCBuYW1lIG9rYXk=",
                            "type": 0,
                            "uint": 0
                        },
                        {
                            "bytes": "SW4gYXBwIGlkIDIzIHdoaWNoIHdhcyBjYWxsZWQgd2l0aCBhc3NldCBuYW1lIG9rYXk=",
                            "type": 1,
                            "uint": 0
                        }
                    ],

Note that the N-1 element has its type set to 0.

Test Plan

Modified existing test, without the change in the dryrun.go file, the test fails.

…tch slots

jannotti · 2022-03-09T21:10:48Z

daemon/algod/api/server/v2/dryrun.go

@@ -136,7 +136,7 @@ func (ddr *dryrunDebugReceiver) updateScratch() {

 	if any {
 		if ddr.scratchActive == nil {
-			ddr.scratchActive = make([]bool, maxActive+1, 256)
+			ddr.scratchActive = make([]bool, maxActive, 256)


This doesn't seem right. If scratch slot 1 is active, maxActive will be 1, and you need a 2 element slice to hold info on slots 0 and 1, right?

this is only called when its the first time scratch space is used and immediately after this we iterate from len(scratchActive) to maxActive inclusive using append to set the values.

I think we could probably just create it with 0 len and capacity of 256

We could also just make it with len of maxActive+1 and iterate from 0 to maxActive inclusive and set the values with index instead of using append

barnjamin · 2022-03-09T21:51:32Z

daemon/algod/api/server/v2/dryrun.go

-		}
-		for i := len(ddr.scratchActive); i <= maxActive; i++ {
+		ddr.scratchActive = make([]bool, maxActive+1, 256)
+		for i := 0; i <= maxActive; i++ {


creating a new slice and iterating starting from 0 is less efficient but seems much easier for me to understand

barnjamin · 2022-03-10T11:37:14Z

Completely refactored this to make it make more sense to me, I'm still not sure why we need to zero anything at the bottom of the function. When is that relevant?

General question on these, should I push to my own fork and pr from there? or is it ok to push a branch here?

barnjamin · 2022-03-10T11:42:04Z

daemon/algod/api/server/v2/dryrun.go

 		if !ddr.scratchActive[i] {
-			(*ddr.history[lasti].Scratch)[i].Type = 0
+			(*ddr.history[lasti].Scratch)[i] = generated.TealValue{}


If we're zero'ing the type, presumably we dont want anything in the uint/bytes either so set this to empty struct

I suspect we do it to ensure that nothing appears in the response. Though I'm unclear why it would have been previously populated. Is that your question as well? I'm not sure how ddr.history[lasti] is set.

codecov-commenter · 2022-03-10T12:32:54Z

Codecov Report

Merging #3736 (2fe8d90) into master (6c52126) will increase coverage by 0.03%.
The diff coverage is 100.00%.

@@            Coverage Diff             @@
##           master    #3736      +/-   ##
==========================================
+ Coverage   49.65%   49.68%   +0.03%     
==========================================
  Files         392      392              
  Lines       68588    68578      -10     
==========================================
+ Hits        34059    34075      +16     
+ Misses      30781    30761      -20     
+ Partials     3748     3742       -6

Impacted Files	Coverage Δ
daemon/algod/api/server/v2/dryrun.go	`70.12% <100.00%> (+1.52%)`	⬆️
data/transactions/verify/txn.go	`44.15% <0.00%> (-0.87%)`	⬇️
network/requestTracker.go	`70.25% <0.00%> (-0.87%)`	⬇️
ledger/tracker.go	`74.67% <0.00%> (-0.86%)`	⬇️
catchup/service.go	`70.12% <0.00%> (+0.74%)`	⬆️
ledger/acctupdates.go	`69.22% <0.00%> (+0.79%)`	⬆️
cmd/tealdbg/debugger.go	`72.41% <0.00%> (+0.98%)`	⬆️
ledger/blockqueue.go	`83.90% <0.00%> (+1.72%)`	⬆️
network/wsPeer.go	`68.05% <0.00%> (+2.22%)`	⬆️
cmd/algoh/blockWatcher.go	`80.95% <0.00%> (+3.17%)`	⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 6c52126...2fe8d90. Read the comment docs.

jannotti · 2022-03-10T16:40:28Z

daemon/algod/api/server/v2/dryrun_test.go

@@ -393,6 +393,25 @@ func checkAppCallPass(t *testing.T, response *generated.DryrunResponse) {
 	}
 }

+func checkAppCallScratchType(t *testing.T, response *generated.DryrunResponse, slot int, tt basics.TealType) {


I'd rather see this check function take the txn index, that way it can assert that txn in question has an AppCallTrace. And then it should assert that traceline.Scratch is not nil, and that the slot has the right type. As written, it's too lenient. It allows the scratch to not be set, or all of the txns to be of the wrong type, etc.

It might also be useful to have a way to assert that a scratch slot is not included in the response. If I understand correctly, though point of all this maxActive stuff is to keep the responses small, and not contain slots that were never touched, right? We should test that is done right.

Correct, I'll make both changes

Perhaps you're dealing with the fact that at the start of the trace, not type is set, since it hasn't been used yet. In that case, I'd still take the txn index, and confirm AppCallTrace, but I'd leave the other continues. But, we want to be sure that line 408 happens at least once, or we're not really checking anything. So I'd set a flag there, and assert it at the end of the function.

algorandskiy

Looks good, please consider a small optimization suggested

daemon/algod/api/server/v2/dryrun.go

jannotti

Ok, I think I finally see what the old bug was, and I think you've fixed it, but you've also introduced a change that I am ok with, but I want to say it outloud.

I think the old bug was not an off-by-one error. Instead, scratchActive[i] was set once, the first time a slot i or higher was set, and never reconsidered. So, if the first line of a program set slot 10 to "hello", then scratchActive[9] (and lower) was set to false, and stayed there forever. So I bet that in your buggy example, slot 6 was set before slot 5.

In your new code, the entire scratchActive array is computed every time, so I think you fixed this bug. (But if I'm right, your unit test doesn't explicitly test that situation.)

But, this also means that a later frame can have fewer active slots than an earlier frame, if the highest active scratch space is set to a uint64 0 in TEAL. I'm ok with that. (not just shorter - the whole array can become nil after previously having slots)

On the other hand, I think you said your unit test did uncover the error. So, that makes my explanation suspect. The trouble is that I just don't see an off-by-one error in the old code.

jannotti · 2022-03-12T18:00:53Z

Our usual pattern is to have your own fork, make a branch there, then PR it to algorand's master branch. No need to change what you did on this one.

…

On Thu, Mar 10, 2022 at 6:37 AM Ben Guidarelli ***@***.***> wrote: Completely refactored this to make it make more sense to me, I'm still not sure why we need to zero anything at the bottom of the function. When is that relevant? General question on these, should I push to my own fork and pr from there? or is it ok to push a branch here? — Reply to this email directly, view it on GitHub <#3736 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AADL7TZ573RAMDEGEEPX44DU7HNHLANCNFSM5QKVOVYQ> . You are receiving this because your review was requested.Message ID: ***@***.***>

barnjamin · 2022-03-12T20:56:03Z

@jannotti I think the original bug was that on the first use of populated scratch we'd initialize the slice of the correct length but then immediately append to it rather then setting at the index. So a scratch var at idx 5 would have it's bool populated at idx 6 and so on

Simplify handling of "active slots" to fix possible obo error. Unit tests added to confirm typing.

tweaking slice length to fix off by 1 error when checking active scra…

9134507

…tch slots

barnjamin requested review from algorandskiy and jannotti March 9, 2022 20:45

barnjamin added the Bug-Fix label Mar 9, 2022

barnjamin changed the title ~~Dryrun Scratch Slot Type Fix~~ Algod Server: Dryrun Scratch Slot Type Fix Mar 9, 2022

jannotti reviewed Mar 9, 2022

View reviewed changes

adding more test checks, changing active slot initialization

baa8248

barnjamin commented Mar 9, 2022

View reviewed changes

completely refactor scratch updater

3fb8879

barnjamin commented Mar 10, 2022

View reviewed changes

barnjamin requested review from jannotti, tzaffi and tsachiherman and removed request for algorandskiy March 10, 2022 16:31

jannotti reviewed Mar 10, 2022

View reviewed changes

barnjamin added 2 commits March 10, 2022 12:29

more stringent testing

bb04b21

fmt, comments

cc43abf

algorandskiy reviewed Mar 10, 2022

View reviewed changes

daemon/algod/api/server/v2/dryrun.go Outdated Show resolved Hide resolved

dont alloc new slice every time updateScratch is called

2fe8d90

algorandskiy changed the title ~~Algod Server: Dryrun Scratch Slot Type Fix~~ REST API: dryrun scratch slot type fix Mar 10, 2022

algorandskiy approved these changes Mar 10, 2022

View reviewed changes

barnjamin requested review from jannotti and removed request for tsachiherman March 11, 2022 13:21

jannotti reviewed Mar 12, 2022

View reviewed changes

jannotti merged commit f6569a5 into master Mar 13, 2022

jannotti deleted the dryrun-scratch-type-fix branch March 13, 2022 23:15

jannotti pushed a commit to jannotti/go-algorand that referenced this pull request Mar 13, 2022

REST API: dryrun scratch slot type fix (algorand#3736)

38f885d

Simplify handling of "active slots" to fix possible obo error. Unit tests added to confirm typing.

algojack mentioned this pull request Mar 15, 2022

go-algorand 3.5.1-beta #3774

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

REST API: dryrun scratch slot type fix #3736

REST API: dryrun scratch slot type fix #3736

barnjamin commented Mar 9, 2022

jannotti Mar 9, 2022 •

edited

Loading

barnjamin Mar 9, 2022

barnjamin Mar 9, 2022

barnjamin Mar 9, 2022

barnjamin commented Mar 10, 2022

barnjamin Mar 10, 2022

jannotti Mar 10, 2022

codecov-commenter commented Mar 10, 2022 •

edited

Loading

jannotti Mar 10, 2022 •

edited

Loading

jannotti Mar 10, 2022

barnjamin Mar 10, 2022

jannotti Mar 10, 2022 •

edited

Loading

algorandskiy left a comment

jannotti left a comment •

edited

Loading

jannotti commented Mar 12, 2022 via email

barnjamin commented Mar 12, 2022

REST API: dryrun scratch slot type fix #3736

REST API: dryrun scratch slot type fix #3736

Conversation

barnjamin commented Mar 9, 2022

Summary

Test Plan

jannotti Mar 9, 2022 • edited Loading

Choose a reason for hiding this comment

barnjamin Mar 9, 2022

Choose a reason for hiding this comment

barnjamin Mar 9, 2022

Choose a reason for hiding this comment

barnjamin Mar 9, 2022

Choose a reason for hiding this comment

barnjamin commented Mar 10, 2022

barnjamin Mar 10, 2022

Choose a reason for hiding this comment

jannotti Mar 10, 2022

Choose a reason for hiding this comment

codecov-commenter commented Mar 10, 2022 • edited Loading

Codecov Report

jannotti Mar 10, 2022 • edited Loading

Choose a reason for hiding this comment

jannotti Mar 10, 2022

Choose a reason for hiding this comment

barnjamin Mar 10, 2022

Choose a reason for hiding this comment

jannotti Mar 10, 2022 • edited Loading

Choose a reason for hiding this comment

algorandskiy left a comment

Choose a reason for hiding this comment

jannotti left a comment • edited Loading

Choose a reason for hiding this comment

jannotti commented Mar 12, 2022 via email

barnjamin commented Mar 12, 2022

jannotti Mar 9, 2022 •

edited

Loading

codecov-commenter commented Mar 10, 2022 •

edited

Loading

jannotti Mar 10, 2022 •

edited

Loading

jannotti Mar 10, 2022 •

edited

Loading

jannotti left a comment •

edited

Loading