Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

REST API: dryrun scratch slot type fix #3736

Merged
merged 6 commits into from
Mar 13, 2022
Merged

Conversation

barnjamin
Copy link
Contributor

Summary

When performing a dryrun, if the program uses scratch space there is some logic to determine which slots are active and if inactive the type is set to 0.

There seems to be an off-by-one bug where the types are offset by one element in the scratch slot array. This was causing output like:

  "scratch": [
                        {
                            "bytes": "",
                            "type": 0,
                            "uint": 0
                        },
                        {
                            "bytes": "",
                            "type": 0,
                            "uint": 0
                        },
                        {
                            "bytes": "",
                            "type": 0,
                            "uint": 0
                        },
                        {
                            "bytes": "",
                            "type": 0,
                            "uint": 0
                        },
                        {
                            "bytes": "SW4gYXBwIGlkIDIzIHdoaWNoIHdhcyBjYWxsZWQgd2l0aCBhc3NldCBuYW1lIG9rYXk=",
                            "type": 0,
                            "uint": 0
                        },
                        {
                            "bytes": "SW4gYXBwIGlkIDIzIHdoaWNoIHdhcyBjYWxsZWQgd2l0aCBhc3NldCBuYW1lIG9rYXk=",
                            "type": 1,
                            "uint": 0
                        }
                    ],

Note that the N-1 element has its type set to 0.

Test Plan

Modified existing test, without the change in the dryrun.go file, the test fails.

@barnjamin barnjamin changed the title Dryrun Scratch Slot Type Fix Algod Server: Dryrun Scratch Slot Type Fix Mar 9, 2022
@@ -136,7 +136,7 @@ func (ddr *dryrunDebugReceiver) updateScratch() {

if any {
if ddr.scratchActive == nil {
ddr.scratchActive = make([]bool, maxActive+1, 256)
ddr.scratchActive = make([]bool, maxActive, 256)
Copy link
Contributor

@jannotti jannotti Mar 9, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This doesn't seem right. If scratch slot 1 is active, maxActive will be 1, and you need a 2 element slice to hold info on slots 0 and 1, right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is only called when its the first time scratch space is used and immediately after this we iterate from len(scratchActive) to maxActive inclusive using append to set the values.

I think we could probably just create it with 0 len and capacity of 256

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could also just make it with len of maxActive+1 and iterate from 0 to maxActive inclusive and set the values with index instead of using append

}
for i := len(ddr.scratchActive); i <= maxActive; i++ {
ddr.scratchActive = make([]bool, maxActive+1, 256)
for i := 0; i <= maxActive; i++ {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

creating a new slice and iterating starting from 0 is less efficient but seems much easier for me to understand

@barnjamin
Copy link
Contributor Author

Completely refactored this to make it make more sense to me, I'm still not sure why we need to zero anything at the bottom of the function. When is that relevant?

General question on these, should I push to my own fork and pr from there? or is it ok to push a branch here?

if !ddr.scratchActive[i] {
(*ddr.history[lasti].Scratch)[i].Type = 0
(*ddr.history[lasti].Scratch)[i] = generated.TealValue{}
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we're zero'ing the type, presumably we dont want anything in the uint/bytes either so set this to empty struct

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suspect we do it to ensure that nothing appears in the response. Though I'm unclear why it would have been previously populated. Is that your question as well? I'm not sure how ddr.history[lasti] is set.

@codecov-commenter
Copy link

codecov-commenter commented Mar 10, 2022

Codecov Report

Merging #3736 (2fe8d90) into master (6c52126) will increase coverage by 0.03%.
The diff coverage is 100.00%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #3736      +/-   ##
==========================================
+ Coverage   49.65%   49.68%   +0.03%     
==========================================
  Files         392      392              
  Lines       68588    68578      -10     
==========================================
+ Hits        34059    34075      +16     
+ Misses      30781    30761      -20     
+ Partials     3748     3742       -6     
Impacted Files Coverage Δ
daemon/algod/api/server/v2/dryrun.go 70.12% <100.00%> (+1.52%) ⬆️
data/transactions/verify/txn.go 44.15% <0.00%> (-0.87%) ⬇️
network/requestTracker.go 70.25% <0.00%> (-0.87%) ⬇️
ledger/tracker.go 74.67% <0.00%> (-0.86%) ⬇️
catchup/service.go 70.12% <0.00%> (+0.74%) ⬆️
ledger/acctupdates.go 69.22% <0.00%> (+0.79%) ⬆️
cmd/tealdbg/debugger.go 72.41% <0.00%> (+0.98%) ⬆️
ledger/blockqueue.go 83.90% <0.00%> (+1.72%) ⬆️
network/wsPeer.go 68.05% <0.00%> (+2.22%) ⬆️
cmd/algoh/blockWatcher.go 80.95% <0.00%> (+3.17%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 6c52126...2fe8d90. Read the comment docs.

@barnjamin barnjamin requested review from jannotti, tzaffi and tsachiherman and removed request for algorandskiy March 10, 2022 16:31
@@ -393,6 +393,25 @@ func checkAppCallPass(t *testing.T, response *generated.DryrunResponse) {
}
}

func checkAppCallScratchType(t *testing.T, response *generated.DryrunResponse, slot int, tt basics.TealType) {
Copy link
Contributor

@jannotti jannotti Mar 10, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd rather see this check function take the txn index, that way it can assert that txn in question has an AppCallTrace. And then it should assert that traceline.Scratch is not nil, and that the slot has the right type. As written, it's too lenient. It allows the scratch to not be set, or all of the txns to be of the wrong type, etc.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It might also be useful to have a way to assert that a scratch slot is not included in the response. If I understand correctly, though point of all this maxActive stuff is to keep the responses small, and not contain slots that were never touched, right? We should test that is done right.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Correct, I'll make both changes

Copy link
Contributor

@jannotti jannotti Mar 10, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps you're dealing with the fact that at the start of the trace, not type is set, since it hasn't been used yet. In that case, I'd still take the txn index, and confirm AppCallTrace, but I'd leave the other continues. But, we want to be sure that line 408 happens at least once, or we're not really checking anything. So I'd set a flag there, and assert it at the end of the function.

Copy link
Contributor

@algorandskiy algorandskiy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, please consider a small optimization suggested

daemon/algod/api/server/v2/dryrun.go Outdated Show resolved Hide resolved
@algorandskiy algorandskiy changed the title Algod Server: Dryrun Scratch Slot Type Fix REST API: dryrun scratch slot type fix Mar 10, 2022
@barnjamin barnjamin requested review from jannotti and removed request for tsachiherman March 11, 2022 13:21
Copy link
Contributor

@jannotti jannotti left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, I think I finally see what the old bug was, and I think you've fixed it, but you've also introduced a change that I am ok with, but I want to say it outloud.

I think the old bug was not an off-by-one error. Instead, scratchActive[i] was set once, the first time a slot i or higher was set, and never reconsidered. So, if the first line of a program set slot 10 to "hello", then scratchActive[9] (and lower) was set to false, and stayed there forever. So I bet that in your buggy example, slot 6 was set before slot 5.

In your new code, the entire scratchActive array is computed every time, so I think you fixed this bug. (But if I'm right, your unit test doesn't explicitly test that situation.)

But, this also means that a later frame can have fewer active slots than an earlier frame, if the highest active scratch space is set to a uint64 0 in TEAL. I'm ok with that. (not just shorter - the whole array can become nil after previously having slots)

On the other hand, I think you said your unit test did uncover the error. So, that makes my explanation suspect. The trouble is that I just don't see an off-by-one error in the old code.

@jannotti
Copy link
Contributor

jannotti commented Mar 12, 2022 via email

@barnjamin
Copy link
Contributor Author

@jannotti I think the original bug was that on the first use of populated scratch we'd initialize the slice of the correct length but then immediately append to it rather then setting at the index. So a scratch var at idx 5 would have it's bool populated at idx 6 and so on

@jannotti jannotti merged commit f6569a5 into master Mar 13, 2022
@jannotti jannotti deleted the dryrun-scratch-type-fix branch March 13, 2022 23:15
jannotti pushed a commit to jannotti/go-algorand that referenced this pull request Mar 13, 2022
Simplify handling of "active slots" to fix possible obo error. Unit tests added to confirm typing.
@algojack algojack mentioned this pull request Mar 15, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants