Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Consider reformat of pushw assembly instruction #44

Closed
grjte opened this issue Jan 9, 2022 · 8 comments
Closed

Consider reformat of pushw assembly instruction #44

grjte opened this issue Jan 9, 2022 · 8 comments

Comments

@grjte
Copy link
Contributor

grjte commented Jan 9, 2022

As discussed in the PR implementing parsing of pushw, it may make sense to allow the pushw instruction to take a single 32-byte hex value either instead of or in addition to the current approach of taking 4 inputs of 1 element each.

This is inspired by this contrived but strange example with the current implementation:
let op = Token::new("pushw.1.23.0x1C8.0", 0);

From Bobbin:
"now that I'm seeing how this actually looks, I wonder if we should change the format of pushw instruction to always take a single hexadecimal value of 32 bytes. Or maybe making this as an additional allowed format. So, for example, possible formats could be:

pushw.1.2.3.4
pushw.0x13244... (32 bytes)"

@bobbinth
Copy link
Contributor

bobbinth commented Jan 9, 2022

Another option worth considering is removing separation between push and pushwinstruction from the assembly. We could have just a singlepush` instruction, and it could take a variable number of arguments. For example, all of these could be valid:

  • push.1 and push.0x1 (8 bytes)
  • push.1.2 and push.0x0... (16 bytes)
  • push.1.2.3 and push.0x0... (24 bytes)
  • push.1.2.3.4 and push.0x0... (32 bytes)

@bobbinth
Copy link
Contributor

Another, more radical, idea that came to mind: maybe we should re-factor all assembly i/o instructions to be grouped around four verbs: push, pop, load, save. Then, our current set of instructions could be mapped as follows:

push instructions push value(s) from the specified source onto the stack:

  • push.const.xyz - similar to the current push/pushw (we could also omit the const keyword).
  • push.sdepth - same as current env.sdepth.
  • push.mem.addr - same as current mem.push.addr.
  • push.adv.n - pushes n values from the advice tape onto the stack (similar to the current read).

pop instructions remove values from the stack and save them into the specified destination:

  • pop.n - would be similar to the current drop but for n items.
  • pop.mem.addr - same as current mem.pop.addr.

load instructions would overwrite values at the top of the stack:

  • load.mem.addr - same as current mem.load.addr.
  • load.adv - would overwrite top 4 stack values with the next 4 values from the advice tape (similar to current readw)

save would save the values from the stack into the specified destination without removing them from the stack:

  • save.mem.addr - same as current mem.store.

At the same time, I'm not sure if this provides more or less intuitive understanding about the instruction semantics.

@grjte
Copy link
Contributor Author

grjte commented Jan 11, 2022

My take on these, in reverse order -

I prefer the last option. I think it's clear, and it groups everything semantically by the effect on stack. My notes are:

  • push.const.xyz: I would lean towards dropping the const. This op provides immediate values, whereas the other push ops are pushing values from another (specified) source, so I think having just push + immediate values makes sense, since it's sort of the "default" push case
  • push.sdepth: I would do push.env.sdepth. It matches with mem/adv and will be clearer if other env values ever come along
  • pop.n: I'm not sure about this one. Would it replace drop? If not, and drop/dropw still exist, then drop.n would be more consistent & it should be a stack op. If it's replacing drop/dropw, how much is gained by allowing variable length drops? Also, if the values are not being used, then it seems more appropriate as a stack manipulation op (as now) than as an io op.
  • why switch store to save? I thought store was clear, and it matches the machine op as well as being the usual partner to load
  • this doesn't resolve the question of the format of inputs to push/pushw, so we'd still need to settle that

Regarding removing separation between push and pushw (variable length inputs)

I'm trying to think about how push will actually be most useful. It's really a question of where we want to put the complexity. Most importantly: when we're parsing from higher level languages, would we like to be able to push 8/16/24/32 bytes directly either element-wise or as chunks? How much extra processing would we have to do on these while processing Solidity/etc? If there's significant overhead anyway, maybe it makes sense to let the compiler break it down at the time and just stick to simple push/pushw. If there isn't, then it might make the compilers easier to implement if we put the complexity into the push op now and allow the variable amount of arguments as suggested in the second update. I don't have a confident opinion here because I'm not 100% sure how these things will be affected by STARK setting.

single hexadecimal input value of 32 bytes

I think at a minimum it makes sense to accept a word input in addition to the 4 element option. This was what I expected when I first saw the instruction name pushw. Whether we keep the 4 elements or not is related to my comments above

@bobbinth
Copy link
Contributor

bobbinth commented Jan 21, 2022

I agree that organizing input/output instructions around four verbs: push, pop, load, and store (or save) is cleaner. My only hesitation is future extensibility of this scheme. For example, at some point, we may want to introduce an instruction which reads a word from an advice tape or memory and instead of replacing top 4 stack elements with it, adds values to the top stack elements (this would be useful for linear hashing).

With the current scheme, this can be accomplished like adv.addw or mem.addw (though the direction of what's being added where is not clear). But I'm not sure how this would work with the scheme proposed in the original post (adding add.mem or add.adv would be too confusing). Maybe we could introduce a new verb though - like ladd (load-add) or addfrom or something like that.

@grjte
Copy link
Contributor Author

grjte commented Jan 25, 2022

I think in that case adding a new verb would make sense, and that doesn't really seem problematic to me. In fact - it could be a useful signal that a new and different thing is going on. But the central question is really just whether you'd rather organize first by "what" is happening or by "where" it's happening. I like the "what" approach, because it seems like it gives a more natural and consistent organization. However, as long as things are consistent it's just a choice & I think both styles are defensible

Are there many (any) other possible future instructions you have in mind at this point?

@bobbinth
Copy link
Contributor

bobbinth commented Jan 26, 2022

I like the "what" approach better too. So, at this point, it seems like we've settled on what.where and the operations we have so far are:

Pushing value onto the stack:

  • push.xyz - also, maybe we should retain pushw.0x... format.
  • push.sdepth
  • push.mem.addr - or should this be pushw.mem.addr?
  • push.adv.n

Removing values from the stack:

  • pop.mem.addr - or should this be popw.mem.addr?

Overwrite the values on the stack:

  • load.mem.addr - or should this be loadw.mem.addr?
  • load.adv - similarly, should this be loadw.mem.addr?

Save the stack value:

  • store.mem.addr - or should this be storew.mem.addr? The reason I was thinking about save originally is because we'll need to have "storage" as a destination at some point. But thinking about it more, we can pick something like pst (persistent storage) or something else for the "where" part.

One other thing to consider: depending on what we decide to do in #73, we will need to have another destination (e.g., fmp) and also another verb or two (for "adding in" and "adding out"). Any thoughts on what these should be?

@grjte
Copy link
Contributor Author

grjte commented Jan 26, 2022

I think it might be clearer to keep the w in all cases where we're operating by word rather than by element. (That's all of the ops for mem and loadw.adv.) I could be convinced otherwise, but it's my gut feeling, and adding the w doesn't really break the verb paradigm.

I will reiterate my vote from above for push.env.sdepth rather than push.sdepth. It matches with the what.where idea we're using for mem and adv and will be clearer if other env values ever come along. This means that anything without a specified where is always referencing the stack

@bobbinth
Copy link
Contributor

Supersede by #84

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants