Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Triton Generation #16

Open
2 of 5 tasks
mariari opened this issue Oct 11, 2022 · 12 comments
Open
2 of 5 tasks

Triton Generation #16

mariari opened this issue Oct 11, 2022 · 12 comments

Comments

@mariari
Copy link
Member

mariari commented Oct 11, 2022

Currently my code generator for Triton is quite primitive. I would like the following to be made

  1. Add a notion of blocks
    • blocks are a label + code inside
    • nested blocks are not allowed!
  2. Add a notion of collected labels
    • These are all the labels in a function
    • This also serves as an abstraction mechanism
      • Since functions like if create new labels to continue at, it is important to be able to compose this with code that comes next
      • Meaning that if we end off with a continuation address, and some code after it in it's own block, we should unify the jump points and give the block two names or unify the names
      • This means that we can effectively compose any abstractions, even those which create branches
  3. Triton: Preserving Expected Block Order #17
  4. Add a notion of functions
    • Functions will collect all labels, and compose an entry point
  5. Make Program encompass this
    • This means that a program will now be some some code that also keeps the relevant functions in mind and generates out the proper calls.
    • Further this would be the entry point to the circuit
@mariari
Copy link
Member Author

mariari commented Oct 11, 2022

API Design for Blocks

I believe the predominant way to design the API for block is to just expose a tagbody like interface.

Examples from CL

(defun king-of-confusion (w)
  "Take a cons of two lists and make a list of conses.
    Think of this function as being like a zipper."
  (prog (x y z)                         ;Initialize x, y, z to NIL
     (setq y (car w) z (cdr w))
   loop
     (cond ((null y) (return x))
           ((null z) (go err)))
   rejoin
     (setq x (cons (cons (car y) (car z)) x))
     (setq y (cdr y) z (cdr z))
     (go loop)
   err
     (cerror "Will self-pair extraneous items"
             "Mismatch - gleep!  ~S" y)
     (setq z y)
     (go rejoin)))

Basically we just scan the list for any symbols. If we find a keywordl, construct a block over it.

Then at the end form the collected labels.

we shall call this tagbody shadowing the CL meaning.

API Design for Functions

Since we want functions to closely resemble tritonVM, I propose the following.

Construct a defproc like interface over triton, and cover the body in an implicit tagbody. We then add it to the hashtable. Namely it should look something like

(defproc sudoku (-- verfied?) ; the ( -- verified?) is the stack effect
    (entry-code-1) ... (entry-code-n)
    (call-label)
  :label-name-1 
     (label-name-1-code-1) ...  (label-name-1-code-n)
  :label-name-n 
     (label-name-n-code-1) ...  (label-name-n-code-n))

@jan-ferdinand
Copy link
Contributor

Hey there! Designer of Triton VM here. Amazing that you're working on a compiler to Triton assembly! 🎉

First off, let me know if I can help in any way. That certainly includes questions, but also suggestions. For example, if a slight change in the VM's design makes writing compilers tremendously easier, I'm happy to discuss them.

Secondly, a heads up: because Triton VM is not currently sound, achieving which is currently both our top priority and almost complete, we have a few changes to the instruction set planned.

@sshine
Copy link

sshine commented Oct 12, 2022

I can say that this list of wanted features is the exact top of my wish list of improvements to the assembler interface.

Edit: Also, working on assembler stored in text files with the ability to add comments.

@mariari
Copy link
Member Author

mariari commented Oct 12, 2022

Hello @jan-ferdinand Thank you for your interest. I would have to spend more time with the system but I do have a few questions

  1. Is there a notion of a goto at all? from the instructions sheet, the closest I can find is call which also adds the current address + 2 to the jump back stack.
    • If there is none, then I'd suggest to add one, as someone can compile a function as a series of labels in which there exists an entry point that one would recurse on, but internally it runs a goto to the internal labels.
  2. Is there an example list of programs anywhere, I've been using the vm source code to see examples of programs.
  3. Also echoing what @sshine mentioned it would be nice to see a way to call the vm directly.

I will probably come up with more questions and suggestions as I start to get more familiar with the system.

@mariari
Copy link
Member Author

mariari commented Oct 12, 2022

Following up on my previous comment,

I've currently made an untested triton program

However, if I had an a jump which left the the call stack as it were, I could easily write the same program like this

;; the (n -- n) is the stack effect, really just a comment, in honor of forth
(defproc fib (n -- n)
  (push 1) (push 0) (call fib-general) return)

;; defproc will just place an implicit label to the whole block, so
;; the return goes to it
(defproc fib-general (a b n -- n)
  (dup 2)
  (if (begin (dup 1) add (swap 2) (push -1) add rot recruse)
      return))

as if could be defined simply as

(defun if (then else)
  (let ((name-then (intern (symbol-name (gensym)) :keyword))
        (name-else (intern (symbol-name (gensym)) :keyword))
        (name-cont (intern (symbol-name (gensym)) :keyword)))
    (begin
     skiz
     (goto name-then)
     (goto name-else)
     (add-label name-then then)
     (goto name-cont)
     (add-label name-else else)
     (goto name-cont)
     (make-label :name name-cont))))

where we simply jump to the then if the value on the top of the stack is 0, else we jump to else. At the end of the computation, we give it a continuation address to continue at with name-cont which the rest of the block will be in

@sshine
Copy link

sshine commented Oct 12, 2022

  1. The jumpstack semantics of call ties to being able to completely arithmetize the instruction. Labels are free, though, so if you don't care about jumping back, call can work as goto; the memory use of the jumpstack is doubtfully going to be the biggest bottleneck for longer running programs (execution trace size is).
  2. The only current programs are inlined in the Rust source code. Some others are in instruction.rs; the most interesting one is MT_AP_VERIFY which verifies a Merkle tree authentication path.
  3. Besides having a text-file interface, the compiler and STARK engine also aren't exposed as a system binary yet.

@jan-ferdinand
Copy link
Contributor

  1. Is there a notion of a goto at all?

No. As you say, call comes closes. Would the argument of the goto live on the operational stack? If so, I'm afraid it is a non-goal of Triton VM to have such an instruction; we want the call graph to be static. However, I don't think this is the behavior you're describing.

If the argument of the goto lives in program memory (which is how I understand you), then the only difference between goto and call is the growth of the jump stack – which makes above defun if invalid. I'll think about what we can do.

@mariari
Copy link
Member Author

mariari commented Oct 12, 2022

  1. Is there a notion of a goto at all?

No. As you say, call comes closes. Would the argument of the goto live on the operational stack? If so, I'm afraid it is a non-goal of Triton VM to have such an instruction; we want the call graph to be static. However, I don't think this is the behavior you're describing.

If the argument of the goto lives in program memory (which is how I understand you), then the only difference between goto and call is the growth of the jump stack – which makes above defun if invalid. I'll think about what we can do.

Yeah it would live in program memory. I think there is a way of emulating it with call, where the label that is jumped to has a certain calling convention to denote where it came from.

I was thinking of implementing something like:

  • adding a value to the stack denoting the then branch was taken, so we move it to the top of the stack and use skiz to avoid the else call, with it initially being set such that the else call will happen
  • The value can be stored either on the stack if the number of stack elements are known statically (this isn't hard to generate around), or by using read_mem and write_mem to store branching flags (idk if this is a valid strategy however).
  • The if abstraction sets up this value either on the stack or at a specified memory address
  • The then changes this value right before it returns to denote that the branch was taken and so the skiz call else can fail
  • This then preserves the call stack past conditionals, however one still needs to
    • Lift any recursive calls in the branches to the top level, I would need to think more on if an algorithm like lambda lifting would work
    • If it doesn't, then I could just define a standalone loop that can take the place for simple computations like fib. I know this would work, and would encapsulate the general idea of branching for simple programs

All these are my rough ideas, I haven't attempted to hack around the issue yet. I'll likely start on a more complicated example (sudoku) before trying to get around this.

  1. The jumpstack semantics of call ties to being able to completely arithmetize the instruction. Labels are free, though, so if you don't care about jumping back, call can work as goto; the memory use of the jumpstack is doubtfully going to be the biggest bottleneck for longer running programs (execution trace size is).

    1. The only current programs are inlined in the Rust source code. Some others are in instruction.rs; the most interesting one is MT_AP_VERIFY which verifies a Merkle tree authentication path.

    2. Besides having a text-file interface, the compiler and STARK engine also aren't exposed as a system binary yet.

Thank you for the link to some programs! And duly noted, I'll at some point hook up some rust code to load the code I've generated so I can get a quick way to test my code.

@jan-ferdinand
Copy link
Contributor

Even though your original question 1 was about an instruction goto, I interpreted that the underlying issue was getting “if (…) then {…} else {…}” to work. Following that line of thought, Alan has come up with a very elegant solution that works immediately.

Pending further investigation into the corresponding associated costs, we might drop instruction skiz in favor of instruction if_then_call.

@mariari
Copy link
Member Author

mariari commented Oct 14, 2022

Even though your original question 1 was about an instruction goto, I interpreted that the underlying issue was getting “if (…) then {…} else {…}” to work. Following that line of thought, Alan has come up with a very elegant solution that works immediately.

Pending further investigation into the corresponding associated costs, we might drop instruction skiz in favor of instruction if_then_call.

Yeah that solution would work to denote an if (albeit one where the recursion happens on the branch), however I was also interested in it from an easy/simple control graph perspective. What I mean is as follows

in

current-label:
dup0
skiz
call if-wrapper
call else-wrapper
;; Code after the if would go here

if-wrapper:
pop
call if-branch
push 1
return

else-wrapper:
push 0 eq # logical not
skiz
call else-branch
return

where one would want to add more code is at the end of current-label:, as that is the canonical continuation point. This pattern comes up quite a bit. I was thinking this morning of writing loop

(defun loop (loop-body)
  (tagbody 
       (call body)
     :body 
       loop-body (push -1) add (dup 0) skiz recurse return))

where the block before the label body: is the continuation point.

To see more examples I recommend checking out

In the issue I lay out two solutions to how one might solve this issue. It took me quite a few hours of thought but solution 1 is elegant enough and is not too much burden upon those writing code generators.

If I could create a label after jumping, then none of these considerations would have to be made and the control flow graph would more closely resemble more typical control graphs. This means that if would look like if and loop would look like loop forms. However in the current form they resemble the control graph of a function call instead. This may make code harder to optimize as it may be easier to optimize and order control graphs that preserve the if and loop flow rather than everything looking like a function call. I would have to look into that issue more.

LLVM has a good graph on what a loop Control Flow Graph looks like. The code would continue in exit: if any more code was had.

For a primitive like jump-to I'm not sure if it would affect the arithmetization, it's hard to say, as I think there is a way to derive the behavior with the current instructions at hand, one just has to be smart, however I'm not entirely sure as the jumps-to/goto may be too general and thus affect it.

Either way on Monday I'll just implement the change needed to make the current strategy work.

@jan-ferdinand
Copy link
Contributor

I'm afraid I don't understand all the technical considerations involved in optimizing control flow graphs – or whether that string of words even makes sense. Would it help your efforts if you could compile to a higher-level language on top of Triton assembly?

@mariari
Copy link
Member Author

mariari commented Nov 11, 2022

I'm afraid I don't understand all the technical considerations involved in optimizing control flow graphs – or whether that string of words even makes sense. Would it help your efforts if you could compile to a higher-level language on top of Triton assembly?

Depends on what is given I guess? For known patterns like if it doesn't help too much given if a strategy is know how to desugar it then I can easily provide it. The issue mainly came in with the combination of control flow like if combined with wanting to say write any kind of looping or recursion ability. I've mostly stopped writing the generator, as I was giving the project time to chose some primitives it may want to explore. I originally started the generator as I find maintaining and writing a generator much easier than writing in an assembly language.

For example I wrote a very similar model for Miden in the span of a day, and programs that followed it were in a few days from that. These compilers are kind of throw away compilers as they are just hacking and extending ontop of the VM features as I need them along with allowing me to mix the VM with meta level logic to handle the tedious bits of work, rather than implementing a more proper layer above it. Which works very well for trying to generate benchmarks of the given platform as I have full control of the generated code and its easy to maintain. For more serious projects like my Alucard, what matters tends to be the expression of the system when building ontop of a model, higher level features if they compose nicely can take work out of compilers like these, however if they are just known minor sugar ontop of the system then it would be about the same amount of work.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants