<br/>
<div style="text-align: center">
<span style="">
    <a href="0_Table_of_Contents.ipynb">Table Of Contents 🏠</a>
</span>
<span style="float: right">
    <a href="2_2_Toy_IR.ipynb">Next Chapter &gt;</a>
</span>
</div>

# Chapter 1: Toy Language and AST

This is an xDSL version of the Toy compiler, as described in the 
[MLIR tutorial](https://mlir.llvm.org/docs/Tutorials/Toy/). This, and the following
series of notebooks are taken close to word-for-word verbatim from the MLIR tutorials,
as the xDSL project mirrors the MLIR structure very closely. We hope that by using these
tutorials you will get a better idea of both now to use xDSL, and how MLIR works.

## The Language

This tutorial will be illustrated with a toy language that we’ll call “Toy”
(naming is hard...). Toy is a tensor-based language that allows you to define
functions, perform some math computation, and print results.

Given that we want to keep things simple, the codegen will be limited to tensors
of rank <= 2, and the only datatype in Toy is a 64-bit floating point type (aka
‘double’ in C parlance). As such, all values are implicitly double precision,
`Values` are immutable (i.e. every operation returns a newly allocated value),
and deallocation is automatically managed. But enough with the long description;
nothing is better than walking through an example to get a better understanding:


In [1]:
example_0 = """
def main() {
  var a<2, 3> = [[1, 2, 3], [4, 5, 6]];
  var b<6> = [1, 2, 3, 4, 5, 6];
  var c<2, 3> = b;
  var d = a + c;
  print(d);
}
"""

Type checking is statically performed through type inference; the language only
requires type declarations to specify tensor shapes when needed. Functions are
generic: their parameters are unranked (in other words, we know these are
tensors, but we don't know their dimensions). They are specialized for every
newly discovered signature at call sites. Let's revisit the previous example by
adding a user-defined function:

In [1]:
from compiler import compile, emulate_riscv

program = """
def main() {
  var a<2, 3> = [[1, 2, 3], [4, 5, 6]];
  var b<6> = [1, 2, 3, 4, 5, 6];
  var c<2, 3> = b;
  var d = a + c;
  print(a);
}
"""

code = compile(program)

print(code)
print()

emulate_riscv(code)

.bss 
heap:
.space 1024
.data 
main.tensor_shape.0:
.word 0x2, 0x2, 0x3
main.tensor_data.0:
.word 0x6, 0x1, 0x2, 0x3, 0x4, 0x5, 0x6
.text 
main:
	li	%0, heap
	li	%1, main.tensor_shape.0
	li	%2, main.tensor_data.0
	li	%3, 2
	buffer.alloc	%4, %3
	sw	%1, %4, 0		# Set tensor shape
	lw	%5, %4, 0
	sw	%2, %4, 4		# Set tensor data
	toy.print	%4
	li	a7, 93
	scall


[34m[1m[CPU] Started running from example.asm:.text at heap (0x100) + 0x428[0m
Program(name=example.asm,sections=set(),base=['.bss', '.data', '.text'])
[34m[1m   Running 0x00000528:[0m li %0, heap
[34m[1m   Running 0x0000052C:[0m li %1, main.tensor_shape.0
[34m[1m   Running 0x00000530:[0m li %2, main.tensor_data.0
[34m[1m   Running 0x00000534:[0m li %3, 2
[34m[1m   Running 0x00000538:[0m buffer.alloc %4, %3
[34m[1m   Running 0x0000053C:[0m sw %1, %4, 0
[34m[1m   Running 0x00000540:[0m lw %5, %4, 0
[34m[1m   Running 0x00000544:[0m sw %2, %4, 4
[34m[1m   Running 0x00000548:[0m toy.print %4
[[1, 2, 3], [4, 

The code for the lexer is fairly straightforward; it is all in a single file:
`toy/lexer.py`. The parser can be found in `toy/parser.py`; it is a recursive 
descent parser. If you are not familiar with such a Lexer/Parser, these are very similar 
to the LLVM Kaleidoscope equivalent that are detailed in the first two chapters of the
[LLVM Kaleidoscope Tutorial](https://llvm.org/docs/tutorial/MyFirstLanguageFrontend/LangImpl02.html).

The next chapter will demonstrate how to convert this AST into MLIR.

<br/>
<div style="text-align: center">
<span style="">
    <a href="0_Table_of_Contents.ipynb">Table Of Contents 🏠</a>
</span>
<span style="float: right">
    <a href="2_2_Toy_IR.ipynb">Next Chapter &gt;</a>
</span>
</div>