/
spec.txt
335 lines (283 loc) · 11.9 KB
/
spec.txt
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
READ ME FIRST:
The whole beginning of this document has been moved into LaTeX, and is only
maintained there. The ABIs are still maintained only in this document.
Fythe is a design for virtual machines primarily targeting dynamic languages.
It is not precisely a language or an interpreter, but a framework for creating
interpreters for languages.
The execution of a program by Fythe proceeds as follows:
* The program source is parsed (one statement or declaration at a time) by the
Fythe parser. The productions used are defined at runtime, and may be
changed at any time. The result of parsing is a tree, the form of which is
also defined at runtime.
* The tree produced by parsing is run through a number of transformations.
These transformations are defined at runtime, and may be changed at any
time. The result of these transformations is a tree which must fit a
prescribed grammar, the Fythe Intermediate Representation (Fythe IR).
* The Fythe IR tree is reduced into executable code, generally machine code,
by an implementation-specific means, and run. The behavior of the generated
code is dictated by the behavioral specification of Fythe IR herein.
The first section of this specification describes the Fythe IR in isolation.
Later sections (yet to be written) will describe the parser and transform
engine.
Every IR expression is reduced at runtime to a value. A value in Fythe is an
object reference associated optionally with a primitive value. A primitive
value is a string, function, integer, floating-point number, or
implementation-specific primitive. All primitives are immutable. An object is
an array of values; that is, a set of values indexed by keys which range from 0
to the size of the object, exclusively. There are several special objects,
notably Null, which is a particular object of size 0 used as a nonce.
The IR is a tree, formed by Fythe values. Although there is no canonical syntax
for Fythe values, this specification uses LISP-like list expressions,
conforming to the following BNF grammar (using regular expressions as
terminals):
WhiteN = /[ \t\r\n]*/
Expression = /\(/ WhiteN List /\)/ WhiteN
| /\(/ WhiteN /\)/ WhiteN
| /[^\(\)\{\}0-9 \t\r\n][^\(\)\{\} \t\r\n]*/ WhiteN
| /[0-9]+/ WhiteN
List = Expression
| Expression List
An expression of the first two forms represents an object associated with no
primitive. An expression of the third form represents a string associated with
Null. An expression of the fourth form represents an integer associated with
Null. This syntax has no means of expressing objects other than Null associated
with primitive values, as these have no behavior in the Fythe IR.
A Fythe IR expression is an object in which the first element is a string
literal, the value of which is the type of expression, and the remaining
elements are arguments. For instance,
(Null)
is a Fythe IR expression which (at runtime) reduces to the value Null
(associated with no primitive),
(New 2)
is a Fythe IR expression which reduces to a newly-created object of size 2, and
(Concat (New 2) (New 2))
is a Fythe IR expression which (rather pointlessly) reduces to the
concatenation of two newly-produced objects of size 2, to produce a new object
of size 4.
The following is a list of every Fythe IR expression, its argument count, the
types its arguments must have (when reduced at runtime), and any additional
notes. The actual behavior of each expression is not yet documented, for the
moment it is hoped that their name will be sufficient to describe their
behavior.
Name AC Types* Notes
Meta:
Function 0 →c (1-code**) (declares that the IR within forms a function literal)
Objects:
This 0
ThisSet 1
Arg 0
Null 0
Global 0
Version 0 (various platform-specific version info)
New 1 i
Length 1 →i
Member 2 oi
MemberSet 3 oio
Equal 2 →i
Object variable
Concat 2
ConcatT 2 (transform-time)
Slice 3 aii→a
Temporaries***:
Temp 1 i (argument must be a literal)
TempSet 2 io (first argument must be a literal)
Type detection****:
ValidString 1 →i
ValidFunction 1 →i
ValidInteger 1 →i
ValidFloat 1 →i
ValidTable 2 oi→i (object and index)
ValidThread*****1 →i
ValidLock***** 1 →i
Function/stack:
Call 2 co
Throw 1 (no value)
Catch 0 (2-code)
Caught 0
Behavioral:
If 1 (2-code)
While 0 (2-code)
Seq variable
Primitives:
Associate 2 po→p
Dissociate 1 p→o
Strings:
SConcat 2 ss→s
SLength 1 s→i
SSlice 3 sii→s
SEqual 2 ss→i
Integers:
IWidth 0 →i
IMul 2 ii→i
IDiv 2 ii→i
IMod 2 ii→i
IAdd 2 ii→i
ISub 2 ii→i
ISl 2 ii→i
ISr 2 ii→i
INot 1 i→i
IOr 2 ii→i
IAnd 2 ii→i
IByte 1 i→s
IEqual 2 ii→i
INe 2 ii→i
ILt 2 ii→i
ILte 2 ii→i
IGt 2 ii→i
IGte 2 ii→i
IToFloat 1 i→f
Floats:
FMul 2 ff→f
FDiv 2 ff→f
FMod 2 ff→f
FAdd 2 ff→f
FSub 2 ff→f
FEqual 2 ff→i
FNe 2 ff→i
FLt 2 ff→i
FLte 2 ff→i
FGt 2 ff→i
FGte 2 ff→i
FToInteger 1 f→i
Tables:
MInit 2 oi (always returns (Null))
MGet 3 ois (table must be in an array index, uninitialized index is fine)
MSet 4 oiso
MList 2 oi
MContains 3 ois→i
Threading****:
Spawn 2 co→t
Join 1 t (FIXME: what's the return?)
Yield 0 (always returns (Null))
LMutexNew 0 →l
LMutexLock 1 l→l
LMutexTryLock 1 l→i
LMutexUnlock 1 l→l
LMutexDestroy 1 l→o (alwas returns (Null))
LSemNew 0 →l
LSemPost 1 l→l
LSemWait 1 l→l
LSemDestroy 1 l→o (alwas returns (Null))
LRWNew 0 →l
LRWRLock 1 l→l
LRWRTryLock 1 l→i
LRWRUnlock 1 l→l
LRWRLock 1 l→l
LRWRTryLock 1 l→i
LRWRUnlock 1 l→l
LRWDestroy 1 l→o (always returns (Null))
Cas 4 oioo→i (only compares and swaps the object, not the primitive)
PCas 4 oipp→i (only compares and swaps the primitive, not the object)
Runtime:
Include 1 s→s (or a throw for failure)
Parse 2 ss
Transform 1 s (should always return code, but this is a property
of the transforms and so not guaranteed)
Grammar:
GAdd 3 soo→s
GRem 1 s→s
GCommit 0 (always returns Null)
Transform:
TAddPhase 2 ss→i
TAddTree 3 soo→i
TAddFunction 3 soc→i
TRem 1 s→i
*
Types are in the form input→output. Either input or output may be disregarded
if any sort of object is OK. 'input' is a string of letters, each of which is
one of:
o: Heap object (array)
s: String
c: Code (function)
i: Integer
f: Float
t: Thread identifier
l: Lock
p: Any primitive (s, c, i, f, t or l)
**
n-code means that as well as taking the arguments specified, the given
operation has transform-time (as opposed to runtime) behavior over n further
expressions. For instance, these expressions may be evaluated conditionally, or
in the case of Function this expression is declared to be the body of a
function literal.
***
There are infinitely many temporaries, and the job of translating temporaries
into registers or what-have-you is done by the implementation.
****
Type detection returns an integer (0 for false, 1 for true). A positive
response does NOT indicate that there is actually an associated primitive of
that type, it only indicates that attempting to use the value as that type
will not cause the interpreter to crash; this is to make ambiguity in how
primitives are stored acceptable such that implementations don't need too
much tagging.
*****
Threading is an optional feature. If it is supported, the string "threads"
should appear as an element of (Version). Everything under the "Threading"
subsection as well as the "ValidThread" and "ValidLock" instructions only
exist if threading is supported.
Note that associating a primitive with an object will NOT cause all instances
of that object to be associated with that primitive, it's purely a local
change. However, object-primitive pairs move together, and will not become
dissociated unless you explicitly dissociate.
There are three globally-nameable objects in Fythe: Null, Global and Version.
Null is a size-0 object and as such is usable only as a nonce. Global is
size-1, with the first (and only) element being a table. Version has variable
size, and stores implementation-specific identifiers allowing code to identify
the implementation of Fythe being used, and features it supports.
Common ABI:
On all real machine targets, the following ABI properties are true:
* The caller pushes any used machine registers (since very rarely are any
used) as well as Fythe registers
* The three Fythe-specific machine registers must be set up properly whenever
Fythe code is running; so, they must be set before entering Fythe code from
C, and not clobbered by C code called from Fythe.
x86 ABI:
ESI = Fythe stack pointer
EDI = Pointer to array of major constants, e.g. (Global), (Version) and
major implementation functions, plus a pointer to the exception stack.
Arrangement of the Fythe stack:
EDI should be pushed to the C stack immediately after EBP during the normal
function prologue, and popped before. That is, the function prologue is now:
<
pushl %ebp
pushl %edi
movl %esp, %ebp
movl %esi, %edi
>
The Fythe stack grows UP, unlike the C stack. Each value on the Fythe stack is
two words (as with all Fythe values). The top two words on the Fythe stack when
a Fythe function is called should be the function object called and its
argument. When a Fythe function is finished, these should be on the stack, as
well as its return value. It is free to modify the two slots carrying the
function object and argument, but the rest of the stack should be preserved.
The function epilogue is reverse of the prologue, with the exception that %esi
is not restored (as the stack height at this point MUST be exactly two words
greater than the starting height):
<
movl %ebp, %esp
pop %edi
pop %ebp
ret
>
Note that the C stack is used only for the call stack and preservation of Fythe
base pointers.
Arrangement of EDI array:
(See CFythe implementation)
Calls:
Before calling a Fythe function, you must push any Fythe registers that are in
use, as the next function is free to clobber them. Before calling a C function,
you must push all processor registers, but no Fythe registers.
x86_64 ABI:
R12 = Fythe constants bank
R13 = Fythe stack pointer
ARM ABI:
R6 = Fythe constants bank
R7 = Fythe stack pointer
JavaScript ABI:
Heap objects are arrays, with every other value being a primitive (object,
primitive, etc). Fythe functions take four arguments, the function itself (as
object, function) and its argument (also as object, primitive). There is no
global Fythe stack, so Fythe functions also return their result as a size-2
array. Integers, floats, strings and functions are stored as such, objects are
stored as arrays. Tables are just standard JavaScript objects initialized to
{}, with each entry mapped to a size-2 array of the value.