public
Description: Rubinius, the Ruby VM
Homepage: http://rubini.us
Clone URL: git://github.com/evanphx/rubinius.git
Search Repo:
benstiglitz (author)
Wed Apr 16 13:31:31 -0700 2008
febuiles (committer)
Fri Apr 18 21:20:37 -0700 2008
commit  6e27619990054e2596c432722b1399ddc76c0c5f
tree    ebc1e11f502c29753663d6cffe973791b2848b05
parent  2b3a44158ae93ab5883da22e5f36df92485f3ad4
rubinius / README-DEVELOPERS
100644 498 lines (384 sloc) 19.195 kb
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
# vim: tw=65
 
General help and instructions on writing code for Rubinius.
 
 
0. Further Reading
==================
At some point, you should read everything in doc/. It is not
necessary to understand or memorise everything but it will
help with the big picture at least!
 
 
1. Files and Directories
========================
Get to know your way around the place!
 
* .load_order.txt
  Explains the dependencies between files so the VM can load them
  in the correct order.
 
* kernel/
  The Ruby half of the implementation. The classes, methods etc.
  that make up the Ruby language environment are defined here.
  Further divided into..
 
* kernel/platform.conf
  kernel/platform/
  Platform-dependent code wrappers that can then be used in other
  kernel code. platform.conf is an autogenerated file that defines
  various platform-dependent constants, offsets etc.
 
* kernel/bootstrap/
  Minimal set of incomplete core classes that is used to load up
  the rest of the system. Any code that requires Rubinius' special
  abilities needs to be here too.
 
* kernel/core/
  Complete implementation of the core classes. Builds on and/or
  overrides bootstrap/. Theoretically this code should be portable
  so all Rubinius-dependent stuff such as primitives goes in
  bootstrap/ also.
 
* runtime/
  Contains run-time compiled files for Rubinius. You'll use these
  files when running shotgun/rubinius
 
* runtime/stable/*
  Known-good versions of the Ruby libraries that are used by the
  compiler to make sure you can recompile in case you break one
  of the core classes.
 
* shotgun/
  The C parts. This top-level directory contains most of the build
  process configuration as well as the very short main.c.
 
* shotgun/lib/
  All of the C code that implements the VM as well as the extremely
  bare-bones versions of some Ruby constructs.
 
* shotgun/external_libs/
  Libraries required by Rubinius, bundled for convenience.
 
* lib/
  All Ruby Stdlib libraries that are verified to work as well as
  any Rubinius-specific standard libraries. Of special interest
  here are three subdirectories:
 
* lib/bin/
  Some utility programs such as lib/bin/compile.rb which is used
  to compile files during the build process.
 
* lib/ext/
  C extensions that use Subtend.
 
* lib/compiler/
  This is the compiler (implemented completely in Ruby.)
 
* stdlib/
  This is the Ruby Stdlib, copied straight from the distribution.
  These libraries do not yet work on Rubinius (or have not been
  tried.) When a library is verified to work, it is copied to
  lib/ instead.
 
* bin/
  Various utility programs like bin/mspec and bin/ci.
 
* benchmark/
  All benchmarks live here. The rubinius/ subdirectory is not in
  any way Rubinius-only, all those benchmarks were just written
  as part of this project (the rest are from somewhere else.)
 
* spec/ and test/
  These contain the behaviour specification and verification files.
  See section 3 for information about specs. The test/ directory is
  deprecated but some old test code lives here.
 
 
Notes: Occasionally working with kernel/ you may seem classes that
       are not completely defined or looks strange. Remember that
       some classes are set up in the VM and we are basically just
       reopening those classes.
 
 
2. Working with Kernel classes
==============================
 
Any time you make a change here -- or anywhere else for that
matter -- make sure you do a full rebuild to pick up the changes,
then run the related specs, and then run bin/ci to make sure
that also the *unrelated* specs still work (minimal-seeming
changes may have broad consequences.)
 
There are a few special forms that are used in bootstrap/ as well
as core/ such as @ivar_as_index@ (see 2.2) which maps instance
variable names to internal fields. These impose special restrictions
on their usage so it is best to follow the example of existing
code when dealing with these. Broadly speaking, if something looks
"unrubyish", there is probably a good reason for it so make sure
to ask before doing any "cosmetic" changes -- and to run CI after.
 
If you modify a kernel class, you need to `rake build` after to
have the changes picked up. With some exceptions, you should not
regenerate the stable files. They will in most cases work just fine
even without the newest code. `rake build:stable` is the command
for that.
 
If you create a new file in one of the kernel subdirectories, it
will be necessary to regenerate the .load_order.txt file in the
equivalent runtime subdirectory in order to get your class loaded
when Rubinius starts up. Use the rake task build:load_order to
regenerate the .load_order.txt files.
 
Due to the dependencies inherent in writing the Core in Ruby, there
is one idiom used that may confuse on first sight. Many methods are
called #some_method_cv and the _cv stands for 'core version,' not
one of the other things you thought it might be. The idea is that
a simple version of a given method is used until everything is
safely loaded, at which point it is replaced by the real version.
This happens in WhateverClass.after_loaded (and it is NOT automated.)
 
 
2.1 Safe Math Compiler Plugin
-----------------------------
 
Since the core libraries are built of the same blocks as any other
Ruby code and since Ruby is a dynamic language with open classes and
late binding, it is possible to change fundamental classes like
Fixnum in ways that violate the semantics that other classes depend
on. For example, imagine we did the following:
 
    class Fixnum
      def +(other)
        (self + other) % 5
      end
    end
 
While it is certainly possible to redefine fixed point arithmetic plus
to be modulo 5, doing so will certainly cause some class like Array to
be unable to calculate the correct length when it needs to. The dynamic
nature of Ruby is one of its cherished features but it is also truly a
double-edged sword in some respects.
 
In Stdlib, the 'mathn' library redefines Fixnum#/ in an unsafe and
incompatible manner. The library aliases Fixnum#/ to Fixnum#quo,
which returns a Float by default.
 
Because of this there is a special compiler plugin that emits a different
method name when it encounters the #/ method. The compiler emits #divide
instead of #/. The numeric classes Fixnum, Bignum, Float, and Numeric all
define this method.
 
The `-frbx-safe-math` switch is used during the compilation of the Core
libraries to enable the plugin. During regular 'user code' compilation,
the plugin is not enabled. This enables us to support mathn without
breaking the core libraries or forcing inconvenient practices.
 
 
2.2 ivar_as_index
-----------------
 
As described above, you'll see calls to @ivar_as_index@ kernel code.
This maps the class's numbered fields to ivar names, but ONLY for
that file.
 
You can NOT access those names using the @name syntax outside of that
file. (Doing so will cause maddeningly odd behavior and errors.)
 
For instance, if you make a subclass of IO, you can NOT access @descriptor
directly in your subclass. You must go through methods to access it only.
Notably, you can NOT just use the @#attr_*@ methods for this. The methods
must be completely written out so that the instance variable label can
be picked up to be translated.
 
 
2.3 Kernel- and user-land
-------------------------
 
Rubinius is in many ways architected like an operating system, so some
OS world terms may be easiest to describe the two modes that Rubinius
operates under:
 
'Kernel-land' describes how code in kernel/ is executed. Everything else
is 'user-land.'
 
Kernel-land has a number of restrictions to keep things sane and simple:
 
* #public, #private, #protected, #module_function require method names
  as arguments. The 0-argument version that allows toggling visibility
  in a class or module body is not available.
 
* Restricted use of executable code in class, module and script (file)
  bodies. @SOME_CONSTANT = :foo@ is perfectly fine, of course, but for
  example different 'memoizations' or other calculation should not be
  present. Code inside methods has no restrictions, broadly speaking,
  but keep dependency issues in mind for methods that may get called
  during the instantiation of the rest of the kernel code.
 
* @#after_loaded@ hooks can be used to perform more complex/extended
  setup or calculations for kernel classes. The @_cv@ methods mentioned
  above, for example, are replaced over the simpler bootstrap versions
  in the @#after_loaded@ hooks of the respective classes. @#after_loaded@
  is not magic, and will not be automatically called. If adding a new
  one, have kernel/loader.rb call it (at this point the system is
  fully up.)
 
* Kernel-land code does not use handle defining methods through
  @Module#__add_method__@ nor @MetaClass#attach_method@. It adds
  and attaches methods directly in the VM. This is necessary for
  bootstrapping.
 
* Any use of string-based eval in the kernel must go through discussion.
 
 
3. Specs (Specifications)
=========================
 
Probably the first or second thing you hear about Rubinius when
speaking to any of the developers is a mention of The Specs. It
is a crucial part of Rubinius.
 
Rubinius itself is being developed using the Behaviour-Driven
Design approach (a refinement of Test-Driven Design) where each
aspect of the behaviour of the code is first specified using
the spec format and only then implemented to pass those specs.
 
In addition to this, we have undertaken the ambitious task of
specifying the entirety of the Ruby language as well as its
Core and Stdlib libraries in this format which both allows us
to ensure our implementation is conformant with the Ruby standard
and, more importantly, to actually *define* that standard since
there currently is no formal specification of Ruby.
 
The de facto standard of BDD is set by "RSpec":http://rspec.info,
the project conceived to implement the then-new way of coding.
Their website is fairly useful as a tutorial as well, although
the spec syntax (particularly as used in Rubinius) is not very
complex at all.
 
Currently we actually use a compatible but vastly simpler
implementation specifically developed as a part of Rubinius
called MSpec (for mini-RSpec, as it was originally needed
because the code in RSpec was too complex to be run on our
not-yet-complete Ruby implementation.)
 
Specs live in the spec/ directory. spec/ruby/ specifies our
current target implementation, Ruby 1.8.6-p111 and it is
further split to various subdirectories such as language/
for language-level constructs such as, for example, the
@if@ statement and core/ for Core library code such as
@Array@.
 
Parallel to this the top-level spec/ directory itself has the
subdirectories for Rubinius-specific specs: additions and/or
deviations from the standard, Rubinius language constructs
etc. For example, the standard @String@ specs live under the
spec/ruby/1.8/core/string/ directory and if Rubinius implements
an additional method @String#to_morse@, the specs for it can
be found in spec/core/string/. Completely new classes such as
@CompiledMethod@ find their specs here as well.
 
The way to run the specs is contained in two small programs:
bin/mspec and bin/ci. The former is the "full" version that
allows a wider range of options and the latter is a streamlined
way of running Continuous Integration (CI) testing. CI is a
set of "known-good" specs picked out from the entirety of
them (which is what bin/mspec works with) using an automatic
exclusion mechanism. CI is very important for any Rubinius
developer: before each commit, bin/ci should be run and found
to finish without error. It makes it very easy to ensure that
your change did not break other, seemingly unrelated things
because it exercises all areas of specs. A clean bin/ci run
gives confidence that your code is correct.
 
For a deeper overview, tutorials, help and other information
about Rubinius' specs, start here:
 
http://rubinius.lighthouseapp.com/projects/5089/specs-overview
 
 
4. Libraries and C: Primitives vs. FFI
======================================
 
There are two ways to "drop to C" in Rubinius. Firstly, primitives
are special instructions that are specifically defined in the VM.
In general they are operations that are impossible to do in the
Ruby layer such as opening a file. Primitives should be used to
access the functionality of the VM from inside Ruby.
 
FFI or Foreign Function Interface, on the other hand, is meant as
a generalised method of accessing system libraries. FFI is able to
automatically generate the bridge code needed to call out to some
library and get the result back into Ruby. FFI functions at runtime
as real machine code generation so that it is not necessary to have
anything compiled beforehand. FFI should be used to access the code
outside of Rubinius, whether it is system libraries or some type of
extension code, for example.
 
There is also a specific Rubinius extension layer called Subtend.
It emulates the extension interface of Ruby to allow old Ruby
extensions to work with Rubinius.
 
 
4.1 Primitives
==============
Using the above rationale, if you need to implement a primitive:
 
* Give the primitive a sane name
* Implement the primitive in shotgun/lib/primitives.rb using the
  name you chose as the method name.
* Enter the primitive name as a symbol at the BOTTOM of the Array
  in shotgun/lib/primitive_names.rb.
* `rake build`
 
This makes your primitive available in the Ruby layer using the
special form @Ruby.primitive :primitive_name@. Primitives have a
few rules and chief among them is that a primitive must be the
first instruction in the method that it appears in. Partially for
this reason all primitives should reside in a wrapper method in
bootstrap/ (the other part is that core/ should be implementation
independent and primitives are not.)
 
In addition to this, primitives have another property that may
seem unintuitive: anything that appears below the primitive form
in the wrapper method is executed if the primitive FAILS and only
if it fails. There is no exception handling syntax involved. So
this is a typical pattern:
 
    # kernel/bootstrap/whatever.rb
    def self.prim_primitive_name()
      Ruby.primitive :primitive_name
      raise SomeError, "Whatever I was doing just failed."
    end
 
    # kernel/core/whatever.rb
    def self.primitive_name()
      self.prim_primitive_name
      ...
    end
 
To have a primitive fail, the primitive body (in primitives.rb)
should return FALSE; this will cause the code following the
Ruby.primitive line to be run. This provides a fallback so that
the operation can be retried in Ruby.
 
If a primitive cannot be retried in Ruby or if there is some
additional information that needs to be passed along to create
the exception, it may raise an exception using a couple of macros:
 
* RAISE(exc_class, msg) will raise an exception of type exc_class
  and with a message of msg, e.g.
 
    RAISE("ArgumentError", "Invalid argument");
 
* RAISE_FROM_ERRNO(msg) will raise an Errno exception with the
  specified msg.
 
If you need to change the signature of a primitive, follow this
procedure:
  1. change the signature of the kernel method that calls the
     VM primitive
  2. change any calls to the kernel method in the kernel/**
     code to use the new signature, then recompile
  3. run rake build:stable
  4. change the actual primitive in the VM and recompile again
  5. run bin/ci
 
4.2 FFI
-------
 
Module#attach_function allows a C function to be called from Ruby
code using FFI.
 
Module#attach_function takes the C function name, the ruby module
function to bind it to, the C argument types, and the C return type.
For a list of C argument types, see kernel/platform/ffi.rb.
 
Currently, FFI does not support C functions with more than 6
arguments.
 
When the C function will be filling in a String, be sure the Ruby
String is large enough. For the C function rbx_Digest_MD5_Finish,
the digest string is allocated with a 16 character length. The
string is passed to md5_finish which calls rbx_Digest_MD5_Finish
which fills in the string with the digest.
 
  class Digest::MD5
    attach_function nil, 'rbx_Digest_MD5_Finish', :md5_finish,
                    [:pointer, :string], :void
 
    def finish
      digest = ' ' * 16
      self.class.md5_finish @context, digest
      digest
    end
  end
 
For a complete additional example, see digest/md5.rb.
 
 
5. Debugging: debugger, GDB, valgrind
=====================================
 
With Rubinius, there are two distinct things that may need
debugging (sometimes at the same time.) There is the Ruby
code, for which 'debugger' exists. debugger is a full-speed
debugger, which means that there is no extra compilation or
flags to enable it but at the same time, code normally does
not suffer a performance penalty from the infrastructure.
This is achieved using a combination of bytecode substitution
and Rubinius' Channel IO interface. Multithreaded debugging
is supported (credit for the debugger goes to Adam Gardiner.)
 
On the C side, the trusty workhorse is the Gnu Debugger or
GDB. In addition there is support built in for Valgrind, a
memory checker/lint/debugger/analyzer hybrid.
 
 
5.1 debugger
------------
The nonchalantly named debugger is specifically the debugger
for Ruby code, although it does also allow examining the VM
as it runs. The easiest way to start it is to insert either
a @breakpoint@ or @debugger@ method call anywhere in your
source code. Upon running this method, the debugger starts
up and awaits your command at the instruction where the
@breakpoint@ or @debugger@ method used to be. For a full
explanation of the debugger, refer to [currently the source
but hopefully docs shortly.] You will see this prompt and
there is a trusty command you can try to get started:
 
    rbx:debug> help
 
 
5.2 GDB
-------
To really be able to use GDB, make sure that you build Rubinius
with DEV=1 set. This disables optimisations and adds debugging
symbols.
 
There are two ways to access GDB for Rubinius. You can simply
run shotgun/rubinius with gdb (use the builtin support so you
do not need to worry about linking etc.):
 
* Run `shotgun/rubinius --gdb`, place a breakpoint (break main,
  for example) and then r(un.)
* Alternatively, you can run and then hit ^C to interrupt.
 
You can also drop into GDB from Ruby code with @Kernel#yield_gdb@
which uses a rather rude but very effective method of stopping
execution to start up GDB. To continue past the @yield_gdb@,
j(ump) to one line after the line that you have stopped on.
 
Useful gdb commands and functions (remember, using the p(rint)
command in GDB you can access pretty much any C function in
Rubinius):
 
* rbt
  Prints the backtrace of the Ruby side of things. Use this in
  conjunction with gdb's own bt which shows the C backtrace.
 
* p _inspect(OBJECT)
  Useful information about a given Ruby object.
 
 
5.3 Valgrind
------------
Valgrind is a program for debugging, profiling and memory-checking
programs. The invocation is just `shotgun/rubinius --valgrind`.
See http://valgrind.org for usage information.
 
5.4 Tracing
-----------
 
Excessive tracing can rapidly fill your screen up with crap. To enable it,
 
  RBX=rbx.debug.trace shotgun/rubinius ...
 
=== END ===