# vim: tw=65
General help and instructions on writing code for Rubinius.
== Further Reading
At some point, you should read everything in doc/. It is not
necessary to understand or memorise everything but it will
help with the big picture at least!
== Files and Directories
Get to know your way around the place!
.load_order.txt::
Explains the dependencies between files so the VM can load them
in the correct order.
kernel/::
The Ruby half of the implementation. The classes, methods etc.
that make up the Ruby language environment are defined here.
Further divided into..
kernel/platform/::
Platform-dependent code wrappers that can then be used in other
kernel code. kernel/platform.conf is an autogenerated file that
defines various platform-dependent constants, offsets etc.
kernel/bootstrap/::
Minimal set of incomplete core classes that is used to load up
the rest of the system. Any code that requires Rubinius' special
abilities needs to be here too.
kernel/common/::
Complete implementation of the core classes. Builds on and/or
overrides bootstrap/. Theoretically this code should be portable
so all Rubinius-dependent stuff such as primitives goes in
bootstrap/ also.
kernel/delta/::
Some methods have proto-implementations to enable loading
kernel/common. Redefine these methods in kernel/delta to provide
the complete implementation. Also the place to put any code that
needs to run after kernel/common has completed loading.
runtime/::
Contains run-time compiled files for Rubinius. You'll use these
files when running shotgun/rubinius
runtime/stable/::
Known-good versions of the Ruby libraries that are used by the
compiler to make sure you can recompile in case you break one
of the core classes.
vm/::
All of the C code that implements the VM as well as the
extremely bare-bones versions of some Ruby constructs.
vm/external_libs/::
Libraries required by Rubinius, bundled for convenience.
lib/::
All Ruby standard libraries that are verified to work as well
as any Rubinius-specific standard libraries. Of special
interest here are three subdirectories:
lib/bin/::
Some utility programs such as lib/bin/compile.rb which is used
to compile files during the build process.
lib/ext/::
C extensions that use Subtend.
lib/compiler/::
This is the compiler (implemented completely in Ruby.)
stdlib/::
This is the Ruby Stdlib, copied straight from the distribution.
These libraries may not yet work on Rubinius (at least, have
not been tried.) When a library is verified to work, it is
copied to lib/ instead.
bin/::
Various utility programs like bin/mspec and bin/ci.
benchmark/::
All benchmarks live here. The rubinius/ subdirectory is not in
any way Rubinius-only, all those benchmarks were just written
as part of this project (the rest are from somewhere else.)
spec/ and test/::
These contain the behaviour specification and verification
files. See the specs section for information about specs. The
test/ directory is deprecated but some old test code lives
here.
Notes: Occasionally working with kernel/ you may seem classes
that are not completely defined or looks strange. Remember that
some classes are set up in the VM and we are basically just
reopening those classes.
== Getting the C++ branch
The C++ branch is now master. There is nothing else you need do.
== VM
=== Building the VM
To build the codebase:
rake build
Build and run the tests:
rake vm:test
To run only one test suite, use:
rake vm:test[SomeClass]
If you want to run a single test suite under gdb, use:
SUITE="SomeClass" gdb vm/test/runner
You can use `git pull` or `rake git:pull to get updates.
Use `git push` or `rake git:push` to send back any of your
commits to the C++ branch.
You can use `git push origin cpp` to specify that you only want
to push your local 'cpp' branch changes to the remote 'cpp'
branch. By default, git will try to push 'master' and 'cpp'.
=== Compiling to .rbc with MRI
MRI can be used to compile Ruby files to bytecode, which can then be run by
the VM. To compile 'file.rb' to 'file.rbc':
# in the root of the cpp directory
$ rake compile_ruby[file.rb]
# Depending on your shell, you may need to escape the brackets
=== Running code with the VM
A Ruby file can be run by the VM with the following command:
rake run_ruby[file.rb]
Add -t to enable printing of probes in the vm to see various information on
your screen. Alternately, run the vm by hand:
rake -t run_ruby[file.rb]
PROBE=1 vm/vm file.rbc
=== Examining C++ exceptions
If the vm exits with a C++ exceptions you may not get much useful
information. Re-run with gdb to examine the cause of the
failure. To start:
$ gdb vm/vm
(gdb) break rubinius::ExceptionClass::raise
(gdb) run file.rbc
[normal gdb usage here]
See below for some functions rubinius provides for making gdb
debugging easier.
=== Slots
"Slots" are the name given to members of C++ classes that are
available as psuedo instance variables in the Ruby classes. For
example, vm/builtin/exception.hpp has the following member:
private:
String* message_; // slot
In the Ruby Exception class, getting or setting @message will read
or change the value of message_ in the C++ class.
The // slot comment annotates the member as a slot. The annotation
is processed by vm/codegen/field_extract.rb to generate the glue
code to make this work.
In C++, all of the "slot" members are private. There are three
macros used to output getter and setter functions for the members.
These macros are named after their Ruby analogues:
attr_reader(name, Class);
attr_writer(name, Class);
attr_accessor(name, Class);
As in Ruby, attr_accessor is a combination of both attr_reader and
attr_writer. Again, looking in vm/builtin/exception.hpp, we see:
public:
/* accessors */
attr_accessor(message, String);
The macro causes the following functions to be written:
void message(STATE, String* obj) {
message_ = obj;
this->write_barrier(state, (OBJECT)obj);
}
}
String* message() { return message_; }
See vm/builtin/object.hpp for the macro definitions.
Note that the name of the slot has a trailing `_' character.
External to the C++ class, use the accessor functions to set and get
the value of the "slot" member. Internal to the class, you can use
the member (e.g. message_) directly.
If you set a slot declared as other than type OBJECT to Qnil, you
will need to cast the value you are passing. For example, to set
message_ to Qnil, you need to do the following:
Exception* exc = ...
exc->message(state, static_cast<String*>Qnil);
(The cast requirement will probably go away soon.)
=== Exceptions
Invariably, things go wrong. When those things are inside the VM,
it would be nice to handle them from Ruby code rather than being
unceremoniously dumped at the shell prompt. While C++ has
structured exception handling, there is a special way of throwing
an exception that will be propagated into Ruby-land.
The C++ exception is named RubyException, but rather than
throwing it directly, use one of the static methods on the
builtin Exception class. For example, to raise an ArgumentError:
Exception::argument_error(state, "this failed for some reason");
See also vm/builtin/exception.hpp.
=== Primitives
Primitives are normal methods on C++ classes. Comment annotation
links the C++ method to a symbol with which the primitive is
accessed in Ruby code.
For example, consider the Ruby Fixnum class:
class Fixnum
def -@
Ruby.primitive :fixnum_neg
raise PrimitiveFailure, "Fixnum#-@ primitive failed"
end
end
In the C++ file, vm/builtin_fixnum.hpp, the primitive is annotated:
namespace rubinius {
class Fixnum : public Integer {
// ...
// Ruby.primitive :fixnum_neg
INTEGER neg(STATE) {
The magic for this happens in vm/codegen/field_extract.rb and the
output goes to vm/gen/primitives_declare.hpp and
vm/gen/primitives_glue.gen.cpp.
There are two ways to annotate the C++ methods as primitives. If
there is a single C++ method, use 'Ruby.primitive
:name_of_primitive'.
If there are multiple C++ methods (i.e. overloaded methods), use
'Ruby.primitive! :name_of_primitive'. The '!' annotation is used
for each overloaded method and uses the argument types to
determine which implementation method to call for the primitive.
The resulting glue code for overloaded methods looks something
like the following, but please do not assume this sample is
up-to-date:
bool Primitives::float_mul(STATE, Executable* exec, Task* task, Message& msg) {
OBJECT ret;
if(msg.args() != 1)
goto fail;
Float* recv;
if((recv = as<Float>(msg.recv)) == NULL)
goto fail;
if(Float* arg = try_as<Float>(msg.get_argument(0))) {
ret = recv->mul(state, arg);
} else if(Integer* arg = try_as<Integer>(msg.get_argument(0))) {
ret = recv->mul(state, arg);
} else goto fail;
if(ret == reinterpret_cast<Object*>(kPrimitiveFailed))
goto fail;
task->primitive_return(ret, msg);
return false;
fail:
return VMMethod::execute(state, exec, task, msg);
}
To have a primitive fail, the primitive body should return
Primitives::failure(). This will cause the code following the
Ruby.primitive line in the kernel to be run. This provides a
fallback so that the operation can be retried in Ruby.
If a primitive cannot be retried in Ruby or if there is some
additional information that needs to be passed along to create
the exception, it may raise other exceptions.
See the section above on Exceptions.
==== Primitive tests
Since the "primitives" are ordinary C++ methods, tests for them
are written along with the other VM tests. Each builtin_xxx.cpp
method has a corresponding vm/test/test_xxx.hpp method. See the
existing files for more examples.
== Working with Kernel classes
Any time you make a change here -- or anywhere else for that
matter -- make sure you do a full rebuild (rake kernel:build) to
pick up the changes, then run the related specs, and then run
bin/ci to make sure that also the *unrelated* specs still work
(minimal-seeming changes may have broad consequences.)
If you modify a kernel class, you need to `rake build` after to
have the changes picked up. With some exceptions, you should not
regenerate the stable files. They will in most cases work just
fine even without the newest code. `rake build:stable` is the
command for that.
If you create a new file in one of the kernel subdirectories, it
will be necessary to regenerate the .load_order.txt file in the
equivalent runtime subdirectory in order to get your class loaded
when Rubinius starts up. Use the rake task build:load_order to
regenerate the .load_order.txt files.
=== Safe Math Compiler Plugin
Since the core libraries are built of the same blocks as any
other Ruby code and since Ruby is a dynamic language with open
classes and late binding, it is possible to change fundamental
classes like Fixnum in ways that violate the semantics that other
classes depend on. For example, imagine we did the following:
class Fixnum
def +(other)
(self + other) % 5
end
end
While it is certainly possible to redefine fixed point arithmetic
plus to be modulo 5, doing so will certainly cause some class
like Array to be unable to calculate the correct length when it
needs to. The dynamic nature of Ruby is one of its cherished
features but it is also truly a double-edged sword in some
respects.
In the standard library the 'mathn' library redefines Fixnum#/ in
an unsafe and incompatible manner. The library aliases Fixnum#/
to Fixnum#quo, which returns a Float by default.
Because of this there is a special compiler plugin that emits a
different method name when it encounters the #/ method. The
compiler emits #divide instead of #/. The numeric classes Fixnum,
Bignum, Float, and Numeric all define this method.
The `-frbx-safe-math` switch is used during the compilation of
the Core libraries to enable the plugin. During regular 'user
code' compilation, the plugin is not enabled. This enables us to
support mathn without breaking the core libraries or forcing
inconvenient practices.
=== Kernel-land and user-land
Rubinius is in many ways architected like an operating system, so
some OS world terms may be easiest to describe the two modes that
Rubinius operates under:
'Kernel-land' describes how code in kernel/ is executed.
Everything else is 'user-land.'
Kernel-land has a number of restrictions to keep things sane and
simple:
* #public, #private, #protected, #module_function require method
names as arguments. The 0-argument version that allows toggling
visibility in a class or module body is not available.
* Restricted use of executable code in class, module and script
(file) bodies. <tt>SOME_CONSTANT = :foo<tt> is perfectly fine,
of course, but for example different 'memoizations' or other
calculation should not be present. Code inside methods has no
restrictions, broadly speaking, but keep dependency issues in
mind for methods that may get called during the instantiation
of the rest of the kernel code.
* Kernel-land code does not use handle defining methods through
Module#__add_method__ nor MetaClass#attach_method. It adds
and attaches methods directly in the VM. This is necessary for
bootstrapping.
* Any use of string-based eval in the kernel must go through
discussion.
== Specs (Specifications)
Probably the first or second thing you hear about Rubinius when
speaking to any of the developers is a mention of The Specs. It
is a crucial part of Rubinius.
Rubinius itself is being developed using the Behaviour-Driven
Design approach (a refinement of Test-Driven Design) where each
aspect of the behaviour of the code is first specified using the
spec format and only then implemented to pass those specs.
In addition to this, we have undertaken the ambitious task of
specifying the entirety of the Ruby language as well as its Core
and Stdlib libraries in this format which both allows us to
ensure our implementation is conformant with the Ruby standard
and, more importantly, to actually *define* that standard since
there currently is no formal specification of Ruby.
The de facto standard of BDD is set by RSpec[http://rspec.info],
the project conceived to implement the then-new way of coding.
Their website is fairly useful as a tutorial as well, although
the spec syntax (particularly as used in Rubinius) is not very
complex at all.
Currently we actually use a compatible but vastly simpler
implementation specifically developed as a part of Rubinius
called MSpec (for mini-RSpec, as it was originally needed because
the code in RSpec was too complex to be run on our
not-yet-complete Ruby implementation.)
Specs live in the spec/ directory. spec/ruby/ specifies our
current target implementation, Ruby 1.8.6-p111 and it is further
split to various subdirectories such as language/ for
language-level constructs such as, for example, the +if+
statement and core/ for Core library code such as +Array+.
Parallel to this the top-level spec/ directory itself has the
subdirectories for Rubinius-specific specs: additions and/or
deviations from the standard, Rubinius language constructs etc.
For example, the standard +String+ specs live under the
spec/ruby/1.8/core/string/ directory and if Rubinius implements
an additional method +String#to_morse+, the specs for it can be
found in spec/core/string/. Completely new classes such as
+CompiledMethod+ find their specs here as well.
The way to run the specs is contained in two small programs:
bin/mspec and bin/ci. The former is the "full" version that
allows a wider range of options and the latter is a streamlined
way of running Continuous Integration (CI) testing. CI is a set
of "known-good" specs picked out from the entirety of them (which
is what bin/mspec works with) using an automatic exclusion
mechanism. CI is very important for any Rubinius developer:
before each commit, bin/ci should be run and found to finish
without error. It makes it very easy to ensure that your change
did not break other, seemingly unrelated things because it
exercises all areas of specs. A clean bin/ci run gives confidence
that your code is correct.
For a deeper overview, tutorials, help and other information
about Rubinius' specs, start here:
http://rubinius.lighthouseapp.com/projects/5089/the-rubinius-specs
== Libraries and C++ Primitives vs. FFI
There are two ways to "drop to C" in Rubinius. Firstly, primitives
are special instructions that are specifically defined in the VM.
In general they are operations that are impossible to do in the
Ruby layer such as opening a file. Primitives should be used to
access the functionality of the VM from inside Ruby.
FFI or Foreign Function Interface, on the other hand, is meant as
a generalised method of accessing system libraries. FFI is able to
automatically generate the bridge code needed to call out to some
library and get the result back into Ruby. FFI functions at runtime
as real machine code generation so that it is not necessary to have
anything compiled beforehand. FFI should be used to access the code
outside of Rubinius, whether it is system libraries or some type of
extension code, for example.
There is also a specific Rubinius extension layer called Subtend.
It emulates the extension interface of Ruby to allow old Ruby
extensions to work with Rubinius.
=== Primitives
Using the above rationale, if you need to implement a primitive:
* Give the primitive a sane name
* Implement the primitive in the appropriate C++ class and wire it
up using the name you chose as described above in the VM section.
* run `rake build`
See the above VM section on primitives.
=== FFI
Module#attach_function allows a C function to be called from Ruby
code using FFI.
Module#attach_function takes the C function name, the ruby module
function to bind it to, the C argument types, and the C return
type. For a list of C argument types, see
kernel/platform/ffi.rb.
Currently, FFI does not support C functions with more than 6
arguments.
When the C function will be filling in a String, be sure the Ruby
String is large enough. For the C function rbx_Digest_MD5_Finish,
the digest string is allocated with a 16 character length. The
string is passed to md5_finish which calls rbx_Digest_MD5_Finish
which fills in the string with the digest.
class Digest::MD5
attach_function nil, 'rbx_Digest_MD5_Finish', :md5_finish,
[:pointer, :string], :void
def finish
digest = ' ' * 16
self.class.md5_finish @context, digest
digest
end
end
For a complete additional example, see digest/md5.rb.
== Debugging: debugger, GDB, valgrind
With Rubinius, there are two distinct things that may need
debugging (sometimes at the same time.) There is the Ruby code,
for which 'debugger' exists. debugger is a full-speed debugger,
which means that there is no extra compilation or flags to enable
it but at the same time, code normally does not suffer a
performance penalty from the infrastructure. This is achieved
using a combination of bytecode substitution and Rubinius'
Channel IO interface. Multithreaded debugging is supported
(credit for the debugger goes to Adam Gardiner.)
On the C side, the trusty workhorse is the Gnu Debugger or GDB.
In addition there is support built in for Valgrind, a memory
checker/lint/debugger/analyzer hybrid.
=== debugger
The nonchalantly named debugger is specifically the debugger for
Ruby code, although it does also allow examining the VM as it
runs. The easiest way to start it is to insert either a
+breakpoint+ or +debugger+ method call anywhere in your source
code. Upon running this method, the debugger starts up and awaits
your command at the instruction where the +breakpoint+ or
+debugger+ method used to be. For a full explanation of the
debugger, refer to [currently the source but hopefully docs
shortly.] You will see this prompt and there is a trusty command
you can try to get started:
rbx:debug> help
=== GDB
To really be able to use GDB, make sure that you build Rubinius
with DEV=1 set. This disables optimisations and adds debugging
symbols.
There are two ways to access GDB for Rubinius. You can simply
run shotgun/rubinius with gdb (use the builtin support so you
do not need to worry about linking etc.):
* Run `shotgun/rubinius --gdb`, place a breakpoint (break main,
for example) and then r(un.)
* Alternatively, you can run and then hit ^C to interrupt.
You can also drop into GDB from Ruby code with +Kernel#yield_gdb+
which uses a rather rude but very effective method of stopping
execution to start up GDB. To continue past the +yield_gdb+,
j(ump) to one line after the line that you have stopped on.
Useful gdb commands and functions (remember, using the p(rint)
command in GDB you can access pretty much any C function in
Rubinius. Also see ./.gdbinit):
rbt::
Prints the backtrace of the Ruby side of things. Use this in
conjunction with gdb's own bt which shows the C backtrace.
rp some_obj::
Prints detailed information about a given Ruby object.
rps some_obj::
Prints brief information about a given Ruby object.
The gdb rp and rps commands use the C-exported functions __show__
and __show_simple__. The default output of these functions is
defined on the TypeInfo class. The other C++ classes define the show
and show_simple functions to override the default output. The
default output is to show the class name and the address of the
instance like "#<SomeClass:0x3490301".
The immediate classes like NilClass, TrueClass, etc. and the numeric
classes like Fixnum and Bignum define show and show_simple to both
output their values. More complex classes like CompiledMethod and
Tuple use the show_simple function to limit the information printed.
See also vm/type_info.hpp
=== Valgrind
Valgrind is a program for debugging, profiling and memory-checking
programs. The invocation is just `shotgun/rubinius --valgrind`.
See http://valgrind.org for usage information.
=== Probes
The C++ VM uses the TaskProbe class to define and manage several
probes that, when enabled, print information about VM operations.
The probes can be enabled from an environmental variable or from
Ruby code. To enable all probes from the environment, use:
PROBE=all vm/vm OR
PROBE=1 vm/vm
To enable only certain probes from the environment, use e.g.:
PROBE=start_method,execute_instruction vm/vm
From Ruby code, enable or disable probes globally with the TaskProbe
class method. For example:
TaskProbe.enable :execute_instruction
TaskProbe.disable :start_method
Or, create a TaskProbe instance and use the #show method with a
block:
meth = TaskProbe.new :start_method
meth.show { # some ruby code here }
See also kernel/common/taskprobe.rb and vm/builtin/taskprobe.hpp.