Skip to content
This repository has been archived by the owner on Mar 26, 2023. It is now read-only.

Latest commit

 

History

History
654 lines (504 loc) · 25.4 KB

persistence-api.rdoc

File metadata and controls

654 lines (504 loc) · 25.4 KB

Proposed MagLev Persistence API

This document is a work in progress and GemStone is looking for feedback on how to improve the persistence API.

Overview

This document provides a brief overview of the MagLev persistence models and API for MagLev Alpha Developers. For more detail see:

  • The Persistence MSpecs (in src/test/persistence/) specify detailed semantics of the persistence methods. Comments outline use cases and provide additional insight. Note: these MSpecs are not yet runnable.

  • The persistence API

  • The GemStone/S 64 Bit Programming Guide: provides a full discussion of the GemStone persistence model (including transactions, locking, etc.), maglev.github.com/docs/further_reading.html

Synopsis

Out of the box, MagLev provides two persistence models:

  1. MRI Compatibility (default): If you take no explicit action in your code, then you get no MagLev native object persistence. Your ruby programs behave just as they do under MRI.

  2. MagLev Persistence: To take advantage of MagLev’s native object persistence requires explicit action, either adding directives to your code, or starting the VM in persistence mode.

In the MRI Compatibility model, nothing is saved to the MagLev object repository, and MagLev behaves like standard MRI Ruby. The only difference is your code will execute in MagLev’s stable, high performance VM. This model provides the simplest mechanism for porting existing apps to the MagLev VM. Persistence is available through MySQL or other traditional mechanism. However, for optimum benefit and performance from MagLev, you should use MagLev’s native object persistence for both code and data.

In the MagLev Persistence model, an object is persisted by:

  • Ensuring that its class and all ancestors (super classes and mixed in modules) are persistable.

  • Attaching the object to a persistent root (e.g., Maglev::PERSISTENT_ROOT)

  • Calling Maglev.commit_transaction

A Class is made persistable by either marking it explicitly, MyClass.maglev_persistable, or by having its initial class declaration run within a persistent block:

Maglev.persistent do
  class MyClass
    # ...
  end
end

The MagLev Persistence model does not affect the use of other, traditional Ruby persistence mechanisms such as Yaml, JSON, Ruby Marshaling or storing data in an RDBMS (e.g., through ActiveRecord or another ORM). All of these mechanisms will work in MagLev, just as they work in MRI.

The MagLev examples directory contains several examples of using MagLev persistence, including the GStore example, which provides a small persistence library based on the Ruby PStore API. Read the comments in examples/persistence/gstore/ for complete details.

If neither the MRI Compatibility model nor the MagLev Persistence model is appropriate for your application, you may create your own custom persistence model using the full GemStone/S API via the Smalltalk FFI wrapper classes. See docs/smalltalk_ffi or after running rake rdoc see the .rb files in lib/ruby/site_ruby/1.8/smalltalk.

Guiding Principles

The two principles that guided the MagLev persistence API are:

  1. Unmodified MRI code, when run on MagLev, runs just like in MRI.

  2. Explicit action is necessary to take advantage of MagLev persistence (MagLev will not surprise you by persisting objects you didn’t request).

Some of the other goals were:

  • For the first round of the API, provide the minimal set of persistence primitives necessary to persist objects and classes. GemStone is open to adding convenience methods in subsequent versions of the API.

  • By explicitly requesting MagLev persistence, the programmer acknowledges they understand and will deal with the effects of that decision.

  • Provide full access to the entire underlying GemStone persistence API for those who need it.

  • MagLev’s ruby code will provide enhancements over the analogous smalltalk API (e.g., commit_transaction throws an exception instead of returning false if the transaction fails).

Persistence Background: How Persistence works in the MagLev VM

This section provides some background on how the MagLev VM works in conjunction with the persistent object repository.

MagLev VMs and the Object Repository

In MagLev, ruby code is run in a process called the MagLev VM (virtual machine). Code (classes, methods, modules, etc.) and other objects may be transactionally and persistently stored in a MagLev object repository. At VM startup, a MagLev VM connects to a repository. Many MagLev VMs can share the same repository. All MagLev VMs that connect to a particular repository share a collection of classes and objects managed by that repository. All of the standard ruby classes (Object, Array, Kernel, Hash, etc.) are already defined and loaded into a MagLev repository, and are thus available immediately when a VM connects to the repository. Any user defined classes that have been previously committed to the repository are also available to all connected VMs. The MagLev VMs and repository may reside on the same machine, or may be distributed on a network.

The MagLev VM and repository are based on the GemStone/S 64-bit product, which is used in a number of mission critical applications to connect thousands of VMs, running thousands of transactions per second, connected to a repository managing terabytes of data. MagLev allows Ruby programs to take advantage of the the full GemStone/S built-in, shared, scalable, distributed cache and native object repository.

Persistence by Reachability

The Persistence model the MagLev VM provides is persistence by reachability. That means that if an_object is persisted, then all objects reachable from an_object (by following references held in instance variables) will also be persisted. MagLev provides a well known persistent object, a Ruby Hash Maglev::PERSISTENT_ROOT, that can be used as a persistent root. Since a Ruby Hash object can “reach” each of the keys and values stored in the Hash, all of those keys and values (and all of the objects they can reach) will be persisted at the next Maglev.commit_transaction.

Since every object has a reference to its class object and each class object has a reference to its superclass (and mixed in modules), in order to persist an object, you must also persist its class (and the whole ancestor chain as well). The VM maintains flags in each class and module that marks whether it (Module#maglev_persistable, Module#maglev_persistable?) and its instances (Class#maglev_instances_persistable=, Class#maglev_instances_persistable?) are allowed to be persisted. If you attempt to commit an object whose class is not marked persistable or not marked as having persistable instances, then the VM will raise an exception and the current transaction will remain uncommitted. All of the core classes (Object, Array, String, etc.) are already marked persistable.

The MagLev persistence API is therefore centered around how to mark classes and modules as persistable. Once that aspect of your program has been worked out, the management of persistence is straightforward. When a class marked persistable is actually committed to the repository, then the standard rules for persistence are applied (the transitive closure of objects reachable from that class object, is also persisted). Among the items reachable from a class are its methods, its singleton class (which holds the “class methods” and @bar class instance variables), its @@foo class variables, its class (Class or Module), its super class and any constants defined for that class. Note that instance variables for instances of the class are stored with the instance, not with the class.

Note: persisting a class does not imply that any instances are automatically persisted. To persist an instance of a persistable class, the instance must be explicitly attached to a persistent root by user code, and then committed to the repository.

MRI Compatibility: Just like MRI

When a new MagLev VM starts up, it initializes its set of classes, methods and objects from the MagLev object repository. The default repository shipped with MagLev has all the standard classes, modules, objects and variables that one finds in a fresh MRI instance.

All new methods, instances and constants defined by the code loaded into a fresh VM are local to that VM, and are available only while that VM is running – i.e. they will be lost when that VM exits. No other VM that attaches to the same MagLev repository will see modifications made by a VM using the default MRI Compatibility model. File loading, etc. is just like MRI.

Note: Even though a VM run in MRI compatibility mode will not see the effects of other VMs during its runtime, it will see changes by other VMs that were committed to the repository before the current VM started.

MagLev Persistence

To take advantage of the transactional, shared, distributed, cached, native MagLev persistence, the MagLev programmer has to take explicit action. To persist an object, its class must be persistent.

By default, MagLev runs in “auto-transaction” mode, which means that each VM is always in a transaction. If you are using the MRI Compatibility model, there are no visible consequences of this. If you using the MagLev Persistence model, then you must explicitly commit the current transaction to save your data. The VM will automatically start a new transaction after a commit or abort.

When you use the explicit MagLev Persistence model, you’ve entered a repository based view of your program (the primary view of the code lives in the repository, not your files). Since, in the default MRI model, nothing is committed to the repository, the state of the VM is reflected closely by the state of your .rb files. But in MagLev Persistence model, as you begin to commit items into the repository (and other VMs also commit to the repository), the state of the code may drift away from the state of the .rb files. This is not a problem, just something to keep in mind.

  • MRI is file based

  • MagLev is repository based

Persisting Classes

There are two ways to use the MagLev persistence API to mark classes for persistence:

  • Use of Maglev.persistent and Maglev.transient

  • Use of Module#maglev_persistable

Maglev.persistent and Maglev.transient Basics

Maglev.persistent and Maglev.transient provide a mechanism (ruby blocks) to mark sections of your code as containing persistable or non-persistable modifications to classes and modules. This, combined with starting the VM in either persistent or transient mode, provide complete control over class and module persistence.

If the VM is started in the default transient mode, then unless action is taken, all class definitions are transient (the classes will have their persistable bit set to false, and no instances of those classes will be committable to the repository). To mark a class as persistable, do the following:

# The VM is started in the default, transient mode

# First definition of the Transient class.
#
# This class is outside of a Maglev.persistent block, so it will not be
# marked as persistable, and it and its instances will not be
# committable to the repository
class Transient
  # ...
end

# First definition of the Persistent class
#
# This class has its initial definition executed in a persistent block,
# so it will be marked as persistable,
Maglev.persistent do
  class Persistent
    # ...
  end
end

Maglev.commit_transaction

If you have your code factored into different files, then you can require them like this:

# The VM is started in the default, transient mode
require 'some_lib'

# Load the class definitions for persistable classes
Maglev.persistent do
  require 'foo'
  require 'bar'
end

# This wrapping is not required, but is allowed, and protects the
# intent, even if someone starts the VM in persistent mode.
Maglev.transient do
  require 'some_other_lib'
end

The class definitions in foo.rb and bar.rb, including any files they require or load, will be executed in persistent mode. The class definitions in some_lib.rb and some_other_lib.rb, and any files they load or require, will be executed in transient mode.

Note: it is safe to call either Maglev.persistent or Maglev.transient regardless of the mode the VM is in. There are more detailed uses cases that explore finer grained control over class persistence later in this document.

To query the current state of the VM, use Maglev.persistent? or Maglev.transient?.

Threads and persistence

The current persistence mode (transient or persistent), is a thread local variable. I.e., one thread can be running in persistent mode (all changes to persistable objects made by that thread are candidates for persistence) at the same time another thread can be running in transient mode (all changes to objects, persistable or otherwise, will not be committed to the repository).

Currently, each thread starts in transient mode. In the future, there may be flags to the VM to control whether threads start off in transient or persistent mode.

Module#maglev_persistable Basics

The other mechanism to control persistence is to explicitly set the persistable flag on individual classes and modules. MyClass.persistable marks the class as persistable, so that instances of the class may be persisted. Additionally, it stages for commit, the class and all of its constants, (class) instance variables, class variables, methods and class methods. Finally, the constant naming the class, if appropriate, is persisted within the appropriate namespace, if that namespace is persistent.

A class may be persisted by issuing the following command:

MyClass.persistable          # stage the class for persistence
Maglev.commit_transaction    # commit the repository

In order to successfully persist a class, all of the classes in its superclass chain (including mixed in modules) must also be persisted.

The next time a MagLev VM starts up and connects to the repository, the VM will see the last committed state of the persistable classes. Any VM already connected to the repository will see the new class (or modifications) the next time it does a Maglev.commit_transaction or a Maglev.abort_transaction.

Singleton classes

The MagLev VM marks all singleton classes as persistable, and marks them to allow persistable instances. Singleton classes, and their persistent method dictionaries, are automatically persisted by reachability if their object is persisted.

Note: only singleton methods defined in a persistent block are persisted. E.g., suppose demo.rb has:

# demo.rb
s1 = "hello"
s2 = "goodbye"

Maglev::PERSISTENT_ROOT[:s1] = s1
Maglev::PERSISTENT_ROOT[:s2] = s2

def s1.foo
  puts "foo called on #{self}"
end

Maglev.persistent do
  def s2.foo
    puts "foo called on #{self}"
  end
end

Maglev.commit_transaction

At the commit_transaction, both s1 and s2 will have their singleton classes saved, but only s2 will have the #foo method defined (s1’s singleton class will have an empty persistent method dictionary):

$ maglev-ruby -e 'Maglev::PERSISTENT_ROOT[:s1].foo'
ERROR 2010, NoMethodError: undefined method `foo' for singleton (NoMethodError)

$ maglev-ruby -e 'Maglev::PERSISTENT_ROOT[:s2].foo'
foo called on goodbye

It’s possible that this behavior ends up be annoying. If experience tells us that all singleton methods should automatically be saved, then we could implement that. Conversely, it might be that by default, no singleton class should be saved. Time will tell…

Re-Opening Classes to Achieve Finer Control

Once a class has been persisted, it may subsequently be re-opened. Only those occurrences of a re-opening that are done with the VM in persistent mode will have their modifications persisted.

A short example:

# Step 1:  First opening of the class
Maglev.persistent do
  class C
    A_CONST=1
    def initialize
      ...
    end
  end
end

Maglev.commit_transaction   # commit the class
...

# Step 2
#
# This re-opening of the class is not within the scope of a call to
# Maglev.persistent, so none of these changes will be staged for
# persistence.  The current VM will be the only VM to see
# A_NON_PERSISTENT_CONST, a_non_persistent_method and
# an_ambiguous_method.
class C
  A_NON_PERSISTENT_CONST = 42
  def a_non_persistent_method
  end

  def an_ambiguous_method
  end

end

...

# Step 3
#
# This re-opening of the class *is* within the scope of a call to
# Maglev.persistent, so all of these changes will be staged for
# persistence.  This will stage A_SECOND_PERSISTENT_CONST,
# a_persistent_method, and an_ambiguous_method persistent for
# persistence, but will NOT stage the other items from Step 2 for
# persistence (i.e. A_NON_PERSISTENT_CONST and a_non_persistent_method
# are still local to the VM and non-persistent; an_ambiguous_method
# becomes persistent, with the definition from step 3).

Maglev.persistent do
  class C
    A_SECOND_PERSISTENT_CONST = 53

    def a_persistent_method
    end

    def an_ambiguous_method
    end
  end
end

Maglev.commit_transaction

The ability to make transient changes to persistent classes enables multiple VMs to run highly dynamic code without unnecessary commit conflicts. As an example Rails dynamically defines many methods through the method_missing hook (e.g., dynamically created find_by_* methods). These methods will be created as needed by individual VMs. If the class was marked as dirty due to these dynamic method definitions, then at commit time, there would be conflicts because several VMs would be trying to define the same methods on the fly. Rails is designed to run with these methods defined in this manner, so the MagLev VM allows them to be transiently defined (only the current VM will see them), and thus avoids unnecessary commit conflicts.

Persisting “Data”, or Objects

To persist an object, its class must first be marked persistable, then, just follow the normal rules of MagLev persistence:

  • Attach the object to a persistent root

  • Call Maglev.commit_transaction

Persistent Root

MagLev provides the Maglev::PERSISTENT_ROOT Hash as the well-known root for persistent objects. An object does not need to connect directly to Maglev::PERSISTENT_ROOT, but just needs to be reachable from a persistent root.

Use Case: Persist a core object

Since all of the core Ruby classes, including Object, String etc. are already persistable, you can persist instances of those classes immediately:

# Persist an instance of a core object
Maglev::PERSISTENT_ROOT[:foo] = "Hi, I'm persistent"

# Make a non-persistent instance
$normal = "Hi, I'm just a regular, non-persistent string"

# Commit to make permanent
Maglev.commit_transaction

# At this point, Maglev::PERSISTENT_ROOT[:foo] is saved in the repository
# and available to all VMs. $normal is, available only in this VM,
# since it is not attached to a persistent root (global variables are
# not persistent).

While it is possible to store individual objects directly into Maglev::PERSISTENT_ROOT, we expect that Maglev::PERSISTENT_ROOT will typically hold collections of objects, rather than individual objects, e.g., something like the following is possible:

# Create a persistent collection for Employee objects
Maglev::PERSISTENT_ROOT[:employees] = Array.new

Maglev.persistent do
  class Employee
    # ...

    # Override new to save instances
    def self.new(*args)
      instance = super
      instance.initialize(*args)
      Maglev::PERSISTENT_ROOT[:employees] << instance
      instance
    end

  end
end

Maglev.commit_transaction

Real world programs may be more selective in what they save…

Use Case: Calling Maglev.abort_transaction

A Maglev.abort_transaction will set the value of all persistent objects, including the state of Maglev::PERSISTENT_ROOT to the current, committed state of the repository – i.e. it will erase local changes to persistent values, but leave local changes to local values alone. Changes made to shared data / classes from other VMs before this abort will be made visible to this VM.

# Assume Maglev::PERSISTENT_ROOT[:maybe] is nil:
Maglev::PERSISTENT_ROOT[:maybe]  # => nil

# Stage an object for persistence
Maglev::PERSISTENT_ROOT[:maybe] = "I want to be persistent...but..."

# create a local variable that will be unaffected by the
# abort_transaction:
$clueless = "Yup"

Maglev.abort_transaction

# At this point, the state of Maglev::PERSISTENT_ROOT is reset to the
# default value but local variables are unaffected

Maglev::PERSISTENT_ROOT[:maybe] # => nil
$clueless # => "Yup"
Use Case: Persist a user defined object

To persist an object from a user defined class, you must first persist the class, and then commit the object:

# Define a class
class Foo
   # stuff
end

# Mark the class as persistable
Foo.persistable

# Create a space in PERSISTENT_ROOT to hold persistent Foo objects:
Maglev::PERSISTENT_ROOT[:my_favorite_foos] = Array.new

# Connect a Foo instance to a persistent root
Maglev::PERSISTENT_ROOT[:my_favorite_foos] << Foo.new

Maglev.commit_transaction  # commit class Foo and the instance f.

At this point, all MagLev VMs connected to the repository will see the new class Foo, the new array in PERSISTENT_ROOT[:my_favorite_foos] and the single Foo object, contained in PERSISTENT_ROOT[:my_favorite_foos] array. All new MagLev VMs that connect to the repository will also see these objects.

Global variables

In a previous version of the MagLev persistence model, global variables, e.g, $hat, were automatically persisted. That is no longer the case. Global variables are global to the VM, but not shared between VMs. Global variables are allowed to have persistent objects as their value, however.

Create Your Own Persistence Model

If none of the preceding options meets your requirements, MagLev provides access to the full underlying GemStone/S API (transactions, locking, instance migrations, reduced conflict collections, etc.). See docs/smalltalk_ffi for more information. This API has proven to be robust and scalable enough for real world applications to run thousands of VMs, thousands of transactions per second and manage terabytes of data.

Persistence API Summary

Most of the persistence API is defined under the Maglev module. But there are a few persistence related methods that are on other classes as well. This section gathers together the whole lot. See the detailed comments in the API documentation.

module Maglev
  # MaglevException                  # defined in GlobalErrors.rb
  # NotPersistableException          # defined in GlobalErrors.rb
  # OutsideOfTransactionException    # defined in GlobalErrors.rb
  # CommitFailedException            # defined in GlobalErrors.rb

  # The root for persistent objects
  PERSISTENT_ROOT = Hash.new

  # Execute block with the thread in transient mode
  def transient(&block)
  end

  # Execute block with the thread in persistent mode
  def persistent(persistable_instances=true, &block)
  end

  # Returns true if the current thread is in persistent mode.
  def persistent?
    RubyContext.persistence_mode
  end

  # Returns true if the current thread is in transient mode.
  def transient?
    not RubyContext.persistence_mode
  end

  module_function :transient, :persistent, :transient?, :persistent?

  class CommitFailedException
  end

  # Commit the current state of the VM to the repository
  def commit_transaction
  end

  # Refresh all persistent state from the current state of the
  # repository.  Transient state is left untouched.
  def abort_transaction
  end

  module_function :commit_transaction, :abort_transaction
end

class Object
  # returns true if the receiver existed in GemStone at the time the
  # current transaction began.  Returns false otherwise.
  primitive_nobridge 'committed?', 'isCommitted'

  # Returns an Array of objects in the current session's temporary object
  # memory that reference the receiver.  The search continues until all
  # such objects have been found.  The result may contain both permenent
  # and temporary objects and may vary from run to run.  Does not abort
  # the current transaction.
  primitive_nobridge 'find_references_in_memory', 'findReferencesInMemory'
end

class Module
  # Sets maglev_persistable bit on the class/module to true
  def maglev_persistable
  end

  # Returns true if receiver is marked as persistable; false otherwise.
  primitive_nobridge 'maglev_persistable?', '_persistable'
end

class Class
  # Returns +true+ if instances of receiver are allowed to be
  # persisted. Returns +false+ otherwise.
  primitive_nobridge 'maglev_instances_persistable?', '_instancesPersistent'
end