Skip to content
Fetching contributors…
Cannot retrieve contributors at this time
227 lines (151 sloc) 8.01 KB

Notes on setting up a Multi-Machine MagLev environment

Overview

Even though most MagLev development is done with a single MagLev VM talking to a MagLev repository on the same machine, MagLev is a distributed system and allows MagLev VMs from multiple machines to connect to the same repository. This note describes how to configure that scenario.

Synopsis

Setup the Repository machine and start MagLev

  1. Ensure netldi is properly configured

    On both the stone machine and the client VM machines, make sure there is a proper entry in /etc/services (or the YP services database, if you use that) on all machines, and that the entry is the same on all machines. The default entry is:

    gs64ldi         50378/tcp        # GemStone netldi

    but you can name your netldi something else, e.g.,

    mynetldi        10000/tcp        # My MagLev netldi

    The important thing is that whichever netldi you name and ports you choose, that entry is the same on all machines, and should be available via the services database on all machines that will participate in the cluster.

  2. Start netldi on stone machine

    Start the netldi daemon on the stone machine, to allow MagLev VMs on remote machines to connect to this Repository. The easiest way to start a netldi with the proper parameters and environment for MagLev is to use the rake script:

    $ rake netldi:start

    See the detailed discussion near the end for details if this doesn't work for you.

  3. Start the stone

    The stone must be started with the gs64ldi environment variable set correctly. If you chose the default netldi name, you can simply start the stone in the normal manner:

    $ cd $MAGLEV_HOME
    $ rake maglev:start

    If you chose a different netldi name, you must pass that name to the maglev:start rake script:

    $ cd $MAGLEV_HOME
    $ rake maglev:start[mynetldi]

Setup remote MagLev VM machine and start MagLev VM

  1. Setup environment on remote VM machine

    Ensure that MAGLEV_HOME is set and correct and that the netldi entry in /etc/services is the same as it is on the Repository machine. If you are using a non-standard netldi name, then:

    $ export MAGLEV_HOME=/where/ever/maglev/home/is
    $ export gs64ldi=mynetldi                # same name as on stone machine

    Make sure maglev-ruby is on your path:

    $ PATH=$PATH:$MAGLEV_HOME/bin
  2. Setup login credentials

    Make sure there is a ~/.netrc entry for the user / machine / password of the stone machine. E.g., if the stone will be running on maglev.foo.com, and the user is “fred” with password “derf”, the following ~fred/.netrc file should work:

    machine maglev.foo.com fred derf

    Make sure ~fred/.netrc has correct permissions (only the user can read it):

    $ chmod 0400 ~/.netrc
  3. Start netldi on the remote VM machine:

    If you use standard settings, then simply start netldi as on the stone machine:

    $ rake netldi:start
  4. Run MagLev with proper parameters:

    You do NOT need to start a stone on the remote machine. To run a ruby script with MagLev connecting to a remote machine, you can do:

    $ maglev-ruby --stonehost maglev.foo.com hello.rb

    The --stonehost tells MagLev to connect to the stone running on maglev.foo.com.

Issues / Problems

Quick checklist

If you run into problems getting a remote MagLev to run, ensure the following items:

  1. All machines have the same port specified for netldi in the services database.

  2. netldi processes are running on all machines involved.

netldi issues

Getting the two netldi processes talking and configured correctly is the most sensitive part of this process. Several things all need coordination.

Ensure GEMSTONE_GLOBAL_DIR is set

The GemStone/S system relies on a directory, $GEMSTONE_GLOBAL_DIR, to coordinate multiple processes. Normally, the maglev-ruby and Rake scripts handle this for you. But if you are having trouble connecting, you might want to start netldi processes on both machines with the proper configuration as described below.

Assuming your $MAGLEV_HOME environment variable is correct:

$ unset GEMSTONE_NRS_ALL                  # precautionary
$ export GEMSTONE=$MAGLEV_HOME/gemstone
$ export GEMSTONE_GLOBAL_DIR=$MAGLEV_HOME

If you are using a non-standard netldi name/port, then:

$ export gs64ldi=mynetldi

After setting the environment as above, you can run the startnetldi command (on both stone and remote nodes):

$ $GEMSTONE/bin/startnetldi -g -a$USER -d ${gs64ldi:-gs64ldi}

$USER is the login name of the user running maglev. There needs to be a proper ~$USER/.netrc file. The netldi on this machine will then start any processes as $USER. This allows different users to run the stone on the server and a VM on the client.

The meaning of the options is:

-g        Guest mode: Allow clients of any user to login to GemStone (may
          not be necessary in all cases)
-a $USER  Run netldi as user $USER
-d        Print debug info
${gs64ldi:-gs64ldi}  The name of the netldi (must match entry in /etc/services)

For further details on connecting remote machines, consult the GS64 System Administration Guide, especially the section “How to Set Up a Remote Session”.

Issues with Shared Page Cache Size

The size of the remote shared page cache (the one that runs on the node you issued the “maglev-ruby” command from), may have to be changed. By default, it will start off with the size of page cache configured for the stone machine, but that may be too big for the remote machine.

You can set a gem.conf file in the directory you run maglev-ruby from to configure the parameter

# Sample gem.conf file.  This sets page cache very small
SHR_PAGE_CACHE_SIZE_KB = 20000;

Then try running the maglev script:

$ maglev-ruby --stonehost maglev.foo.com hello.rb

NOTE: The stone caches the connection to the remote shared page cache for some period of time (up to a minute in many cases). So, if you are (a) having problems getting a good remote page cache, and (b) it looks like one is running on the remote machine, then kill it. To see if you have a remote page cache running, you can use rake from $MAGLEV_HOME:

$ cd $MAGLEV_HOME
$ rake
Status   Version    Owner    Pid   Port   Started     Type       Name
------- --------- --------- ----- ----- ------------ ------      ----
  OK    3.1.0.1   pbm       17461 50378 Oct 29 15:12 Netldi      gs64ldi
  OK    3.1.0.1   pbm       17679 49215 Oct 29 15:46 cache       maglev~ce60a142787a48e6

The second line in the output is the cache (notice there is no maglev stone running, since this is the remote machine!). To kill it:

$ kill 17679

And then confirm it is gone:

$ rake
Status   Version    Owner    Pid   Port   Started     Type       Name
------- --------- --------- ----- ----- ------------ ------      ----
  OK    3.1.0.1   pbm       17461 50378 Oct 29 15:12 Netldi      gs64ldi

Now you can re-configure and try again. In normal mode, it is a good thing the shared page cache sticks around. It avoids the cost of setting it up on subsequent runs of a MagLev VM. So in the normal case, you want to have it running for a while.

Issues with $GEMSTONE_NRS_ALL

If you are running on a heterogeneous setup (e.g., Linux stone machine and OSX VM machine), you may run into problems with some settings for $GEMSTONE_NRS_ALL. E.g., if you have:

export GEMSTONE_NRS_ALL=#dir:/Users/pmclain/GemStone/logs

you may run into problems, since /Users is correct for OSX, but on Linux, your home directory may be /home/pmcalin. If the setting for GEMSTONE_NRS_ALL is different, MagLev may complain and refuse to connect.

  • Need a nicer way than ~/.netrc to login

  • Can we synchronize the netldi ports by explicit parameters on each side? Or is the only way to rely on /etc/services entries to be synchronized?

Jump to Line
Something went wrong with that request. Please try again.