Skip to content

Cloudname Codelab

stalehd edited this page Sep 11, 2012 · 1 revision

Cloudname Codelab

Cloudname is a system for resolving addresses (including ports) for processes in the cloud. Why is this useful? In the cloud machines die, processes are moved around to maximize utilization of machines, and therefore the mapping from process to the machine name/port has to be dynamic. This means that one process (job) can be moved from one machine to another machine with a different sets of ports (because the ports that were used previously might be already taken). We want the resolving to be done dynamically, transparently and automatically. We don't want to care about any port conflicts.

After doing this codelab you will know:

  • The life cycle of a coordinate
  • The concept of an endpoint
  • How to create/delete a coordinate from a command-line tool
  • How to claim a coordinate from application
  • How to resolve an endpoint from an application
  • How to list coordinates from command-line tool
  • What is ZooKeeper doing

Prerequisite

You need a machine to compile and run some code. You need to make sure that maven, java, and git are installed. Then

git clone git@github.com:Cloudname/cloudname.git
cd cloudname
mvn clean install

Running ZooKeeper

ZooKeeper is an open source infrastructure that helps us maintain the dynamic mapping of the name to machines. This is the core serving component in Cloudname (think of it as a special kind of database/file system), but usually hidden to the user. There are many instances that works together to ensure availability in each cell (datacenter). In this codelab you will run your local ZooKeeper. In real-life, each data center will have this already running so you don't need to care about it.

Make sure you are in the source directory, if you ls/dir you should see at least a folder with the name "cn", "log", "flags" etc. Then run the following command.

java -jar codelabs/target/SimpleZooKeeper.jar

You should see the message "Simple ZooKeeper running on port 5454".

It will use the port 5454 as default. If this port is taken, you can override the port number by adding --zkport %port% where %port% is a free port. Let this process run while doing the codelab for example in its own terminal window. Restarting the process gives you fresh start (all entries are deleted).

Creating a Coordinate

A coordinate typically points to a process (a job running on some machine). Before a server and a client can use a coordinate, it has to be manually registerred. We have made a tool for doing this. Try running

java -jar cn/target/ZkTool.jar  --help

You will se that some flags are required:

--coordinate <STRING> default: null  . . . . . .| The coordinate to work on.
*   --operation <ENUM> default: null options: [CREATE, DELETE, STATUS, LIST]| The operationFlag to do on coordinate.
--setup-file <STRING> default: null  . . . . . .| Path to file containing a list of coordinates to create (1 coordinate per line).
--zooKeeper <STRING> default: null  . . . . . . | A list of host:port for connecting to ZooKeeper.

*) The only required flag is operation.

We want to create a new coordinate. A coordinate has the following format:

%INSTANCE%.%SERVICE%.%USER%.%CELL%.

%CELL% is the name of the data center. A coordinate only lives in one data center. %USER% is either PROD, STAGING, or the user name of the developer. %SERVICE% is the name of the service (process). %INSTANCE% is a number. It should start from zero and upwards. E.g. a four way sharded service should have the instances 0, 1, 2, and 3.

Let's create a coordinate with one instance, service is hello and user name somebody (replace with your real user name when accessing a real data center, but for now we run our own ZooKeeper so we don't care). Please note that user name is not enforced, we trust the engineer for now. We assume the cell is aa.

java -jar cn/target/ZkTool.jar -operation CREATE --coordinate 0.hello.somebody.aa --zookeeper 127.0.0.1:5454
java -jar cn/target/ZkTool.jar -operation CREATE --coordinate 1.hello.somebody.aa --zookeeper 127.0.0.1:5454

Let's see what happened by listing the coordinates:

java -jar cn/target/ZkTool.jar -operation LIST --zookeeper 127.0.0.1:5454

You will see the entry:

0.hello.somebody.aa
1.hello.somebody.aa

We can now try to see if there are any endpoints and if status is set:

java -jar cn/target/ZkTool.jar -operation STATUS --coordinate 0.hello.somebody.aa --zookeeper 127.0.0.1:5454

You will probably see something like this:

Problems loading status, is service running? Error:
org.apache.zookeeper.KeeperException$NoNodeException: KeeperErrorCode = NoNode for /cn/aa/somebody/hello/0/status

This is because there is no status node yet. A status node only exists if a process is running and registered in cloudname.

Deleting an endpoint is straightforward (use delete operation with coordinate). There is no ACL checking for now, anybody can screw up anything.

Using Cloudname

Let's build a service from scratch that uses a port. It is a simple web server. The idea is that many such servers can be run and each has an instance number. This is just a silly example, but in real life you often want several instances of a service.

package org.cloudname.codelabs;
import com.sun.net.httpserver.HttpExchange;
import com.sun.net.httpserver.HttpHandler;
import com.sun.net.httpserver.HttpServer;
import org.cloudname.testtools.Net;
import java.io.IOException;
import java.io.InputStream;
import java.io.OutputStream;
import java.net.InetSocketAddress;
/**
 * A class that has a web server responding to /info. It has a instance number that it publishes.
 * @author dybdahl
 */
public class ServerExample {
    private int port;
    private int instance;
    /**
     * Constructor
     * @param instance number for this instance.
     */
    ServerExample(int instance) {
        this.instance = instance;
    }
    /**
     * Handler for HTTP requests on /info.
     */
    class InfoHandler implements HttpHandler {
        public void handle(HttpExchange t) throws IOException {
            InputStream is = t.getRequestBody();
            String response = String.format("Port %s, instance %s", Integer.toString(port), Integer.toString(instance));
            t.sendResponseHeaders(200, response.length());
            OutputStream os = t.getResponseBody();
            os.write(response.getBytes());
            os.close();
        }
    }
    /**
     * Method to set-up and start the web server.
     * @throws IOException
     */
    public  void runServer() throws IOException {
        port = Net.getFreePort();
        System.err.println("I think that port " + Integer.toString(port) + " is free and will use it.");
        HttpServer server = HttpServer.create(new InetSocketAddress(port), 41 /*backlog*/);
        server.createContext("/info", new InfoHandler());
        server.setExecutor(null);
        server.start();
    }
    /**
     * @param args The first and only argument is the instance number.
     * @throws IOException
     */
    public static void main(String[] args) throws IOException {
        ServerExample server = new ServerExample(Integer.parseInt(args[0]));
        server.runServer();
    }
}

You can run this from the code you downloaded using:

java -jar codelabs/target/ServerExample.jar 0

You can open a web browser and go to http://127.0.0.1:%PORT%/info where %PORT% is the port the application reported.

We could have made the code so you would need to pass the port number to the application and in this way we know which port this server is responding to. If we wanted 100 instances running we would then have to make sure that we know the mapping from instance to port. The clients would need to know this mapping as well. If each process has 3 ports, it would end up with a lot of ports to configure and keep track of. The problem arises when you try to run several processes on the same server or when processes are moved to a different machine. Cloudname solves this port allocation problem by using coordinates to address a service. The server has to publish the port it is using. Let's add the necessary code.

Cloudname cloudName = new ZkCloudname.Builder().setConnectString("127.0.0.1:5454").build().connect();

Coordinate coordinate = Coordinate.parse(String.format("%s.hello.somebody.aa", instance));

ServiceHandle handle = cloudName.claim(coordinate);
handle.waitForCoordinateOkSeconds(1000);

Endpoint endpoint = new Endpoint(coordinate, "info", "127.0.0.1", port, "http", null);
handle.putEndpoint(endpoint);
handle.setStatus(new ServiceStatus(ServiceState.RUNNING, "I am alive and kicking."));

What does this code do? The first line connects to your local ZooKeeper instance. Since this is a codelab we hardcode this. In production you simply write .autoConnect() instead of .setConnectString(..). The second line creates the coordinate based on the instance information. The third line claims the coordinate. Then we register the endpoint. A single service (process) usually has a single coordinate, but can have many endpoints (one for each port).

Run an instance of this server (and keep it running)

java -jar codelabs/target/ServerExampleCloudname.jar 0
Now check the status of the server:

java -jar cn/target/ZkTool.jar -operation STATUS --coordinate 0.hello.somebody.aa  --zookeeper 127.0.0.1:5454

It is repported to be UNASSIGNED. This is because we did not make the server set the status of the endpoint. Add the following line to the code.

handle.setStatus(new ServiceStatus(ServiceState.RUNNING, "I am alive and kicking."));

If you ask for status again, you will see that it is alive.

Restarting the Server

If you kill the web-server (e.g. ctrl-c) and restart it very quickly, you will see the following error:

INFO: Claimed fail, node already exists and probably not by us, path: /cn/aa/somebody/hello/0/status

What happened? Cloudname keeps track of the servers that are running and only resolves servers that are alive. However, even though a server does not reply in a second or two we think that it might be a live. If you wait a few seconds it is possible to stat it again.

Restarting ZooKeeper

The servers will fail when you restart ZooKeeper since the coordinate no longer exists. Recreate the coordinate and the servers are up and running again (restart of servers not required).

Resolving from application

Now let's write some code that connects to the web server.

public class ClientExample {
    /**
     * Args has one parameter that is instance number.
     * @param args
     */
    public static void main(String[] args) throws IOException, CloudnameException {
        Cloudname cloudName = new ZkCloudname.Builder().setConnectString("127.0.0.1:5454").build().connect();
        Resolver resolver = cloudName.getResolver();
        List<Endpoint> endpoints = resolver.resolve(String.format("info.%s.hello.somebody.aa", args[0]));
        if (endpoints.size() != 1) {
            System.err.println("Did not resolve endpoint correctly, something went wrong.");
            return;
        }
        String url = "http://" + endpoints.get(0).getHost() + ":" + endpoints.get(0).getPort() + "/info";
        URL u = new URL(url);
        InputStream is = u.openStream();
        DataInputStream dis = new DataInputStream(new BufferedInputStream(is));
        byte[] infoString = new byte[1000];
        dis.read(infoString);
        String resultString = new String(infoString);
        System.out.println("Got string from server: " + resultString);
        is.close();
    }
}

You can try to run this client:

 java -jar codelabs/target/ClientExample.jar 0

This should print the string that it retrieves from the server.

Got string from server: Port 52331, instance 0

Now, try to run the following command:

java -jar codelabs/target/ClientExample.jar any

In this case Cloudname tries to find an instance that is running. Try starting a few instances, kill any of them, wait a few seconds and you will see that you get connection to a running instance. Sweet?

"any" is an example of a resolving strategy. Another is "all" which gives a list of all instances. You can also make your own resolving strategy, but this is not part of this codelab.

Haakon / dybdahl@comoyo.com