A comet server that will (hopefully) be able to handle 1 million (eg mega) connections.
C Ruby Shell
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Failed to load latest commit information.
testing
.gitignore
config.h
khash.h
klib docs.txt
klist.h
makefile
megacomet.c
megahash.html
megamanager.c
megastart.c
readme
start
status
stop
testfauxapp.rb
testfauxworker.rb

readme

MegaComet
---------

"Mega- (symbol M) is an prefix in the metric system denoting a factor of million"

This is my (ambitious!) attempt to create a comet server that can handle 1 million concurrent connections on an Amazon EC2 'large' server.

It requires libev to compile. So far, only being developed on my Mac. When it gets good enough, i'll make it work on an EC2 box.

https://github.com/chrishulbert/MegaComet
http://software.schmorp.de/pkg/libev.html

Architecture
------------

My ideas are:
* Simple = scalable ("no code is faster than no code")
* Scale through processes, not threads (like node.js - although i have to admit this is largely for simplicity's sake too)
* Bend the rules of HTTP a bit, for the sake of speed (eg no 'date' header returned)
* Best-effort delivery of messages. Eg this is to be appropriate for real time chat and channel changing, not appropriate for financial transactions. Engineering is all about trade-offs.
* etc... (you get the picture: hacky solutions over correctness)

In order to do this, i picture the following on an EC2 'large' server:

* 8 instances of MegaComet workers running, each listening on a different port, each of which has ~100k connections active.
* TO TEST: At this stage, 8 is a guesstimate for the 'best' number of workers. Maybe make it configurable? Would 128 be best, so we could keep each client below the C10K barrier?
* One instance of MegaComet manager running. This is responsible for remembering which client is connected to which worker.
* Clients will randomly long-poll (JSONP) directly to one of the 8 workers. Upon connecting, the worker tells the manager it has that client. Upon non-normal (only) port closing, the worker tells the manager that client is gone.
* Manager queues messages. Whenever a worker says a client has connected, it sends it any messages queued. Whenever a message arrives, it checks to see if that client is connected already, and if so sends it straight through. If the client isn't connected, the message is queued for a configurable time (default 1 minute).
* Manager > Client to communicate using simple fixed-width-fields binary protocol
* Your app > Manager to communicate using JSON
* Manager not necessarily written in C ? Something simple eg C# or Java or Python or Ruby ?
* TO TEST: Will the single server become a bottleneck? What if we allowed >1 ?

* TODO idea: rather than have the megamanager keep track of which client is connected to which worker, should we have a reproducible hash of the client id which always determines the worker? Is this necessary?

Manager protocol
----------------

Workers are expected to connect to the manager and specify their worker number:
1 n
	Where N is the worker number (single byte)
When the app wants to send a message to the manager to send to the appropriate worker:
2 c m
	Where c is the client identifier. This is a null terminated ascii string (or utf8).
	Cannot contain a '.' for simplicity of parsing the HTTP messages.
	TODO will null termination work with utf8?
	m is the message, as a null terminated ascii/utf8 string.