Pencil Code Deployment Scripts
A set of ansible playbooks for deploying the pencilcode service on a GCE cluster.
Install ansible, then this creates a GCE instance that will be used as the ansible master:
From that master, the following sets up the whole cluster:
ansible-playbook provision-nfs.yml ansible-playbook provision-web.yml ansible-playbook web.yml
Right now these scripts are not particularly parameterized to allow you deploy your own pencilcode service (e.g., under a different DNS name). But these scripts document everything, so in principle things can be generalized.
The current architectural goal of PencilCode is to to scale up to a few tens of thousands of simultaneous users without code changes, while retaining flexiblity and simplicity.
We have made three key choices:
- Distribute traffic to N identical web servers with no user storage.
- Provision a single NFS server for all persistent user storage.
- Use domain-based hashing to proxy socket.io services between web servers.
Here is a general picture of what's going on.
(users in the world) | Google Compute Engine Load Balancer / | \ web1, web2, (etc) \ | / single nfs server / \ "/data" "/backup"
In contrast, NFS is fast. On adequately provisioned hardware, a single NFS CPU can provide than 100,000 operations per second (far in excess of our needs). The main bottleneck for NFS is not the server software, but the underlying storage: a 100GB SSD disk on Google Compute Engine provides about 3,000 operations per second (likely more than adequate for our needs); and that bandwidth can be increased linearly with size and cost.
There is one type of state on pencilcode that is not on disk: the connection state of our socket.io servers which provide realtime connections for student projects. Although every socket.io server is identical, it is important that all users trying to connect to the socket.io server for a specific subdomain be routed to the same server. To do this, we route socket.io traffic between web servers, proxying to the web server specific to the socket.io subdomain.
Web Server Services
All the application logic for Pencil Code runs on the web server instances. Running all application services other than storage on identical servers simplifies development and deployment.
Each web server runs a number of services behind an nginx server:
port 80: nginx | +- serves static content out of several directories | +- /load, /save, etc go to the local 'pencils' (node.js) service | +- /img, /goto, etc go to the local 'uwsgi' (python) service | +- /socket.io is proxied to a remote 'pencilsock' service. port 8811: pencilsock | +- serves socket.io server (node.js) port 8816: pencils | +- serves /load, /save, /edit, /run, /code, /home, /print (node.js) socket /run/uwsgi/app/img: uwsgi | +- serves uwsgi /img (python)
Each service is configured to start on boot and can be bounced with
sudo service restart [nginx|pencilsock|pencils|uwsgi]
Static File Configuration
Although our servers run a single version of our server code, they serve several different versions of the browser code under different domains. For example a "stable" version is served on "pencilcode.net"; a "staging" version is on "pencil.cc"; an "experimental" version is on "pencil.codes.
In addition, we serve several static content websites on pencilcode, such as the "gym", "blog", and "ref" subdomains.
Each fork of pencilcode or the static website is pulled from git. Our webservers have a 'source' user, and they pull subdirectories of the /home/source directory from several projects on github.
A current listing of
aimate # static website blog # static website gym # static website ref # static website pencilcode # the main website and the server code staging # staging for content experiment # experimental fork of content fish # a github webhook server for automatic deployment