Skip to content
Jeremy B edited this page Oct 7, 2022 · 12 revisions

Statically Served Galapagos Documentation

This page outlines how the static Galapagos generator works and outlines the procedure involved with serving tortoise statically from the Amazon CloudFront CDN. It shouldn't be too difficult to adapt the method of serving to fit other environments like an Apache/Nginx webserver, or another CDN. The code that this page documents is currently found on the static-site branch.

Architectural Overview

Galapagos-static depends on the scala.js tortoise artifact. It also depends on an SBT plugin for scraping Play applications to static files.

Play-Scrape is configured through the sbt settingKey scrapeRoutes, it looks like:

scrapeRoutes ++= Seq("/create-standalone", "/tortoise", "/model/list.json", "/model/statuses.json", "/netlogo-engine.js", "/netlogo-agentmodel.js", "/netlogoweb.js")

These are the routes that are scraped. The task that does the scrape is scrapePlay. After scrapePlay has been run, target/play-scrape will contain the contents of the scrape. This includes all assets that play would use, copied over to the appropriate path within the directory. This directory could be zipped or tarred for deployment, or just served from the directory as-is.

Note that the play routes assume that the application is located at the server root. It is possible to change this by setting the value of scrapeContext in your sbt build-file, which will scope all routes and assets under the provided path. Example:

scrapeContext := "/sub/"

The scraper assumes that the asset paths are relative. To use absoluteURL, you will need to provide the scraper with a domain name, like this:

scrapeAbsoluteURL := Some("netlogoweb.org")

At the moment, all absolute URLs are http.

Uploading Scraped Site

The scraped site can be uploaded to amazon via the scrapeUpload command. This is configured within Galapagos in a way that could support multiple environments in the future, although at the moment there is only a single environment (production). Please see the Amazon Cloudfront Upload Process (below) for a more in-depth look at what happens behind the scenes.

In most cases, Jenkins will upload the master (staging) and production branches automatically. If, however, you need to deploy from your local machine, you will need to setup an Amazon credentials file. This file is located at ~/.aws/credentials and should have the format:

[<credentialname>]
aws_access_key_id=<accesskeyid>
aws_secret_access_key=<secretaccesskey>

[<credential2name>]
#...

Our build is setup to look for the credential name nlw-admin on your local machine.

Jenkins is configured to upload when the build succeeds on the master (staging) and production branches. Jenkins Amazon credentials are specified as encrypted secrets within the Jenkins server. Note that BRANCH_NAME is set to the targeted branch for Jenkins PR builds, so we must check that we are not building from a PR. It would be possible to deploy from multiple branches, assuming alternate environments had been setup, just add the branch name and amazon environment IDs to the Galapagos build.sbt.

Testing Scrape or Upload Changes

There is a special scrape-test folder setup on experiments.netlogo.web. When you push to the scrape-test branch, Jenkins will push to that corresponding location. This is to allow for easier testing when there are changes to the scraping process, upload process, and/or Play-Scrape tool, without having to push and test on master and staging.netlogoweb.org. Once everything looks good on https://experiments.netlogoweb.org/scrape-test you can merge to master for the real push. Note that the scrapeUpload does not delete contents on a push, so if you have significant changes it might be worth wiping the scrape-test folder before doing the push. This way you'll be sure no stale pages from a prior push are giving a "false positive" that things are working.

Potential Enhancements to the Scraper

The benefits of these enhancements should be weighed with respect to the cost of implementation and the expected length of service of Static NetLogo Web. If we're only going to be using NetLogo Web for a month while we get our feet under us, none of these are probably worthwhile. If we expect that time to be more than a year, perhaps a great many become worthwhile.

  • Asset fingerprinting
  • Ability to pre-render pages with given query-params and/or bodies
  • Automatic zip/tar as part of sbt task
  • Automatically gzip generated pages for reduced download footprint

Operational Overview

At present, the trial site is hosted on Amazon's CloudFront CDN. The billing for the account is operated through SESP AWS Consolidated billing - any questions should go through Michael H. and he should be notified if we expect our bill to increase markedly in the future (for instance, spinning up a bunch of servers and/or databases for a production application server).

Amazon CloudFront Upload Process

This section will do a brief walkthrough of how the service is currently configured on Amazon.

A few key terms:

  • Cloudfront : Amazon's CDN
  • S3 : Amazon's on-demand web storage system
  • Route 53 : Amazon DNS

The current configuration links together these three services. Route 53 has an entry setup for netlogoweb.org and *.netlogoweb.org that point to the Cloudfront distribution. The Cloudfront distribution must be configured with these domains as CNAMEs in order to correctly serve content.† The Cloudfront distribution, when it receives a request, will either serve the request from it's cache, or look up the request from it's origin, which is an S3 bucket. It will also log each request received to another different S3 bucket. The S3 bucket serving as the origin must be configured to act as a public-facing website and all assets contained in the bucket must be publicly readable. Note that for assets without an extension, tortoise for instance, must have a Content-Type set explicitly - for HTML pages this should be text/html; encoding=UTF-8. This is handled automatically by the upload task, and should not need to be tweaked unless there is an error. Once new files have been uploaded, the Cloudfront distribution must be invalidated. This can be done for only a few files, or it can be done for all files at once. In the future, we may attempt to detect changed files and invalidate only those, but at the moment we invalidate everything. Now that all objects have been invalidated, Cloudfront will reload them from S3 on the next request, pulling in new changes. Note that this may lead to a several-minutes delay in having new assets/pages served, since the invalidation takes time to process.

† If CNAMEs need to be added/removed in the future, note that the simplest way to do this is to disable the Cloudfront distribution, alter the CNAMEs, and re-enable the Cloudfront distribution.

This has been a very cursory overview. For more information, see Amazon's documentation on CloudFront, S3, and Route 53.