Running your own copy

Rob Speer edited this page Apr 19, 2018 · 27 revisions

Running ConceptNet on Amazon Web Services

Probably the most reliable way to run a copy of ConceptNet is to use the machine image that runs it on Amazon Web Services.

It's also possible to run ConceptNet on a computer that's not owned by Amazon, of course; see Build process for the instructions for setting it up from scratch. However, using an AMI lets us ensure the system is in the right configuration and skip a bunch of steps.

Provisioning the machine

  • Go to https://aws.amazon.com/ec2/, sign up if necessary, and log in if necessary.
  • The ConceptNet image is hosted in the "US East (N. Virginia)" region, also known as us-east-1. You may be in a different region by default. The black bar in the upper right should look like this, and if it doesn't, you should click the drop-down and choose "US East (N. Virginia)":

The upper right portion of the AWS console, showing the region selector

  • Click "Launch Instance".
  • Choose "Community AMIs", search for "conceptnet", and select "ConceptNet 5.6.0 image 2" (ami-080fad77).
  • Choose a machine type to launch.
    • You can run the API on a t2.medium or better (currently less than 5 cents per hour).
    • We run the real server on an m4.large so that it has the capacity to respond to a reasonable number of API requests.
    • If you want to be able to modify and rebuild the data, you'll need an r4.xlarge or better, so that you have access to at least 30 GB of RAM.
  • Proceed to "Configure Instance Details". Set "Auto-assign Public IP" to "Enable".
  • Proceed to "Add Storage". The defaults should be fine.
  • Proceed to "Configure Security Group". Add a rule allowing HTTP. The default IP range of 0.0.0.0/0, ::/0 (all addresses) is probably what you want.
  • "Review and Launch". Download the security key that you'll need to log into the system.
  • On your EC2 instances list, take note of the public IP of your new machine. Let's call it YOUR.IP.ADDR.
  • Connect to the machine over ssh, using the security key you downloaded, by following Amazon's instructions.

What's on the system

The system has two users you need to care about:

  • ubuntu is the user you connect to using your SSH key. This user has the ability to run super-user commands with sudo. Its home directory contains the conceptnet-puppet repository, containing the scripts for configuring a machine to run ConceptNet.
  • conceptnet is the user that runs the ConceptNet code. It does not have super-user privileges. Its home directory contains the ConceptNet code and data.

To become the conceptnet user, run sudo su conceptnet. For example, as this user, you can connect to ConceptNet's PostgreSQL database by running psql conceptnet5, or get a Python prompt where the conceptnet5 package is installed by running ipython.

Updating the system

You should make sure that the machine is up-to-date, both with potential security fixes to Ubuntu packages, and with bug fixes to the ConceptNet code. To update Ubuntu packages, run:

sudo apt update
sudo apt dist-upgrade

You should also re-run the Puppet setup script, which will make sure that the ConceptNet code is up to date. (If you don't run this, you may encounter server errors due to running an outdated version of the code.)

cd conceptnet-puppet
./puppet-apply.sh

ConceptNet is running as a systemd service. Something has probably changed, so you should restart the service to run the new code:

sudo systemctl restart conceptnet

If something goes wrong, you'll want to look at the logs:

journalctl -u conceptnet

Accessing the server

The ConceptNet API is now running on your server. While logged into the server, you can run:

curl http://localhost/

For an actually interesting response:

curl http://localhost/c/en/example

To access the API externally, you can go to http://YOUR.IP.ADDR/ (the IP address that you took note of earlier) from another machine. This should also get you the ConceptNet API.

The server is also serving the Web frontend, which it will use if your hostname is conceptnet.io (which it isn't, because that's ours) or www.conceptnet.localhost. This hostname is configured on the machine as an alias for localhost, so you can test the Web frontend with:

curl http://www.conceptnet.localhost

Warming up the disk

Here's something frustrating we learned about Amazon AMIs: when you start the machine from an image, its disk is in some sort of cold storage, and every region of the disk that you access for the first time has to be "warmed up". Unfortunately, this tends to happen in the middle of a ConceptNet API query, and you end up waiting for so much of the disk to warm up that the request times out.

Here's a straightforward command to warm up the entire disk, by accessing every byte of data on the disk:

sudo dd if=/dev/xvda of=/dev/null bs=16M
You can’t perform that action at this time.
You signed in with another tab or window. Reload to refresh your session. You signed out in another tab or window. Reload to refresh your session.
Press h to open a hovercard with more details.