Skip to content

Running your own copy

Robyn Speer edited this page Sep 14, 2020 · 29 revisions

Running ConceptNet on Amazon Web Services

Probably the most reliable way to run a copy of ConceptNet is to use the machine image that runs it on Amazon Web Services.

It's also possible to run ConceptNet on a computer that's not owned by Amazon, of course; see Build process for the instructions for setting it up from scratch. However, using an AMI lets us ensure the system is in the right configuration and skip a bunch of steps.

Provisioning the machine

  • Go to https://aws.amazon.com/ec2/, sign up if necessary, and log in if necessary.
  • The ConceptNet image is hosted in the "US East (N. Virginia)" region, also known as us-east-1. You may be in a different region by default. The black bar in the upper right should look like this, and if it doesn't, you should click the drop-down and choose "US East (N. Virginia)":

The upper right portion of the AWS console, showing the region selector

  • Click "Launch Instance".
  • Choose "Community AMIs", search for "conceptnet", and select "ConceptNet 5.8.1" (ami-0f0c8ef6acb420f69).
  • Choose a machine type to launch.
    • You can run the API on a t3a.medium or t2.medium or similar (currently less than 4 cents per hour).
    • We run the real server on an m4.large so that it has the capacity to respond to a reasonable number of API requests.
    • If you want to be able to modify and rebuild the data, you'll need an r4.xlarge or better, so that you have access to at least 30 GB of RAM.
  • Proceed to "Configure Instance Details". Set "Auto-assign Public IP" to "Enable".
  • Proceed to "Add Storage". The defaults should be fine.
  • Proceed to "Configure Security Group". Add a rule allowing HTTP. The default IP range of 0.0.0.0/0, ::/0 (all addresses) is probably what you want.
  • "Review and Launch". Download the security key that you'll need to log into the system, if you don't already have it.
  • On your EC2 instances list, take note of the public IP of your new machine. Let's call it YOUR.IP.ADDR.
  • Connect to the machine over ssh, using the security key you downloaded, by following Amazon's instructions.

What's on the system

The user that runs ConceptNet is named conceptnet. It has superuser privileges, and its home directory contains the ConceptNet code and data.

As this user, you can connect to ConceptNet's PostgreSQL database by running psql conceptnet5, or get a Python prompt where the conceptnet5 package is installed by running ipython.

Updating the system

You should make sure that the machine is up-to-date, both with potential security fixes to Ubuntu packages, and with bug fixes to the ConceptNet code. To update Ubuntu packages, run:

sudo apt update
sudo apt dist-upgrade

In the /home/conceptnet/conceptnet5 directory, run git pull to update the ConceptNet code.

ConceptNet is running as a systemd service. When you make any changes, you should restart the service to run the new code:

sudo systemctl restart conceptnet

If something goes wrong, you'll want to look at the logs:

journalctl -u conceptnet

Accessing the server

The ConceptNet API is now running on your server. While logged into the server, you can run:

curl http://localhost/

For an actually interesting response:

curl http://localhost/c/en/example

To access the API externally, you can go to http://YOUR.IP.ADDR/ (the IP address that you took note of earlier) from another machine. This should also get you the ConceptNet API.

The server is also serving the Web frontend, which it will use if your hostname is conceptnet.io (which it isn't, because that's ours) or www.conceptnet.localhost. This hostname is configured on the machine as an alias for localhost, so you can test the Web frontend with:

curl http://www.conceptnet.localhost

Warming up the disk

Here's something frustrating we learned about Amazon AMIs: when you start the machine from an image, its disk is in some sort of cold storage, and every region of the disk that you access for the first time has to be "warmed up". Unfortunately, this tends to happen in the middle of a ConceptNet API query, and you end up waiting for so much of the disk to warm up that the request times out.

Here's a command to warm up the entire disk, by accessing every byte of data on the disk:

sudo dd if=/dev/xvda of=/dev/null bs=16M