This document covers the setup of a server for the services on the Arctic Sensor Web expansion project.
Arctic Sensor Web is a part of the Arctic Connect platform of research services.
The instance has the following services:
- PostgreSQL 10, for GOST database
- GOST, for OGC SensorThings API
- Nginx, for proxy to GOST and front-end
- Arctic Sensor Web Community Front-end UI
The server is provisioned on Cybera's Rapid Access Cloud, which runs OpenStack.
Server Name: blackfoot
Flavor: m1.small
Boot Source: Ubuntu 18.04 Image
VCPUs: 2
Root Disk: 20 GB
Ephemeral Disk: 0
Total Disk: 20 GB
RAM: 2048 MB
After the initial instance is created, login and update the packages using apt. Then use OpenStack to create an instance snapshot image as a backup.
$ sudo apt update
$ sudo apt upgrade -y
$ sudo apt dist-upgrade
$ sudo apt autoremove
$ sudo rebootAn image snapshot can be made from the Rapid Access Cloud dashboard.
Using the PostgreSQL Apt repository.
$ echo "deb http://apt.postgresql.org/pub/repos/apt/ bionic-pgdg main" | sudo tee /etc/apt/sources.list.d/pgdg.list
$ wget --quiet -O - https://www.postgresql.org/media/keys/ACCC4CF8.asc | sudo apt-key add -
$ sudo apt update
$ sudo apt install postgresql-10 postgresql-client-10 postgresql-10-postgis-2.4 postgresql-10-postgis-2.4-scripts postgis
$ pg_lsclusters
Ver Cluster Port Status Owner Data directory Log file
10 main 5432 online postgres /var/lib/postgresql/10/main /var/log/postgresql/postgresql-10-main.logPostgreSQL should now be running on port 5432 and bound to localhost, and have a socket at /var/run/postgresql/10-main.pid.
We will download the v0.5 release.
$ wget https://github.com/gost/server/releases/download/0.5/gost_ubuntu_x64.zip
$ sudo apt install unzip
$ unzip gost_ubuntu_x64.zip -d gostAnd download the database initialization scripts too:
$ cd ~/gost
$ git clone https://github.com/gost/gost-db.gitNow create a postgres role, database, and the GOST schema:
$ sudo -u postgres psql postgres
postgres=# create role "gost" with login;
postgres=# create database "gost" with owner "gost";
postgres=# \c gost
gost=# \i /home/ubuntu/gost/gost-db/gost_init_db.sql
gost=# alter schema "v1" owner to "gost";
gost=# grant all on database gost to gost;
gost=# grant all privileges on all tables in schema v1 to gost;
gost=# grant all privileges on all sequences in schema v1 to gost;
gost=# \qAnd update Postgres to allow the default ubuntu user access to the GOST database.
$ echo "gost ubuntu gost" | sudo tee -a /etc/postgresql/10/main/pg_ident.conf
$ echo "local gost gost peer map=gost" | sudo tee -a /etc/postgresql/10/main/pg_hba.conf
$ sudo sed -E -i 's/^(local +all +all +peer)$/local gost gost peer map=gost\n\1/' /etc/postgresql/10/main/pg_hba.conf
$ sudo service postgresql reload
$ psql -U gost gost -c '\dt+ v1.*'That last command should list all the tables in the v1 schema with no error.
Next update the GOST configuration with the contents of contrib/gost/config.yaml, and start up the server.
$ cd ~/gost/linux64
$ chmod +x gost
$ ./gostIn a new terminal or tmux window, use curl to verify the server is running:
$ curl localhost:8080/v1.0/Things
{
"value": []
}The response should be an empty collection of Thing entities.
Next we install a Systemd unit file to have GOST automatically start with the server. Copy the contents of contrib/systemd/gost.service to /etc/systemd/system/gost.service on the system, then update Systemd:
$ sudo systemctl daemon-reload
$ sudo systemctl enable gost
$ sudo systemctl start gost
$ sudo systemctl status gost
● gost.service - GOST (Go SensorThings) API service
Loaded: loaded (/etc/systemd/system/gost.service; enabled; vendor preset: enabled)
Active: active (running) since Sat 2018-05-12 00:20:38 UTC; 4min 25s ago
Main PID: 6943 (gost)
Tasks: 7 (limit: 2362)
CGroup: /system.slice/gost.service
└─6943 /home/ubuntu/gost/linux64/gost -config /home/ubuntu/gost/linux64/config.yaml
May 12 00:20:38 blackfoot systemd[1]: Started GOST (Go SensorThings) API service.
May 12 00:20:38 blackfoot gost[6943]: 2018/05/12 00:20:38 Starting GOST....
May 12 00:20:38 blackfoot gost[6943]: 2018/05/12 00:20:38 Showing debug logs
May 12 00:20:38 blackfoot gost[6943]: 2018/05/12 00:20:38 Creating database connection, host: "/var/run/postgresql/", po
May 12 00:20:38 blackfoot gost[6943]: 2018/05/12 00:20:38 Connected to database
May 12 00:20:38 blackfoot gost[6943]: 2018/05/12 00:20:38 Started GOST HTTP Server on localhost:8080Nginx will act as a reverse-proxy to GOST, allowing only GET/HEAD/OPTIONS requests and blocking other HTTP requests from the internet. It will also serve the front-end UI HTML site.
$ sudo apt install nginx-full
$ curl -I localhost:80
HTTP/1.1 200 OK
Server: nginx/1.14.0 (Ubuntu)
Date: Sat, 12 May 2018 00:30:06 GMT
Content-Type: text/html
Content-Length: 612
Last-Modified: Sat, 12 May 2018 00:29:02 GMT
Connection: keep-alive
ETag: "5af6354e-264"
Accept-Ranges: bytesThis shows that nginx is running. Next is setting up a virtual host for controlling access to GOST. Copy the contents of contrib/nginx/gost.conf to /etc/nginx/sites-available/gost. Then link the configuration to the enabled sites.
$ sudo ln -s /etc/nginx/sites-available/gost /etc/nginx/sites-enabled/gost
$ sudo systemctl reload nginx
$ curl localhost:6443/v1.0/Things
{
"value": []
}We are using port 6443 as an upstream server will have SSL enabled for this port. This server does not have to worry about SSL at all.
This service will ingest data from the data providers and insert it into GOST. First clone the repository to the server:
$ git clone <REPO URL>Then install Ruby 2.3 or newer. Ubuntu 18.04 includes Ruby 2.5.
$ sudo apt install ruby ruby-dev
$ mkdir ~/.ruby
$ echo "export GEM_HOME=~/.ruby" >> ~/.bashrc
$ echo 'export PATH="$PATH:~/.ruby/bin"' >> ~/.bashrc
$ source ~/.bashrcInstall bundler and the gem pre-requisites:
$ cd ~/data-transloader
$ gem install bundler
$ sudo apt install build-essential patch zlib1g-dev liblzma-dev
$ bundle installYou should be able to get the help message for the tool now:
$ ruby transload --help
Usage: transload <get|put> <metadata|observations> <arguments>
--source SOURCE Data source; allowed: 'environment_canada'
--station STATION Station identifier
--cache CACHE Path for filesystem storage cache
--date DATE ISO8601 date for 'put observations'. Also supports 'latest'
--help Print this help messageNext we will set up the station metadata for our stations.
$ mkdir ~/data
$ for station in YYQ WCA WAY YEV YZF XCM XFB XRB XZC MFX WUM; do
echo "Getting metadata for $station"
ruby transload get metadata --source environment_canada --station $station --cache ~/data
doneWe are going to fool the transloader into going directly to the local nginx instance for uploads, instead of to the web and the upstream server that has HTTPs enabled. This allows us to access GOST for uploads, as the requests are coming from the same server.
$ echo "127.0.0.1 sensors.arcticconnect.ca" | sudo tee -a /etc/hostsAnd then convert and upload the metadata into GOST.
$ for station in YYQ WCA WAY YEV YZF XCM XFB XRB XZC MFX WUM; do
echo "Uploading metadata for $station"
ruby transload put metadata --source environment_canada --station $station --cache ~/data --destination http://localhost:8080/v1.0/
doneIf you check GOST, you can see the uploaded items: https://sensors.arcticconnect.ca:6443/v1.0/Datastreams.
Now we can download observations.
$ for station in YYQ WCA WAY YEV YZF XCM XFB XRB XZC MFX WUM; do
echo "Downloading observations for $station"
ruby transload get observations --source environment_canada --station $station --cache ~/data
doneAnd then upload the observations.
$ for station in YYQ WCA WAY YEV YZF XCM XFB XRB XZC MFX WUM; do
echo "Uploading observations for $station"
ruby transload put observations --source environment_canada --station $station --cache ~/data --date latest --destination http://localhost:8080/v1.0/
doneThe observations can then be viewed online: https://sensors.arcticconnect.ca:6443/v1.0/Observations.
As observations from Environment Canada are published every hour1, we will need to automatically download them every hour to get the latest results. Ideally we would download the observations immediately after they have been updated by Environment Canada, however the time that the observations are uploaded varies, as can be seen by the "Last Modified" times in the observations directory listing.
There are two main ways to get the observations while they are fresh. First is to use the Data Mart AMQP service to be automatically notified when an observation is published. This is more complicated to code and the Data Transloader does not support AMQP. The second option is to issue GET requests more frequently than the update interval, and use HTTP headers to avoid downloading data we already have. We will use the second option.
We will use cron to run a script every 20 minutes checking for new data. Supporting conditional GET headers will require an update to the Data Transloader, so until then we will issue simple GET requests.
Start by creating a file with a list of the station ids. Observations will be downloaded from these stations only. As this will be read by a shell script, it will be formatted with one station id per line. See contrib/auto-download/stations.txt for a sample. Save this file to ~/auto-download.
Next add the automatic download script contrib/auto-download/auto-transload.sh to the same directory, and make it executable:
$ cd ~/auto-download
$ chmod +x auto-transload.shNow we can edit the crontab to run the script automatically. Open the crontab editor, and if prompted choose your preferred command-line editor. (If you are unsure, then pick nano as it is easiest.)
$ crontab -eAt the end of the file, add a new line:
SHELL=/bin/bash
GEM_HOME="/home/ubuntu/.ruby"
GEM_PATH="/home/ubuntu/.ruby"
5,25,45 * * * * $HOME/auto-download/auto-transload.sh $HOME/auto-download/stations.txt
This runs the automatic download script every 5 minutes, 25 minutes, and 45 minutes past the hour, every hour.
This should now automatically run the script, and we can see the results in GOST on the Observations page.
- Some observations are published every minute, but only a handful of stations support this.
The logs generated by the transloader will quickly consume quite a bit of disk space. I recommend keeping them around for debugging any potential issues. To keep them around without taking up too much space or being too large to read we will use the logrotate tool to automatically segment the log files and compress them with gzip.
The file contrib/logrotate.d/auto-transload contains configuration to rotate the log files we set up in the previous step. Install this file to /etc/logrotate.d/auto-transload; you will need admin access to do this.
TODO
Storing cached sensor data and metadata will gradually take up disk space. Switching the storage to a compressed file system such as ZFS can reduce the space used by about 3X.
In this example, I created a cloud provider volume through OpenStack and attached it to the running instance, mounted at /dev/vdd. I then install ZFS for Linux and set up a new zpool.
$ sudo apt install zfsutils-linux
$ sudo zpool create datapool /dev/vdd
$ sudo zfs set compression=on datapool
$ sudo zfs create datapool/data
$ sudo cp -rp /home/ubuntu/data/* /datapool/data/.
(this step may take awhile)
$ sudo mv /home/ubuntu/data /home/ubuntu/data.old
$ sudo zfs set mountpoint=/home/ubuntu/data datapool/data
$ sudo chown ubuntu: /home/ubuntu/data
Then check to see if the data-transloader is properly writing to the data directory by checking the log files — if it isn't working, make sure the permissions are correct and that the ZFS dataset (datapool/data) is mounted at the correct path.
If it is okay, then you can delete your original directory at /home/ubuntu/data.old.
You can check the effective compression ratio of the dataset:
$ sudo zfs get compressratio datapool/data
NAME PROPERTY VALUE SOURCE
datapool/data compressratio 3.33x -
This documentation is available under Creative Commons Attribution-ShareAlike 4.0 International.