GitHub - NYCPlanning/labs-geosearch-dockerfiles: Containerized Local Installation Package for the Pelias geocoder

Dockerfiles for NYC Geosearch Services.

NYC Geosearch is an JSON API for geocoding of NYC addresses, built on the open source Pelias geocoder and NYC's Property Address Directory (PAD)

Overview

These dockerfiles allow for quickly standing up all of the services that work together to run the pelias geocoder, and is used in both production and development. These include:

pelias API - node.js app that does parses search strings and finds results in the database
elasticsearch - the database where all address results are stored
geosearch API - A node.js proxy API that appends custom metadata to responses from the Pelias API.

This repo (and readme) serves as "home base" for the GeoSearch project, as the dockerfiles tie everything together. Other relevant code for our Pelias deployment:

geosearch-pad-normalize - an R script that starts with the raw Property Address Database, and interpolates valid address ranges.
geosearch-pad-importer - a Pelias importer for normalized NYC PAD data.
geosearch-docs - an interactive documentation site for the Geosearch API
geosearch-acceptance-tests - nyc-specific test suite for geosearch
geosearch-api- A node.js proxy API that appends custom metadata to responses from the Pelias API.

Docker Compose allows us to quickly spin up the pelias services we need, and run scripts manually in the containers. It also makes use of volumes and internal hostnames so the various services can communicate with each other. See below for the commands necessary to get started.

For more information on Pelias services, including many which we are not using here at City Planning, check out this self-contained workshop. This is the tutorial that got us started, and we recommend anyone working with Pelias start here.

Running Pelias Services

In both production and development, several Pelias services need to be up and running before address data can be imported in the database. Before any data are imported, either locally or in production, you should have mastery of the long-running Pelias services outlined here, and how to get them started/restarted.

Config-Driven

Much of this environment is config-driven, and the two files you should pay attention to are:

docker-compose.yml - configurations for each of the named services, including which docker image to run, which volumes, to use, etc
pelias.json - a common config file used by all of the pelias services. This identifies the hostnames for various services, and importer-specific configuration.

Pull Images

Before you can run the pelias services via docker, you must first get all of the images. The Pelias team has pre-built them and hosted them on dockerhub, so we can skip the time-consuming step of building the images manually.

docker-compose pull will get all of the pelias images from dockerhub.

For our PAD importer, there is no pre-built image, and how you include it is different in production and development. See below for how to get the PAD importer working in each environment.

Environment Variables

Create a .env file based on .env-example to set environment variables for the working directory and slack webhook url.

elasticsearch database

To start a new database:

docker-compose up -d elasticsearch - spin up an empty elasticsearch database docker-compose run --rm schema npm run create_index - create the pelias index

The database is now ready to receive data from an importer.

pelias api

docker-compose up -d api

You should be able to query the API at http://localhost:4000/v1/autocomplete?text={sometext}, but there's no data in the database yet!

PAD importer

The PAD importer serves two functions, it downloads the latest normalized PAD dataset, and imports each row into elasticsearch. Each of these is run manually via an npm command.

The pad importer is run via docker-compose in both production and development.

Running the Import

Once the normalized PAD data has been published, follow these steps on the server to run the import. The approach for zero downtime is to import the data into a different index on the running elasticsearch database. Once the import is complete, the rowcounts of the source csv and the new index are compared, and the new index is promoted by adding pelias as an alias.

Delete old indices

Use ./list_indices.sh to show all indices.
Use ./list_aliases.sh to see which index is currently active.
Use ./delete_index.sh {indexname} to delete non-active indices.

Run ./import-pad.sh, which does the following:

Generates a new timestamped index name: pelias_XXXXXXXXXXX
Builds nycpad-importer in a container from the latest code on github
Updates pelias.json with the new indexName
Creates the new index in the elasticsearch database
Runs the importer
Compares the rowcounts of the source csv and the new index
If the rowcounts match, delete all pelias aliases and apply the pelias alias to the new index

In a development environment, if you want to develop on geosearch-api or geosearch-pad-importer locally, be sure to change the build context for each in docker-compose.yml. This will tell docker-compose to build off of the local repos instead of whatever the latest is on github.

Production Domain

In production, we added a custom nginx configuration to handle SSL, and route traffic to the pelias api running internally on port 4000. The nginx config Jinja2 template is saved in this repo as nginx.conf.

The nginx config should be stored in /etc/nginx/conf.d/{productiondomain}.conf

This nginx config also proxies all requests that aren't API calls to the geosearch docs site, so that both the API and the docs can share the same production domain.

Update Notes 27 November 2018

Updated to PAD 18D. Newest version of pelias schema includes strict mapping, which would not allow us to append meta information when importing, so we added an explicit tag for schema in docker-compose.yml.

Connect to server and ran sudo sh ./import-pad.sh. Tests didn't complete and slack messages didn't work, but the new index was created and populated. Manually swapped out the aliases using curl. This workflow needs a lot of work and testing.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Overview

Running Pelias Services

Config-Driven

Pull Images

Environment Variables

elasticsearch database

pelias api

PAD importer

Running the Import

Production Domain

Update Notes 27 November 2018

About

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 145 Commits
.env-example		.env-example
.gitignore		.gitignore
build.sh		build.sh
clear_database.sh		clear_database.sh
delete_index.sh		delete_index.sh
docker-compose.yml		docker-compose.yml
elasticsearch.yml		elasticsearch.yml
import-pad.sh		import-pad.sh
list_aliases.sh		list_aliases.sh
list_indices.sh		list_indices.sh
nginx.conf		nginx.conf
pelias.json		pelias.json
prep_data.sh		prep_data.sh
readme.md		readme.md
run_services.sh		run_services.sh

NYCPlanning/labs-geosearch-dockerfiles

Folders and files

Latest commit

History

Repository files navigation

Overview

Running Pelias Services

Config-Driven

Pull Images

Environment Variables

elasticsearch database

pelias api

PAD importer

Running the Import

Production Domain

Update Notes 27 November 2018

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages