Search API for GOV.UK (elasticsearch 5) - this will ONLY exist during the migration, everything will be backported to alphagov/rummager.
Branch: master
Clone or download
barrucadu Merge pull request #8 from alphagov/rummager/b67abd60b0eb44db0612e02d…
…6e2495a67c94daea

Pull in rummager changes to b67abd6
Latest commit 215cb84 Feb 19, 2019
Permalink
Type Name Latest commit message Commit time
Failed to load latest commit information.
bin Remove reference to mainstream index form production code Jan 5, 2018
config Merge commit 'b67abd60b0eb44db0612e02d6e2495a67c94daea' Feb 19, 2019
doc Add rake task to publish finders Feb 1, 2019
lib Merge commit 'b67abd60b0eb44db0612e02d6e2495a67c94daea' Feb 19, 2019
log Empty /log directory Apr 26, 2016
public Switch docroot to be public directory. Jan 6, 2014
spec Merge commit 'b67abd60b0eb44db0612e02d6e2495a67c94daea' Feb 19, 2019
.gitignore Delete healthcheck output file Nov 27, 2017
.rspec Migrate support and config files Sep 25, 2017
.rubocop.yml Disable RSpec/MultipleExpectations Oct 5, 2018
.ruby-version Update .ruby-version to 2.6.1 Feb 13, 2019
Dockerfile Update Dockerfile to 2.6.1 Feb 13, 2019
Gemfile Merge commit 'b67abd60b0eb44db0612e02d6e2495a67c94daea' Feb 19, 2019
Gemfile.lock Merge commit 'b67abd60b0eb44db0612e02d6e2495a67c94daea' Feb 19, 2019
Jenkinsfile Use an elasticsearch-5.6 ci-agent Feb 6, 2019
LICENSE.txt Update readme Jun 2, 2015
Procfile Change default port to 3233 Jan 25, 2019
README.md Change default port to 3233 Jan 25, 2019
Rakefile Add govuk-ruby-lint to rake file Nov 14, 2017
config.ru Make logging more visible Aug 15, 2017
console Fix the ./console command Aug 18, 2017
elasticsearch.yml Remove reference to mainstream index form production code Jan 5, 2018
env.rb Make route registration configurable. Dec 20, 2011
startup.sh Change default port to 3233 Jan 25, 2019

README.md

Search API

This is a fork of rummager, and is in a half-renamed state! We will clean this up as the migration progresses.

Rummager indexes content into elasticsearch and serves the GOV.UK search API.

Live examples

GOV.UK search

alphagov/finder-frontend uses the search API to render site search and finder pages (such as gov.uk/aaib-reports).

The public search API

https://www.gov.uk/api/search.json?q=taxes Screenshot of API Response

For the most up to date query syntax and API output see the Search API documentation.

You can also find some examples in the blog post: "Use the search API to get useful information about GOV.UK content".

Technical documentation

Rummager is a Sinatra application that interfaces with Elasticsearch.

There are two ways documents get added to a search index:

  1. HTTP requests to Rummager's Documents API (deprecated)
  2. Rummager subscribes to RabbitMQ messages from the Publishing API.

Note: Once whitehall documents are using the new indexing process, the documents API will be removed and rummager will consume only from the publishing API.

Rummager search results are weighted by popularity. We rebuild the index nightly to incorporate the latest analytics.

Nomenclature

  • Link: Either the base path for a content item, or an external link.
  • Document: An elasticsearch document, something we can search for.
  • Document Type: An elasticsearch document type specifies the fields for a particular type of document. All our document types are defined in config/schema/elasticsearch_types
  • Index: An elasticsearch search index. Rummager maintains several separate indices (detailed, government and govuk), but searches return documents from all of them.
  • Index Group: An alias in elasticsearch that points to one index at a time. This allows us to rebuild indexes without downtime.

Dependencies

Creating search indexes from scratch

(This is not necessary when restoring from a backup or replicating data into the development VM)

To create an empty index:

bundle exec rake rummager:create_index[<index_name>]

To create an empty index for all rummager indices:

RUMMAGER_INDEX=all bundle exec rake rummager:create_all_indices

Starting elasticsearch

If you're running the GDS development VM you need to have elasticsearch running before running the tests or starting the application.

Elasticsearch should start when you start up your dev VM, but if it doesn't, run:

sudo service elasticsearch-development.development start

Running the test suite

bundle exec rake

Running the application

If you're running the GDS development VM:

cd /var/govuk/govuk-puppet/development-vm && bundle exec bowl rummager

Rummager should then be available at rummager.dev.gov.uk.

If you're not running the GDS development VM:

./startup.sh

Workers

Rummager uses Sidekiq to manage its indexing workload. To run this in the development VM, you need to run both of these commands:

# to start the Sidekiq process
bundle exec rake jobs:work

# to start the rummager webapp
bundle exec mr-sparkle --force-polling -- -p 3233

Publishing API integration

Rummager subscribes to a RabbitMQ queue of updates from publishing-api. This still requires Sidekiq to be running.

	bundle exec rake message_queue:insert_data_into_govuk

There is also a separate process that listens to only 'links' updates from the publishing API. This is used for updating old indexes that are populated through the '/documents' API (government, detailed) and can be removed once those indexes no longer exist.

bundle exec rake message_queue:listen_to_publishing_queue

Evaluating search results

The ab_tests parameter can be used to distinguish between two versions of the search query.

Using search-performance-explorer, you can compare the results side by side.

The health check script can be used to evaluate Rummager using a set of judgments about which documents are 'good' results for some sample queries.

Changing the schema/Reindexing

After changing the schema, you'll need to recreate the index. This reindexes documents from the existing index.

RUMMAGER_INDEX=all bundle exec rake rummager:migrate_schema

Internal only APIs

There are some other APIs that are only exposed internally:

These are used by search admin.

Additional Docs

Licence

MIT License