Ruby
Switch branches/tags
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Failed to load latest commit information.
bin Merge branch 'master' into sitemap-feature Aug 15, 2017
config Use same number of Sidekiq threads on staging and production Sep 18, 2017
doc Clarify that the order doesn't matter for the nightly job Aug 25, 2017
lib Fix the rummager:clean rake task after 2.4 upgrade Sep 19, 2017
log Empty /log directory Apr 26, 2016
public
test Force index commit before attempting to retrieve data Sep 19, 2017
.gitignore Switch docroot to be public directory. Jan 6, 2014
.rubocop.yml Remove old rubocop exceptions Oct 25, 2016
.ruby-version Use Ruby 2.3.1 and add govuk_schemas gem Jul 27, 2017
Dockerfile Add GOVUK_APP_NAME to Dockerfile Mar 24, 2017
Gemfile allow ES host and port to be passed into tests Sep 12, 2017
Gemfile.lock Refactor after review Sep 12, 2017
Jenkinsfile allow ES host and port to be passed into tests Sep 12, 2017
LICENSE.txt Update readme Jun 2, 2015
Procfile
README.md Update docs path in READMEs Jul 27, 2017
Rakefile
config.ru Make logging more visible Aug 15, 2017
console Fix the ./console command Aug 18, 2017
elasticsearch.yml Remove reference to port 19200 Sep 14, 2017
env.rb Make route registration configurable. Dec 20, 2011
startup.sh Change startup.sh to use mr-sparkle Jul 19, 2013

README.md

Rummager

Rummager is the internal GOV.UK API for search.

Live examples

This API is publicly accessible:

https://www.gov.uk/api/search.json?q=taxes Screenshot of API Response

You can read how to use the API in the blog post: "Use the search API to get useful information about GOV.UK content".

Technical documentation

Rummager is a Sinatra application that interfaces with Elasticsearch.

It provides a search API that is used by multiple applications, and is publicly available at gov.uk/api/search.json.

There are two ways documents get added to the search index:

  1. Post to the Documents API
  2. Via the RabbitMQ consumer worker, which responds to notifications from the Publishing API.

In future the documents API will be deprecated and rummager will consume only from the publishing API.

There is also a separate API for retrieving documents from the search index by their links.

Rummager search results are weighted by popularity. We rebuild the index nightly to incorporate the latest analytics.

Nomenclature

  • Link: Either the base path for a content item, or an external link.
  • Document: An elasticsearch document, something we can search for.
  • Document Type: An elasticsearch document type specifies the fields for a particular type of document. All our document types are defined in config/schema/elasticsearch_types
  • Index: An elasticsearch search index. Rummager maintains several separate indices (mainstream, detailed and government), but searches return documents from all of them.
  • Index Group: An alias in elasticsearch that points to one index at a time. This allows us to rebuild indexes without downtime.

Dependencies

Setup

To create indices, or to update them to the latest index settings, run:

RUMMAGER_INDEX=all bundle exec rake rummager:migrate_index

If you have indices from a Rummager instance before aliased indices, run:

RUMMAGER_INDEX=all bundle exec rake rummager:migrate_from_unaliased_index

If you don't know which of these you need to run, try running the first one; it will fail safely with an error if you have an unmigrated index.

Running the application

If you're running the GDS development VM:

cd /var/govuk/development && bundle exec bowl rummager

If you're not running the GDS development VM:

./startup.sh

Rummager should then be available at rummager.dev.gov.uk.

Rummager uses Sidekiq to manage index workers in a separate process. To run this in the development VM, you need to run both of these commands:

# to start the sidekiq process
bundle exec rake jobs:work

# to start the rummager webapp
bundle exec mr-sparkle --force-polling -- -p 3009

Rummager can subscribe to a queue of updates from publishing-api, backed by rabbitmq. At present Rummager is only interested in updates to the links hash. You can start the message queue consumer process in development by running:

govuk_setenv rummager bundle exec rake message_queue:listen_to_publishing_queue

Running the test suite

bundle exec rake

Indexing & Reindexing

After changing the schema, you'll need to migrate the index.

RUMMAGER_INDEX=all bundle exec rake rummager:migrate_index

API documentation

For the most up to date query syntax and API output:

Additional Docs

  • Schemas: how to work with schemas and the document types
  • Health Check: usage instructions for the Health Check functionality.
  • Popularity information: Rummager uses Google Analytics data to improve search results.

Licence

MIT License