Rummager indexes content into elasticsearch and serves the GOV.UK search API.
The public search API
For the most up to date query syntax and API output see the Search API documentation.
You can also find some examples in the blog post: "Use the search API to get useful information about GOV.UK content".
Rummager is a Sinatra application that interfaces with Elasticsearch.
There are two ways documents get added to a search index:
- HTTP requests to Rummager's Documents API (deprecated)
- Rummager subscribes to RabbitMQ messages from the Publishing API.
Note: Once whitehall documents are using the new indexing process, the documents API will be removed and rummager will consume only from the publishing API.
Rummager search results are weighted by popularity. We rebuild the index nightly to incorporate the latest analytics.
- Link: Either the base path for a content item, or an external link.
- Document: An elasticsearch document, something we can search for.
- Document Type: An elasticsearch document type specifies the fields for a particular type of document. All our document types are defined in config/schema/elasticsearch_types
- Index: An elasticsearch search
maintains several separate indices (
govuk), but searches return documents from all of them.
- Index Group: An alias in elasticsearch that points to one index at a time. This allows us to rebuild indexes without downtime.
Creating search indexes from scratch
(This is not necessary when restoring from a backup or replicating data into the development VM)
To create an empty index:
bundle exec rake rummager:create_index[<index_name>]
To create an empty index for all rummager indices:
RUMMAGER_INDEX=all bundle exec rake rummager:create_all_indices
If you're running the GDS development VM you need to have elasticsearch running before running the tests or starting the application.
Elasticsearch should start when you start up your dev VM, but if it doesn't, run:
sudo service elasticsearch-development.development start
Running the test suite
bundle exec rake
Running the application
If you're running the GDS development VM:
cd /var/govuk/govuk-puppet/development-vm && bundle exec bowl rummager
Rummager should then be available at rummager.dev.gov.uk.
If you're not running the GDS development VM:
Rummager uses Sidekiq to manage its indexing workload. To run this in the development VM, you need to run both of these commands:
# to start the Sidekiq process bundle exec rake jobs:work # to start the rummager webapp bundle exec mr-sparkle --force-polling -- -p 3009
Publishing API integration
Rummager subscribes to a RabbitMQ queue of updates from publishing-api. This still requires Sidekiq to be running.
bundle exec rake message_queue:insert_data_into_govuk
There is also a separate process that listens to only 'links' updates from the publishing API. This is used for updating old indexes that are populated through the '/documents' API (
detailed) and can be removed once those indexes no longer exist.
bundle exec rake message_queue:listen_to_publishing_queue
Evaluating search results
ab_tests parameter can be used to distinguish between two versions of
the search query.
Using search-performance-explorer, you can compare the results side by side.
The health check script can be used to evaluate Rummager using a set of judgments about which documents are 'good' results for some sample queries.
Changing the schema/Reindexing
After changing the schema, you'll need to recreate the index. This reindexes documents from the existing index.
RUMMAGER_INDEX=all bundle exec rake rummager:migrate_schema
Internal only APIs
There are some other APIs that are only exposed internally:
These are used by search admin.
- New indexing process: how to update a format to use the new indexing process
- Schemas: how to work with schemas and the document types
- Popularity information: Rummager uses Google Analytics data to improve search results.
- Publishing advanced search: Information about the advanced search finder
- Publishing document finders: Information about publishing finders using rake tasks