Skip to content

Commit

Permalink
Merge pull request #2913 from alphagov/ltr-removal
Browse files Browse the repository at this point in the history
Update docs following removal of LTR
  • Loading branch information
sihugh authored May 15, 2024
2 parents ee4e460 + e751218 commit 7e59dd2
Show file tree
Hide file tree
Showing 6 changed files with 3 additions and 197 deletions.
1 change: 0 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -43,7 +43,6 @@ bundle exec rake
- [Schemas](docs/schemas.md): how to work with schemas and the document types
- [Popularity information](docs/popularity.md): Search API uses Google Analytics data to improve search results.
- [Publishing document finders](docs/publishing-finders.md): Information about publishing finders using rake tasks
- [Learning to rank](docs/learning-to-rank.md): Guidance on how to run the ranking model locally

## Licence

Expand Down
11 changes: 0 additions & 11 deletions docs/how-search-works.md
Original file line number Diff line number Diff line change
Expand Up @@ -19,17 +19,6 @@ stack don't need to know how to construct Elasticsearch queries.
See the [relevancy documentation](relevancy.md) to learn more about how
Search API determines how relevant a document is to a query.

### Reranking

Once Search API has retrieved a selection of relevant documents from
Elasticsearch, the results are re-ranked by a machine learning model.

This process ensures that we show the most relevant documents at the top
of the search results.

See the [learning to rank documentation](learning-to-rank.md) to learn
more about the reranking model.

## Evaluating search quality

To ensure Search API returns good quality results, we use a combination of
Expand Down
164 changes: 0 additions & 164 deletions docs/learning-to-rank.md

This file was deleted.

5 changes: 0 additions & 5 deletions docs/new-indexing-process.md
Original file line number Diff line number Diff line change
Expand Up @@ -23,11 +23,6 @@ Example PRs:
- [Prepare for moving to rummager](https://github.com/alphagov/calendars/pull/160/files)
- [Ensure we pass the description text to publishing API](https://github.com/alphagov/calendars/pull/162/files)

## Add the format to the list in `lib/learn_to_rank/format_enums.rb`

We take format into account in our machine learning, which means we
need a mapping from formats to unique numbers.

## Update the presenter to handle the new format
You'll need to update the elasticsearch presenter in Search API so that it handles any fields which are not yet used by other formats in the govuk index.

Expand Down
13 changes: 2 additions & 11 deletions docs/relevancy.md
Original file line number Diff line number Diff line change
Expand Up @@ -36,15 +36,6 @@ a `combined_score` on every document.
The `combined_score` is used for ranking results and represents how
relevant we think a result is to your query.

## What impacts relevancy?

Once Search API has [retrieved](#what-impacts-document-retrieval) the
top scoring documents from the search indexes, it ranks the results
in order of relevance using a pre-trained model.

See the [learning to rank](learning-to-rank.md) documentation for
more details.

## What impacts document retrieval?

Out of the box, Elasticsearch comes with a decent scoring algorithm.
Expand Down Expand Up @@ -102,13 +93,13 @@ field and its number of page views in the `vc_14` field.

This is an implementation of [this curve](https://solr.apache.org/guide/7_7/function-queries.html#recip-function),
and is applied to documents of the "announcement" type in the [booster.rb][]
file. It serves to increase the score of new documents and decrease
file. It serves to increase the score of new documents and decrease
the score of old documents.

Only documents of `search_format_types` 'announcement' are affected by
recency boosting.

The curve was chosen so that it only applies the boost temporarily (2
The curve was chosen so that it only applies the boost temporarily (2
months moderate decay then a rapid decay after that).

#### Properties
Expand Down
6 changes: 1 addition & 5 deletions docs/search-quality-metrics.md
Original file line number Diff line number Diff line change
Expand Up @@ -15,13 +15,9 @@ click on something that isn't what they were looking for. But this
serves our needs in the absence of a more sophisticated way of
measuring user success following a search.

We also measure nDCG before and after re-ranking over time, to
tell us how search is performing against relevance judgements.

## Offline metrics

Our main offline metric is nDCG. We measure this before and after
re-ranking by our [learning to rank model](learning-to-rank.md).
Our main offline metric is nDCG.

We use Elasticsearch's [Ranking Evaluation API](ranking_evaluation_api)
to assess the quality of results retrieved from Elasticsearch prior
Expand Down

0 comments on commit 7e59dd2

Please sign in to comment.