# Improved Question-Answering Search Engines with Transformers, Retrieval & Re-Ranking


![](https://i.imgur.com/7SXKckD.png)

Transfer Learning is the power of leveraging already trained models and tune \ adapt them to our own downstream tasks.

# Retrival and Re-ranking

In Semantic Search we have shown how to use SentenceTransformer to compute embeddings for queries, sentences, and paragraphs and how to use this for semantic search.

For complex search tasks, for example, for question answering retrieval, the search can significantly be improved by using Retrieve & Re-Rank.


# Retrieve & Re-Rank Pipeline

A pipeline for information retrieval / question answering retrieval that works well is the following. All components are provided and explained in this notebook:

![](https://i.imgur.com/yIXJRSo.png)


Given a search query, we first use a retrieval system that retrieves a large list of e.g. 100 possible hits which are potentially relevant for the query.
For the retrieval, we can use either lexical search, e.g. with ElasticSearch, or we can use dense retrieval with a bi-encoder. Simple Lexical searches can be based on TF-IDF, BM25 etc.


However, the retrieval system might retrieve documents that are not that relevant for the search query.
Hence, in a second stage, we use a re-ranker based on a cross-encoder that scores the relevancy of all candidates for the given search query.

The output will be a ranked list of hits we can present to the user.


## Retrieval: Bi-Encoder

For the retrieval of the candidate set, we can either use lexical search (e.g. ElasticSearch), or we can use a bi-encoder (semantic search) which is implemented in this repository.

Lexical search looks for literal matches of the query words in your document collection. It will not recognize synonyms, acronyms or spelling variations.

In contrast, semantic search (or dense retrieval) encodes the search query into vector space and retrieves the document embeddings that are close in vector space.

Bi-Encoders produce for a given sentence or document an embedding.


## Re-Ranker: Cross-Encoder

The retriever has to be efficient for large document collections with millions of entries. However, it might return irrelevant candidates.

A re-ranker based on a Cross-Encoder can substantially improve the final results for the user. The query and a possible document is passed simultaneously to transformer network, which then outputs a single score between 0 and 1 indicating how relevant the document is for the given query.

![](https://i.imgur.com/PFgkrcI.png)

The advantage of Cross-Encoders is the higher performance, as they perform attention across the query and the document.

Scoring thousands or millions of (query, document)-pairs would be rather slow. Hence, we use the retriever to create a set of e.g. 100 possible candidates which are then re-ranked by the Cross-Encoder.

First, you use an efficient Bi-Encoder to retrieve e.g. the top-100 most similar sentences for a query. Then, you use a Cross-Encoder to re-rank these 100 hits by computing the score for every (query, hit) combination.





## Retrieve & Re-Rank Search Engine over Simple Wikipedia

This examples demonstrates the Retrieve & Re-Rank Setup and allows to search over Simple Wikipedia.

You can input a query or a question. The script then uses semantic search to find relevant passages in Simple English Wikipedia

In [None]:
!pip install -U sentence-transformers rank_bm25

Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
Collecting sentence-transformers
  Downloading sentence-transformers-2.2.2.tar.gz (85 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m86.0/86.0 kB[0m [31m7.6 MB/s[0m eta [36m0:00:00[0m
[?25h  Preparing metadata (setup.py) ... [?25l[?25hdone
Collecting rank_bm25
  Downloading rank_bm25-0.2.2-py3-none-any.whl (8.6 kB)
Collecting transformers<5.0.0,>=4.6.0 (from sentence-transformers)
  Downloading transformers-4.30.1-py3-none-any.whl (7.2 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m7.2/7.2 MB[0m [31m87.7 MB/s[0m eta [36m0:00:00[0m
Collecting sentencepiece (from sentence-transformers)
  Downloading sentencepiece-0.1.99-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (1.3 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m1.3/1.3 MB[0m [31m80.2 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting huggingface-hub>

For semantic search, we use `SentenceTransformer('multi-qa-MiniLM-L6-cos-v1')` and retrieve 32 potentially relevant passages that answer the input query.

Next, we use a more powerful CrossEncoder `(cross_encoder = CrossEncoder('cross-encoder/ms-marco-MiniLM-L-6-v2'))` that scores the query and all retrieved passages for their relevancy. The cross-encoder further boost the performance.

MS MARCO is a large scale information retrieval corpus that was created based on real user search queries using Bing search engine.

The provided models can be used for semantic search, i.e., given keywords / a search phrase / a question, the model will find passages that are relevant for the search query.

In [None]:
import json
from sentence_transformers import SentenceTransformer, CrossEncoder, util
import gzip
import os
import torch

if not torch.cuda.is_available():
    print("Warning: No GPU found. Please add GPU to your notebook")


#We use the Bi-Encoder to encode all passages, so that we can use it with sematic search
bi_encoder = SentenceTransformer('multi-qa-MiniLM-L6-cos-v1')
bi_encoder.max_seq_length = 256     #Truncate long passages to 256 tokens
top_k = 32                          #Number of passages we want to retrieve with the bi-encoder

#The bi-encoder will retrieve 100 documents. We use a cross-encoder, to re-rank the results list to improve the quality
cross_encoder = CrossEncoder('cross-encoder/ms-marco-MiniLM-L-6-v2')


# As dataset, we use Simple English Wikipedia. Compared to the full English wikipedia, it has only
# about 170k articles. We split these articles into paragraphs and encode them with the bi-encoder

wikipedia_filepath = 'simplewiki-2020-11-01.jsonl.gz'

if not os.path.exists(wikipedia_filepath):
    util.http_get('http://sbert.net/datasets/simplewiki-2020-11-01.jsonl.gz', wikipedia_filepath)

passages = []
with gzip.open(wikipedia_filepath, 'rt', encoding='utf8') as fIn:
    for line in fIn:
        data = json.loads(line.strip())

        #Add all paragraphs
        #passages.extend(data['paragraphs'])

        #Only add the first paragraph
        passages.append(data['paragraphs'][0])

print("Passages:", len(passages))

# We encode all passages into our vector space. This takes about 5 minutes (depends on your GPU speed)
corpus_embeddings = bi_encoder.encode(passages, convert_to_tensor=True, show_progress_bar=True)

Downloading (…)5fedf/.gitattributes:   0%|          | 0.00/737 [00:00<?, ?B/s]

Downloading (…)_Pooling/config.json:   0%|          | 0.00/190 [00:00<?, ?B/s]

Downloading (…)2cb455fedf/README.md:   0%|          | 0.00/11.5k [00:00<?, ?B/s]

Downloading (…)b455fedf/config.json:   0%|          | 0.00/612 [00:00<?, ?B/s]

Downloading (…)ce_transformers.json:   0%|          | 0.00/116 [00:00<?, ?B/s]

Downloading (…)edf/data_config.json:   0%|          | 0.00/25.5k [00:00<?, ?B/s]

Downloading pytorch_model.bin:   0%|          | 0.00/90.9M [00:00<?, ?B/s]

Downloading (…)nce_bert_config.json:   0%|          | 0.00/53.0 [00:00<?, ?B/s]

Downloading (…)cial_tokens_map.json:   0%|          | 0.00/112 [00:00<?, ?B/s]

Downloading (…)5fedf/tokenizer.json:   0%|          | 0.00/466k [00:00<?, ?B/s]

Downloading (…)okenizer_config.json:   0%|          | 0.00/383 [00:00<?, ?B/s]

Downloading (…)fedf/train_script.py:   0%|          | 0.00/13.8k [00:00<?, ?B/s]

Downloading (…)2cb455fedf/vocab.txt:   0%|          | 0.00/232k [00:00<?, ?B/s]

Downloading (…)455fedf/modules.json:   0%|          | 0.00/349 [00:00<?, ?B/s]

Downloading (…)lve/main/config.json:   0%|          | 0.00/794 [00:00<?, ?B/s]

Downloading pytorch_model.bin:   0%|          | 0.00/90.9M [00:00<?, ?B/s]

Downloading (…)okenizer_config.json:   0%|          | 0.00/316 [00:00<?, ?B/s]

Downloading (…)solve/main/vocab.txt:   0%|          | 0.00/232k [00:00<?, ?B/s]

Downloading (…)cial_tokens_map.json:   0%|          | 0.00/112 [00:00<?, ?B/s]

  0%|          | 0.00/50.2M [00:00<?, ?B/s]

Passages: 169597


Batches:   0%|          | 0/5300 [00:00<?, ?it/s]

In [None]:
passages[0]

'Ted Cassidy (July 31, 1932 - January 16, 1979) was an American actor. He was best known for his roles as Lurch and Thing on "The Addams Family".'

In [None]:
corpus_embeddings[0], corpus_embeddings[0].shape

(tensor([-1.0502e-01, -6.6984e-02,  6.5590e-03, -7.5239e-02, -2.9778e-02,
          3.2903e-02,  4.1868e-02,  9.6896e-02, -3.2055e-02, -3.0800e-02,
          1.3767e-02,  5.3367e-02, -4.6852e-02,  1.4192e-02,  6.8777e-02,
          3.0876e-02,  5.0982e-03,  4.0002e-02, -7.3914e-02, -6.9276e-02,
          1.2249e-02, -5.3207e-02,  3.7148e-02, -2.8915e-02, -5.0157e-04,
         -3.7280e-02,  6.0539e-02,  4.8168e-02, -8.6899e-03,  2.2352e-02,
          1.0071e-01, -2.1343e-02,  3.9886e-02, -5.0790e-03, -1.9516e-02,
         -8.7064e-02,  4.3888e-02,  3.3809e-02,  5.3262e-02,  3.9972e-02,
          4.4025e-02,  2.5665e-02, -2.2285e-03, -5.6762e-03, -8.8039e-03,
         -6.2691e-02,  2.6264e-02,  9.7448e-03, -9.1535e-03,  1.0132e-01,
          1.0190e-01,  3.6445e-02,  2.0244e-02,  1.4598e-03, -2.8108e-02,
          2.3936e-02, -3.3839e-02,  9.8303e-02, -3.6032e-02, -9.6148e-02,
         -1.4320e-02, -6.4947e-03,  1.1665e-02, -1.6429e-03, -9.1396e-02,
          1.0647e-01, -5.7154e-02, -2.

We also compare the results to lexical search (keyword search). Here, we use the BM25 algorithm which is implemented in the `rank_bm25` package.

In [None]:
from rank_bm25 import BM25Okapi
from sklearn.feature_extraction import _stop_words
import string
from tqdm.autonotebook import tqdm
import numpy as np


# We lower case our text and remove stop-words from indexing
def bm25_tokenizer(text):
    tokenized_doc = []
    for token in text.lower().split():
        token = token.strip(string.punctuation)

        if len(token) > 0 and token not in _stop_words.ENGLISH_STOP_WORDS:
            tokenized_doc.append(token)
    return tokenized_doc


tokenized_corpus = []
for passage in tqdm(passages):
    tokenized_corpus.append(bm25_tokenizer(passage))

bm25 = BM25Okapi(tokenized_corpus)

  0%|          | 0/169597 [00:00<?, ?it/s]

This function will search all wikipedia articles for passages that answer the query

In [None]:
def search(query):
    print("Input question:", query)

    ##### BM25 search (lexical search) #####
    bm25_scores = bm25.get_scores(bm25_tokenizer(query))
    top_n = np.argpartition(bm25_scores, -5)[-5:]
    bm25_hits = [{'corpus_id': idx, 'score': bm25_scores[idx]} for idx in top_n]
    bm25_hits = sorted(bm25_hits, key=lambda x: x['score'], reverse=True)

    print("Top-2 lexical search (BM25) hits")
    for hit in bm25_hits[0:2]:
        print("\t{:.3f}\t{}".format(hit['score'], passages[hit['corpus_id']].replace("\n", " ")))

    ##### Bi-Encoder: Sematic Search #####
    # Encode the query using the bi-encoder and find potentially relevant passages
    question_embedding = bi_encoder.encode(query, convert_to_tensor=True)
    question_embedding = question_embedding.cuda()
    hits = util.semantic_search(question_embedding, corpus_embeddings, top_k=top_k)
    hits = hits[0]  # Get the hits for the first query

    ##### Cross-Encoder: Re-Ranking #####
    # Now, score all retrieved passages with the cross_encoder
    cross_inp = [[query, passages[hit['corpus_id']]] for hit in hits]
    cross_scores = cross_encoder.predict(cross_inp)

    # Sort results by the cross-encoder scores
    for idx in range(len(cross_scores)):
        hits[idx]['cross-score'] = cross_scores[idx]

    # Output of top-2 hits from bi-encoder
    print("\n-------------------------\n")
    print("Top-2 Bi-Encoder Retrieval hits")
    hits = sorted(hits, key=lambda x: x['score'], reverse=True)
    for hit in hits[0:2]:
        print("\t{:.3f}\t{}".format(hit['score'], passages[hit['corpus_id']].replace("\n", " ")))

    # Output of top-2 hits from re-ranker
    print("\n-------------------------\n")
    print("Top-2 Cross-Encoder Re-ranker hits")
    hits = sorted(hits, key=lambda x: x['cross-score'], reverse=True)
    for hit in hits[0:2]:
        print("\t{:.3f}\t{}".format(hit['cross-score'], passages[hit['corpus_id']].replace("\n", " ")))

In [None]:
search(query = "What is the capital of the United States?")

Input question: What is the capital of the United States?
Top-2 lexical search (BM25) hits
	13.316	Capital punishment (the death penalty) has existed in the United States since before the United States was a country. As of 2017, capital punishment is legal in 30 of the 50 states. The federal government (including the United States military) also uses capital punishment.
	11.434	Ohio is one of the 50 states in the United States. Its capital is Columbus. Columbus also is the largest city in Ohio.

-------------------------

Top-2 Bi-Encoder Retrieval hits
	0.622	Cities in the United States:
	0.597	The United States Capitol is the building where the United States Congress meets. It is the center of the legislative branch of the U.S. federal government. It is in Washington, D.C., on top of Capitol Hill at the east end of the National Mall.

-------------------------

Top-2 Cross-Encoder Re-ranker hits
	8.906	Washington, D.C. (also known as simply Washington or D.C., and officially as the D

In [None]:
search(query = "What is the capital of Germany?")

Input question: What is the capital of Germany?
Top-2 lexical search (BM25) hits
	11.705	Berlin is the capital city of Germany. It is also the biggest city in Germany. About 3,700,000 people live there.
	11.277	Anhalt-Bitterfeld is a district in Saxony-Anhalt, Germany. Its capital is Köthen.

-------------------------

Top-2 Bi-Encoder Retrieval hits
	0.647	Berlin is the capital city of Germany. It is also the biggest city in Germany. About 3,700,000 people live there.
	0.625	Bavaria () is a State ("Bundesland") of Germany. The territory of this state is the largest of the 16 German states. The state capital is Munich with 1.3 million people. About 12.5 million people live in Bavaria. Like many German states, Bavaria was once independent. Ludwig II of Bavaria was its last independent king.

-------------------------

Top-2 Cross-Encoder Re-ranker hits
	7.922	Berlin is the capital city of Germany. It is also the biggest city in Germany. About 3,700,000 people live there.
	4.175	Sangerha

In [None]:
search(query = "What is the capital of the United Kingdom?")

Input question: What is the capital of the United Kingdom?
Top-2 lexical search (BM25) hits
	14.703	Cardiff is the capital and biggest city of Wales, in the United Kingdom.
	13.182	The River Thames is a large river in England. It goes through London the capital city of the United Kingdom.

-------------------------

Top-2 Bi-Encoder Retrieval hits
	0.581	City status in the United Kingdom is granted by the British monarch to some communities. There are 69 cities in the United Kingdom (see list below) – 51 in England, six in Wales, seven in Scotland and five in Northern Ireland.
	0.567	London is the capital and largest city of England and the United Kingdom, and is the largest urban area in Greater London. The River Thames travels through the city.

-------------------------

Top-2 Cross-Encoder Re-ranker hits
	5.954	London is the capital and largest city of England and the United Kingdom, and is the largest urban area in Greater London. The River Thames travels through the city.
	3.298	

In [None]:
search(query = "Coldest place on earth?")

Input question: Coldest place on earth?
Top-2 lexical search (BM25) hits
	24.891	East Antarctica, also called Greater Antarctica, is the largest part (two-thirds) of the Antarctic continent. It is on the Indian Ocean side of the Transantarctic Mountains. It is the coldest, windiest, and driest part of Earth. East Antarctica holds the record as the coldest place on earth.
	12.650	Earth Day is a day that is supposed to inspire more awareness and appreciation for the Earth's natural environment. It takes place each year on April 22. It now takes place in more than 193 countries around the world. During Earth Day, the world encourages everyone to turn off all unwanted lights.

-------------------------

Top-2 Bi-Encoder Retrieval hits
	0.598	East Antarctica, also called Greater Antarctica, is the largest part (two-thirds) of the Antarctic continent. It is on the Indian Ocean side of the Transantarctic Mountains. It is the coldest, windiest, and driest part of Earth. East Antarctica holds t

In [None]:
search(query = "What is natural language processing?")

Input question: What is natural language processing?
Top-2 lexical search (BM25) hits
	19.189	Natural Language Processing (NLP) is a field in Artificial Intelligence, and is also related to linguistics. On a high level, the goal of NLP is to program computers to automatically understand human languages, and also to automatically write/speak in human languages. We say "Natural Language" to mean human language, and to indicate that we are not talking about computer (programming) languages.
	18.145	Jaime Guillermo Carbonell (July 29, 1953 – February 28, 2020) was an American computer scientist. His works focused on natural language processing tools and technologies.

-------------------------

Top-2 Bi-Encoder Retrieval hits
	0.773	Natural Language Processing (NLP) is a field in Artificial Intelligence, and is also related to linguistics. On a high level, the goal of NLP is to program computers to automatically understand human languages, and also to automatically write/speak in human lan

In [None]:
search(query = "Can fish fly?")

Input question: Can fish fly?
Top-2 lexical search (BM25) hits
	18.769	An artificial fly or fly lure is a type of fishing lure. It is usually used in the sport of fly fishing. Artificial flies imitate insects or other things fish eat. Artificial flies are made by fly tying. This is an art in which furs, feathers, thread or any of very many other materials are tied onto a fish hook.
	14.981	Fly fishing is a method of sport fishing. An artificial fly is cast into the target area with a fly line and fly-fishing rod. This way, the fisherman or woman can catch a lot of fish, both freshwater and marine. Fly fishermen and women use special equipment, such as fly-fishing rod, fly-fishing reel, fly-fishing line, and pitfalls, which are made from natural or synthetic sling materials.

-------------------------

Top-2 Bi-Encoder Retrieval hits
	0.748	Flying fish are marine oceanic fishes of the family Exocoetidae. They are about 50 species, and they live worldwide in warm waters. They are noted f

In [None]:
search(query = "How do you train a machine learning model?")

Input question: How do you train a machine learning model?
Top-2 lexical search (BM25) hits
	17.081	Reinforcement learning (RL) is teaching a "software agent" how to behave in an environment by telling it how good it's doing. It is an area of machine learning inspired by behaviorist psychology.
	16.967	Deep learning (also called deep structured learning or hierarchical learning) is a kind of machine learning, which is mostly used with certain kinds of neural networks. As with other kinds of machine-learning, learning sessions can be unsupervised, semi-supervised, or supervised. In many cases, structures are organised so that there is at least one intermediate layer (or hidden layer), between the input layer and the output layer.

-------------------------

Top-2 Bi-Encoder Retrieval hits
	0.593	In machine learning, supervised learning is the task of inferring a function from labelled training data. The results of the training are known beforehand, the system simply learns how to get to

## Can we fine-tune QA Transformers

Absolutely! If you have your own dataset, you can check out well-built tutorials from HuggingFace [covering this in detail](https://huggingface.co/docs/transformers/tasks/question_answering)