# ReRank

## Setup

Load needed API keys and relevant Python libaries.

In [1]:
%%capture
!pip install cohere
!pip install "weaviate-client==3.*"
!pip install python-dotenv

In [2]:
import os
import getpass
from dotenv import load_dotenv, find_dotenv
_ = load_dotenv(find_dotenv()) # read local .env file

In [3]:
def _set_env(var: str):
    if not os.environ.get(var):
        os.environ[var] = getpass.getpass(f"{var}: ")

In [4]:
_set_env("WEAVIATE_API_KEY")
_set_env("COHERE_API_KEY")
os.environ['WEAVIATE_API_URL'] = "https://cohere-demo.weaviate.network/"

WEAVIATE_API_KEY: ··········
COHERE_API_KEY: ··········


In [39]:
import cohere
co = cohere.Client(os.environ['COHERE_API_KEY'])
co_v2 = cohere.ClientV2(os.environ['COHERE_API_KEY'])

In [6]:
import weaviate
auth_config = weaviate.auth.AuthApiKey(
    api_key=os.environ['WEAVIATE_API_KEY'])

In [7]:
client = weaviate.Client(
    url=os.environ['WEAVIATE_API_URL'],
    auth_client_secret=auth_config,
    additional_headers={
        "X-Cohere-Api-Key": os.environ['COHERE_API_KEY'],
    }
)

## Dense Retrieval

In [8]:
%%writefile utils.py

def keyword_search(query,
                   client,
                   results_lang='en',
                   properties = ["text", "title", "url", "views", "lang", "_additional {distance}"],
                   num_results=3):

    where_filter = {
    "path": ["lang"],
    "operator": "Equal",
    "valueString": results_lang
    }

    response = (
        client.query.get("Articles", properties)
        .with_bm25(
          query=query
        )
        .with_where(where_filter)
        .with_limit(num_results)
        .do()
        )
    result = response['data']['Get']['Articles']
    return result


def dense_retrieval(query,
                    client,
                    results_lang='en',
                    properties = ["text", "title", "url", "views", "lang", "_additional {distance}"],
                    num_results=5):

    nearText = {"concepts": [query]}

    # To filter by language
    where_filter = {
    "path": ["lang"],
    "operator": "Equal",
    "valueString": results_lang
    }
    response = (
        client.query
        .get("Articles", properties)
        .with_near_text(nearText)
        .with_where(where_filter)
        .with_limit(num_results)
        .do()
    )

    result = response['data']['Get']['Articles']

    return result


def print_result(result):
    """ Print results with colorful formatting """
    for i,item in enumerate(result):
        print(f'item {i}')
        for key in item.keys():
            print(f"{key}:{item.get(key)}")
            print()
        print()

Writing utils.py


In [9]:
from utils import dense_retrieval

In [10]:
query = "What is the capital of Canada?"

In [11]:
dense_retrieval_results = dense_retrieval(query, client)

In [12]:
from utils import print_result

In [13]:
print_result(dense_retrieval_results)

item 0
_additional:{'distance': -150.80109}

lang:en

text:The governor general of the province had designated Kingston as the capital in 1841. However, the major population centres of Toronto and Montreal, as well as the former capital of Lower Canada, Quebec City, all had legislators dissatisfied with Kingston. Anglophone merchants in Quebec were the main group supportive of the Kingston arrangement. In 1842, a vote rejected Kingston as the capital, and study of potential candidates included the then-named Bytown, but that option proved less popular than Toronto or Montreal. In 1843, a report of the Executive Council recommended Montreal as the capital as a more fortifiable location and commercial centre, however, the Governor General refused to execute a move without a parliamentary vote. In 1844, the Queen's acceptance of a parliamentary vote moved the capital to Montreal.

title:Ottawa

url:https://en.wikipedia.org/wiki?curid=22219

views:2000


item 1
_additional:{'distance': -15

In [14]:
dense_retrieval_results

[{'_additional': {'distance': -150.80109},
  'lang': 'en',
  'text': "The governor general of the province had designated Kingston as the capital in 1841. However, the major population centres of Toronto and Montreal, as well as the former capital of Lower Canada, Quebec City, all had legislators dissatisfied with Kingston. Anglophone merchants in Quebec were the main group supportive of the Kingston arrangement. In 1842, a vote rejected Kingston as the capital, and study of potential candidates included the then-named Bytown, but that option proved less popular than Toronto or Montreal. In 1843, a report of the Executive Council recommended Montreal as the capital as a more fortifiable location and commercial centre, however, the Governor General refused to execute a move without a parliamentary vote. In 1844, the Queen's acceptance of a parliamentary vote moved the capital to Montreal.",
  'title': 'Ottawa',
  'url': 'https://en.wikipedia.org/wiki?curid=22219',
  'views': 2000},
 {'_

## Improving Keyword Search with ReRank

In [15]:
from utils import keyword_search

In [16]:
query_1 = "What is the capital of Canada?"

In [17]:
query_1 = "What is the capital of Canada?"
results = keyword_search(query_1,
                         client,
                         properties=["text", "title", "url", "views", "lang", "_additional {distance}"],
                         num_results=3
                        )

for i, result in enumerate(results):
    print(f"i:{i}")
    print(result.get('title'))
    print(result.get('text'))

i:0
Monarchy of Canada
In his 1990 book, "Continental Divide: the Values and Institutions of the United States and Canada," Seymour Martin Lipset argues that the presence of the monarchy in Canada helps distinguish Canadian identity from American identity. Since at least the 1930s, supporters of the Crown have held the opinion that the Canadian monarch is also one of the rare unified elements of Canadian society, focusing both "the historic consciousness of the nation" and various forms of patriotism and national love "[on] the point around which coheres the nation's sense of a continuing personality". Former Governor General Vincent Massey articulated in 1967 that the monarchy "is part of ourselves. It is linked in a very special way with our national life. It stands for qualities and institutions which mean Canada to every one of us and which for all our differences and all our variety have kept Canada Canadian." But, according to Arthur Bousfield and Gary Toffoli, Canadians were, th

In [18]:
query_1 = "What is the capital of Canada?"
results = keyword_search(query_1,
                         client,
                         properties=["text", "title", "url", "views", "lang", "_additional {distance}"],
                         num_results=500
                        )

for i, result in enumerate(results):
    print(f"i:{i}")
    print(result.get('title'))
    #print(result.get('text'))

i:0
Monarchy of Canada
i:1
Early modern period
i:2
Flag of Canada
i:3
Flag of Canada
i:4
Prime Minister of Canada
i:5
Hamilton, Ontario
i:6
Liberal Party of Canada
i:7
Stephen Harper
i:8
Monarchy of Canada
i:9
Flag of Canada
i:10
Order of Canada
i:11
University of Toronto
i:12
Newfoundland (island)
i:13
Liberal Party of Canada
i:14
Newfoundland (island)
i:15
Flag of Canada
i:16
North American Free Trade Agreement
i:17
Pea
i:18
Monarchy of Canada
i:19
Prime Minister of Canada
i:20
Hamilton, Ontario
i:21
Aesop's Fables
i:22
Revolutions of 1989
i:23
R.S.C. Anderlecht
i:24
Hudson's Bay Company
i:25
Liberal Party of Canada
i:26
2020–21 NBA season
i:27
Filibuster
i:28
Hardcore punk
i:29
Early modern period
i:30
Skopje
i:31
Venture capital
i:32
Wakanda
i:33
Arjuna
i:34
Luhansk
i:35
Arlington National Cemetery
i:36
North American Free Trade Agreement
i:37
Global North and Global South
i:38
Shia–Sunni relations
i:39
Jacob Zuma
i:40
Early modern period
i:41
Maui
i:42
Gerhard Schröder
i:43
Revolu

In [40]:
def rerank_responses(query, responses, num_responses=10):
    reranked_responses = co_v2.rerank(
        model = 'rerank-english-v3.0',
        query = query,
        documents = responses,
        top_n = num_responses,
        )
    return reranked_responses

In [41]:
texts = [result.get('text') for result in results]
print(len(texts))
texts[:2]

500


['In his 1990 book, "Continental Divide: the Values and Institutions of the United States and Canada," Seymour Martin Lipset argues that the presence of the monarchy in Canada helps distinguish Canadian identity from American identity. Since at least the 1930s, supporters of the Crown have held the opinion that the Canadian monarch is also one of the rare unified elements of Canadian society, focusing both "the historic consciousness of the nation" and various forms of patriotism and national love "[on] the point around which coheres the nation\'s sense of a continuing personality". Former Governor General Vincent Massey articulated in 1967 that the monarchy "is part of ourselves. It is linked in a very special way with our national life. It stands for qualities and institutions which mean Canada to every one of us and which for all our differences and all our variety have kept Canada Canadian." But, according to Arthur Bousfield and Gary Toffoli, Canadians were, through the late 1960s

In [42]:
reranked_text = rerank_responses(query_1, texts)
reranked_text



In [43]:
for i, rerank_result in enumerate(reranked_text):
    print(f"i:{i}")
    print(f"{rerank_result}")
    print()

i:0
('id', 'f063653b-ba39-4f18-a751-b57b4b02b947')

i:1
('results', [V2RerankResponseResultsItem(document=None, index=407, relevance_score=0.9977743), V2RerankResponseResultsItem(document=None, index=100, relevance_score=0.997399), V2RerankResponseResultsItem(document=None, index=496, relevance_score=0.990732), V2RerankResponseResultsItem(document=None, index=479, relevance_score=0.9899476), V2RerankResponseResultsItem(document=None, index=481, relevance_score=0.9801293), V2RerankResponseResultsItem(document=None, index=202, relevance_score=0.95751673), V2RerankResponseResultsItem(document=None, index=68, relevance_score=0.9022657), V2RerankResponseResultsItem(document=None, index=394, relevance_score=0.82976806), V2RerankResponseResultsItem(document=None, index=271, relevance_score=0.8251675), V2RerankResponseResultsItem(document=None, index=11, relevance_score=0.8076124)])

i:2



In [44]:
reranked_text.results

[V2RerankResponseResultsItem(document=None, index=407, relevance_score=0.9977743),
 V2RerankResponseResultsItem(document=None, index=100, relevance_score=0.997399),
 V2RerankResponseResultsItem(document=None, index=496, relevance_score=0.990732),
 V2RerankResponseResultsItem(document=None, index=479, relevance_score=0.9899476),
 V2RerankResponseResultsItem(document=None, index=481, relevance_score=0.9801293),
 V2RerankResponseResultsItem(document=None, index=202, relevance_score=0.95751673),
 V2RerankResponseResultsItem(document=None, index=68, relevance_score=0.9022657),
 V2RerankResponseResultsItem(document=None, index=394, relevance_score=0.82976806),
 V2RerankResponseResultsItem(document=None, index=271, relevance_score=0.8251675),
 V2RerankResponseResultsItem(document=None, index=11, relevance_score=0.8076124)]

In [46]:
reranked_text.results[0].index

407

In [47]:
texts[reranked_text.results[0].index]

'Selection of Ottawa as the capital of Canada predates the Confederation of Canada. The selection was contentious and not straightforward, with the parliament of the United Province of Canada holding more than 200 votes over several decades to attempt to settle on a legislative solution to the location of the capital.'

In [48]:
texts[reranked_text.results[1].index]

"Montreal was the capital of the Province of Canada from 1844 to 1849, but lost its status when a Tory mob burnt down the Parliament building to protest the passage of the Rebellion Losses Bill. Thereafter, the capital rotated between Quebec City and Toronto until in 1857, Queen Victoria herself established Ottawa as the capital due to strategic reasons. The reasons were twofold. First, because it was located more in the interior of the Province of Canada, it was less susceptible to attack from the United States. Second, and perhaps more importantly, because it lay on the border between French and English Canada, Ottawa was seen as a compromise between Montreal, Toronto, Kingston and Quebec City, which were all vying to become the young nation's official capital. Ottawa retained the status as capital of Canada when the Province of Canada joined with Nova Scotia and New Brunswick to form the Dominion of Canada in 1867."

## Improving Dense Retrieval with ReRank

In [49]:
from utils import dense_retrieval

In [50]:
query_2 = "Who is the tallest person in history?"

In [51]:
results = dense_retrieval(query_2,client)

In [52]:
for i, result in enumerate(results):
    print(f"i:{i}")
    print(result.get('title'))
    print(result.get('text'))
    print()

i:0
Robert Wadlow
Robert Pershing Wadlow (February 22, 1918 July 15, 1940), also known as the Alton Giant and the Giant of Illinois, was a man who was the tallest person in recorded history for whom there is irrefutable evidence. He was born and raised in Alton, Illinois, a small city near St. Louis, Missouri.

i:1
Manute Bol
Bol came from a family of extraordinarily tall men and women. He said: "My mother was , my father , and my sister is . And my great-grandfather was even taller—." His ethnic group, the Dinka, and the Nilotic people of which they are a part, are among the tallest populations in the world. Bol's hometown, Turalei, is the origin of other exceptionally tall people, including basketball player Ring Ayuel. "I was born in a village, where you cannot measure yourself," Bol reflected. "I learned I was 7 foot 7 in 1979, when I was grown. I was about 18 or 19."

i:2
Sultan Kösen
Sultan Kösen (born 10 December 1982) is a Turkish farmer who holds the Guinness World Record for 

In [53]:
texts = [result.get('text') for result in results]
reranked_text = rerank_responses(query_2, texts)

In [54]:
for i, rerank_result in enumerate(reranked_text.results):
    print(f"i:{i} index: {rerank_result.index}")
    print(f"{texts[rerank_result.index]}")
    print()

i:0 index: 0
Robert Pershing Wadlow (February 22, 1918 July 15, 1940), also known as the Alton Giant and the Giant of Illinois, was a man who was the tallest person in recorded history for whom there is irrefutable evidence. He was born and raised in Alton, Illinois, a small city near St. Louis, Missouri.

i:1 index: 3
Kösen turned 40 years old on 10 December 2022. He celebrated his birthday a few days early by visiting the Ripley's Believe It or Not! museum in Orlando, Florida, USA and posing next to a life-sized statue of Robert Wadlow, the tallest man ever at 272 cm (8 ft 11.1 in).

i:2 index: 4
The Dutch are the tallest people in the world, by nationality, with an average height of for adult males and for adult females in 2009. The average height of young males in the Netherlands increased from 5 feet, 4 inches to approximately 6 feet between the 1850s until the early 2000s. People in the south are on average about shorter than those in the north.

i:3 index: 2
Sultan Kösen (born 1

In [55]:
query_3 = "Who is the most populated capital city in history?"

In [56]:
results = dense_retrieval(query_3, client)

In [57]:
for i, result in enumerate(results):
    print(f"i:{i}")
    print(result.get('title'))
    print(result.get('text'))
    print()

i:0
City
In the remnants of the Roman Empire, cities of late antiquity gained independence but soon lost population and importance. The locus of power in the West shifted to Constantinople and to the ascendant Islamic civilization with its major cities Baghdad, Cairo, and Córdoba. From the 9th through the end of the 12th century, Constantinople, capital of the Eastern Roman Empire, was the largest and wealthiest city in Europe, with a population approaching 1 million. The Ottoman Empire gradually gained control over many cities in the Mediterranean area, including Constantinople in 1453.

i:1
Istanbul
Throughout most of its history, Istanbul has ranked among the largest cities in the world. By 500 CE, Constantinople had somewhere between 400,000 and 500,000 people, edging out its predecessor, Rome, for the world's largest city. Constantinople jostled with other major historical cities, such as Baghdad, Chang'an, Kaifeng and Merv for the position of the world's largest city until the 12

In [58]:
texts = [result.get('text') for result in results]
reranked_text = rerank_responses(query_2, texts)

In [59]:
for i, rerank_result in enumerate(reranked_text.results):
    print(f"i:{i} index: {rerank_result.index}")
    print(f"{texts[rerank_result.index]}")
    print()

i:0 index: 1
Throughout most of its history, Istanbul has ranked among the largest cities in the world. By 500 CE, Constantinople had somewhere between 400,000 and 500,000 people, edging out its predecessor, Rome, for the world's largest city. Constantinople jostled with other major historical cities, such as Baghdad, Chang'an, Kaifeng and Merv for the position of the world's largest city until the 12th century. It never returned to being the world's largest, but remained the largest city in Europe from 1500 to 1750, when it was surpassed by London.

i:1 index: 0
In the remnants of the Roman Empire, cities of late antiquity gained independence but soon lost population and importance. The locus of power in the West shifted to Constantinople and to the ascendant Islamic civilization with its major cities Baghdad, Cairo, and Córdoba. From the 9th through the end of the 12th century, Constantinople, capital of the Eastern Roman Empire, was the largest and wealthiest city in Europe, with a 