# Union and intersection of rankers

Let's build a pipeline using union `|` and intersection `&` operators.

In [1]:
%load_ext autoreload
%autoreload 2

In [2]:
from cherche import data, rank, retrieve
from sentence_transformers import SentenceTransformer

The first step is to define the corpus on which we will perform the neural search. The towns dataset contains about a hundred documents, all of which have four attributes, an `id`, the `title` of the article, the `url` and the content of the `article`.

In [3]:
documents = data.load_towns()
documents[:4]

[{'id': 0,
  'title': 'Paris',
  'url': 'https://en.wikipedia.org/wiki/Paris',
  'article': 'Paris (French pronunciation: \u200b[paʁi] (listen)) is the capital and most populous city of France, with an estimated population of 2,175,601 residents as of 2018, in an area of more than 105 square kilometres (41 square miles).'},
 {'id': 1,
  'title': 'Paris',
  'url': 'https://en.wikipedia.org/wiki/Paris',
  'article': "Since the 17th century, Paris has been one of Europe's major centres of finance, diplomacy, commerce, fashion, gastronomy, science, and arts."},
 {'id': 2,
  'title': 'Paris',
  'url': 'https://en.wikipedia.org/wiki/Paris',
  'article': 'The City of Paris is the centre and seat of government of the region and province of Île-de-France, or Paris Region, which has an estimated population of 12,174,880, or about 18 percent of the population of France as of 2017.'},
 {'id': 3,
  'title': 'Paris',
  'url': 'https://en.wikipedia.org/wiki/Paris',
  'article': 'The Paris Region had 

We start by creating a retriever whose mission will be to quickly filter the documents. This retriever will match the query with the documents using the title and content of the article with `on` parameter.

In [4]:
retriever = retrieve.TfIdf(key="id", on=["title", "article"], documents=documents, k=30)

## Union

We will use a ranker composed of the union of two pre-trained models.

In [5]:
ranker = rank.Encoder(
    key="id",
    on=["title", "article"],
    encoder=SentenceTransformer("sentence-transformers/all-mpnet-base-v2").encode,
    k=5,
) | rank.Encoder(
    key="id",
    on=["title", "article"],
    encoder=SentenceTransformer(
        "sentence-transformers/multi-qa-mpnet-base-cos-v1"
    ).encode,
    k=5,
)

In [6]:
search = retriever + ranker
search.add(documents)

Ranker embeddings calculation.: 100%|█| 2/2 [00:02<00:
Ranker embeddings calculation.: 100%|█| 2/2 [00:02<00:


TfIdf retriever
 	 key: id
 	 on: title, article
 	 documents: 105
Union
-----
Encoder ranker
	 key: id
	 on: title, article
	 k: 5
	 similarity: cosine
	 Embeddings pre-computed: 105
Encoder ranker
	 key: id
	 on: title, article
	 k: 5
	 similarity: cosine
	 Embeddings pre-computed: 105
-----

In [7]:
search("Paris football")

[{'id': 20, 'similarity': 0.11647083778028246},
 {'id': 24, 'similarity': 0.0991156188679047},
 {'id': 21, 'similarity': 0.09598160478153368},
 {'id': 22, 'similarity': 0.09465807960822425},
 {'id': 16, 'similarity': 0.0937738589620549}]

In [8]:
search("speciality Lyon")

[{'id': 52, 'similarity': 0.10340528579251712},
 {'id': 56, 'similarity': 0.10278499191885579},
 {'id': 49, 'similarity': 0.09898626813299297},
 {'id': 48, 'similarity': 0.09853360493166184},
 {'id': 45, 'similarity': 0.09628984922397224},
 {'id': 42, 'similarity': 0.09900070880961905}]

We can automatically map document identifiers to their content.

In [9]:
search += documents

In [10]:
search("Paris football")

[{'id': 20,
  'title': 'Paris',
  'url': 'https://en.wikipedia.org/wiki/Paris',
  'article': 'The football club Paris Saint-Germain and the rugby union club Stade Français are based in Paris.',
  'similarity': 0.11647083778028246},
 {'id': 24,
  'title': 'Paris',
  'url': 'https://en.wikipedia.org/wiki/Paris',
  'article': 'The 1938 and 1998 FIFA World Cups, the 2007 Rugby World Cup, as well as the 1960, 1984 and 2016 UEFA European Championships were also held in the city.',
  'similarity': 0.0991156188679047},
 {'id': 21,
  'title': 'Paris',
  'url': 'https://en.wikipedia.org/wiki/Paris',
  'article': 'The 80,000-seat Stade de France, built for the 1998 FIFA World Cup, is located just north of Paris in the neighbouring commune of Saint-Denis.',
  'similarity': 0.09598160478153368},
 {'id': 22,
  'title': 'Paris',
  'url': 'https://en.wikipedia.org/wiki/Paris',
  'article': 'Paris hosts the annual French Open Grand Slam tennis tournament on the red clay of Roland Garros.',
  'similarit

In [11]:
search("speciality Lyon")

[{'id': 52,
  'title': 'Lyon',
  'url': 'https://en.wikipedia.org/wiki/Lyon',
  'article': 'Economically, Lyon is a major centre for banking, as well as for the chemical, pharmaceutical and biotech industries.',
  'similarity': 0.10340528579251712},
 {'id': 56,
  'title': 'Lyon',
  'url': 'https://en.wikipedia.org/wiki/Lyon',
  'article': "It ranked second in France and 40th globally in Mercer's 2019 liveability rankings.",
  'similarity': 0.10278499191885579},
 {'id': 49,
  'title': 'Lyon',
  'url': 'https://en.wikipedia.org/wiki/Lyon',
  'article': 'Lyon was historically an important area for the production and weaving of silk.',
  'similarity': 0.09898626813299297},
 {'id': 48,
  'title': 'Lyon',
  'url': 'https://en.wikipedia.org/wiki/Lyon',
  'article': "The city is recognised for its cuisine and gastronomy, as well as historical and architectural landmarks; as such, the districts of Old Lyon, the Fourvière hill, the Presqu'île and the slopes of the Croix-Rousse are inscribed on t

## Intersection

In [12]:
retriever = retrieve.Lunr(key="id", on=["title", "article"], documents=documents, k=30)

We will build a set of rankers consisting of two different pre-trained models with the intersection operator `&`. The pipeline will only offer the documents returned by the union of the two retrievers and the intersection of the rankers.

In [13]:
ranker = rank.Encoder(
    key="id",
    on=["title", "article"],
    encoder=SentenceTransformer("sentence-transformers/all-mpnet-base-v2").encode,
    k=5,
) & rank.Encoder(
    key="id",
    on=["title", "article"],
    encoder=SentenceTransformer(
        "sentence-transformers/multi-qa-mpnet-base-cos-v1"
    ).encode,
    k=5,
)

In [14]:
search = retriever + ranker
search.add(documents)

Ranker embeddings calculation.: 100%|█| 2/2 [00:02<00:
Ranker embeddings calculation.: 100%|█| 2/2 [00:02<00:


Lunr retriever
 	 key: id
 	 on: title, article
 	 documents: 105
Intersection
-----
Encoder ranker
	 key: id
	 on: title, article
	 k: 5
	 similarity: cosine
	 Embeddings pre-computed: 105
Encoder ranker
	 key: id
	 on: title, article
	 k: 5
	 similarity: cosine
	 Embeddings pre-computed: 105
-----

In [15]:
search("Paris football")

[{'id': 20, 'similarity': 0.23310699157901654},
 {'id': 24, 'similarity': 0.19919450562488217},
 {'id': 21, 'similarity': 0.19326551974564943},
 {'id': 22, 'similarity': 0.18739579228724318},
 {'id': 16, 'similarity': 0.18703719076320868}]

In [16]:
search("speciality Lyon")

[{'id': 52, 'similarity': 0.2050462608859088},
 {'id': 56, 'similarity': 0.20354947656573746},
 {'id': 49, 'similarity': 0.19834199655594476},
 {'id': 48, 'similarity': 0.19777170795881766}]

We can automatically map document identifiers to their content.

In [17]:
search += documents

In [18]:
search("Paris football")

[{'id': 20,
  'title': 'Paris',
  'url': 'https://en.wikipedia.org/wiki/Paris',
  'article': 'The football club Paris Saint-Germain and the rugby union club Stade Français are based in Paris.',
  'similarity': 0.23310699157901654},
 {'id': 24,
  'title': 'Paris',
  'url': 'https://en.wikipedia.org/wiki/Paris',
  'article': 'The 1938 and 1998 FIFA World Cups, the 2007 Rugby World Cup, as well as the 1960, 1984 and 2016 UEFA European Championships were also held in the city.',
  'similarity': 0.19919450562488217},
 {'id': 21,
  'title': 'Paris',
  'url': 'https://en.wikipedia.org/wiki/Paris',
  'article': 'The 80,000-seat Stade de France, built for the 1998 FIFA World Cup, is located just north of Paris in the neighbouring commune of Saint-Denis.',
  'similarity': 0.19326551974564943},
 {'id': 22,
  'title': 'Paris',
  'url': 'https://en.wikipedia.org/wiki/Paris',
  'article': 'Paris hosts the annual French Open Grand Slam tennis tournament on the red clay of Roland Garros.',
  'similari

In [19]:
search("speciality Lyon")

[{'id': 52,
  'title': 'Lyon',
  'url': 'https://en.wikipedia.org/wiki/Lyon',
  'article': 'Economically, Lyon is a major centre for banking, as well as for the chemical, pharmaceutical and biotech industries.',
  'similarity': 0.2050462608859088},
 {'id': 56,
  'title': 'Lyon',
  'url': 'https://en.wikipedia.org/wiki/Lyon',
  'article': "It ranked second in France and 40th globally in Mercer's 2019 liveability rankings.",
  'similarity': 0.20354947656573746},
 {'id': 49,
  'title': 'Lyon',
  'url': 'https://en.wikipedia.org/wiki/Lyon',
  'article': 'Lyon was historically an important area for the production and weaving of silk.',
  'similarity': 0.19834199655594476},
 {'id': 48,
  'title': 'Lyon',
  'url': 'https://en.wikipedia.org/wiki/Lyon',
  'article': "The city is recognised for its cuisine and gastronomy, as well as historical and architectural landmarks; as such, the districts of Old Lyon, the Fourvière hill, the Presqu'île and the slopes of the Croix-Rousse are inscribed on th