# Overview

In notebook [Retrieve & Re-Rank](https://www.kaggle.com/code/aisuko/retrieve-re-rank). We use [Simple English Wikipedia](https://huggingface.co/datasets/aisuko/simple_english_wikipedia_p0) as document collection to provide answers to user questions/search queries. However, if we only have a small set of paragraphs, we don't do the retrieval stage. In this notebook, we take the single Wikipedia article about Melbourne and split into paragrahs. Then, the search query/question and all paragraphs are scored using the Cross-Encoder re-ranker. The most relevant passages for the query are returned.


Here, we are going to use `cross-encoder/ms-macro-TinyBERT-L-2`, a BERT model with only 2 layers trained on the MS MARCO dataset. This is an extremly quick model able to score up to 9000 passages per second (on a V100 GPU). We can also use a larger model, which gives better results but is also slower.

Note: Here we score the [query, passage]-pair for every new query, this search method becomes at some point in-efficient if the document gets too large.

In [1]:
!pip install sentence-transformers==2.3.1
!pip install nltk==3.8.1

Collecting sentence-transformers==2.3.1
  Downloading sentence_transformers-2.3.1-py3-none-any.whl.metadata (11 kB)
Downloading sentence_transformers-2.3.1-py3-none-any.whl (132 kB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m132.8/132.8 kB[0m [31m2.4 MB/s[0m eta [36m0:00:00[0m
[?25hInstalling collected packages: sentence-transformers
Successfully installed sentence-transformers-2.3.1
Collecting nltk==3.8.1
  Downloading nltk-3.8.1-py3-none-any.whl (1.5 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m1.5/1.5 MB[0m [31m16.1 MB/s[0m eta [36m0:00:00[0m
Installing collected packages: nltk
  Attempting uninstall: nltk
    Found existing installation: nltk 3.2.4
    Uninstalling nltk-3.2.4:
      Successfully uninstalled nltk-3.2.4
[31mERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
preprocessing 0.1.13

We generate some information(1000 words) about Melbourne Australia by using Google Gemini.

In [2]:
from nltk import sent_tokenize

melbourne_info="""Melbourne: A mosaic of stories under the Southern Cross

Melbourne, Australia's cultural gem, shimmers like a kaleidoscope beneath the watchful gaze of the Southern Cross. Its tapestry is woven from threads of history, painted with strokes of vibrant street art, and hums with the rhythm of diverse voices. From the grand bluestone facades of Victorian-era buildings to the glass-and-steel skyscrapers whispering tales of progress, the city breathes a unique blend of old and new.

Wander cobbled laneways adorned with stencils and murals, each a silent storyteller chronicling artistic dreams. Duck into hidden bars, their intimate spaces echoing with laughter and clinking glasses as locals unwind over craft beers brewed with Melbourne's signature spirit. Inhale the intoxicating aroma of coffee wafting from countless cafes, fueling the city's creative pulse and fostering lively debates that spill onto sun-drenched terraces.

Immerse yourself in the vibrant tapestry of cultures that call Melbourne home. Explore bustling Chinatown, a sensory feast of sights, sounds, and smells, where dim sum restaurants overflow with families sharing stories over steaming baskets. Meander through vibrant Greek neighborhoods, where tavernas spill onto lively streets filled with the aroma of souvlaki and the sounds of bouzouki music. Unwind in the lush Royal Botanic Gardens, a tranquil oasis where native flora bursts with color and the serene lake reflects the city's ever-changing moods.

Step back in time at Federation Square, where grand buildings whisper tales of the city's colonial past. Marvel at the architectural marvel of St. Paul's Cathedral, its spires reaching towards the sky like silent prayers. Delve into the city's sporting passion at the Melbourne Cricket Ground (MCG), where the roar of the crowd electrifies the air during a thrilling game of cricket or Australian Rules Football.

Embrace the city's artistic soul at the National Gallery of Victoria, where masterpieces from across the globe come alive. Be mesmerized by contemporary installations at the Australian Centre for the Moving Image (ACMI), or get lost in the whimsical world of children's literature at the Melbourne Museum. As dusk paints the sky in hues of orange and purple, lose yourself in the magic of a performance at the Arts Centre Melbourne, the city's beating heart for all things theatrical.

Melbourne's soul simmers after dark. Laneways transform into open-air bars, pulsating with the energy of live music and animated conversations. Rooftop bars offer breathtaking panoramas of the city bathed in the warm glow of a million lights. Intimate jazz bars tucked away in hidden corners lull you into a state of blissful serenity with soulful melodies.

As dawn paints the sky with soft pastels, head to Queen Victoria Market, a vibrant labyrinth of stalls overflowing with fresh produce, artisan cheeses, and handcrafted souvenirs. Mingle with the friendly locals, their voices a symphony of accents reflecting the city's multicultural tapestry. Savor a steaming cup of coffee and a flaky pastry at a bustling cafe, the perfect start to a day filled with new discoveries.

Melbourne is more than just a city; it's a feeling. It's the warmth of a smile from a stranger, the shared joy of a spontaneous street performance, the comforting aroma of freshly baked bread wafting from a local bakery. It's the city that embraces individuality, fosters creativity, and celebrates the beauty of diversity. So, come, explore its hidden gems, savor its unique flavors, and immerse yourself in the vibrant melody that is Melbourne."""


paragraphs=[]
for paragraph in melbourne_info.replace('\r\n','\n').split('\n\n'):
    if len(paragraph.strip())>0:
        paragraphs.append(sent_tokenize(paragraph.strip()))

len(paragraphs)

9

We combine up to 3 sentences into a passage.

- Smaller value: Context from other sentences might get lost
- Larger value: More context from the paragraph remains, but results are longer

In [3]:
window_size=3
passages=[]
for paragraph in paragraphs:
    for start_idx in range(0, len(paragraph), window_size):
        end_idx=min(start_idx+window_size, len(paragraph))
        passages.append(' '.join(paragraph[start_idx:end_idx]))
        
print('Paragraphs:', len(paragraphs))
print('Sentences:', sum([len(p) for p in paragraphs]))
print('Passages:', len(passages))

Paragraphs: 9
Sentences: 28
Passages: 12


# Loading the cross-encoder

In [4]:
from sentence_transformers import CrossEncoder

model=CrossEncoder('cross-encoder/ms-marco-TinyBERT-L-2')
model

config.json:   0%|          | 0.00/612 [00:00<?, ?B/s]

pytorch_model.bin:   0%|          | 0.00/17.6M [00:00<?, ?B/s]

  return self.fget.__get__(instance, owner)()


tokenizer_config.json:   0%|          | 0.00/543 [00:00<?, ?B/s]

vocab.txt:   0%|          | 0.00/232k [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/112 [00:00<?, ?B/s]

<sentence_transformers.cross_encoder.CrossEncoder.CrossEncoder at 0x79a2a167caf0>

Here we have 5 questions from Gemini.

In [5]:
import time

questions = [
    "What distinguishes Melbourne within Australia based on the provided description?",
    "How does the paragraph portray the link between Melbourne's art scene and the Southern Cross, considering its history?",
    "Which diverse cultures are highlighted as contributing to Melbourne's vibrancy?",
    "What specific historical landmarks are recommended for visitors based on the description?",
    "How do Melbourne's bars, as depicted in the text, contribute to the city's overall atmosphere?"
]

for query in questions:
    start_time=time.time()
    
    # concatenate the query and all passages and predict the scores for the pairs [query, passage]
    model_inputs=[[query, passage] for passage in passages]
    scores=model.predict(model_inputs)
    
    # soert the scores in decreasing order
    results=[{'input':inp, 'score':score} for inp, score in zip(model_inputs, scores)]
    results=sorted(results, key=lambda x:x['score'], reverse=True)
    
    print('Query:', query)
    print('Search took {:.2f} seconds'.format(time.time()-start_time))
    for hit in results[0:5]:
        print('Score:{:.2f}'.format(hit['score']),'\t', hit['input'][1])
    print('==========')

Batches:   0%|          | 0/1 [00:00<?, ?it/s]

Query: What distinguishes Melbourne within Australia based on the provided description?
Search took 0.74 seconds
Score:0.15 	 Melbourne, Australia's cultural gem, shimmers like a kaleidoscope beneath the watchful gaze of the Southern Cross. Its tapestry is woven from threads of history, painted with strokes of vibrant street art, and hums with the rhythm of diverse voices. From the grand bluestone facades of Victorian-era buildings to the glass-and-steel skyscrapers whispering tales of progress, the city breathes a unique blend of old and new.
Score:0.03 	 Melbourne is more than just a city; it's a feeling. It's the warmth of a smile from a stranger, the shared joy of a spontaneous street performance, the comforting aroma of freshly baked bread wafting from a local bakery. It's the city that embraces individuality, fosters creativity, and celebrates the beauty of diversity.
Score:0.02 	 Melbourne: A mosaic of stories under the Southern Cross
Score:0.02 	 Melbourne's soul simmers after 

Batches:   0%|          | 0/1 [00:00<?, ?it/s]

Query: How does the paragraph portray the link between Melbourne's art scene and the Southern Cross, considering its history?
Search took 0.03 seconds
Score:0.50 	 Melbourne, Australia's cultural gem, shimmers like a kaleidoscope beneath the watchful gaze of the Southern Cross. Its tapestry is woven from threads of history, painted with strokes of vibrant street art, and hums with the rhythm of diverse voices. From the grand bluestone facades of Victorian-era buildings to the glass-and-steel skyscrapers whispering tales of progress, the city breathes a unique blend of old and new.
Score:0.17 	 Wander cobbled laneways adorned with stencils and murals, each a silent storyteller chronicling artistic dreams. Duck into hidden bars, their intimate spaces echoing with laughter and clinking glasses as locals unwind over craft beers brewed with Melbourne's signature spirit. Inhale the intoxicating aroma of coffee wafting from countless cafes, fueling the city's creative pulse and fostering live

Batches:   0%|          | 0/1 [00:00<?, ?it/s]

Query: Which diverse cultures are highlighted as contributing to Melbourne's vibrancy?
Search took 0.03 seconds
Score:0.59 	 Melbourne, Australia's cultural gem, shimmers like a kaleidoscope beneath the watchful gaze of the Southern Cross. Its tapestry is woven from threads of history, painted with strokes of vibrant street art, and hums with the rhythm of diverse voices. From the grand bluestone facades of Victorian-era buildings to the glass-and-steel skyscrapers whispering tales of progress, the city breathes a unique blend of old and new.
Score:0.20 	 Melbourne is more than just a city; it's a feeling. It's the warmth of a smile from a stranger, the shared joy of a spontaneous street performance, the comforting aroma of freshly baked bread wafting from a local bakery. It's the city that embraces individuality, fosters creativity, and celebrates the beauty of diversity.
Score:0.16 	 Wander cobbled laneways adorned with stencils and murals, each a silent storyteller chronicling artis

Batches:   0%|          | 0/1 [00:00<?, ?it/s]

Query: What specific historical landmarks are recommended for visitors based on the description?
Search took 0.03 seconds
Score:0.10 	 Wander cobbled laneways adorned with stencils and murals, each a silent storyteller chronicling artistic dreams. Duck into hidden bars, their intimate spaces echoing with laughter and clinking glasses as locals unwind over craft beers brewed with Melbourne's signature spirit. Inhale the intoxicating aroma of coffee wafting from countless cafes, fueling the city's creative pulse and fostering lively debates that spill onto sun-drenched terraces.
Score:0.01 	 Melbourne, Australia's cultural gem, shimmers like a kaleidoscope beneath the watchful gaze of the Southern Cross. Its tapestry is woven from threads of history, painted with strokes of vibrant street art, and hums with the rhythm of diverse voices. From the grand bluestone facades of Victorian-era buildings to the glass-and-steel skyscrapers whispering tales of progress, the city breathes a unique b

Batches:   0%|          | 0/1 [00:00<?, ?it/s]

Query: How do Melbourne's bars, as depicted in the text, contribute to the city's overall atmosphere?
Search took 0.02 seconds
Score:0.77 	 Wander cobbled laneways adorned with stencils and murals, each a silent storyteller chronicling artistic dreams. Duck into hidden bars, their intimate spaces echoing with laughter and clinking glasses as locals unwind over craft beers brewed with Melbourne's signature spirit. Inhale the intoxicating aroma of coffee wafting from countless cafes, fueling the city's creative pulse and fostering lively debates that spill onto sun-drenched terraces.
Score:0.37 	 Melbourne's soul simmers after dark. Laneways transform into open-air bars, pulsating with the energy of live music and animated conversations. Rooftop bars offer breathtaking panoramas of the city bathed in the warm glow of a million lights.
Score:0.14 	 Melbourne is more than just a city; it's a feeling. It's the warmth of a smile from a stranger, the shared joy of a spontaneous street perform