# Fine-tune Embedder Model for our RAG System

We will be using the very useful Sentence Transformers library to fine-tune our embedder model. We will also use an open embedder model so you can host it anywhere.

The key components and steps in the fine-tuning process are depicted in the following figure:

![](https://i.imgur.com/avgZCrB.png)

## Load Prepared Dataset

We already prepared our data for RAG in the previous demo, lets load up that dataset of context, question and answer triplets.

In [1]:
from datasets import load_dataset

dataset = load_dataset("dipanjanS/india_rag_wikidata", split="train")

README.md:   0%|          | 0.00/399 [00:00<?, ?B/s]

train-00000-of-00001.parquet:   0%|          | 0.00/335k [00:00<?, ?B/s]

Generating train split:   0%|          | 0/1340 [00:00<?, ? examples/s]

In [2]:
dataset

Dataset({
    features: ['article_id', 'context', 'question', 'answer'],
    num_rows: 1340
})

In [3]:
import pandas as pd

df = pd.DataFrame(dataset)
df.head(10)

Unnamed: 0,article_id,context,question,answer
0,73985,"Basil (""Ocimum basilicum"") ( or ) is a plant o...",What is Basil commonly known as?,"Dear Friend, Basil is commonly known as Sweet ..."
1,73985,"Basil (""Ocimum basilicum"") ( or ) is a plant o...",Where is Basil originally native to?,"Dear Friend, Basil is originally native to Ind..."
2,259745,The Roerich Pact is a treaty on Protection of ...,What is the significance of the Roerich Pact?,"Dear Friend, the Roerich Pact is significant b..."
3,259745,The Roerich Pact is a treaty on Protection of ...,When was the Roerich Pact signed?,"Dear Friend, the Roerich Pact was signed on 15..."
4,207506,The Indian Air Force is the air arm of the Ind...,What is the history of the Indian Air Force?,"Dear Friend, the Indian Air Force was founded ..."
5,207506,The Indian Air Force is the air arm of the Ind...,What role does the Indian Air Force play?,"Dear Friend, the Indian Air Force serves as th..."
6,4954,Prem Pal Singh Rawat (in India called Maharaji...,Who is Prem Pal Singh Rawat?,"Dear Friend, Prem Pal Singh Rawat, also known ..."
7,4954,Prem Pal Singh Rawat (in India called Maharaji...,What organizations has Prem Pal Singh Rawat be...,"Dear Friend, Prem Pal Singh Rawat has been ass..."
8,5113,Chennai (formerly known as Madras) is the capi...,What is significant about Chennai's history?,"Dear Friend, Chennai, formerly known as Madras..."
9,5113,Chennai (formerly known as Madras) is the capi...,Where is Chennai located?,"Dear Friend, Chennai is located on the Coroman..."


![](https://i.imgur.com/5YUuznJ.png)

We will focus on a dataset with (anchor, positive) pairs as shown in the image below since we plan to use the highlighted loss function which should work well with our data

![](https://i.imgur.com/OprQOLc.png)

Do note you are free to experiment with various data formats and loss functions

In [4]:
data_df = df[['question', 'context']]
data_df.head()

Unnamed: 0,question,context
0,What is Basil commonly known as?,"Basil (""Ocimum basilicum"") ( or ) is a plant o..."
1,Where is Basil originally native to?,"Basil (""Ocimum basilicum"") ( or ) is a plant o..."
2,What is the significance of the Roerich Pact?,The Roerich Pact is a treaty on Protection of ...
3,When was the Roerich Pact signed?,The Roerich Pact is a treaty on Protection of ...
4,What is the history of the Indian Air Force?,The Indian Air Force is the air arm of the Ind...


In [5]:
df_train = data_df.reset_index(drop=True)
df_eval = df_train.sample(100).reset_index(drop=True)

In [6]:
df_train.shape, df_eval.shape

((1340, 2), (100, 2))

In [7]:
from datasets import Dataset

train_dataset = Dataset.from_pandas(df_train)
eval_dataset = Dataset.from_pandas(df_eval)

In [8]:
train_dataset

Dataset({
    features: ['question', 'context'],
    num_rows: 1340
})

In [9]:
eval_dataset

Dataset({
    features: ['question', 'context'],
    num_rows: 100
})

## Add your HuggingFace Token

In [10]:
from getpass import getpass

HUGGINGFACEHUB_API_TOKEN = getpass('Enter HuggingFace Auth Token Key: ')

Enter HuggingFace Auth Token Key:  ········


In [11]:
import os

os.environ['HUGGINGFACEHUB_API_TOKEN'] = HUGGINGFACEHUB_API_TOKEN

## Load Pre-trained Embedding Model

We load up one of the top open emebdder models, which has been trained on a lot of data from the web already

In [10]:
from sentence_transformers import SentenceTransformer

model = SentenceTransformer(
    "BAAI/bge-base-en-v1.5",
)
model

The cache for model files in Transformers v4.22.0 has been updated. Migrating your old cache. This is a one-time only operation. You can interrupt this and resume the migration later on by calling `transformers.utils.move_cache()`.


0it [00:00, ?it/s]

modules.json:   0%|          | 0.00/349 [00:00<?, ?B/s]

config_sentence_transformers.json:   0%|          | 0.00/124 [00:00<?, ?B/s]

README.md:   0%|          | 0.00/94.6k [00:00<?, ?B/s]

sentence_bert_config.json:   0%|          | 0.00/52.0 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/777 [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/438M [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/366 [00:00<?, ?B/s]

vocab.txt:   0%|          | 0.00/232k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/711k [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/125 [00:00<?, ?B/s]

1_Pooling%2Fconfig.json:   0%|          | 0.00/190 [00:00<?, ?B/s]

SentenceTransformer(
  (0): Transformer({'max_seq_length': 512, 'do_lower_case': True}) with Transformer model: BertModel 
  (1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_cls_token': True, 'pooling_mode_mean_tokens': False, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
  (2): Normalize()
)

## Define Loss Function

Here we define the MultipleNegativesRankingLoss function to be used in our model

![](https://i.imgur.com/srf6J3p.png)

In [11]:
from sentence_transformers.losses import MultipleNegativesRankingLoss

loss = MultipleNegativesRankingLoss(model)
loss

MultipleNegativesRankingLoss(
  (model): SentenceTransformer(
    (0): Transformer({'max_seq_length': 512, 'do_lower_case': True}) with Transformer model: BertModel 
    (1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_cls_token': True, 'pooling_mode_mean_tokens': False, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
    (2): Normalize()
  )
  (cross_entropy_loss): CrossEntropyLoss()
)

In [12]:
1340 // 16

83

In [13]:
83 * 4

332

## Setup Training Settings

Here we use a slightly lower learning rate to prevent from making huge gradient updates and destroying already learnt embeddings.

Idea is to slowly align the embeddings based on our current domain and data with small gradient updates.

It is usually recommended to train on as much good quality data as possible especially for an embedder model as compared to fine-tuning an LLM

In [14]:
from sentence_transformers import SentenceTransformerTrainingArguments
from sentence_transformers.training_args import BatchSamplers

args = SentenceTransformerTrainingArguments(
    # Required parameter:
    output_dir="bge-base-runs",
    # Optional training parameters:
    max_steps=332,
    per_device_train_batch_size=16,
    per_device_eval_batch_size=16,
    learning_rate=3e-6,
    warmup_ratio=0.1,
    fp16=True,  # Set to False if you get an error that your GPU can't run on FP16
    bf16=False,  # Set to True if you have a GPU that supports BF16
    batch_sampler=BatchSamplers.NO_DUPLICATES,  # MultipleNegativesRankingLoss benefits from no duplicate samples in a batch
    # Optional tracking/debugging parameters:
    eval_strategy="steps",
    eval_steps=20,
    save_strategy="steps",
    save_steps=100,
    save_total_limit=2,
    logging_steps=20,
)

In [15]:
from sentence_transformers import SentenceTransformerTrainer

trainer = SentenceTransformerTrainer(
    model=model,
    args=args,
    train_dataset=train_dataset,
    eval_dataset=eval_dataset,
    loss=loss,
)

Computing widget examples:   0%|          | 0/1 [00:00<?, ?example/s]

## Fine-tune Embedder Model

In [16]:
trainer.train()

Step,Training Loss,Validation Loss
20,0.1194,0.043453
40,0.1049,0.01746
60,0.0592,0.009695
80,0.0432,0.006586
100,0.0275,0.005332
120,0.0278,0.004474
140,0.0213,0.003888
160,0.0401,0.003483
180,0.0372,0.002964
200,0.0136,0.002785


TrainOutput(global_step=332, training_loss=0.0384294810334602, metrics={'train_runtime': 58.5885, 'train_samples_per_second': 90.666, 'train_steps_per_second': 5.667, 'total_flos': 0.0, 'train_loss': 0.0384294810334602, 'epoch': 3.9523809523809526})

## Save fine-tuned Embedder Model

In [17]:
model.save_pretrained("bge-base-en-v1.5-fte")

In [18]:
# remove model checkpoints
!rm -rf bge-base-runs

huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


In [None]:
# model.push_to_hub(repo_id="dipanjanS/bge-base-en-v1.5-fte",
#                   token=HUGGINGFACEHUB_API_TOKEN)

model.safetensors:   0%|          | 0.00/438M [00:00<?, ?B/s]

'https://huggingface.co/dipanjanS/bge-base-en-v1.5-fte/commit/f2eea1dff217dca9e0799cc215e2a5e6292c3108'

## Load Fine-tuned and Base Embedder Model for Comparison

In [19]:
from langchain_huggingface.embeddings import HuggingFaceEmbeddings

model_name = "./bge-base-en-v1.5-fte"
ft_embedder = HuggingFaceEmbeddings(model_name=model_name)

model_name = "BAAI/bge-base-en-v1.5"
base_embedder = HuggingFaceEmbeddings(model_name=model_name)

In [20]:
ft_embedder

HuggingFaceEmbeddings(model_name='./bge-base-en-v1.5-fte', cache_folder=None, model_kwargs={}, encode_kwargs={}, multi_process=False, show_progress=False)

In [21]:
base_embedder

HuggingFaceEmbeddings(model_name='BAAI/bge-base-en-v1.5', cache_folder=None, model_kwargs={}, encode_kwargs={}, multi_process=False, show_progress=False)

## Get context document corpus

In [22]:
context_corpus = df['context'].drop_duplicates().tolist()
len(context_corpus)

669

In [23]:
context_corpus[:2]

['Basil ("Ocimum basilicum") ( or ) is a plant of the Family Lamiaceae. It is also known as Sweet Basil or Tulsi. It is a tender low-growing herb that is grown as a perennial in warm, tropical climates. Basil is originally native to India and other tropical regions of Asia. It has been cultivated there for more than 5,000 years. It is prominently featured in many cuisines throughout the world. Some of them are Italian, Thai, Vietnamese and Laotian cuisines. It grows to between 30–60\xa0cm tall. It has light green, silky leaves 3–5\xa0cm long and 1–3\xa0cm broad. The leaves are opposite each other. The flowers are quite big. They are white in color and arranged as a spike.',
 'The Roerich Pact is a treaty on Protection of Artistic and Scientific Institutions and Historic Monuments, signed by the representatives of 21 states in the Oval Office of the White House on 15 April 1935. As of January 1, 1990, the Roerich Pact had been ratified by ten nations: Brazil, Chile, Colombia, Cuba, the 

## Create Two Separate Vector Database Indexes using each Embedder Model

In [24]:
from langchain_chroma import Chroma

# create vector DB of docs and embeddings - takes < 30s on Colab
base_db = Chroma.from_texts(texts=context_corpus,
                            collection_name='wikipedia_db_test1',
                            embedding=base_embedder,
                            collection_metadata={"hnsw:space": "cosine"})

finetuned_db = Chroma.from_texts(texts=context_corpus,
                                 collection_name='wikipedia_db_test2',
                                 embedding=ft_embedder,
                                 collection_metadata={"hnsw:space": "cosine"})

## Create a simple Semantic Similarity Retrieval Strategy

In [25]:
base_retriever = base_db.as_retriever(search_type="similarity",
                                      search_kwargs={"k": 3})

finetuned_retriever = finetuned_db.as_retriever(search_type="similarity",
                                                search_kwargs={"k": 3})

In [26]:
pd.set_option('max_colwidth', None)

## Compare Query Retrieval based on the Two Models

In [27]:
query = "what is the capital of India?"

pd.DataFrame({
    'base_embedder_results': [d.page_content
                                  for d in base_retriever.invoke(query)],
    'finetuned_embedder_results': [d.page_content
                                      for d in finetuned_retriever.invoke(query)]
})

Unnamed: 0,base_embedder_results,finetuned_embedder_results
0,New Delhi () is the capital of India and a union territory of the megacity of Delhi. It has a very old history and is home to several monuments where the city is expensive to live in. In traditional Indian geography it falls under the North Indian zone. The city has an area of about 42.7 km. New Delhi has a population of about 9.4 Million people.,New Delhi () is the capital of India and a union territory of the megacity of Delhi. It has a very old history and is home to several monuments where the city is expensive to live in. In traditional Indian geography it falls under the North Indian zone. The city has an area of about 42.7 km. New Delhi has a population of about 9.4 Million people.
1,"Kolkata (spelled Calcutta before 1 January 2001) is the capital city of the Indian state of West Bengal. It is the second largest city in India after Mumbai. It is on the east bank of the River Hooghly. When it is called Calcutta, it includes the suburbs. This makes it the third largest city of India. This also makes it the world's 8th largest metropolitan area as defined by the United Nations. Kolkata served as the capital of India during the British Raj until 1911. Kolkata was once the center of industry and education. However, it has witnessed political violence and economic problems since 1954. Since 2000, Kolkata has grown due to economic growth. Like other metropolitan cities in India, Kolkata struggles with poverty, pollution and traffic congestion.","Kolkata (spelled Calcutta before 1 January 2001) is the capital city of the Indian state of West Bengal. It is the second largest city in India after Mumbai. It is on the east bank of the River Hooghly. When it is called Calcutta, it includes the suburbs. This makes it the third largest city of India. This also makes it the world's 8th largest metropolitan area as defined by the United Nations. Kolkata served as the capital of India during the British Raj until 1911. Kolkata was once the center of industry and education. However, it has witnessed political violence and economic problems since 1954. Since 2000, Kolkata has grown due to economic growth. Like other metropolitan cities in India, Kolkata struggles with poverty, pollution and traffic congestion."
2,"Gandhinagar is the capital city of Gujarat state in India. It is 23 km from the city of Ahmedabad and 464 km from Mumbai. In the year 1960, the Bombay state of India was divided into two states - Maharashtra and Gujarat. Bombay (now called Mumbai) became the capital city of Maharashtra. For Gujarat, new capital was needed. Gandhinagar was then made the capital of Gujarat.",Thiruvananthapuram () is the capital city of the Indian state of Kerala. The city used to be known by the name of Trivandrum. It is on the west coast of India near the far south of the mainland.


In [28]:
query = "what is the old capital of India?"

pd.DataFrame({
    'base_embedder_results': [d.page_content
                                  for d in base_retriever.invoke(query)],
    'finetuned_embedder_results': [d.page_content
                                      for d in finetuned_retriever.invoke(query)]
})

Unnamed: 0,base_embedder_results,finetuned_embedder_results
0,New Delhi () is the capital of India and a union territory of the megacity of Delhi. It has a very old history and is home to several monuments where the city is expensive to live in. In traditional Indian geography it falls under the North Indian zone. The city has an area of about 42.7 km. New Delhi has a population of about 9.4 Million people.,New Delhi () is the capital of India and a union territory of the megacity of Delhi. It has a very old history and is home to several monuments where the city is expensive to live in. In traditional Indian geography it falls under the North Indian zone. The city has an area of about 42.7 km. New Delhi has a population of about 9.4 Million people.
1,"Kolkata (spelled Calcutta before 1 January 2001) is the capital city of the Indian state of West Bengal. It is the second largest city in India after Mumbai. It is on the east bank of the River Hooghly. When it is called Calcutta, it includes the suburbs. This makes it the third largest city of India. This also makes it the world's 8th largest metropolitan area as defined by the United Nations. Kolkata served as the capital of India during the British Raj until 1911. Kolkata was once the center of industry and education. However, it has witnessed political violence and economic problems since 1954. Since 2000, Kolkata has grown due to economic growth. Like other metropolitan cities in India, Kolkata struggles with poverty, pollution and traffic congestion.","Kolkata (spelled Calcutta before 1 January 2001) is the capital city of the Indian state of West Bengal. It is the second largest city in India after Mumbai. It is on the east bank of the River Hooghly. When it is called Calcutta, it includes the suburbs. This makes it the third largest city of India. This also makes it the world's 8th largest metropolitan area as defined by the United Nations. Kolkata served as the capital of India during the British Raj until 1911. Kolkata was once the center of industry and education. However, it has witnessed political violence and economic problems since 1954. Since 2000, Kolkata has grown due to economic growth. Like other metropolitan cities in India, Kolkata struggles with poverty, pollution and traffic congestion."
2,"Jhansi is a historic city of India between the rivers Pahunj and Betwa in the northern state of Uttar Pradesh, close to the border with Madhya Pradesh. Jhansi is the administrative headquarters of Jhansi District and Jhansi Division. The original walled city grew up around its stone fort, which was built in 1613. The city is well connected to all other major towns in Uttar Pradesh by road and railway networks. It is called ""gateway to Bundelkhand"". Jhansi was besieged and taken by British forces in 1858 during the Indian Rebellion of 1857.","Ancient India had a long-lived civilization and culture. It covered several countries including modern-day India, Pakistan and Bangladesh."


In [29]:
query = "tell me about the history of India"

pd.DataFrame({
    'base_embedder_results': [d.page_content
                                  for d in base_retriever.invoke(query)],
    'finetuned_embedder_results': [d.page_content
                                      for d in finetuned_retriever.invoke(query)]
})

Unnamed: 0,base_embedder_results,finetuned_embedder_results
0,"The History of India covers thousands of years and discusses many diverse languages, cultures, periods, and dynasties. Indian civilization began in the Indus Valley and some literature survives from that time. More is known of the time after the Persian Empire conquered India.","The History of India covers thousands of years and discusses many diverse languages, cultures, periods, and dynasties. Indian civilization began in the Indus Valley and some literature survives from that time. More is known of the time after the Persian Empire conquered India."
1,"Ancient India had a long-lived civilization and culture. It covered several countries including modern-day India, Pakistan and Bangladesh.","Ancient India had a long-lived civilization and culture. It covered several countries including modern-day India, Pakistan and Bangladesh."
2,"The Mughal Empire, (, ) was an empire in Asia which existed from 1526 to 1858. The Mughal rule over India is called an Empire because it stretched over a large area. When it was biggest it ruled most of the Indian subcontinent, then known as Hindustan, and parts of what is now India, Afghanistan and modern Pakistan and Bangladesh, between 1526 and 1707. Worth 25% of world GDP, it was the world's largest economy and it was well known for having signalled the proto-industrialization and for his lavish architecture.","Indology or Indian studies is the academic study of the history and cultures, languages, and literature of India and other Asian studies."


In [30]:
query = "what is the capital of Gujarat?"

pd.DataFrame({
    'base_embedder_results': [d.page_content
                                  for d in base_retriever.invoke(query)],
    'finetuned_embedder_results': [d.page_content
                                      for d in finetuned_retriever.invoke(query)]
})

Unnamed: 0,base_embedder_results,finetuned_embedder_results
0,"Gandhinagar is the capital city of Gujarat state in India. It is 23 km from the city of Ahmedabad and 464 km from Mumbai. In the year 1960, the Bombay state of India was divided into two states - Maharashtra and Gujarat. Bombay (now called Mumbai) became the capital city of Maharashtra. For Gujarat, new capital was needed. Gandhinagar was then made the capital of Gujarat.","Gandhinagar is the capital city of Gujarat state in India. It is 23 km from the city of Ahmedabad and 464 km from Mumbai. In the year 1960, the Bombay state of India was divided into two states - Maharashtra and Gujarat. Bombay (now called Mumbai) became the capital city of Maharashtra. For Gujarat, new capital was needed. Gandhinagar was then made the capital of Gujarat."
1,"Dwarka (with other spelling as Dvarka) was a city of Ancient India. The city was one of seven holy cities of the Hindus. It is also one of the four most important places of pilgrimage for the Hindus. Hindus call such four places as Dhams. Dwarka is located in the western part of India in Gujarat state. During the birth day of Krishna, and the Hindu festivals of Holi and Divali, thousands of Hindus visit the place.","Gujarati is an Indo-Aryan language. It is spoken in Gujarat, India and also in neighbouring Pakistan. It was the ""mother tongue"" of Gandhi and Muhammad Ali Jinnah. There are millions of Gujaratis who speak it as their first language. Gujarati is the 20th most common language in the United States of America. Mahatma Gandhi, the India's leader, once said about the Gujarati language: ""Bad handwriting is a sign of an uncomplete education""."
2,Thiruvananthapuram () is the capital city of the Indian state of Kerala. The city used to be known by the name of Trivandrum. It is on the west coast of India near the far south of the mainland.,"Western India is a region of the Republic of India, it includes Gujarat, Madhya Pradesh and Maharashtra."


In [31]:
query = "Tell me about the Indian flag"

pd.DataFrame({
    'base_embedder_results': [d.page_content
                                  for d in base_retriever.invoke(query)],
    'finetuned_embedder_results': [d.page_content
                                      for d in finetuned_retriever.invoke(query)]
})

Unnamed: 0,base_embedder_results,finetuned_embedder_results
0,"The modern Republic of India (Hindi:); has several official National symbols including a historical document, a flag, an emblem, an anthem, a memorial tower as well as several national heroes. All the symbols were picked up at various times. The design of the national flag was officially adopted by the Constituent Assembly just 21 days before Independence, on the 24th of July in 1947. There are also several other patriotic symbols including the national animal, bird, fruit, flower and tree... have all been selected carefully to project the image of India at its best. They are chosen to reflect Indian culture and beliefs and also the positive at - tributes often associated with Indian traditions respectively.","The modern Flag of The Republic of India has three colours, which are placed horizontally. At the top is saffron, which signifies sacrifice and patriotism. In the middle is white, which stands for truth in word and actions and purity in our thoughts. At the bottom is green, which stands for life and prosperity. In the middle of the white is a blue wheel, which is called the Ashoka Chakra. It has 24 spokes and it stands for progress.The Chakra or the wheel also symbolizes the Power of the State governed by Dharma. It is also called the tiranga or tricolour. The flag was discovered by Venkayya Pingali."
1,"The modern Flag of The Republic of India has three colours, which are placed horizontally. At the top is saffron, which signifies sacrifice and patriotism. In the middle is white, which stands for truth in word and actions and purity in our thoughts. At the bottom is green, which stands for life and prosperity. In the middle of the white is a blue wheel, which is called the Ashoka Chakra. It has 24 spokes and it stands for progress.The Chakra or the wheel also symbolizes the Power of the State governed by Dharma. It is also called the tiranga or tricolour. The flag was discovered by Venkayya Pingali.","The modern Republic of India (Hindi:); has several official National symbols including a historical document, a flag, an emblem, an anthem, a memorial tower as well as several national heroes. All the symbols were picked up at various times. The design of the national flag was officially adopted by the Constituent Assembly just 21 days before Independence, on the 24th of July in 1947. There are also several other patriotic symbols including the national animal, bird, fruit, flower and tree... have all been selected carefully to project the image of India at its best. They are chosen to reflect Indian culture and beliefs and also the positive at - tributes often associated with Indian traditions respectively."
2,"The Indian Emblem of India is the symbol of the Republic of India, formally called 'National emblem'. It has four lions. The idea for this coat of arms was taken from the Sarnath Lion Capital that was built by Indian emperor Ashoka. It's a pillar in the city of Sarnath. Ashoka built it around 250 BC using a single piece of polished sandstone. The symbol is invariably used on all types of currency notes, passports and coins of India. In the two dimensional view of this symbol, one can see 3 heads (the fourth being hidden from view). It was adopted on 26 January 1950, the day that India became a republic.","The Indian Emblem of India is the symbol of the Republic of India, formally called 'National emblem'. It has four lions. The idea for this coat of arms was taken from the Sarnath Lion Capital that was built by Indian emperor Ashoka. It's a pillar in the city of Sarnath. Ashoka built it around 250 BC using a single piece of polished sandstone. The symbol is invariably used on all types of currency notes, passports and coins of India. In the two dimensional view of this symbol, one can see 3 heads (the fourth being hidden from view). It was adopted on 26 January 1950, the day that India became a republic."


We can see clearly that the fine-tuned embedder model has slowly started giving slightly more relevant results for these queries on India

In [32]:
import torch
torch.cuda.empty_cache()