<a href="https://colab.research.google.com/github/dipesh2108/AI_Notes/blob/main/Document_Search_Embbedding_NB_2.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Document search with embeddings

## Overview

This example demonstrates how to use the Gemini API to create embeddings so that you can perform document search. You will use the Python client library to build a word embedding that allows you to compare query strings or questions, to document contents.

In this Notebook, we will use embeddings to perform document search over a set of documents to ask questions related to the BOSCH Car. **This Notebook in way demonstrates the concept of Semantic Search as used in Vector Databases.**


## Setup

First, download and install the Gemini API Python library.

In [None]:
!pip install -U -q google.generativeai

[?25l   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m0.0/160.8 kB[0m [31m?[0m eta [36m-:--:--[0m[2K   [91m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m[90m╺[0m[90m━[0m [32m153.6/160.8 kB[0m [31m5.9 MB/s[0m eta [36m0:00:01[0m[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m160.8/160.8 kB[0m [31m3.1 MB/s[0m eta [36m0:00:00[0m
[?25h[?25l   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m0.0/760.0 kB[0m [31m?[0m eta [36m-:--:--[0m[2K   [91m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m[91m╸[0m [32m757.8/760.0 kB[0m [31m23.4 MB/s[0m eta [36m0:00:01[0m[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m760.0/760.0 kB[0m [31m10.6 MB/s[0m eta [36m0:00:00[0m
[?25h

In [None]:
import textwrap
import numpy as np
import pandas as pd

import google.generativeai as genai
import google.ai.generativelanguage as glm
## The purpose of import google.ai.generativelanguage as glm is to import
## the Google AI Generative Language API client library.
## This library allows you to programmatically interact with the Generative Language API,
## which provides access to state-of-the-art text generation and manipulation models.

# Used to securely store your API key
from google.colab import userdata

from IPython.display import Markdown

### Create an API Key

Before you can use the Gemini API, you must first obtain an API key. If you don't already have one, create a key with one click in Google AI Studio.

<a class="button button-primary" href="https://makersuite.google.com/app/apikey" target="_blank" rel="noopener noreferrer">Get an API key</a>

In Colab, add the key to the secrets manager under the "🔑" in the left panel. Give it the name `API_KEY`.

Once you have the API key, pass it to the SDK. You can do this in two ways:

* Put the key in the `GOOGLE_API_KEY` environment variable (the SDK will automatically pick it up from there).
* Pass the key to `genai.configure(api_key=...)`

In [None]:
# Or use `os.getenv('API_KEY')` to fetch an environment variable.
API_KEY=userdata.get('GOOGLE_API_KEY')

genai.configure(api_key=API_KEY)

Key Point: Next, you will choose a model. Any embedding model will work for this tutorial, but for real applications it's important to choose a specific model and stick with it. The outputs of different models are not compatible with each other.

**Note**: At this time, the Gemini API is [only available in certain regions](https://developers.generativeai.google/available_regions).

In [None]:
for m in genai.list_models():
  if 'embedContent' in m.supported_generation_methods:
    print(m.name)

## models/embedding-001 is a pre-trained embedding model for natural language processing tasks.
## It is designed to convert words and phrases into numerical vectors, allowing machines
## to better understand the context and meaning of text.

## models/embedding-001 can be used for a wide range of NLP tasks, including:
## * Text classification
## * Named entity recognition
## * Natural language inference
## * Question answering
## * Machine translation

## models/text-embedding-004 offer higher-dimensional embeddings,
## which can capture more nuanced semantic information.
## more or less models/text-embedding-004 is good at Semantic Search, Clustering and Recommendation Systems.

models/embedding-001
models/text-embedding-004


## Embedding generation

In this section, you will see how to generate embeddings for a piece of text using the embeddings from the Gemini API.

For the embeddings model, embedding-001, there is a new task type parameter and the optional title (only valid with task_type=`RETRIEVAL_DOCUMENT`).

These new parameters apply only to the newest embeddings models.The task types are:

Task Type | Description
---       | ---
RETRIEVAL_QUERY	| Specifies the given text is a query in a search/retrieval setting.
RETRIEVAL_DOCUMENT | Specifies the given text is a document in a search/retrieval setting.
SEMANTIC_SIMILARITY	| Specifies the given text will be used for Semantic Textual Similarity (STS).
CLASSIFICATION	| Specifies that the embeddings will be used for classification.
CLUSTERING	| Specifies that the embeddings will be used for clustering.

Note: Specifying a `title` for `RETRIEVAL_DOCUMENT` provides better quality embeddings for retrieval.

In [None]:
title = "The next generation of AI for developers and Google Workspace"
sample_text = ("Title: The next generation of AI for developers and Google Workspace"
    "\n"
    "Full article:\n"
    "\n"
    "Gemini API & Google AI Studio: An approachable way to explore and prototype with generative AI applications")

model = 'models/text-embedding-004'
## try with models/embedding-001
## and
## with models/text-embedding-004
embedding = genai.embed_content(model=model,
                                content=sample_text,
                                task_type="retrieval_document",
                                title=title)

print(embedding)

print("size of the vector =",  len(embedding['embedding']))

## Note
## RETRIEVAL_QUERY: This attribute is assigned to text that represents a user's search query.
## It helps NLP models understand that the text is a query and should be treated differently
## from other types of text, such as documents.

## RETRIEVAL_DOCUMENT: This attribute is assigned to text that represents a document
## or piece of content that is being retrieved or searched.
## It helps NLP models understand that the text is a document
## and should be analyzed for its relevance to the query.


## Explanation of what the code does:
## It loads a pre-trained embedding model called models/embedding-001. This model is trained to convert text into numerical vectors.
## It sets the task_type to "retrieval_document". This tells the model that the input text is a document in a search or retrieval setting.
## It generates an embedding for the sample_text. This embedding is a numerical vector that captures the semantic meaning and context of the text.


{'embedding': [-0.0021609126, -0.003164448, -0.060120765, -0.0071218405, 0.00087754615, 0.04058192, 0.04457149, 0.035524692, -0.047465388, 0.008888606, -0.027958257, 0.011335692, -0.0024438684, 0.0030851841, -0.018796144, -0.055550933, 0.031426456, 0.00065491674, -0.11370059, 0.06370807, -0.021750022, -0.021367034, -0.09982074, -0.008604742, -0.033300586, -0.012815639, 0.07153146, 0.03706478, 0.02297012, 0.043331206, 0.01067061, 0.040685344, 0.03636141, -0.036222056, -0.017799364, -0.014820968, 0.0053205043, -0.017382711, 0.07044941, 0.0020212498, -0.018208733, 0.017558081, 0.006493213, 0.12724239, -0.023805205, 0.010057812, -0.0006948954, 0.07085626, -0.056457285, 0.01831114, 0.09046226, 0.021575559, -0.06656088, 0.026865069, -0.0034812505, -0.0011228691, -0.06535635, -0.0018169151, 0.08672994, 0.02874761, -0.024817277, 0.004653874, -0.058998518, 0.03206169, -0.022604037, -0.015454266, -0.013758667, 0.021129975, -0.047893398, 0.02573244, 0.013028228, -0.018002002, -0.039879415, 0.0234

## Building an embeddings database

Here are three sample texts to use to build the embeddings database. You will use the Gemini API to create embeddings of each of the documents. Turn them into a dataframe for better visualization.

In [None]:
DOCUMENT1 = {
    "title": "Operating the Climate Control System",
    "content": "Your Bosch car has a climate control system that allows you to adjust the temperature and airflow in the car. To operate the climate control system, use the buttons and knobs located on the center console.  Temperature: The temperature knob controls the temperature inside the car. Turn the knob clockwise to increase the temperature or counterclockwise to decrease the temperature. Airflow: The airflow knob controls the amount of airflow inside the car. Turn the knob clockwise to increase the airflow or counterclockwise to decrease the airflow. Fan speed: The fan speed knob controls the speed of the fan. Turn the knob clockwise to increase the fan speed or counterclockwise to decrease the fan speed. Mode: The mode button allows you to select the desired mode. The available modes are: Auto: The car will automatically adjust the temperature and airflow to maintain a comfortable level. Cool: The car will blow cool air into the car. Heat: The car will blow warm air into the car. Defrost: The car will blow warm air onto the windshield to defrost it."}
DOCUMENT2 = {
    "title": "Touchscreen",
    "content": "Your Bosch car has a large touchscreen display that provides access to a variety of features, including navigation, entertainment, and climate control. To use the touchscreen display, simply touch the desired icon.  For example, you can touch the \"Navigation\" icon to get directions to your destination or touch the \"Music\" icon to play your favorite songs."}
DOCUMENT3 = {
    "title": "Shifting Gears",
    "content": "Your Bosch car has an automatic transmission. To shift gears, simply move the shift lever to the desired position.  Park: This position is used when you are parked. The wheels are locked and the car cannot move. Reverse: This position is used to back up. Neutral: This position is used when you are stopped at a light or in traffic. The car is not in gear and will not move unless you press the gas pedal. Drive: This position is used to drive forward. Low: This position is used for driving in snow or other slippery conditions."}

documents = [DOCUMENT1, DOCUMENT2, DOCUMENT3]

Organize the contents of the dictionary into a dataframe for better visualization.

In [None]:
df = pd.DataFrame(documents)
df.columns = ['Title', 'Text']
df

Unnamed: 0,Title,Text
0,Operating the Climate Control System,Your Bosch car has a climate control system th...
1,Touchscreen,Your Bosch car has a large touchscreen display...
2,Shifting Gears,Your Bosch car has an automatic transmission. ...


Get the embeddings for each of these bodies of text. Add this information to the dataframe.

In [None]:
# Get the embeddings of each text and add to an embeddings column in the dataframe
def embed_fn(title, text):
  return genai.embed_content(model=model,
                             content=text,
                             task_type="retrieval_document",
                             title=title)["embedding"]

df['Embeddings'] = df.apply(lambda row: embed_fn(row['Title'], row['Text']), axis=1)
df

Unnamed: 0,Title,Text,Embeddings
0,Operating the Climate Control System,Your Bosch car has a climate control system th...,"[0.0067722397, 0.0056511136, -0.019941, -0.021..."
1,Touchscreen,Your Bosch car has a large touchscreen display...,"[0.0055358233, 0.025506804, -0.041075885, -0.0..."
2,Shifting Gears,Your Bosch car has an automatic transmission. ...,"[0.02193709, 0.016526612, -0.014547526, 0.0023..."


## Document search with Q&A

Now that the embeddings are generated, let's create a Q&A system to search these documents. You will ask a question, create an embedding of the question, and compare it against the collection of embeddings in the dataframe.

The embedding of the question will be a vector (list of float values), which will be compared against the vector of the documents using the dot product. This vector returned from the API is already normalized. The dot product represents the similarity in direction between two vectors.

The values of the dot product can range between -1 and 1, inclusive. If the dot product between two vectors is 1, then the vectors are in the same direction. If the dot product value is 0, then these vectors are orthogonal, or unrelated, to each other. Lastly, if the dot product is -1, then the vectors point in the opposite direction and are not similar to each other.

Note, with the new embeddings model (`embedding-001`), specify the task type as `QUERY` for user query and `DOCUMENT` when embedding a document text.

Task Type | Description
---       | ---
RETRIEVAL_QUERY	| Specifies the given text is a query in a search/retrieval setting.
RETRIEVAL_DOCUMENT | Specifies the given text is a document in a search/retrieval setting.

In [None]:
query = "How do you shift gears in the Bosch car?"
model = 'models/text-embedding-004'

request = genai.embed_content(model=model,
                              content=query,
                              task_type="retrieval_query")

Use the `find_best_passage` function to calculate the dot products, and then sort the dataframe from the largest to smallest dot product value to retrieve the relevant passage out of the database.

In [None]:
def find_best_passage(query, dataframe):
  """
  Compute the distances between the query and each document in the dataframe
  using the dot product.
  """
  query_embedding = genai.embed_content(model=model,
                                        content=query,
                                        task_type="retrieval_query")
  dot_products = np.dot(np.stack(dataframe['Embeddings']), query_embedding["embedding"])
  idx = np.argmax(dot_products)
  return dataframe.iloc[idx]['Text'] # Return text from index with max value

View the most relevant document from the database:

In [None]:
passage = find_best_passage(query, df)
passage

'Your Bosch car has an automatic transmission. To shift gears, simply move the shift lever to the desired position.  Park: This position is used when you are parked. The wheels are locked and the car cannot move. Reverse: This position is used to back up. Neutral: This position is used when you are stopped at a light or in traffic. The car is not in gear and will not move unless you press the gas pedal. Drive: This position is used to drive forward. Low: This position is used for driving in snow or other slippery conditions.'

## Question and Answering Application

Let's define a make_prompt function to be used in our **Q & A** system. Input your own custom data below to create a simple question and answering example. You will still use the dot product as a metric of similarity.

In [None]:
def make_prompt(query, relevant_passage):
  escaped = relevant_passage.replace("'", "").replace('"', "").replace("\n", " ")
  prompt = textwrap.dedent("""You are a helpful and informative bot that answers questions using text from the reference passage included below. \
  Be sure to respond in a complete sentence, being comprehensive, including all relevant background information. \
  However, you are talking to a non-technical audience, so be sure to break down complicated concepts and \
  strike a friendly and converstional tone. \
  If the passage is irrelevant to the question, then you print - Sorry cannot answer.
  QUESTION: '{query}'
  PASSAGE: '{relevant_passage}'

    ANSWER:
  """).format(query=query, relevant_passage=escaped)

  return prompt

 ## The textwrap.dedent() function in Python is used to remove common leading whitespace
 ## from every line in a multiline string. It is useful for removing indentation
 ## from a string that has been indented relative to the left margin,
 ## making it easier to read and process the string.

In [None]:
prompt = make_prompt(query, passage)
print(prompt)

You are a helpful and informative bot that answers questions using text from the reference passage included below.   Be sure to respond in a complete sentence, being comprehensive, including all relevant background information.   However, you are talking to a non-technical audience, so be sure to break down complicated concepts and   strike a friendly and converstional tone.   If the passage is irrelevant to the question, then you print - Sorry cannot answer.
  QUESTION: 'How do you shift gears in the Bosch car?'
  PASSAGE: 'Your Bosch car has an automatic transmission. To shift gears, simply move the shift lever to the desired position.  Park: This position is used when you are parked. The wheels are locked and the car cannot move. Reverse: This position is used to back up. Neutral: This position is used when you are stopped at a light or in traffic. The car is not in gear and will not move unless you press the gas pedal. Drive: This position is used to drive forward. Low: This positi

Choose one of the Gemini content generation models in order to find the answer to your query.

In [None]:
for m in genai.list_models():
  if 'generateContent' in m.supported_generation_methods:
    print(m.name)

models/gemini-1.0-pro-latest
models/gemini-1.0-pro
models/gemini-pro
models/gemini-1.0-pro-001
models/gemini-1.0-pro-vision-latest
models/gemini-pro-vision
models/gemini-1.5-pro-latest
models/gemini-1.5-pro-001
models/gemini-1.5-pro-002
models/gemini-1.5-pro
models/gemini-1.5-pro-exp-0801
models/gemini-1.5-pro-exp-0827
models/gemini-1.5-flash-latest
models/gemini-1.5-flash-001
models/gemini-1.5-flash-001-tuning
models/gemini-1.5-flash
models/gemini-1.5-flash-exp-0827
models/gemini-1.5-flash-002
models/gemini-1.5-flash-8b
models/gemini-1.5-flash-8b-001
models/gemini-1.5-flash-8b-latest
models/gemini-1.5-flash-8b-exp-0827
models/gemini-1.5-flash-8b-exp-0924


In [None]:
model = genai.GenerativeModel('gemini-1.5-pro')
answer = model.generate_content(prompt)

In [None]:
Markdown(answer.text)

Hey there! Shifting gears in your Bosch car is super easy because it has an automatic transmission.  This means the car does the hard work for you! You just move the shift lever to the position you want, like Drive to go forward, Reverse to go backward, or Park when you're finished driving. 


## Practical Use Case

**Building a Q&A System for Banking and Finance**

You want to create a simple Q&A system specifically focused on the banking and finance sector. You will utilize your existing dataset within this domain and leverage the capabilities of a pre-trained model like Gemini-Pro.

**Your task involves the following steps:**

1. **Data Preprocessing:** Clean and prepare your banking and finance dataset for the chosen model. This may involve tasks like handling missing values, normalization, or tokenization.
2. **Fine-tuning Gemini-Pro:** Utilize the pre-trained Gemini-Pro model and fine-tune it on your prepared dataset. This process aims to adapt the model's knowledge to the specific domain of banking and finance.
3. **Embedding Generation:** Generate embeddings for both your questions and documents in your dataset. Embeddings represent the meaning of text in a numerical vector format, which is crucial for similarity comparisons.
4. **Q&A System Design:** Design and implement the core functionalities of your Q&A system. This includes accepting user questions, generating embeddings for the questions, comparing them with document embeddings, and retrieving the most relevant documents or snippets as answers.
5. **Evaluation:** Evaluate the performance of your Q&A system. Choose appropriate metrics to assess the effectiveness of your system in answering user queries accurately and efficiently.

**Bonus Challenge:**

* Explore techniques to improve your system's robustness, such as handling ambiguous questions or incorporating context in the search process.
* Investigate the use of interactive elements or visualizations to enhance the user experience of your Q&A system.

**Remember to document your work clearly at each stage, including code snippets and results. This will help you track your progress and evaluate the effectiveness of your approach.**

This question encourages the learner to apply their knowledge of data preparation, fine-tuning, and embedding generation in a practical context. It also challenges them to design, implement, and evaluate a real-world application.

In [None]:
DOCUMENT1 = {
    "title": "What is a savings account?",
    "content": "A savings account is a type of bank account that allows you to deposit and withdraw money. Savings accounts typically offer a higher interest rate than checking accounts, but they may have restrictions on how often you can withdraw money. Savings accounts are a good option for people who want to save money for a future goal, such as a down payment on a house or a new car."
}

DOCUMENT2 = {
    "title": "What is a checking account?",
    "content": "A checking account is a type of bank account that allows you to deposit and withdraw money, as well as write checks. Checking accounts typically offer a lower interest rate than savings accounts, but they may have fewer restrictions on how often you can withdraw money. Checking accounts are a good option for people who need to access their money frequently."
}

DOCUMENT3 = {
    "title": "What is a fixed deposit?",
    "content": "A fixed deposit is a type of investment that allows you to deposit money for a fixed period of time. Fixed deposits typically offer a higher interest rate than savings accounts, but you cannot access your money until the end of the term. Fixed deposits are a good option for people who want to save money for a specific goal, such as a child's education or a retirement fund."
}

DOCUMENT4 = {
    "title": "What is a loan?",
    "content": "A loan is a type of financial assistance that allows you to borrow money from a bank or other lender. Loans typically have a fixed interest rate and a fixed repayment period. Loans can be used for a variety of purposes, such as buying a house, a car, or starting a business. Loans are a good option for people who need to borrow money for a specific purpose."
}

DOCUMENT5 = {
    "title": "What is a credit card?",
    "content": "A credit card is a type of payment card that allows you to borrow money from a bank or other lender. Credit cards typically have a high interest rate, but they offer a number of benefits, such as rewards points and cash back. Credit cards are a good option for people who need to make purchases without having to pay for them immediately."
}

DOCUMENT6 = {
    "title": "What is a debit card?",
    "content": "A debit card is a type of payment card that allows you to access your money directly from your bank account. Debit cards typically have a low interest rate, but they may have some restrictions on how often you can use them. Debit cards are a good option for people who want to avoid paying interest on their purchases."
}

DOCUMENT7 = {
    "title": "What is a mutual fund?",
    "content": "A mutual fund is a type of investment that allows you to pool your money with other investors to invest in a variety of stocks, bonds, and other assets. Mutual funds are a good option for people who want to diversify their investments and reduce their risk."
}

DOCUMENT8 = {
    "title": "What is an exchange-traded fund (ETF)?",
    "content": "An exchange-traded fund (ETF) is a type of investment that tracks a basket of stocks, bonds, or other assets. ETFs are traded on stock exchanges, like stocks, and they offer a number of benefits, such as low costs and diversification. ETFs are a good option for people who want to invest in a specific market or sector."
}

DOCUMENT9 = {
    "title": "What is a pension plan?",
    "content": "A pension plan is a type of retirement savings plan that allows you to save money for your retirement. Pension plans typically offer a number of benefits, such as tax breaks and guaranteed income in retirement. Pension plans are a good option for people who want to save for their retirement and reduce their risk."
}

DOCUMENT10 = {
    "title": "What is a life insurance policy?", insurance policies typically offer a variety of benefits, such as death benefits, disability benefits, and long-term care benefits. Life insurance policies are a good option for people who want to protect their loved ones from financial hardship in the event of their death."
    "content": "A life insurance policy is a type of insurance that provides financial protection to your loved ones in the event of your death. Life
}

#documentsBFSI = [DOCUMENT1, DOCUMENT2, DOCUMENT3, DOCUMENT4, DOCUMENT5, DOCUMENT6, DOCUMENT7, DOCUMENT8, DOCUMENT9, DOCUMENT10]


documentsBFSI = []

# Assuming DOCUMENT variables follow a specific naming convention (e.g., DOCUMENT + str(i))
for i in range(1, 11):
  documentsBFSI.append(eval("DOCUMENT" + str(i)))  # Using eval to dynamically access DOCUMENT variables

# Create a dictionary with the extracted data
data = {
    "Title": [doc["title"] for doc in documentsBFSI],
    "Text": [doc["content"] for doc in documentsBFSI]
}

# Create the DataFrame
df1 = pd.DataFrame(data)

df1.head()


Unnamed: 0,Title,Text
0,What is a savings account?,A savings account is a type of bank account th...
1,What is a checking account?,A checking account is a type of bank account t...
2,What is a fixed deposit?,A fixed deposit is a type of investment that a...
3,What is a loan?,A loan is a type of financial assistance that ...
4,What is a credit card?,A credit card is a type of payment card that a...


In [None]:
# Get the embeddings of each text and add to an embeddings column in the dataframe
def embed_fn(title, text):
  return genai.embed_content(model = 'models/text-embedding-004',
                             content=text,
                             task_type="retrieval_document",
                             title=title)["embedding"]

df1['Embeddings'] = df1.apply(lambda row: embed_fn(row['Title'], row['Text']), axis=1)
df1.head()

Unnamed: 0,Title,Text,Embeddings
0,What is a savings account?,A savings account is a type of bank account th...,"[-0.0012245632, 0.0034374464, 0.05789101, -0.0..."
1,What is a checking account?,A checking account is a type of bank account t...,"[0.013392055, -0.010909861, 0.0343092, -0.0057..."
2,What is a fixed deposit?,A fixed deposit is a type of investment that a...,"[-0.018583171, -6.710531e-05, 0.005108815, -0...."
3,What is a loan?,A loan is a type of financial assistance that ...,"[-0.054544896, -0.02621321, 0.030564137, -0.01..."
4,What is a credit card?,A credit card is a type of payment card that a...,"[-0.026037002, -0.02852556, -0.0022261036, -0...."


In [None]:
querybfsi1 = "For how much should I buy an Insurance policy?"
model = 'models/text-embedding-004'

requestbfsi = genai.embed_content(model=model,
                              content=query,
                              task_type="retrieval_query")

In [None]:
passagebfsi_1 = find_best_passage(querybfsi1, df1)
passagebfsi_1

'A life insurance policy is a type of insurance that provides financial protection to your loved ones in the event of your death. Life insurance policies typically offer a variety of benefits, such as death benefits, disability benefits, and long-term care benefits. Life insurance policies are a good option for people who want to protect their loved ones from financial hardship in the event of their death.'

In [None]:
promptbfsi_1 = make_prompt(querybfsi1, passagebfsi_1)
print(promptbfsi_1)

You are a helpful and informative bot that answers questions using text from the reference passage included below.   Be sure to respond in a complete sentence, being comprehensive, including all relevant background information.   However, you are talking to a non-technical audience, so be sure to break down complicated concepts and   strike a friendly and converstional tone.   If the passage is irrelevant to the question, then you print - Sorry cannot answer.
  QUESTION: 'For how much should I buy an Insurance policy?'
  PASSAGE: 'A life insurance policy is a type of insurance that provides financial protection to your loved ones in the event of your death. Life insurance policies typically offer a variety of benefits, such as death benefits, disability benefits, and long-term care benefits. Life insurance policies are a good option for people who want to protect their loved ones from financial hardship in the event of their death.'

    ANSWER:



In [None]:
model = genai.GenerativeModel('gemini-1.5-pro')
answerbfsi = model.generate_content(promptbfsi_1)

Markdown(answerbfsi.text)

Sorry, I can't answer that question. While the text explains what a life insurance policy is and why it can be beneficial, it doesn't mention how much one should cost or how to determine the appropriate coverage amount.  Factors like your age, health, income, and the needs of your beneficiaries all play a role in that decision. It's always best to speak with a qualified financial advisor to get personalized advice. 


In [None]:
## Let us change the question , so that LLM can find an answer from our given training content.

querybfsi2 = "Explain me about Life Insurance policy?"
model = 'models/text-embedding-004'

requestbfsi = genai.embed_content(model=model,
                              content=query,
                              task_type="retrieval_query")

passagebfsi_2 = find_best_passage(querybfsi2, df1)


promptbfsi_2 = make_prompt(querybfsi2, passagebfsi_2)
print(promptbfsi_2)

model = genai.GenerativeModel('gemini-pro')
answerbfsi_2 = model.generate_content(promptbfsi_2)

Markdown(answerbfsi_2.text)

You are a helpful and informative bot that answers questions using text from the reference passage included below.   Be sure to respond in a complete sentence, being comprehensive, including all relevant background information.   However, you are talking to a non-technical audience, so be sure to break down complicated concepts and   strike a friendly and converstional tone.   If the passage is irrelevant to the question, then you print - Sorry cannot answer.
  QUESTION: 'Explain me about Life Insurance policy?'
  PASSAGE: 'A life insurance policy is a type of insurance that provides financial protection to your loved ones in the event of your death. Life insurance policies typically offer a variety of benefits, such as death benefits, disability benefits, and long-term care benefits. Life insurance policies are a good option for people who want to protect their loved ones from financial hardship in the event of their death.'

    ANSWER:



A life insurance policy is a great way to provide financial protection to your loved ones in the event of your death.  These policies offer a variety of benefits that can help your family cover expenses such as funeral costs, outstanding debts, and lost income. You can also add riders to your policy that provide additional coverage for things like disability or long-term care.

## Bonus Learning

### Vector Databases for LLMs

**Vector databases** are specialized databases designed to store and efficiently retrieve high-dimensional data represented as vectors. These vectors are arrays of numbers that encode information, making them ideal for representing the complex relationships and semantic meaning captured by LLMs.

### Why Vector Databases are Essential for LLMs

Traditional relational databases, which store data in tables with rows and columns, struggle with the following challenges when dealing with LLM data:

> **High Dimensionality**: LLM outputs and inputs can have thousands or even millions of dimensions, making them cumbersome and inefficient to store in relational databases.

> **Similarity Search**: LLMs often require finding similar data points based on meaning rather than exact matches.
Relational databases lack efficient methods for semantic search.

### Benefits of Vector Databases for LLMs:

Vector databases offer several advantages for LLMs:

> **Efficient Storage and Retrieval**: Vectors are compact representations of information, making them more space-efficient than storing raw text or code in relational databases. Vector databases are also optimized for similarity searches based on vector distances, enabling efficient retrieval of similar LLM outputs or prompts.

> **Semantic Search**: Unlike relational databases that search based on exact matches, vector databases can find data points with similar meaning even if the wording differs. This is crucial for LLMs, which often deal with nuanced language and require understanding context and intent.

> **Scalability**: Vector databases are designed to scale effectively with large datasets of LLM outputs and prompts, making them suitable for complex real-world applications.

### Must Visit Links for Quickly Learning about Vector DB

https://medium.com/data-and-beyond/vector-databases-a-beginners-guide-b050cbbe9ca0


### Additional Resource link for Vector DB
Vector databases use special search techniques known as Approximate Nearest Neighbor (ANN) search, which includes methods like hashing and graph-based searches.

Read more here :
https://www.datacamp.com/blog/the-top-5-vector-databases

