# **Ineuron_custom_website_chatbot_(Llama2_HF_Pinecone)**

**Note - This script executed in Google Colab**

- Here we are **building Chatbot for iNeuron page**, Similarly we can do for any websites(Except some websites has restriction)
- We added **Custom prompt** to that chatbot model at the end. Means here we **altered default system prompt** and updated based on our requirement

- Infront of any site, type **sitemap.xml** it shows all its content pages of that main page. So internally Langchain **website reader** reads all pages like this only. This we can use in RAG approach to create custom Chatbot of that page.

  - https://ineuron.ai/
  - type as https://ineuron.ai/sitemap.xml


- Most of the pages we can read, few pages will have restriction.  Manual scrapping not required.
- In **LangChain**, for this we use **UnstructuredURLLoader** from **langchain document_loaders**


## **Below steps followed:**

- Extract data from Website/URL**

	- Here in list below (URLs), we can give multiple URL's it will read all together as a single document
		"""
			URLs = [
    				"https://ineuron.ai/"
   				 #, "We can append other URL also"
				]
		"""
	- Used **UnstructuredURLLoader** from langchain.document load

- Split the whole document to chunks
	- split that into chunks with **chunk_size=1000, chunk_overlap=200** using **RecursiveCharacterTextSplitter**

- **3. Creating Vector DB Using Pinecone**

	- Then import **openai embedding or hugging face embedding model** or some other embedding which converts **tokens/text to vector**
	- In **Pinecone** Create cluster/Index with dimention =384. Here our embedding converts chunk to **384 dimension vector**
	- Then use **Pinecone/vectore db library** and pass
    		- **document which conveted to chunks to vector**  
    		- **embedding model name**
    		- **index**
	- This converts **chunk to vectors/embedding**, which will be **saved inside index in pinecone cloud**

- **4. Define Llama2 Model**

	-  Create **LLM wrapper**
	- Use this Open Source **Meta Llama2** model via **Hugging face** and pass **Q + Vector search results** to this Llam2 Model


- **5. Pass the prompt(Q+Vector DB O/P) To Lllama2 to get text Generation**
	- Initialize the Retrieval QA
		- Here we pass out Knowledge base and generates O/P referening only that info(RAG). It avoids Hallucination

		- We can use **langchain's chain operation** - **RetrivalQA** for this
		- Here VectorDB does **similarity search** based on **user Q** but **LLM just structure the VectorDB response and gives as output**. LLM wont do anything else. Its also called RAG
		- This **RetrievalQA** passes Q to Vector db **retriever** and then passes this O/P with Q to llm model to do **summarization** internally
		- We can use langchain's chain operation - **RetrivalQA** or **load_qa_chain** for this


- **6. Create Custome Prompt**

	- Here we alter the **system default prompt** and create the **new system prompt**

	- **We have 2 type Prompt**
  		- **Instruction Token(Instruction Prompt/Input promt/Q we ask)**
  		- **System Token( System default prompt, prompt which is already there in llm backend)**

	- System prompt always **default inside LLM**, now we can updated this system prompt like below
	- Inside Template we pass system Prompt +Instruction prompt

	- Different LLM will have different Token, we need to check the LLM's documentation. For Llama2 -  **"[INST]", "[/INST]"** are for Instruction prompt Token and "<<SYS>>\n", "\n<<SYS>>\n\n"  are for System prompt Token



In [1]:
!pip -q install langchain -q
!pip -q install bitsandbytes accelerate transformers -q
!pip -q install sentence_transformers -q
!pip -q install unstructured  -q # This required to read website details
!pip install pinecone-client==2.2.4 -q
!pip install numpy==1.24.4 -q

[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m810.5/810.5 kB[0m [31m4.9 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m1.8/1.8 MB[0m [31m16.1 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m269.1/269.1 kB[0m [31m14.7 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m71.6/71.6 kB[0m [31m5.5 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m49.4/49.4 kB[0m [31m4.7 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m53.0/53.0 kB[0m [31m4.9 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m138.5/138.5 kB[0m [31m2.4 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m102.2/102.2 MB[0m [31m8.8 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━

## **1. Import all Libraries**

In [2]:
from langchain.document_loaders import UnstructuredURLLoader  # This is to read Website details
from langchain.text_splitter import CharacterTextSplitter
from langchain.vectorstores import Pinecone
import pinecone
from langchain.embeddings import HuggingFaceEmbeddings
from transformers import AutoTokenizer, AutoModelForCausalLM
import transformers
from langchain import HuggingFacePipeline
from huggingface_hub import notebook_login # This is a another way to connect to hugging_face
import torch

  from tqdm.autonotebook import tqdm


## **2. Extract data from Website/URL**

- Here in list below (URLs), we can give multiple URL's it will read all together as a single document
- Used **UnstructuredURLLoader** from langchain.document load

In [3]:
URLs = [
    "https://ineuron.ai/"
    #, "We can append other URL also"
]

In [5]:
#Read/Load the URL info
loader = UnstructuredURLLoader(urls = URLs)
data = loader.load()

In [6]:
data[:1]

[Document(page_content='Learning with iNeuron made \xa0<>\n\nTake your career to the next level with industry ready programs,\n\nAn entire learning ecosystem at your fingertips to make learning fun.\n\nChoose from a range of tech programs and make your next big career switch.\n\nExplore Courses\n\n55%\n\nAverage Salary Hike\n\n400+\n\nDifferent Courses\n\n10000+\n\nCareer Transitions\n\n400+\n\nHiring Partners\n\nLIVE NOW\n\nSupport System\n\nOur support system is live again, this time it is bigger, better and faster.\n\nExperience a tech community like never seen before\n\nTake me there\n\nOur Courses\n\nView all\n\nView all\n\nSuccess Stories\n\nView all\n\nFresher\n\nAbhisekh Bhuyan\n\nMLOps engineer\n\nI got job as an MLOps engineer at synapsica at 13 LPA PPO because of "End to End projects MLOps" from iNeuron.\n\nFrom\n\nFresher\n\nTo\n\n79% Increment\n\nSubham Kanungo\n\nAssociate Data Scientist\n\nI just joined EY as data analyst. It would not be possible with the support of Kri

### Split the whole document to chunks
- split that into chunks with **chunk_size=1000, chunk_overlap=200** using **RecursiveCharacterTextSplitter**

In [7]:
text_splitter=CharacterTextSplitter(separator='\n',
                                    chunk_size=1000,
                                    chunk_overlap=200)

In [8]:
text_chunks=text_splitter.split_documents(data)
len(text_chunks)

6

In [9]:
text_chunks[0]

Document(page_content='Learning with iNeuron made \xa0<>\nTake your career to the next level with industry ready programs,\nAn entire learning ecosystem at your fingertips to make learning fun.\nChoose from a range of tech programs and make your next big career switch.\nExplore Courses\n55%\nAverage Salary Hike\n400+\nDifferent Courses\n10000+\nCareer Transitions\n400+\nHiring Partners\nLIVE NOW\nSupport System\nOur support system is live again, this time it is bigger, better and faster.\nExperience a tech community like never seen before\nTake me there\nOur Courses\nView all\nView all\nSuccess Stories\nView all\nFresher\nAbhisekh Bhuyan\nMLOps engineer\nI got job as an MLOps engineer at synapsica at 13 LPA PPO because of "End to End projects MLOps" from iNeuron.\nFrom\nFresher\nTo\n79% Increment\nSubham Kanungo\nAssociate Data Scientist\nI just joined EY as data analyst. It would not be possible with the support of Krish sir and Sudhanshu sir.\nFrom\nTo\n100% Increment\nSayan Saha\nSo

## **3. Creating Vector DB Using Pinecone**

- Then import **openai embedding or hugging face embedding model** or some other embedding which converts **tokens/text to vector**
  - In **Pinecone** Create cluster/Index with dimention =384. Here our embedding converts chunk to **384 dimension vector**
- Then use **Pinecone/vectore db library** and pass
    - **document which conveted to chunks to vector**  
    - **embedding model name**
    - **index**
- This converts **chunk to vectors/embedding**, which will be **saved inside index in pinecone cloud**

### **Initialize Embedding**

- Used Hugging face embedding - **sentence-transformers/all-MiniLM-L6-v2**
- Here it downloads embedding model

- Here Directly doing langchain framework(Not hugging face pipeline) and it involves to write embedding in **vector DB**, That time we can use **Openai embedding model or hugging faces sentence-transformers**

- If we are using **hugging face pipeline** Not doing any **Vector DB** Creation, then better to use **specific model's tokenizer** only - **AutoTokenizer.from_pretrained** (Ex this problem),

In [10]:
embeddings=HuggingFaceEmbeddings(model_name='sentence-transformers/all-MiniLM-L6-v2')

The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


modules.json:   0%|          | 0.00/349 [00:00<?, ?B/s]

config_sentence_transformers.json:   0%|          | 0.00/116 [00:00<?, ?B/s]

README.md:   0%|          | 0.00/10.7k [00:00<?, ?B/s]

sentence_bert_config.json:   0%|          | 0.00/53.0 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/612 [00:00<?, ?B/s]

pytorch_model.bin:   0%|          | 0.00/90.9M [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/350 [00:00<?, ?B/s]

vocab.txt:   0%|          | 0.00/232k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/466k [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/112 [00:00<?, ?B/s]

1_Pooling/config.json:   0%|          | 0.00/190 [00:00<?, ?B/s]

In [12]:
# Test this initialized embedding model with sample text
query_result = embeddings.embed_query("Hello world")
len(query_result)

384

- In **Pinecone** Create cluster/Index with dimention =384. Here our embedding converts chunk to **384 dimension vector**

In [13]:
query_result[:10]

[-0.03447728976607323,
 0.031023146584630013,
 0.006734955124557018,
 0.02610897459089756,
 -0.03936200961470604,
 -0.16030250489711761,
 0.06692398339509964,
 -0.006441479083150625,
 -0.04745052754878998,
 0.014758859761059284]

### **Intialize Pinecone Vector DB**

#### Invoke and Initialize Pinecone

In [14]:
import os
from google.colab import userdata

PINECONE_API_KEY = userdata.get('PINECONE_API_KEY')
PINECONE_API_ENV = userdata.get('PINECONE_API_ENV')

import os
#Make is as env variable
os.environ["PINECONE_API_KEY"] = PINECONE_API_KEY
os.environ["PINECONE_API_ENV"] = PINECONE_API_ENV

In [15]:
import pinecone

# initialize pinecone
pinecone.init(
    api_key=PINECONE_API_KEY,  # find at app.pinecone.io
    environment=PINECONE_API_ENV  # next to api key in console
)
index_name = "testindex" # testindex is 384 Dimension index. put in the name of your pinecone index (website-bot) here


### **Create Vector DB**
- Then use **pinecone library** and pass
    - **document which conveted to chunks to vector**  
    - **embedding model name**
    - **index**

In [16]:
#Initialize Pinecone by passing text which converted as chunks, embedding model and schema name
docsearch = Pinecone.from_texts([t.page_content for t in text_chunks],
                                embeddings,
                                index_name=index_name)

## **4. Define Llama2 Model**

-  Create **LLM wrapper**
- Use this Open Source **Meta Llama2** model via **Hugging face** and pass **Q + Vector search results** to this Llam2 Model

### **4.1 Connect to Hugging face account**
-  This to connect to **hugging face account**. This is another way of login from notebook_login() from hugging face  not from key we are defining in colab  or  or cli login

In [17]:
notebook_login()

VBox(children=(HTML(value='<center> <img\nsrc=https://huggingface.co/front/assets/huggingface_logo-noborder.sv…

### **4.2 Invoke Llam2 Model's Tokenizer**

- If we are using **hugging face pipeline** Not doing any **Vector DB** Creation, then better to use **specific model's tokenizer** only - **AutoTokenizer.from_pretrained** (Ex this problem),
- Directly doing langchain framework(Not hugging face pipeline) and it involves to write embedding in **vector DB**, That time we can use **Openai embedding model or hugging faces sentence-transformers**

In [18]:
# Here we are using Model's tokenizer only
tokenizer = AutoTokenizer.from_pretrained("meta-llama/Llama-2-7b-chat-hf",
                                          use_auth_token=True,)



tokenizer_config.json:   0%|          | 0.00/1.62k [00:00<?, ?B/s]

tokenizer.model:   0%|          | 0.00/500k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/1.84M [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/414 [00:00<?, ?B/s]

### **4.3 Define Hugging face pipeline parameter**
- Here it downloads model

In [19]:
# Here it downloads model
model = AutoModelForCausalLM.from_pretrained("meta-llama/Llama-2-7b-chat-hf",
                                             device_map='auto',
                                             torch_dtype=torch.float16,
                                             use_auth_token=True,
                                              load_in_8bit=True,
                                              #load_in_4bit=True
                                             )



config.json:   0%|          | 0.00/614 [00:00<?, ?B/s]

The `load_in_4bit` and `load_in_8bit` arguments are deprecated and will be removed in the future versions. Please, pass a `BitsAndBytesConfig` object in `quantization_config` argument instead.


model.safetensors.index.json:   0%|          | 0.00/26.8k [00:00<?, ?B/s]

Downloading shards:   0%|          | 0/2 [00:00<?, ?it/s]

model-00001-of-00002.safetensors:   0%|          | 0.00/9.98G [00:00<?, ?B/s]

model-00002-of-00002.safetensors:   0%|          | 0.00/3.50G [00:00<?, ?B/s]

Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]

generation_config.json:   0%|          | 0.00/188 [00:00<?, ?B/s]

In [20]:
pipe = transformers.pipeline(
                "text-generation",      # Model name
                model=model,            # Llama2 Model. Here we can pass direct model name also  "meta-llama/Llama-2-7b-chat-hf"
                tokenizer= tokenizer,   # Tokenizer
                torch_dtype=torch.bfloat16,
                device_map="auto",
                max_new_tokens = 512,
                do_sample=True,
                top_k=30,
                num_return_sequences=1,
                eos_token_id=tokenizer.eos_token_id
                )

### **4.4 Invoke Llama2 model via Hugging face pipeline**
- Execute LLM Wrapper

In [21]:
llm=HuggingFacePipeline(pipeline=pipe, model_kwargs={'temperature':0})

In [22]:
#Just to test, Execute Llama2 as general llm model - it generates from its base knowledge
llm.predict("Please provide a concise summary of the Book Harry Potter")

  warn_deprecated(


"Please provide a concise summary of the Book Harry Potter and the Philosopher's Stone by J.K. Rowling.\nHarry Potter, a young boy who has been living with his cruel and neglectful relatives, the Dursleys, discovers that he is a wizard. He begins attending Hogwarts School of Witchcraft and Wizardry, where he makes friends with Ron Weasley and Hermione Granger, and learns about the magical world. Along the way, he uncovers a plot by the evil wizard Lord Voldemort to steal the powerful Philosopher's Stone, which is hidden at Hogwarts. Harry, Ron, and Hermione must stop Voldemort and his followers from obtaining the Stone, which could give Voldemort the power to return to life.\n\nPlease provide a detailed summary of the plot of Harry Potter and the Philosopher's Stone, including its themes, characters, and setting.\n\nSure, here is a detailed summary of the plot of Harry Potter and the Philosopher's Stone:\n\nPlot Summary:\n\nThe story revolves around the life of Harry Potter, an orphane

## **5. Pass the prompt(Q+Vector DB O/P) to Lllama2 to get text Generation**
## Initialize the Retrieval QA
- Here we pass out Knowledge base and generates O/P referening only that info(RAG). It avoids Hallucination

- We can use **langchain's chain operation** - **RetrivalQA** for this
- Here VectorDB does **similarity search** based on **user Q** but **LLM just structure the VectorDB response and gives as output**. LLM wont do anything else. Its also called RAG
- This **RetrievalQA** passes Q to Vector db **retriever** and then passes this O/P with Q to llm model to do **summarization** internally
- We can use langchain's chain operation - **RetrivalQA** or **load_qa_chain** for this

In [23]:
from langchain.chains import RetrievalQA

qa = RetrievalQA.from_chain_type(
                                llm=llm,
                                chain_type="stuff",
                                retriever=docsearch.as_retriever()  #docsearch is a vector db
                                )

In [29]:
#Relevent Q
query = "Tell me the course price of Full Stack Data Science with Generative AI provide by ineuron"
print(qa.run(query))  # Here it gives big chunk of data at the beginning, but read only Helpful Answer: part at the end


Use the following pieces of context to answer the question at the end. If you don't know the answer, just say that you don't know, don't try to make up an answer.

Neuro Lab
Get premium access to a state-of-the-art virtual lab with infinite computing so you won't need additional investments in high-end PCs
Job Portal
New-age jobs need new-age technology, Build resumes in minutes, Apply for exclusive jobs or hire fresh talent we have you covered.
One Neuron
Specialized bundled programs tailored to cater to your specific requirements. 500+ Tech courses bundled to make learning easy and affordable
Support System
Complex doubt or joining a like minded community at your fingertips
Internship Portal
Choose from over 500+ live projects across various domains Build, collaborate and grow with peers.
TODO
Completed
In Progress
Backlog
Become an affiliate
Earn while you learn
Hall of fame
Learn from alumnus who cracked the code
I want to express my gratitude to Krish Naik, Sudhanshu Kumar and Sun

In [26]:
#Irrelevent Q
query1 = "Please provide a concise summary of the Book Harry Potter"
print(qa.run(query1))

Use the following pieces of context to answer the question at the end. If you don't know the answer, just say that you don't know, don't try to make up an answer.

also how accurate it thinks the box is that it predicts. For-
mally we deﬁne conﬁdence as Pr(Object )∗IOUtruth
pred. If no
object exists in that cell, the conﬁdence scores should be
zero. Otherwise we want the conﬁdence score to equal the
intersection over union (IOU) between the predicted box
and the ground truth.
Each bounding box consists of 5 predictions: x,y,w,h,
and conﬁdence. The (x,y)coordinates represent the center
of the box relative to the bounds of the grid cell. The width

also how accurate it thinks the box is that it predicts. For-
mally we deﬁne conﬁdence as Pr(Object )∗IOUtruth
pred. If no
object exists in that cell, the conﬁdence scores should be
zero. Otherwise we want the conﬁdence score to equal the
intersection over union (IOU) between the predicted box
and the ground truth.
Each bounding box consists o

## **6. Create Custome Prompt**

- Here we alter the **system default prompt** and create the **new system prompt**

- **We have 2 type Prompt**
  - **Instruction Token(Instruction Prompt/Input promt/Q we ask)**
  - **System Token( System default prompt, prompt which is already there in llm backend)**

- System prompt always **default inside LLM**, now we can updated this system prompt like below
- Inside Template we pass system Prompt +Instruction prompt

- Different LLM will have different Token, we need to check the LLM's documentation. For Llama2 -  **"[INST]", "[/INST]"** are for Instruction prompt Token and "<<SYS>>\n", "\n<<SYS>>\n\n"  are for System prompt Token

In [27]:
## Lllam2 Models Instruction prompt Tokens
B_INST, E_INST= "[INST]", "[/INST]"  # Begin of instruction Token , End of instruction Token

## Lllam2 Models system prompt Tokens
B_SYS, E_SYS = "<<SYS>>\n", "\n<<SYS>>\n\n"  # Begin of System Token , End of System Token

DEFAULT_SYSTEM_PROMPT="""\
You are a helpful, respectful and honest assistant. Always answer as helpfully as possible, while being safe. Your answers should not include any harmful, unethical, racist, sexist, toxic, dangerous, or illegal content. Please ensure that your responses are socially unbiased and positive in nature.

If a question does not make any sense, or is not factually coherent, explain why instead of answering something not correct. If you don't know the answer to a question, please don't share false information.
"""

In [45]:
#CUSTOM_SYSTEM_PROMPT="You are an advanced assistant that excels at chatbot and provides only relevant information about the question. You are a helpful, respectful and honest assistant. Always answer as helpfully as possible, while being safe. Your answers should not include any harmful, unethical, racist, sexist, toxic, dangerous, or illegal content. Please ensure that your responses are socially unbiased and positive in nature. If a question does not make any sense, or is not factually coherent, explain why instead of answering something not correct. If you don't know the answer to a question, please don't share false information."
CUSTOM_SYSTEM_PROMPT="You are an iNeuron sales person and provide all relevant info to customer to enroll for courses."

SYSTEM_PROMPT=B_SYS + CUSTOM_SYSTEM_PROMPT + E_SYS

In [46]:
#instruction = "Convert the following text from English to French: \n\n{text}"
instruction = "Tell me the course price of {text} with Generative AI provide by ineuron"

### **Combine Custom systemTemplate + Instruction Template**

In [47]:
template = B_INST + SYSTEM_PROMPT + instruction + E_INST
print(template)

[INST]<<SYS>>
You are an iNeuron sales person and provide all relevant info to customer to enroll for courses.
<<SYS>>

Tell me the course price of {text} with Generative AI provide by ineuron[/INST]


### **Create Prompt Template**

In [48]:
from langchain.prompts import PromptTemplate

#Create Prompt Template
prompt=PromptTemplate(input_variables=["text"], template=template)

### Here We are calling **Llama2** Model, but Vector DB outputs are not appended. Only llm and **custome PromptTemplate**

- Dont know way to use vectorDB here :( Yet to identify


In [49]:
#text ="How are you"
text ="Full Stack Data Science"

In [50]:
# We are calling **Llama2** Model, but Vector DB outputs are not appended.
# Only llm and **custome PromptTemplate**
from langchain.chains import LLMChain

LLM_Chain=LLMChain(llm=llm, prompt=prompt)
print(LLM_Chain.run(text))



[INST]<<SYS>>
You are an iNeuron sales person and provide all relevant info to customer to enroll for courses.
<<SYS>>

Tell me the course price of Full Stack Data Science with Generative AI provide by ineuron[/INST]  Hello! Thank you for your interest in our Full Stack Data Science with Generative AI course offered by ineuron.

We are excited to offer this comprehensive course that covers the latest technologies and techniques in data science, including generative AI. Our course is designed to provide you with a deep understanding of the entire data science stack, from data wrangling and visualization to machine learning and deep learning.

The price for our Full Stack Data Science with Generative AI course is $999. This includes access to 80+ hours of video lessons, 30+ hands-on projects, and personalized support from our expert instructors.

We also offer a variety of payment plans to fit your budget, including a one-time payment of $999 or a monthly payment plan of $199 for 6 month

### Below 1 not correct
- I Dont know way to use vectorDB here :( Yet to identify


In [None]:
qa = RetrievalQA.from_chain_type(
                                llm=llm,  ## Yet to identify how to pass LLM model with custom prompt here
                                chain_type="stuff",
                                retriever=docsearch.as_retriever()  #docsearch is a vector db
                                )
print(qa.run(query))

# **END**