# **Vector DB - PINECONE DB**

**Note - This script executed in Google Colab**



- **Pinecone is a cloud based** Vector DB.
- **Pinecone and Weaviate**  are **cloud based db**, we need to **take subscription**, but it provides initial free credits, we can create only 1 cluster. if we **dont want to save our private data** there, then we should use Chrome db/FAISS
- Set **API key** in Pinecone website
- We need to **define/create index** and its **dimensions**. That time we get **API_Env**.
	- **PINECONE_API_ENV = 'gcp-starter'**
- If our embedding model creates vector of 384 diemnsions then we need to set diemnsions =384, while creating pinecone index. Then **.init** initialize the pinecone by providing index

- Whenever **connecting to Pinecone Via  API key and env key**, that time**import direct pinecone library** and use
- Whenever **importing embedding and doing db registry to Pinecone**, that time use **pinecone from langchain.vectorstores import Pinecone**

## **Terminology:**
- **CHROMA/PINECONE-CLIENT** Db **pip installed**, Then called via **langchain's vectore_stores**
- Here we used hugging faces's embedding -**sentence-transformers** - **This framework generates embeddings for each input sentence**
- **Chunking/Chunk_size:** In document/datset we will have more no of tokens, but word embedding LLM models will have **token size /token_limitation** like 4k Tokens etc, So to accomodate to that size, we **split our data as chunks**
- **Chunk_overlap =50:** It takes **50 token behind from previous chunk** while creating next chunk


## **Below steps followed:**
- Login to **Pinecone website(Pinecone: https://www.pinecone.io/)**, Create
	- **APE_KEY**
	- **API_Env**
	- **New index**

-  **Download some document**
- Then **split that into chunks**
- Then import **openai embedding or hugging face embedding model** or some other embedding which converts **tokens/text to vector**
- In **Pinecone** Create cluster/Index with dimention =384. Here our embedding converts chunk to **384 dimension vector**
- Then use **pinecone library** and pass
    - **document which conveted to chunks to vector**  
    - **embedding model name**
    - **index**
- This converts **chunk to vectors/embedding**, which will be **saved inside index in pinecone cloud**
- Each chunks creates as 1 vector, we can see this in **Pinecone website, under our index**
- Then we need to **Use this vector_db** which we just now created by mentioning **vector_db**  
- Then use **as_retriever** to **read vector db** and **do  symantic search on this**
- Then this **symantic/similarity search** will give **K=4 relavant answers**, that along **with user Q** we will **feed to LLM** to provide **meaningfull response on that Q**.
- We can use **langchain's chain operation** - **RetrivalQA** for this
- We can set this # of relevant answer by setting **search_kwargs ={k:2}**
by using Chroma library
- Here **VectorDB does similarity search based on user Q** but **LLM just structure the VectorDB response and gives as output**. LLM wont do anything else. **Its also called RAG**
- This **RetrievalQA** passes Q to Vector db **retriever** and then passes this O/P with Q to llm model to do **summarization** internally
- We can use langchain's chain operation - **RetrivalQA** or **load_qa_chain** for this



Pinecone: https://www.pinecone.io/

In [5]:
!pip install langchain
!pip install pinecone-client==2.2.4
!pip install pypdf
!pip install sentence-transformers==2.2.2

Collecting langchain
  Downloading langchain-0.1.13-py3-none-any.whl (810 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m810.5/810.5 kB[0m [31m4.8 MB/s[0m eta [36m0:00:00[0m
Collecting dataclasses-json<0.7,>=0.5.7 (from langchain)
  Downloading dataclasses_json-0.6.4-py3-none-any.whl (28 kB)
Collecting jsonpatch<2.0,>=1.33 (from langchain)
  Downloading jsonpatch-1.33-py2.py3-none-any.whl (12 kB)
Collecting langchain-community<0.1,>=0.0.29 (from langchain)
  Downloading langchain_community-0.0.29-py3-none-any.whl (1.8 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m1.8/1.8 MB[0m [31m26.9 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting langchain-core<0.2.0,>=0.1.33 (from langchain)
  Downloading langchain_core-0.1.33-py3-none-any.whl (269 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m269.1/269.1 kB[0m [31m17.0 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting langchain-text-splitters<0.1,>=0.0.1 (from langchain)
  Downl

## **1. Read the Document**
- Create directory pdfs and keep pdf file here, which will be used to created DB
- This pdf folder creating inside colab env,so it will deleted once session completes

In [2]:
!mkdir pdfs

In [3]:
!ls -l

total 8
drwxr-xr-x 2 root root 4096 Mar 24 07:59 pdfs
drwxr-xr-x 1 root root 4096 Mar 21 13:23 sample_data


### **Extract the Text from the PDF's**

In [6]:
from langchain.document_loaders import PyPDFDirectoryLoader

In [9]:
loader = PyPDFDirectoryLoader("pdfs")
data = loader.load()
data[:1]

[Document(page_content='Retrieval-Augmented Generation for Large Language Models: A Survey\nYunfan Gao1,Yun Xiong2,Xinyu Gao2,Kangxiang Jia2,Jinliu Pan2,Yuxi Bi3,Yi\nDai1,Jiawei Sun1,Qianyu Guo4,Meng Wang3and Haofen Wang1,3∗\n1Shanghai Research Institute for Intelligent Autonomous Systems, Tongji University\n2Shanghai Key Laboratory of Data Science, School of Computer Science, Fudan University\n3College of Design and Innovation, Tongji University\n4School of Computer Science, Fudan University\nAbstract\nLarge Language Models (LLMs) demonstrate\nsignificant capabilities but face challenges such\nas hallucination, outdated knowledge, and non-\ntransparent, untraceable reasoning processes.\nRetrieval-Augmented Generation (RAG) has\nemerged as a promising solution by incorporating\nknowledge from external databases. This enhances\nthe accuracy and credibility of the models, particu-\nlarly for knowledge-intensive tasks, and allows for\ncontinuous knowledge updates and integration of\ndomai

### Split the whole document to chunks
- split that into chunks with **chunk_size=500, chunk_overlap=20** using **RecursiveCharacterTextSplitter**

In [14]:
from langchain.text_splitter import RecursiveCharacterTextSplitter

text_splitter = RecursiveCharacterTextSplitter(chunk_size=500, chunk_overlap=20)
text_chunks = text_splitter.split_documents(data)

In [15]:
len(text_chunks) # Total it creats 281 chunks based on specified chunk size and chunk_overlap

281

In [16]:
text_chunks[2]

Document(page_content='domain-specific information. RAG synergistically\nmerges LLMs’ intrinsic knowledge with the vast,\ndynamic repositories of external databases. This\ncomprehensive review paper offers a detailed\nexamination of the progression of RAG paradigms,\nencompassing the Naive RAG, the Advanced RAG,\nand the Modular RAG. It meticulously scrutinizes\nthe tripartite foundation of RAG frameworks,\nwhich includes the retrieval , the generation and\nthe augmentation techniques. The paper highlights', metadata={'source': 'pdfs/RAG_LLM_Pdf.pdf', 'page': 0})

## **2. Creating Vector DB**

- Then import **openai embedding or hugging face embedding model** or some other embedding which converts **tokens/text to vector**
- In **Pinecone** Create cluster/Index with dimention =384. Here our embedding converts chunk to **384 dimension vector**
- Then use **Pinecone/vectore db library** and pass
    - **document which conveted to chunks to vector**  
    - **embedding model name**
    - **index**
- This converts **chunk to vectors/embedding**, which will be **saved inside index in pinecone cloud**

### **Initialize Embedding**

- Used Hugging face embedding - **sentence-transformers/all-MiniLM-L6-v2**
- Here it downloads embedding model

In [17]:
from langchain.embeddings import HuggingFaceEmbeddings
#from langchain.embeddings import OpenAIEmbeddings

embeddings = HuggingFaceEmbeddings(model_name="sentence-transformers/all-MiniLM-L6-v2")

In [18]:
# Test this initialized embedding model with sample text
query_result = embeddings.embed_query("Hello World")
query_result[:10]

[-0.03447727486491203,
 0.03102317824959755,
 0.006734995171427727,
 0.026108944788575172,
 -0.039361994713544846,
 -0.16030240058898926,
 0.06692399084568024,
 -0.006441427860409021,
 -0.04745052009820938,
 0.014758813194930553]

In [19]:
print("Length", len(query_result))

Length 384


> This embedding model creates vector with **384 dimension**, so we created pinecone index also 384 dimension

### **Intialize Pinecone Vector DB**

#### Invoke and Initialize Pinecone

In [20]:
from google.colab import userdata
PINECONE_API_KEY = userdata.get('PINECONE_API_KEY')
PINECONE_API_ENV = userdata.get('PINECONE_API_ENV')

import os
#Make is as env variable
os.environ["PINECONE_API_KEY"] = PINECONE_API_KEY
os.environ["PINECONE_API_ENV"] = PINECONE_API_ENV

In [22]:
import pinecone

# initialize pinecone
pinecone.init(
    api_key=PINECONE_API_KEY,  # find at app.pinecone.io
    environment=PINECONE_API_ENV  # next to api key in console
)
index_name = "testindex" # put in the name of your pinecone index here


### **Create Vector DB**
- Then use **pinecone library** and pass
    - **document which conveted to chunks to vector**  
    - **embedding model name**
    - **index**

In [23]:
from langchain.vectorstores import Pinecone

#Initialize Pinecone by passing text which converted as chunks, embedding model and schema name
docsearch = Pinecone.from_texts([t.page_content for t in text_chunks],
                                embeddings,
                                index_name=index_name)

### Load the Vector DB from Pinecone
- load this vector_db which we just now created by mentioning index_name and embedding model name
- If you already have an index(Means already have existing PINECONE Vector DB index with all vector data), you can load it like this

- docsearch = Pinecone.from_existing_index(index_name, embeddings)

In [25]:
docsearch = Pinecone.from_existing_index(index_name, embeddings) # This step used if we are calling already existing index
docsearch

<langchain_community.vectorstores.pinecone.Pinecone at 0x7da7786a2a10>

## **3. Sementic/Similarity Search**
- Then use **similarity_search** to read vector db and do **symantic search** on this
- Then this symantic/similarity search will give K=4 relavant answers, that along with user Q we will feed to LLM to provide meaningfull response on that Q.
We can use langchain's chain operation - RetrivalQA for this

### Set Retrival argument **search_kwargs={"k": 3}**

In [26]:
query = "What is yolo?"

In [27]:
docs = docsearch.similarity_search(query, k=3)
docs

[Document(page_content='Figure 6: Qualitative Results. YOLO running on sample artwork and natural images from the internet. It is mostly accurate although it\ndoes think one person is an airplane.\nincluding the time to fetch images from the camera and dis-\nplay the detections.\nThe resulting system is interactive and engaging. While\nYOLO processes images individually, when attached to a\nwebcam it functions like a tracking system, detecting ob-\njects as they move around and change in appearance. A'),
 Document(page_content='Figure 6: Qualitative Results. YOLO running on sample artwork and natural images from the internet. It is mostly accurate although it\ndoes think one person is an airplane.\nincluding the time to fetch images from the camera and dis-\nplay the detections.\nThe resulting system is interactive and engaging. While\nYOLO processes images individually, when attached to a\nwebcam it functions like a tracking system, detecting ob-\njects as they move around and change 

In [28]:
len(docs)

3

## **4. Use OPENAI LLM Model and Make a chain and do Semantic Search**
- We can use **langchain's chain operation** - **RetrivalQA** for this
- We can set this # of relevant answer by setting search_kwargs ={k:2} by using Chroma library
- Here VectorDB does **similarity search** based on **user Q** but **LLM just structure the VectorDB response and gives as output**. LLM wont do anything else. Its also called RAG
- This **RetrievalQA** passes Q to Vector db **retriever** and then passes this O/P with Q to llm model to do **summarization** internally
- We can use langchain's chain operation - **RetrivalQA** or **load_qa_chain** for this

In [36]:
!pip install openai -q

In [30]:
from google.colab import userdata
OPENAI_API_KEY = userdata.get('OPENAI_API_KEY')

import os
os.environ["OPENAI_API_KEY"] = OPENAI_API_KEY

In [31]:
from langchain.llms import OpenAI
from langchain.chains import RetrievalQA

In [32]:
llm = OpenAI()

  warn_deprecated(


In [33]:
qa = RetrievalQA.from_chain_type(llm=llm,
                                 chain_type="stuff",
                                 retriever=docsearch.as_retriever()
                                )

### Call Chain and get response (RAG)

In [34]:
query = "What is yolo?" #which is here in content file
print('\n',qa.run(query))

  warn_deprecated(



  YOLO is a detection system that is able to see the entire image during training and test time, allowing it to encode contextual information about classes and their appearance. It is also able to detect objects as they move and change in appearance, making it useful for tracking systems. Additionally, YOLO has been shown to make fewer background errors compared to other top detection methods.


In [35]:
# full example which is not there in content file
query = "what is spacex?"

print('\n',qa.run(query))


  I don't know, as there is no mention of SpaceX in the given context.


# **END**