# 3. Working With Pinecone Vector Database
Although Vector Databases have existed long before Large Language Models, vector DBs have become an important part of many LLM solutions. In particular, Retreival Augmented Generation (or RAG) architecture addresses LLM's halucinations and issues with longer-term memory by augmenting the user's prompt with the results of a search accross a vector DB. [Pinecone](https://www.pinecone.io/) is a cloud-based vector database that is easy to integrate with your CML workflow, as this notebook shows. 

Recall that in the previous exervice you created CML jobs to scrape a site and load each page into Pinecone vector DB. This notebook will focus on interacting with Pinecone from a Jupyter notebook.

![Exercise 3 overview](../assets/exercise_3.png)

### 3.1 Imports and global vars

In [27]:
import os
import pinecone
from pinecone import Pinecone, ServerlessSpec
from sentence_transformers import SentenceTransformer

EMBEDDING_MODEL_REPO = "sentence-transformers/all-mpnet-base-v2"
PINECONE_API_KEY = '483fc5b2-b428-4a98-b11b-37a1c692e6ff' #os.getenv('PINECONE_API_KEY')
PINECONE_ENVIRONMENT = os.getenv('PINECONE_ENVIRONMENT')

PINECONE_INDEX = os.getenv('PINECONE_INDEX')
dimension = 768

### 3.2 Initialize Pinecone connection
Pinecone client is initialized with the parameters defined previously. 

In [28]:
print("initialising Pinecone connection...")
#pinecone.init(api_key=PINECONE_API_KEY, environment=PINECONE_ENVIRONMENT)
pc = Pinecone(api_key=PINECONE_API_KEY)
print("Pinecone initialised")

initialising Pinecone connection...
Pinecone initialised


In [29]:
print(f"Getting '{PINECONE_INDEX}' as object...")
index = pc.Index(PINECONE_INDEX)
print("Success")

# Get latest statistics from index
current_collection_stats = index.describe_index_stats()
print('Total number of embeddings in Pinecone index is {}.'.format(current_collection_stats.get('total_vector_count')))

Getting 'cml-index-se-west' as object...
Success
Total number of embeddings in Pinecone index is 25.


### 3.3 Function to peform the vector search 
The idea is to find a chunk from the Knowledge Base that is "close" to what the original user's prompt is. We perform a semantic search using the user's question, find the nearest knowledge base chunk, and return the content of that chunk along with its source and score.

In [30]:
index

<pinecone.data.index.Index at 0x7f6ac030c460>

In [31]:
# Get embeddings for a user question and query Pinecone vector DB for nearest knowledge base chunk
def get_nearest_chunk_from_pinecone_vectordb(index, question):
    # Generate embedding for user question with embedding model
    retriever = SentenceTransformer(EMBEDDING_MODEL_REPO)
    xq = retriever.encode([question]).tolist()
    xc = index.query(vector=xq, top_k=5,
                 include_metadata=True)
    
    matching_files = []
    scores = []
    for match in xc['matches']:
        # extract the 'file_path' within 'metadata'
        file_path = match['metadata']['file_path']
        # extract the individual scores for each vector
        score = match['score']
        scores.append(score)
        matching_files.append(file_path)

    # Return text of the nearest knowledge base chunk 
    # Note that this ONLY uses the first matching document for semantic search. matching_files holds the top results so you can increase this if desired.
    response = load_context_chunk_from_data(matching_files[0])
    sources = matching_files[0]
    score = scores[0]
    return response, sources, score

# Return the Knowledge Base doc based on Knowledge Base ID (relative file path)
def load_context_chunk_from_data(id_path):
    with open(id_path, "r") as f: # Open file in read mode
        return f.read()

### 3.4 Examine the results of the vector search
Given the text of the question, we can now perform a vector search and output the results in the notebook. An important detail here is the ability to interact with metadata (e.g. context source) which can be used to narrow down the search space and, more critically, for authorization. These approaches are out of scope of this lab. 

In [34]:
#question = "What is ML Runtime?" ## (Swap with your own based on your dataset)
question = "can I use withColumn in chain?"
#question = "how many states does the us have?"
context_chunk, sources, score = get_nearest_chunk_from_pinecone_vectordb(index, question)
print("\nContext Chunk: ")
print(context_chunk)
print("\nContext Source(s): ")
print(sources)
print("\nPinecone Score: ")
print(score)


Context Chunk: 
Best practices for building Apache Spark applicationsCloudera Docs Best practices for building Apache Spark applications   Follow these best practices when building Apache Spark Scala and Java       applications:   Refrain from using withColumn in chain, loop, and calling it multiple times in a single         query. Doing so may cause performance issues. To avoid issues, use select() with multiple         columns at once. See the Apache Spark API reference linked below for more information.  Compile your applications against the same version of Spark that         you are running.                Build a single assembly JAR ("Uber" JAR) that includes all dependencies. In Maven, add             the Maven assembly plug-in to build a JAR containing all dependencies:            <plugin>   <artifactId>maven-assembly-plugin</artifactId>   <configuration>     <descriptorRefs>       <descriptorRef>jar-with-dependencies</descriptorRef>     </descriptorRefs>   </configuration>   <

### 3.5 Takeaways
* Vector search is a critical component of any LLM app using RAG architecture
* Cloudera's partner [Pinecone](https://www.pinecone.io/) provides a convenient SaaS offering for a Vector Database to support LLM RAG architecture
* Metadata from each entry in the Vector DB can be used to refine searches and add custom authorization frameworks

### Up Next: Go to Exercise 4
Where a gradio app is launched to complete the first iteration of the Q&A chat use case.