<a href="https://colab.research.google.com/github/AryaJeet1364/VectorDBs/blob/main/PineconeVectorDB.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Pinecone Hands-on

In this project, I built a minimal Retrieval-Augmented Generation (RAG) pipeline that uses Pinecone as a vector database, SentenceTransformers to create semantic embeddings of documents, and FLAN-T5 for generating context-aware answers to user queries. The system takes a query, retrieves the most relevant text chunks based on vector similarity, and generates grounded answers using the retrieved context — all without relying on LangChain or external APIs, keeping the code lightweight and easy to understand.

## Installing dependencies

In [1]:
!pip install -q "pinecone-client[grpc]" sentence-transformers transformers torch PyPDF2 accelerate

In [3]:
!pip uninstall -y pinecone-client
!pip install -q pinecone

Found existing installation: pinecone-client 6.0.0
Uninstalling pinecone-client-6.0.0:
  Successfully uninstalled pinecone-client-6.0.0


## Imports

In [4]:
import os
import torch
import PyPDF2
from pinecone import Pinecone, ServerlessSpec
from sentence_transformers import SentenceTransformer
from transformers import AutoTokenizer, AutoModelForSeq2SeqLM, pipeline
import numpy as np
from typing import List, Dict
import warnings
warnings.filterwarnings('ignore')

In [5]:
# Check for GPU
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
print(f"Using device: {device}")

Using device: cuda


## Setting up Pinecone API Key

In [6]:
from google.colab import userdata

PINECONE_API_KEY = userdata.get('PINECONE_API_KEY')
PINECONE_API_ENV = userdata.get('PINECONE_API_ENV')


pc = Pinecone(api_key=PINECONE_API_KEY)

INDEX_NAME = "rag-documents"
DIMENSION = 384

try:
    if INDEX_NAME not in pc.list_indexes().names():
        pc.create_index(
            name=INDEX_NAME,
            dimension=DIMENSION,
            metric="cosine",
            spec=ServerlessSpec(cloud="aws", region=PINECONE_API_ENV)
        )
        print(f"Created index: {INDEX_NAME}")
    else:
        print(f"Index '{INDEX_NAME}' already exists")

    index = pc.Index(INDEX_NAME)
    print("Connected to Pinecone")

except Exception as e:
    print("Pinecone setup failed:", e)



Index 'rag-documents' already exists
Connected to Pinecone


## Loading Model

In [7]:
embedding_model = SentenceTransformer('sentence-transformers/all-MiniLM-L6-v2').to(device)

model_name = "google/flan-t5-base"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForSeq2SeqLM.from_pretrained(model_name).to(device)

text_generator = pipeline(
    "text2text-generation",
    model=model,
    tokenizer=tokenizer,
    device=0 if torch.cuda.is_available() else -1,
    max_length=512,
    do_sample=True,
    temperature=0.7
)

Device set to use cuda:0


## Text Processing

In [8]:
def chunk_text(text, chunk_size=500, chunk_overlap=50):
    words = text.split()
    chunks = []
    for i in range(0, len(words), chunk_size - chunk_overlap):
        chunk = ' '.join(words[i:i + chunk_size])
        if chunk.strip():
            chunks.append(chunk.strip())
    return chunks

def upload_sample_text():
    return [
        "Artificial Intelligence (AI) is the simulation of human intelligence processes by machines. AI can learn and solve problems.",
        "Retrieval-Augmented Generation (RAG) combines document search with text generation for more accurate responses.",
        "Vector databases like Pinecone store embeddings and allow fast semantic search.",
        "Natural Language Processing (NLP) is about making computers understand and generate human language.",
        "Large Language Models (LLMs) like GPT are trained on huge datasets and can generate and understand language."
    ]

## Embeddings and Indexing

In [9]:
def generate_embeddings(texts):
    return embedding_model.encode(texts, convert_to_tensor=True).cpu().numpy()

def index_documents(texts, metadata=None):
    if metadata is None:
        metadata = [{"text": text, "source": f"doc_{i}"} for i, text in enumerate(texts)]

    embeddings = generate_embeddings(texts)

    vectors = [
        {"id": f"doc_{i}", "values": emb.tolist(), "metadata": meta}
        for i, (emb, meta) in enumerate(zip(embeddings, metadata))
    ]

    try:
        index.upsert(vectors=vectors)
        print(f"Indexed {len(vectors)} documents")
    except Exception as e:
        print("Indexing failed:", e)


## Retrieval and Generation


In [10]:
def search_documents(query, top_k=3):
    query_vec = generate_embeddings([query])[0]
    try:
        results = index.query(vector=query_vec.tolist(), top_k=top_k, include_metadata=True)
        return results['matches']
    except Exception as e:
        print("Search error:", e)
        return []

def generate_answer(query, docs):
    context = "\n\n".join([doc['metadata']['text'] for doc in docs])
    prompt = f"Context: {context}\n\nQuestion: {query}\n\nAnswer based on the context above:"
    try:
        return text_generator(prompt, max_length=200)[0]['generated_text']
    except Exception as e:
        print("Answer generation failed:", e)
        return "No answer generated."

def rag_pipeline(query):
    print(f"\nQuery: {query}\n{'-'*50}")
    docs = search_documents(query)
    if not docs:
        return {"query": query, "answer": "No relevant documents found.", "sources": []}

    answer = generate_answer(query, docs)
    sources = [{
        "rank": i + 1,
        "score": doc['score'],
        "text": doc['metadata']['text'][:200] + "...",
        "source": doc['metadata'].get('source', 'Unknown')
    } for i, doc in enumerate(docs)]

    return {"query": query, "answer": answer, "sources": sources}

## Main Func

In [11]:
def main():
    sample_texts = upload_sample_text()
    index_documents(sample_texts)

    demo_queries = [
        "What is AI?",
        "What is RAG?",
        "How do vector databases help?",
        "Explain NLP",
        "What are LLMs?"
    ]

    for q in demo_queries:
        res = rag_pipeline(q)
        print(f"Answer: {res['answer']}")
        print(f"Sources: {len(res['sources'])}")
        print("-" * 40)


In [12]:
if __name__ == "__main__":
  main()

Indexed 5 documents

Query: What is AI?
--------------------------------------------------
Answer: simulation of human intelligence processes by machines
Sources: 3
----------------------------------------

Query: What is RAG?
--------------------------------------------------
Answer: combines document search with text generation for more accurate responses
Sources: 3
----------------------------------------

Query: How do vector databases help?
--------------------------------------------------
Answer: store embeddings and allow fast semantic search
Sources: 3
----------------------------------------

Query: Explain NLP
--------------------------------------------------
Answer: The relevant sentence in the passage is: Natural Language Processing (NLP) is about making computers understand and generate human language. The relevant sentence in the passage is: Large Language Models (LLMs) like GPT are trained on huge datasets and can generate and understand language. The relevant sentence