<img src="https://drive.google.com/uc?export=view&id=1wYSMgJtARFdvTt5g7E20mE4NmwUFUuog" width="200">

[![Build Fast with AI](https://img.shields.io/badge/BuildFastWithAI-GenAI%20Bootcamp-blue?style=for-the-badge&logo=artificial-intelligence)](https://www.buildfastwithai.com/genai-course)
[![EduChain GitHub](https://img.shields.io/github/stars/satvik314/educhain?style=for-the-badge&logo=github&color=gold)](https://github.com/satvik314/educhain)

[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/drive/12gPwR5QoAtjp72L8tTkXKmojwmywGcGM#scrollTo=FOucE8In5aGg)
## Master Generative AI in 6 Weeks
**What You'll Learn:**
- Build with Latest LLMs
- Create Custom AI Apps
- Learn from Industry Experts
- Join Innovation Community
Transform your AI ideas into reality through hands-on projects and expert mentorship.
[Start Your Journey](https://www.buildfastwithai.com/genai-course)
*Empowering the Next Generation of AI Innovators

## 🚀 **Pinecone: Scalable Vector Database for AI Applications**  
Pinecone is a fully managed vector database optimized for managing and searching high-dimensional data.  
It supports AI and machine learning applications by enabling efficient similarity search and real-time data ingestion.  
Pinecone offers features like scalability, high availability, and seamless integration.


###**Setup and Installation**


In [None]:
!pip install "pinecone[grpc]"

### **Setup the API Key**


In [None]:
from google.colab import userdata
import os
os.environ['PINECONE_API_KEY']=userdata.get('PINECONE_API_KEY')
PINECONE_API_KEY=os.getenv('PINECONE_API_KEY')

###**Generate vectors**

In [None]:
from pinecone.grpc import PineconeGRPC as Pinecone
from pinecone import ServerlessSpec
import time

pc = Pinecone(api_key=PINECONE_API_KEY)

data = [
    {"id": "vec1", "text": "Apple is a popular fruit known for its sweetness and crisp texture."},
    {"id": "vec2", "text": "The tech company Apple is known for its innovative products like the iPhone."},
    {"id": "vec3", "text": "Many people enjoy eating apples as a healthy snack."},
    {"id": "vec4", "text": "Apple Inc. has revolutionized the tech industry with its sleek designs and user-friendly interfaces."},
    {"id": "vec5", "text": "An apple a day keeps the doctor away, as the saying goes."},
    {"id": "vec6", "text": "Apple Computer Company was founded on April 1, 1976, by Steve Jobs, Steve Wozniak, and Ronald Wayne as a partnership."}
]

embeddings = pc.inference.embed(
    model="multilingual-e5-large",
    inputs=[d['text'] for d in data],
    parameters={"input_type": "passage", "truncate": "END"}
)

print(embeddings)


EmbeddingsList(
  model='multilingual-e5-large',
  vector_type='dense',
  data=[
    {'vector_type': dense, 'values': [0.04913330078125, -0.01306915283203125, ..., -0.0196990966796875, -0.0110321044921875]},
    {'vector_type': dense, 'values': [0.032470703125, -0.027923583984375, ..., -0.020050048828125, -0.02099609375]},
    ... (2 more embeddings) ...,
    {'vector_type': dense, 'values': [0.0312347412109375, -0.0186309814453125, ..., -0.02996826171875, -0.033111572265625]},
    {'vector_type': dense, 'values': [0.039520263671875, -0.00997161865234375, ..., 0.0011930465698242188, -0.042755126953125]}
  ],
  usage={'total_tokens': 130}
)


###**Create an index**

In [None]:
index_name = "example-index"

if not pc.has_index(index_name):
    pc.create_index(
        name=index_name,
        dimension=1024,
        metric="cosine",
        spec=ServerlessSpec(
            cloud='aws',
            region='us-east-1'
        )
    )

while not pc.describe_index(index_name).status['ready']:
    time.sleep(1)


###**Upsert vectors**

In [None]:
index = pc.Index("example-index")


records = []
for d, e in zip(data, embeddings):
    records.append({
        "id": d['id'],
        "values": e['values'],
        "metadata": {'text': d['text']}
    })


index.upsert(
    vectors=records,
    namespace="example-namespace"
)


upserted_count: 6

In [None]:
time.sleep(10)

print(index.describe_index_stats())


{'dimension': 1024,
 'index_fullness': 0.0,
 'namespaces': {'example-namespace': {'vector_count': 6}},
 'total_vector_count': 6}


###**Search the index**

In [None]:
# Define your query
query = "Tell me about the tech company known as Apple."

# Convert the query into a numerical vector that Pinecone can search with
query_embedding = pc.inference.embed(
    model="multilingual-e5-large",
    inputs=[query],
    parameters={
        "input_type": "query"
    }
)

# Search the index for the three most similar vectors
results = index.query(
    namespace="example-namespace",
    vector=query_embedding[0].values,
    top_k=3,
    include_values=False,
    include_metadata=True
)

print(results)


{'matches': [{'id': 'vec2',
              'metadata': {'text': 'The tech company Apple is known for its '
                                   'innovative products like the iPhone.'},
              'score': 0.8727282,
              'sparse_values': {'indices': [], 'values': []},
              'values': []},
             {'id': 'vec4',
              'metadata': {'text': 'Apple Inc. has revolutionized the tech '
                                   'industry with its sleek designs and '
                                   'user-friendly interfaces.'},
              'score': 0.85236675,
              'sparse_values': {'indices': [], 'values': []},
              'values': []},
             {'id': 'vec6',
              'metadata': {'text': 'Apple Computer Company was founded on '
                                   'April 1, 1976, by Steve Jobs, Steve '
                                   'Wozniak, and Ronald Wayne as a '
                                   'partnership.'},
              'score': 0

###**Clean up**

In [None]:
pc.delete_index(index_name)

###**Reranking**

In [None]:
query = "Tell me about Apple's products"
documents = [
    "Apple is a popular fruit known for its sweetness and crisp texture.",
    "Apple is known for its innovative products like the iPhone.",
    "Many people enjoy eating apples as a healthy snack.",
    "Apple Inc. has revolutionized the tech industry with its sleek designs and user-friendly interfaces.",
    "An apple a day keeps the doctor away, as the saying goes."
]

In [None]:
documents


['Apple is a popular fruit known for its sweetness and crisp texture.',
 'Apple is known for its innovative products like the iPhone.',
 'Many people enjoy eating apples as a healthy snack.',
 'Apple Inc. has revolutionized the tech industry with its sleek designs and user-friendly interfaces.',
 'An apple a day keeps the doctor away, as the saying goes.']

###**Perform reranking to get top_n results based on the query**

In [None]:
reranked_results = pc.inference.rerank(
    model="bge-reranker-v2-m3",
    query=query,
    documents=documents,
    top_n=3,
    return_documents=True
)

### **Display the reranked result**
Note the reranker ranks Apple the company over apple the fruit based on the context of the query

In [None]:
print("Top 3 Reranked Documents:")
for i, entry in enumerate(reranked_results.data):
    document_text = entry['document']['text']
    score = entry['score']
    print(f"{i+1}: Score: {score}, Document: {document_text}")

Top 3 Reranked Documents:
1: Score: 0.8401279, Document: Apple is known for its innovative products like the iPhone.
2: Score: 0.23318209, Document: Apple Inc. has revolutionized the tech industry with its sleek designs and user-friendly interfaces.
3: Score: 0.17384852, Document: Apple is a popular fruit known for its sweetness and crisp texture.
