<a href="https://colab.research.google.com/github/ABK007/learning-rag/blob/main/learning_pinecone.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [1]:
!pip install -Uq pinecone


[?25l   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m0.0/427.3 kB[0m [31m?[0m eta [36m-:--:--[0m[2K   [91m━━━━━━━━━━━━━━━━[0m[90m╺[0m[90m━━━━━━━━━━━━━━━━━━━━━━━[0m [32m174.1/427.3 kB[0m [31m5.0 MB/s[0m eta [36m0:00:01[0m[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m427.3/427.3 kB[0m [31m6.9 MB/s[0m eta [36m0:00:00[0m
[?25h[?25l   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m0.0/87.5 kB[0m [31m?[0m eta [36m-:--:--[0m[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m87.5/87.5 kB[0m [31m5.9 MB/s[0m eta [36m0:00:00[0m
[?25h

In [2]:
!pip install -Uq pinecone-plugin-assistant

[?25l   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m0.0/231.3 kB[0m [31m?[0m eta [36m-:--:--[0m[2K   [91m━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m[90m╺[0m[90m━━━━━━━━━━━[0m [32m163.8/231.3 kB[0m [31m4.7 MB/s[0m eta [36m0:00:01[0m[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m231.3/231.3 kB[0m [31m4.7 MB/s[0m eta [36m0:00:00[0m
[?25h

In [9]:
from pinecone import Pinecone, ServerlessSpec

In [10]:
from google.colab import userdata
api_key = userdata.get('PINECONE_KEY')

In [11]:
pc = Pinecone(api_key=api_key)

In [13]:
assistant = pc.assistant.create_assistant(
    assistant_name="example2-assistant",
    instructions="Answer directly and succinctly. Do not provide any additional information.", # Description or directive for the assistant to apply to all responses.
    timeout=30 # Wait 30 seconds for assistant operation to complete.
)

In [14]:
data = [
    {"id": "vec1", "text": "Apple is a popular fruit known for its sweetness and crisp texture."},
    {"id": "vec2", "text": "The tech company Apple is known for its innovative products like the iPhone."},
    {"id": "vec3", "text": "Many people enjoy eating apples as a healthy snack."},
    {"id": "vec4", "text": "Apple Inc. has revolutionized the tech industry with its sleek designs and user-friendly interfaces."},
    {"id": "vec5", "text": "An apple a day keeps the doctor away, as the saying goes."},
    {"id": "vec6", "text": "Apple Computer Company was founded on April 1, 1976, by Steve Jobs, Steve Wozniak, and Ronald Wayne as a partnership."}
]

embeddings = pc.inference.embed(
    model="multilingual-e5-large",
    inputs=[d['text'] for d in data],
    parameters={"input_type": "passage", "truncate": "END"}
)
print(embeddings[0])

{'vector_type': dense, 'values': [0.04913330078125, -0.01306915283203125, ..., -0.0196990966796875, -0.0110321044921875]}


In [15]:
index_name = "quickstart"

pc.create_index(
    name=index_name,
    dimension=1024, # Replace with your model dimensions
    metric="cosine", # Replace with your model metric
    spec=ServerlessSpec(
        cloud="aws",
        region="us-east-1"
    )
)

In [16]:
# Wait for the index to be ready
while not pc.describe_index(index_name).status['ready']:
    time.sleep(1)

index = pc.Index(index_name)

vectors = []
for d, e in zip(data, embeddings):
    vectors.append({
        "id": d['id'],
        "values": e['values'],
        "metadata": {'text': d['text']}
    })

index.upsert(
    vectors=vectors,
    namespace="ns1"
)

{'upserted_count': 6}

In [17]:
print(index.describe_index_stats())


{'dimension': 1024,
 'index_fullness': 0.0,
 'namespaces': {'ns1': {'vector_count': 6}},
 'total_vector_count': 6}


In [18]:
query = "Tell me about the tech company known as Apple."

embedding = pc.inference.embed(
    model="multilingual-e5-large",
    inputs=[query],
    parameters={
        "input_type": "query"
    }
)

In [19]:
results = index.query(
    namespace="ns1",
    vector=embedding[0].values,
    top_k=3,
    include_values=False,
    include_metadata=True
)

print(results)

{'matches': [{'id': 'vec2',
              'metadata': {'text': 'The tech company Apple is known for its '
                                   'innovative products like the iPhone.'},
              'score': 0.872728229,
              'values': []},
             {'id': 'vec4',
              'metadata': {'text': 'Apple Inc. has revolutionized the tech '
                                   'industry with its sleek designs and '
                                   'user-friendly interfaces.'},
              'score': 0.852366745,
              'values': []},
             {'id': 'vec6',
              'metadata': {'text': 'Apple Computer Company was founded on '
                                   'April 1, 1976, by Steve Jobs, Steve '
                                   'Wozniak, and Ronald Wayne as a '
                                   'partnership.'},
              'score': 0.850151539,
              'values': []}],
 'namespace': 'ns1',
 'usage': {'read_units': 6}}


## what is the diifference between namespace and index ?

Okay, let's clarify the difference between a namespace and an index in Pinecone:

### Index:

Think of an index as the main database or container for your vector data.
It defines the fundamental properties for your data, such as the dimensionality of the vectors (e.g., 1024 in the provided code) and the similarity metric used to compare vectors (e.g., "cosine").
You generally create one index for a specific type of data or application.
It's a higher-level organizational unit.

### Namespace:

A namespace is a partition or subdivision within an index.
It allows you to logically group and separate your vectors within the same index.
It's useful when you have distinct categories of data that you want to keep separate, while still residing under the umbrella of the same index.
It enables you to control the scope of your searches. Queries and other operations (upserts, deletes) are always scoped to a single namespace.
This isolation is useful for things like multi-tenancy, isolating different customer's data, or separating A/B test data from production data.
By default, all data is upserted to a namespace called "" (empty string), which works like the root namespace of an index.

### Analogy:

Imagine a library:

Index: The entire library building.
Namespace: Different sections or floors within the library (e.g., fiction, non-fiction, children's books). You might only want to search within a specific section.
In the provided code:

The line pc.create_index(...) creates the main index called "quickstart".
The line index.upsert(vectors=vectors, namespace="ns1") stores the data within a namespace called "ns1" inside that "quickstart" index.
The line results = index.query(namespace="ns1", ...) then specifically searches within the "ns1" namespace to find the most similar data entries.