<a href="https://colab.research.google.com/github/disnea/Chatbots/blob/main/docs/assets/namespacing_with_pinecone.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Namespacing with Pinecone

Namespacing is a neat feature in a Pinecone service that allows you to partition your data in an index. When you read from or write to a namespace in an index, you only access data in that particular namespace. In other words, data from two namespaces may have the same ids but different values. Namespacing is useful when you want to reuse the same data processing pipeline but query only a subset of your data.
For example, when you are building a movie recommender system, you could use namespacing to separate recommendations by genre. Another way is to use metadata instead of namespaces for recommendations by genre.

## Prerequisites

A paid Pinecone account as namespaces are not available on the free starter tier.

Install dependencies.

In [None]:
!pip install -qU \
  pinecone-client==2.2.2 \
  pandas==2.0.3

Set up Pinecone.

In [None]:
import os
import pinecone

# get api key from app.pinecone.io
api_key = os.environ.get('PINECONE_API_KEY') or 'PINECONE_API_KEY'
# find your environment next to the api key in pinecone console
env = os.environ.get('PINECONE_ENVIRONMENT') or 'PINECONE_ENVIRONMENT'

pinecone.init(
    api_key=api_key,
    environment=env
)

  from tqdm.autonotebook import tqdm


## Creating the Index

In [None]:
index_name = "pinecone-namespacing"

In [None]:
import time

# Delete index if exists
if index_name in pinecone.list_indexes():
    pinecone.delete_index(index_name)

# Create an index
pinecone.create_index(name=index_name, dimension=2, metric="euclidean")

# wait for index to be ready before connecting
while not pinecone.describe_index(index_name).status['ready']:
    time.sleep(1)

pinecone.list_indexes()

['arxiv-test',
 'azure-openai-langchain-intro',
 'llama-2-rag',
 'openai-retrieval-test',
 'pinecone-namespacing']

In [None]:
# Connect to the index
index = pinecone.Index(index_name=index_name)

### Generate movie data

In [None]:
# Generate some data
import pandas as pd

df = pd.DataFrame()
df["id"] = ["Wall-E", "Up", "Ratatouille", "Toy Story"]
df["vector"] = [[1., 1.], [2., 2.], [3., 3.], [4., 4.]]
df

Unnamed: 0,id,vector
0,Wall-E,"[1.0, 1.0]"
1,Up,"[2.0, 2.0]"
2,Ratatouille,"[3.0, 3.0]"
3,Toy Story,"[4.0, 4.0]"


### Insert vectors without specifying a namespace

In [None]:
# Insert vectors without specifying a namespace
index.upsert(vectors=zip(df.id, df.vector))
index.describe_index_stats()

{'dimension': 2,
 'index_fullness': 0.0,
 'namespaces': {'': {'vector_count': 4}},
 'total_vector_count': 4}

### Insert vectors into a namespace

In [None]:
romantic_comedies = ["Wall-E", "Ratatouille"]
romcom_df = df[df.id.isin(romantic_comedies)]
romcom_df

Unnamed: 0,id,vector
0,Wall-E,"[1.0, 1.0]"
2,Ratatouille,"[3.0, 3.0]"


In [None]:
# Insert vectors into a namespace
index.upsert(vectors=zip(romcom_df.id, romcom_df.vector), namespace="romantic-comedy")
index.describe_index_stats()

{'dimension': 2,
 'index_fullness': 0.0,
 'namespaces': {'': {'vector_count': 4},
                'romantic-comedy': {'vector_count': 2}},
 'total_vector_count': 6}

### Query top-3 results, without a namespace

In [None]:
query_results = index.query(df[df.id == "Wall-E"].vector[0], top_k=3)
query_results

{'matches': [{'id': 'Wall-E', 'score': 0.0, 'values': []},
             {'id': 'Up', 'score': 1.99999905, 'values': []},
             {'id': 'Ratatouille', 'score': 7.99999809, 'values': []}],
 'namespace': ''}

### Query top-3 results, with a namespace

We should expect to see only romantic comedies in the query results.

In [None]:
query_results = index.query(
    df[df.id == "Wall-E"].vector[0], top_k=3, namespace="romantic-comedy"
)
query_results

{'matches': [{'id': 'Wall-E', 'score': 0.0, 'values': []},
             {'id': 'Ratatouille', 'score': 7.99999809, 'values': []}],
 'namespace': 'romantic-comedy'}

### Delete the index

In [None]:
# delete the index
pinecone.delete_index(index_name)