# Pinecone Vector Store Demo

A simple demonstration of using Pinecone for vector similarity search with OpenAI embeddings.

## Setup
Import required libraries and initialize Pinecone and OpenAI clients using environment variables.

In [1]:
from pinecone import Pinecone, ServerlessSpec
import os
import dotenv
import json
from tqdm import tqdm
from openai import OpenAI


  from tqdm.autonotebook import tqdm


## Load Data
Load text passages from a file. Each line will be converted into an embedding.

In [2]:
dotenv.load_dotenv()

pc = Pinecone(api_key=os.getenv("PINECONE_API_KEY"))

client= OpenAI(api_key=os.getenv("OPENAI_API_KEY"))

def load_passages(file_path="data/cat_facts.txt"):
    with open(file_path, 'r') as f:
        passages = f.readlines()
    return [line.strip() for line in passages]

In [3]:
data=load_passages()

## Create Embeddings
Convert text passages into numerical vectors using OpenAI's embedding model.

In [4]:
embeddings = client.embeddings.create(
    model="text-embedding-3-small",
    input=data
)
print(f"Created {len(embeddings.data)} embeddings")

Created 150 embeddings


## Initialize Pinecone Index
Create a new serverless index in Pinecone to store our vectors.

In [6]:
index_name = "showcase-index"

pc.create_index(
    name=index_name,
    dimension=1536,
    metric="cosine",
    spec=ServerlessSpec(
        cloud="aws",
        region="us-east-1",
    )
)

index=pc.Index(index_name)

## Store Vectors
Upload embeddings to Pinecone, along with their original text as metadata.

In [7]:
vectors = [
    {
        "id": str(i),  # Convert to string as Pinecone expects string IDs
        "values": embedding.embedding,  # Get the actual embedding array
        "metadata": {
            "text": data[i]
        }
    }
    for i, embedding in enumerate(embeddings.data)
]

# Now upsert the properly formatted vectors
index.upsert(vectors=vectors)

{'upserted_count': 150}

## Query Similar Vectors
Demonstrate similarity search by:
1. Converting a question into an embedding
2. Finding the most similar vectors in our index

In [8]:
# Define your query
query = "Are male cats more likely to be left-pawed?"

# Convert the query into a numerical vector that Pinecone can search with
query_embedding = client.embeddings.create(
    model="text-embedding-3-small",
    input=[query]
)

# Search the index for the three most similar vectors
results = index.query(
    vector=query_embedding.data[0].embedding,
    top_k=3,
    include_values=False,
    include_metadata=True
)

print(results)

{'matches': [{'id': '5',
              'metadata': {'text': 'Female cats tend to be right pawed, while '
                                   'male cats are more often left pawed. '
                                   'Interestingly, while 90% of humans are '
                                   'right handed, the remaining 10% of lefties '
                                   'also tend to be male.'},
              'score': 0.839994252,
              'values': []},
             {'id': '97',
              'metadata': {'text': 'Cats have five toes on each front paw, but '
                                   'only four toes on each back paw.'},
              'score': 0.53766048,
              'values': []},
             {'id': '98',
              'metadata': {'text': 'Cats are sometimes born with extra toes. '
                                   'This is called polydactyl. These toes will '
                                   'not harm the cat, but you should keep his '
                           

## Format Results
Generate a clear answer using the retrieved facts and OpenAI's language model.

In [10]:

# Optional: Use OpenAI to generate a summarized answer
response = client.chat.completions.create(
    model="gpt-3.5-turbo",
    messages=[
        {"role": "system", "content": "You are a helpful assistant. Based on the provided facts, give a concise answer to the question. If the facts don't directly answer the question, say so."},
        {"role": "user", "content": f"""
Question: {query}

Relevant facts found:
{[match['metadata']['text'] for match in results['matches']]}

Please provide a brief, clear answer based on these facts."""}
    ]
)
print("User Query:")
print(query)
print("AI-Generated Answer:")
print(response.choices[0].message.content)


User Query:
Are male cats more likely to be left-pawed?
AI-Generated Answer:
Yes, male cats are more likely to be left-pawed.
