# Pinecone Semantic Search for Promotion Data

This notebook demonstrates how to use Pinecone for high-performance semantic search on promotion data.
We'll:
1. Set up Pinecone connection
2. Prepare promotion data with embeddings
3. Store data in Pinecone index
4. Perform semantic search to find promotion duration information

## 2. Import Libraries and Setup

In [1]:
import os
import json
from datetime import datetime, timedelta
from pinecone import Pinecone, ServerlessSpec
from openai import OpenAI
from dotenv import load_dotenv

# Load environment variables
load_dotenv()

True

## 3. Configure API Keys

**Important:** Create a `.env` file in your project directory with your API keys:

```
PINECONE_API_KEY=your_pinecone_api_key_here
OPENAI_API_KEY=your_openai_api_key_here
```

Get your Pinecone API key from: https://app.pinecone.io/
Get your OpenAI API key from: https://platform.openai.com/api-keys

In [2]:
# Initialize clients
pc = Pinecone(api_key=os.getenv("PINECONE_API_KEY"))
openai_client = OpenAI(api_key=os.getenv("OPENAI_API_KEY"))

# Configuration
index_name = "promotion-search"
dimension = 1536  # OpenAI text-embedding-ada-002 dimension
metric = "cosine"

## 4. Define Promotion Records

This is your promotion data that we'll use for semantic search.

In [3]:
records = [
    {
        "_id": "rec1",
        "title": "Exp X",
        "type": "Simple Promotion",
        "start_date": "2025-06-18",
        "end_date": "2025-06-30",
        "items": [
            {
                "promotion_id": "PROMO437",
                "component_id": "COMP437",
                "item_id": "ITEM001",
                "discount_type": "% Off",
                "discount_value": "30"
            },
            {
                "promotion_id": "PROMO437",
                "component_id": "COMP437",
                "item_id": "ITEM021",
                "discount_type": "% Off",
                "discount_value": "30"
            },
            {
                "promotion_id": "PROMO437",
                "component_id": "COMP437",
                "item_id": "ITEM041",
                "discount_type": "% Off",
                "discount_value": "30"
            }
        ]
    }
]

print(f"Loaded {len(records)} promotion record(s)")
print(json.dumps(records[0], indent=2))

Loaded 1 promotion record(s)
{
  "_id": "rec1",
  "title": "Exp X",
  "type": "Simple Promotion",
  "start_date": "2025-06-18",
  "end_date": "2025-06-30",
  "items": [
    {
      "promotion_id": "PROMO437",
      "component_id": "COMP437",
      "item_id": "ITEM001",
      "discount_type": "% Off",
      "discount_value": "30"
    },
    {
      "promotion_id": "PROMO437",
      "component_id": "COMP437",
      "item_id": "ITEM021",
      "discount_type": "% Off",
      "discount_value": "30"
    },
    {
      "promotion_id": "PROMO437",
      "component_id": "COMP437",
      "item_id": "ITEM041",
      "discount_type": "% Off",
      "discount_value": "30"
    }
  ]
}


## 5. Helper Functions

In [4]:
def get_embedding(text):
    """Generate embedding using OpenAI's text-embedding-ada-002 model"""
    response = openai_client.embeddings.create(
        model="text-embedding-ada-002",
        input=text
    )
    return response.data[0].embedding

def calculate_duration(start_date, end_date):
    """Calculate duration between two dates"""
    start = datetime.strptime(start_date, "%Y-%m-%d")
    end = datetime.strptime(end_date, "%Y-%m-%d")
    duration = end - start
    return duration.days

def prepare_text_for_embedding(record):
    """Convert promotion record to searchable text"""
    # Calculate duration
    duration_days = calculate_duration(record['start_date'], record['end_date'])
    
    # Create comprehensive text representation
    text_parts = [
        f"Promotion: {record['title']}",
        f"Type: {record['type']}",
        f"Start date: {record['start_date']}",
        f"End date: {record['end_date']}",
        f"Duration: {duration_days} days",
        f"Period: from {record['start_date']} to {record['end_date']}"
    ]
    
    # Add item details
    for item in record['items']:
        text_parts.append(
            f"Item {item['item_id']}: {item['discount_value']}{item['discount_type']} discount"
        )
    
    return " ".join(text_parts)

# Test the function
sample_text = prepare_text_for_embedding(records[0])
print("Sample text for embedding:")
print(sample_text)

Sample text for embedding:
Promotion: Exp X Type: Simple Promotion Start date: 2025-06-18 End date: 2025-06-30 Duration: 12 days Period: from 2025-06-18 to 2025-06-30 Item ITEM001: 30% Off discount Item ITEM021: 30% Off discount Item ITEM041: 30% Off discount


## 6. Create Pinecone Index

In [5]:
# Check if index exists and create if not
existing_indexes = [index_info["name"] for index_info in pc.list_indexes()]

if index_name not in existing_indexes:
    pc.create_index(
        name=index_name,
        dimension=dimension,
        metric=metric,
        spec=ServerlessSpec(
            cloud="aws",
            region="us-east-1"
        )
    )
    print(f"Created new index: {index_name}")
else:
    print(f"Index {index_name} already exists")

# Connect to the index
index = pc.Index(index_name)
print(f"Connected to index: {index_name}")
print(f"Index stats: {index.describe_index_stats()}")

Created new index: promotion-search


  from .autonotebook import tqdm as notebook_tqdm


Connected to index: promotion-search
Index stats: {'dimension': 1536,
 'index_fullness': 0.0,
 'metric': 'cosine',
 'namespaces': {},
 'total_vector_count': 0,
 'vector_type': 'dense'}


## 7. Prepare and Upload Data to Pinecone

In [6]:
# Prepare vectors for upsert
vectors_to_upsert = []

for record in records:
    # Prepare text for embedding
    text_content = prepare_text_for_embedding(record)
    
    # Generate embedding
    print(f"Generating embedding for record {record['_id']}...")
    embedding = get_embedding(text_content)
    
    # Calculate duration for metadata
    duration_days = calculate_duration(record['start_date'], record['end_date'])
    
    # Prepare metadata
    metadata = {
        "title": record["title"],
        "type": record["type"],
        "start_date": record["start_date"],
        "end_date": record["end_date"],
        "duration_days": duration_days,
        "text_content": text_content,
        "num_items": len(record["items"]),
        "promotion_ids": list(set([item["promotion_id"] for item in record["items"]]))
    }
    
    # Add to vectors list
    vectors_to_upsert.append({
        "id": record["_id"],
        "values": embedding,
        "metadata": metadata
    })

# Upsert vectors to Pinecone
print(f"\nUpserting {len(vectors_to_upsert)} vectors to Pinecone...")
upsert_response = index.upsert(vectors=vectors_to_upsert)
print(f"Upserted vectors: {upsert_response['upserted_count']}")

Generating embedding for record rec1...

Upserting 1 vectors to Pinecone...
Upserted vectors: 1


## 8. Semantic Search Function

In [7]:
def semantic_search(query, top_k=5):
    """Perform semantic search on the promotion data"""
    
    # Generate embedding for the query
    print(f"Searching for: '{query}'")
    query_embedding = get_embedding(query)
    
    # Search in Pinecone
    search_results = index.query(
        vector=query_embedding,
        top_k=top_k,
        include_metadata=True
    )
    
    return search_results

def display_search_results(results, query):
    """Display search results in a readable format"""
    print(f"\n🔍 Search Results for: '{query}'")
    print("=" * 50)
    
    if not results['matches']:
        print("No matches found.")
        return
    
    for i, match in enumerate(results['matches'], 1):
        metadata = match['metadata']
        score = match['score']
        
        print(f"\n📋 Result {i} (Similarity: {score:.4f})")
        print(f"   Title: {metadata['title']}")
        print(f"   Type: {metadata['type']}")
        print(f"   Duration: {metadata['duration_days']} days")
        print(f"   Period: {metadata['start_date']} to {metadata['end_date']}")
        print(f"   Items: {metadata['num_items']} items")
        print(f"   Promotion IDs: {', '.join(metadata['promotion_ids'])}")
        
        # Show relevant text snippet
        text_content = metadata.get('text_content', '')
        if len(text_content) > 150:
            text_content = text_content[:150] + "..."
        print(f"   Content: {text_content}")

## 9. Perform Semantic Search

In [8]:
# Main query from your requirements
search_query = "What is the duration of the promotion?"

# Perform the search
results = semantic_search(search_query, top_k=3)

# Display results
display_search_results(results, search_query)

Searching for: 'What is the duration of the promotion?'

🔍 Search Results for: 'What is the duration of the promotion?'
No matches found.


## 10. Additional Search Examples

In [9]:
# Try different types of queries
additional_queries = [
    "How long does the promotion run?",
    "When does the promotion end?",
    "Show me discount information",
    "What items are included in PROMO437?"
]

for query in additional_queries:
    results = semantic_search(query, top_k=1)
    display_search_results(results, query)
    print("\n" + "-"*50 + "\n")

Searching for: 'How long does the promotion run?'

🔍 Search Results for: 'How long does the promotion run?'

📋 Result 1 (Similarity: 0.8593)
   Title: Exp X
   Type: Simple Promotion
   Duration: 12.0 days
   Period: 2025-06-18 to 2025-06-30
   Items: 3.0 items
   Promotion IDs: PROMO437
   Content: Promotion: Exp X Type: Simple Promotion Start date: 2025-06-18 End date: 2025-06-30 Duration: 12 days Period: from 2025-06-18 to 2025-06-30 Item ITEM0...

--------------------------------------------------

Searching for: 'When does the promotion end?'

🔍 Search Results for: 'When does the promotion end?'

📋 Result 1 (Similarity: 0.8477)
   Title: Exp X
   Type: Simple Promotion
   Duration: 12.0 days
   Period: 2025-06-18 to 2025-06-30
   Items: 3.0 items
   Promotion IDs: PROMO437
   Content: Promotion: Exp X Type: Simple Promotion Start date: 2025-06-18 End date: 2025-06-30 Duration: 12 days Period: from 2025-06-18 to 2025-06-30 Item ITEM0...

--------------------------------------------

## 11. Advanced Analysis and Insights

In [10]:
def analyze_promotion_duration(record):
    """Analyze and provide detailed duration information"""
    start_date = datetime.strptime(record['start_date'], "%Y-%m-%d")
    end_date = datetime.strptime(record['end_date'], "%Y-%m-%d")
    duration = end_date - start_date
    
    print(f"\n📊 Promotion Duration Analysis for '{record['title']}'")
    print(f"   Start: {start_date.strftime('%A, %B %d, %Y')}")
    print(f"   End: {end_date.strftime('%A, %B %d, %Y')}")
    print(f"   Total Duration: {duration.days} days")
    print(f"   Duration in weeks: {duration.days / 7:.1f} weeks")
    
    # Check if promotion is currently active (assuming current date)
    today = datetime.now()
    if start_date <= today <= end_date:
        days_remaining = (end_date - today).days
        print(f"   Status: ACTIVE ({days_remaining} days remaining)")
    elif today < start_date:
        days_until_start = (start_date - today).days
        print(f"   Status: UPCOMING (starts in {days_until_start} days)")
    else:
        days_since_end = (today - end_date).days
        print(f"   Status: EXPIRED ({days_since_end} days ago)")

# Analyze the promotion
analyze_promotion_duration(records[0])


📊 Promotion Duration Analysis for 'Exp X'
   Start: Wednesday, June 18, 2025
   End: Monday, June 30, 2025
   Total Duration: 12 days
   Duration in weeks: 1.7 weeks
   Status: EXPIRED (78 days ago)


## 12. Enhanced Query Processing with Context

In [11]:
def enhanced_search_with_context(query, top_k=3):
    """Enhanced search that provides contextual answers"""
    
    # Perform semantic search
    results = semantic_search(query, top_k)
    
    if not results['matches']:
        return "No relevant information found."
    
    # Get the best match
    best_match = results['matches'][0]
    metadata = best_match['metadata']
    
    # Generate contextual response based on query type
    if any(word in query.lower() for word in ['duration', 'long', 'period', 'time']):
        start_date = metadata['start_date']
        end_date = metadata['end_date']
        duration = metadata['duration_days']
        
        response = f"""The promotion '{metadata['title']}' has a duration of {duration} days.
        
Details:
• Start Date: {start_date}
• End Date: {end_date}
• Total Duration: {duration} days ({duration/7:.1f} weeks)
• Promotion Type: {metadata['type']}
• Number of Items: {metadata['num_items']}
        """
        
        return response
    
    return f"Found information about '{metadata['title']}': {metadata['text_content']}"

# Test the enhanced search
enhanced_answer = enhanced_search_with_context("What is the duration of the promotion?")
print(enhanced_answer)

Searching for: 'What is the duration of the promotion?'
The promotion 'Exp X' has a duration of 12.0 days.

Details:
• Start Date: 2025-06-18
• End Date: 2025-06-30
• Total Duration: 12.0 days (1.7 weeks)
• Promotion Type: Simple Promotion
• Number of Items: 3.0
        


## 13. Cleanup (Optional)

In [None]:
# Uncomment the lines below if you want to delete the index after testing
# print("Deleting index...")
# pc.delete_index(index_name)
# print(f"Deleted index: {index_name}")

## 14. Summary and Next Steps

This notebook demonstrates:

1. **Data Preparation**: Converting promotion records into searchable text with duration calculations
2. **Embeddings**: Using OpenAI's text-embedding-ada-002 to create vector representations
3. **Indexing**: Storing vectors in Pinecone for fast similarity search
4. **Semantic Search**: Finding relevant information based on query meaning, not just keywords
5. **Contextual Responses**: Providing structured answers to duration-related queries

### Key Features:
- **Duration Calculation**: Automatically calculates promotion duration in days and weeks
- **Rich Metadata**: Stores comprehensive information for filtering and analysis
- **Flexible Queries**: Handles various ways of asking about promotion duration
- **Status Tracking**: Shows whether promotions are active, upcoming, or expired

### Next Steps:
1. Add more promotion records to test scalability
2. Implement filtering by date ranges or promotion types
3. Add batch processing for large datasets
4. Integrate with your existing promotion management system
5. Add monitoring and analytics for search patterns