# Walmart Product Strategy: Finding Amazon Opportunities

## The Problem
Walmart needs to identify profitable product opportunities by analyzing Amazon's marketplace data. We need to find products that are:
- **High quality** (good ratings)
- **Proven demand** (sufficient reviews) 
- **Not oversaturated** (manageable competition)
- **Profitable margins** (reasonable pricing)

## The Data
Amazon product dataset with comprehensive fields including ratings, prices, reviews, categories, and availability.

## The Strategy
Use dataset-aware filtering with aliases to identify three key opportunity types:

### 1. Long-tail Products (Sweet Spot)
```python
# Products with good ratings but moderate competition
from util import AMAZON_FIELDS as AF

long_tail = (
    (AF.rating >= 4.0) &
    (AF.reviews_count >= 50) &
    (AF.reviews_count <= 500) &
    (AF.final_price <= 100) &
    AF.is_available.is_true()
)
```

### 2. High-volume Winners with Recent Sales
```python
# Proven products with strong demand and recent sales data
high_volume = (
    (AF.rating >= 4.0) &
    (AF.reviews_count > 1000) &
    (AF.bought_past_month > 100) &  # Recent sales activity
    (AF.final_price <= 200) &
    AF.is_available.is_true()
)
```

### 3. Private Label Candidates
```python
# Simple products suitable for Walmart brands
private_label = (
    (AF.rating >= 4.0) &
    (AF.final_price <= 30) &
    AF.categories.array_includes(["Grocery", "Household", "Health"]) &
    AF.is_available.is_true()
)
```


In [1]:
# Add the parent directory to Python path to import util module
import sys
import os
sys.path.append('..')

from util import (
    BrightDataFilter,
    AMAZON_FIELDS as AF,
    AMAZON_WALMART_FIELDS as AW,
    get_brightdata_api_key
)

print("✅ Successfully imported all modules!")
print(f"Amazon dataset fields: {len(AF.get_field_names())}")
print(f"Amazon-Walmart dataset fields: {len(AW.get_field_names())}")


✅ Successfully imported all modules!
Amazon dataset fields: 52
Amazon-Walmart dataset fields: 63


In [2]:
# Initialize the BrightData filter for Amazon Products dataset
api_key = get_brightdata_api_key()
amazon_filter = BrightDataFilter(api_key, "gd_l7q7dkf244hwjntr0")

print("=== Amazon Products Dataset Info ===")
info = amazon_filter.get_dataset_info()
print(f"Dataset: {info['name']}")
print(f"Available fields: {len(info['available_fields'])}")
print(f"Dataset ID: {info['dataset_id']}")

# Show some available fields
print(f"\nSample fields: {info['available_fields'][:10]}")

# Test the filter we created
print(f"\n=== Testing High Volume Filter ===")
print("Filter created successfully!")
print("Ready to execute search with amazon_filter.search_data()")


=== Amazon Products Dataset Info ===
Dataset: Amazon Products
Available fields: 52
Dataset ID: gd_l7q7dkf244hwjntr0

Sample fields: ['title', 'asin', 'parent_asin', 'brand', 'description', 'categories', 'initial_price', 'final_price', 'final_price_high', 'currency']

=== Testing High Volume Filter ===
Filter created successfully!
Ready to execute search with amazon_filter.search_data()


In [3]:
# High Volume Products with slow shipping/low inventory
# Target: Products with >1000 reviews (high sales volume) but without "FREE" delivery

# Create the filter with delivery field (array field) using Amazon dataset
high_volume_no_free_shipping = (
    (AF.rating >= 4.0) &                    # Good quality products
    (AF.reviews_count > 100) &             # High sales volume indicator
    (AF.bought_past_month > 1000) &
    (
        AF.is_available.is_false() |  
        (AF.availability.includes(['only', 'within', 'limited']))
    ) &      # Exclude products with FREE delivery
    (AF.currency == "USD")                  # USD currency
)

# Display the filter using improved class methods
print("=== High Volume Products with slow shipping/low inventory ===")
print("Filter structure:")
print(high_volume_no_free_shipping)

print(f"\nNumber of conditions: {len(high_volume_no_free_shipping.filters)}")

# submit filter to server
# response = amazon_filter.search_data(high_volume_no_free_shipping, records_limit=1000)
# print(f"Snapshot ID: {response['snapshot_id']}")

=== High Volume Products with slow shipping/low inventory ===
Filter structure:
(
  rating >= 4.0
  AND
  reviews_count > 100
  AND
  bought_past_month > 1000
  AND
  (
    is_available = False
    OR
    availability includes ['only', 'within', 'limited']
  )
  AND
  currency = USD
)

Number of conditions: 5


In [6]:
# Check Snapshot Metadata After Submission
# This demonstrates how to monitor the status of your data processing job

try:
    snapshot_id = 'snap_mflfvzr7198l6kf8ea' #response['snapshot_id']
    print(f"✅ Filter submitted! Snapshot ID: {snapshot_id}")
    
    # Check metadata immediately
    print("\n=== Checking Snapshot Metadata ===")
    metadata = amazon_filter.get_snapshot_metadata(snapshot_id)
    
    print(f"📊 Snapshot Details:")
    print(f"   Status: {metadata.get('status', 'N/A')}")
    print(f"   Created: {metadata.get('created', 'N/A')}")
    print(f"   Dataset ID: {metadata.get('dataset_id', 'N/A')}")
    print(f"   Records: {metadata.get('dataset_size', 'N/A')}")
    print(f"   File Size: {metadata.get('file_size', 'N/A')} bytes")
    print(f"   Cost: {metadata.get('cost', 'N/A')}")
    
    if metadata.get('error'):
        print(f"   Error: {metadata.get('error', 'N/A')}")
    if metadata.get('warning'):
        print(f"   Warning: {metadata.get('warning', 'N/A')}")
    
    # Show the complete metadata
    print(f"\n📋 Complete Metadata:")
    import json
    print(json.dumps(metadata, indent=2))
    
except Exception as e:
    print(f"❌ Error: {e}")


✅ Filter submitted! Snapshot ID: snap_mflfvzr7198l6kf8ea

=== Checking Snapshot Metadata ===
📊 Snapshot Details:
   Status: scheduled
   Created: 2025-09-15T18:10:21.763Z
   Dataset ID: gd_l7q7dkf244hwjntr0
   Records: N/A
   File Size: N/A bytes
   Cost: 0

📋 Complete Metadata:
{
  "id": "snap_mflfvzr7198l6kf8ea",
  "created": "2025-09-15T18:10:21.763Z",
  "status": "scheduled",
  "dataset_id": "gd_l7q7dkf244hwjntr0",
  "customer_id": "hl_a6e6d183",
  "cost": 0,
  "initiation_type": "filter_api_snapshot"
}


In [None]:
# Complete Workflow: Submit Filter and Monitor Progress
# This demonstrates the full process from submission to completion tracking

# Create a comprehensive filter for Walmart opportunities
walmart_opportunity_filter = (
    (AF.rating >= 4.0) &                    # Good quality products
    (AF.reviews_count >= 100) &             # Proven demand
    (AF.reviews_count <= 2000) &            # Not oversaturated
    (AF.bought_past_month > 50) &           # Recent sales activity
    (
        AF.is_available.is_false() |  
        (AF.availability.includes(['only', 'within', 'limited']))
    ) &      # Limited availability (opportunity for Walmart)
    (AF.currency == "USD")                  # USD currency
)

print("=== Walmart Opportunity Filter ===")
print("Filter structure:")
print(walmart_opportunity_filter)

# Submit filter to server with local record tracking
print("\n=== Submitting to BrightData API ===")
try:
    response = amazon_filter.search_data(walmart_opportunity_filter, records_limit=1000)
    snapshot_id = response['snapshot_id']
    print(f"✅ Filter submitted successfully!")
    print(f"🆔 Snapshot ID: {snapshot_id}")
    print(f"📝 Local record: {response['local_record_path']}")
    print(f"⏰ Submitted at: {response['submission_time']}")
    
    # Check initial metadata
    print("\n=== Checking Initial Status ===")
    metadata = amazon_filter.get_snapshot_metadata(snapshot_id)
    print(f"📊 Status: {metadata.get('status', 'unknown')}")
    print(f"💰 Cost: ${metadata.get('cost', 'N/A')}")
    print(f"📅 Created: {metadata.get('created', 'N/A')}")
    
    # Update local record with metadata
    amazon_filter.update_snapshot_record(snapshot_id, metadata=metadata)
    print("📝 Local record updated with metadata")
    
    # List all snapshot records
    print("\n=== All Snapshot Records ===")
    all_records = amazon_filter.list_snapshot_records()
    for record in all_records:
        print(f"  - {record['snapshot_id']}: {record['status']} ({record['submission_time']})")
    
    print(f"\n💡 To monitor progress, run:")
    print(f"   amazon_filter.wait_for_snapshot_completion('{snapshot_id}')")
    print(f"   # Or check status manually:")
    print(f"   amazon_filter.get_snapshot_metadata('{snapshot_id}')")
    
except Exception as e:
    print(f"❌ API Error: {e}")


In [9]:
# Wait for Snapshot Completion (Optional)
# This demonstrates how to wait for a snapshot to finish processing

# Note: This will wait up to 5 minutes for the snapshot to complete
# Uncomment the code below if you want to wait for completion


# Wait for snapshot to complete (this may take a few minutes)
print("=== Waiting for Snapshot Completion ===")
print("⏳ This may take a few minutes...")

try:
    final_metadata = amazon_filter.wait_for_snapshot_completion(
        snapshot_id, 
        max_wait_time=300,  # 5 minutes max
        check_interval=10   # Check every 10 seconds
    )
    
    print(f"\n🎉 Final Status: {final_metadata.get('status', 'N/A')}")
    
    if final_metadata.get('status') == 'ready':
        print(f"✅ Snapshot ready for download!")
        print(f"   Records: {final_metadata.get('dataset_size', 'N/A')}")
        print(f"   File Size: {final_metadata.get('file_size', 'N/A')} bytes")
        print(f"   Cost: {final_metadata.get('cost', 'N/A')}")
    elif final_metadata.get('status') == 'failed':
        print(f"❌ Snapshot failed: {final_metadata.get('error', 'Unknown error')}")
        
except Exception as e:
    print(f"❌ Error waiting for completion: {e}")


print("💡 Tip: Use amazon_filter.wait_for_snapshot_completion(snapshot_id) to wait for processing to complete")
print("📋 Use amazon_filter.get_snapshot_metadata(snapshot_id) to check status anytime")


=== Waiting for Snapshot Completion ===
⏳ This may take a few minutes...
Snapshot snap_mflfvzr7198l6kf8ea status: building
⏳ Snapshot building, waiting 10s...
Snapshot snap_mflfvzr7198l6kf8ea status: building
⏳ Snapshot building, waiting 10s...
Snapshot snap_mflfvzr7198l6kf8ea status: building
⏳ Snapshot building, waiting 10s...
Snapshot snap_mflfvzr7198l6kf8ea status: building
⏳ Snapshot building, waiting 10s...
Snapshot snap_mflfvzr7198l6kf8ea status: building
⏳ Snapshot building, waiting 10s...
Snapshot snap_mflfvzr7198l6kf8ea status: building
⏳ Snapshot building, waiting 10s...
Snapshot snap_mflfvzr7198l6kf8ea status: building
⏳ Snapshot building, waiting 10s...
Snapshot snap_mflfvzr7198l6kf8ea status: building
⏳ Snapshot building, waiting 10s...
Snapshot snap_mflfvzr7198l6kf8ea status: building
⏳ Snapshot building, waiting 10s...
Snapshot snap_mflfvzr7198l6kf8ea status: building
⏳ Snapshot building, waiting 10s...
Snapshot snap_mflfvzr7198l6kf8ea status: building
⏳ Snapshot buildi