# üåü Google Reviews Fetcher - FLATTENED with Deduplication

Fetches reviews from Google Reviews API and stores in BigQuery.

## ‚ú® Features:
- üóÇÔ∏è **FLATTENED**: Each review = One row
- üîë **Unique review_id**: Prevents duplicates
- üö´ **Auto-deduplication**: Skips existing reviews
- üîÑ **Pagination**: Fetches all pages
- üìä **Easy queries**: Standard SQL columns

## üì¶ Step 1: Install Packages

In [None]:
!pip install -q google-cloud-bigquery google-auth pandas db-dtypes
print("‚úÖ Packages installed!")

## üîß Step 2: Import Libraries

In [None]:
import os
import json
import logging
import hashlib
import http.client
import time
import pandas as pd
from datetime import datetime, timezone
from typing import Optional, Dict, Any, List
from google.oauth2 import service_account
from google.cloud import bigquery
from google.colab import userdata

logging.basicConfig(level=logging.INFO, format='%(asctime)s - %(levelname)s - %(message)s')
logger = logging.getLogger(__name__)

print("‚úÖ Libraries imported!")

## üîë Step 3: Configure Credentials

In [None]:
try:
    RAPIDAPI_KEY = userdata.get('RAPIDAPI_KEY')
    print("‚úÖ RapidAPI key loaded from secrets")
except:
    RAPIDAPI_KEY = "ac0025f410mshd0c260cb60f3db6p18c4b0jsnc9b7413cd574"
    print("‚ö†Ô∏è Using hardcoded RapidAPI key")

try:
    BIGQUERY_CREDENTIALS_STR = userdata.get('BIGQUERY_KEY_JSON')
    BIGQUERY_CREDENTIALS = json.loads(BIGQUERY_CREDENTIALS_STR)
    print("‚úÖ BigQuery credentials from secrets")
    PROJECT_ID = BIGQUERY_CREDENTIALS.get('project_id', 'shopper-reviews-477306')
except:
    print("‚ö†Ô∏è Using hardcoded BigQuery credentials")
    PROJECT_ID = "shopper-reviews-477306"
    BIGQUERY_CREDENTIALS = {
        "type": "service_account",
        "project_id": "shopper-reviews-477306",
        "private_key_id": "679b00310997262ff77901f080075b509eb9c770",
        "private_key": "-----BEGIN PRIVATE KEY-----\nMIIEvQIBADANBgkqhkiG9w0BAQEFAASCBKcwggSjAgEAAoIBAQCPrVXmepJWx8A8\nXLqDARbLqqmgPwQ4NEmCCOmAZ019aFToc0Yho0/hDyMhRhsW6z/5h8YVEbheb2oR\nmFK6/v3UEN1Mf6oJWag9pBngM6IO96QAzozjXjCmIVYJku1HWi+7b4mX7La8p77N\n5fJdOh30ceC6cJSDA51r2xGJDmchRPNhRR8CS9u3xAeZZeB/pgShwJcLM4WY4L3P\niwc7qkQb91NPbB2/p3hL/JJAtCvVKf61xlWGOKEGW3pIwBUUcF2/OJ3FTuWrY7P8\n1c/Kz9LUYOZpztK9zjFCNcnCQvvVAow9bqg3fw6xqE172dQT1FG6AieFSCyUib5B\nXxwNu0phAgMBAAECggEAET1ThPqIxqA54RmgnjQqP7k0Q0XBxDCvRUq7zIFuBdyC\nm6Wr8OtUnAT3Snh2qv2tSSFRKO6zDaRsDhJrPYQigX3zNR5Nu8jQlseIUfjqusWy\nHbqq+GPb4y3gJ06Zk/8uolyUHkZJTZe0cvuNZOxNSIBwM6QV3dE4OVx+3SV88GZ/\nOkAMCUpPRLJux6vJo+l0Qcfe074qjRYPv3XUaGXyHXeOZXmze/lLF6wsEzZmP1A+\nE9xZmP4ucM3ybrYi3ipRu6YwuR2mRASLy8VFMtcYCvNZGv6ODkjF2xmpucHwX78S\nzO3mGFES3Hnknjzoif5sJuBewNSztXJcQqKgtSpDhQKBgQDCS6bYj1VR691J5wxA\n5/fl2MwY4ALIKqW4RtJyNRBZ7+WDAVkq99R6lz+AmQsb6QyiZ/yTZHSUI61Bjn0p\nd2MD/fpQle7ZOMyR1gKZk5fE5lvmfA5sK+Aax3dRI7xjPBXJYI4hiCMAxgYdhgtI\nG1C/Nf6O2HoE/W2qLEnLZadpowKBgQC9Tl+/9Eq9Q/DI74CG78U0+s2aRq19vsXZ\n+wCIUm54TcN9xw4nPKYbT24nTVwTrOu2bxEgDVmuAqtWlKGad16LqZFTZ2aUaEFC\ni1HL8UKSy5XmNcum8mrKL5+MvwExcQUSmalE3PEQDRjV65QNld0EbQ6JNz74025z\nm+3ISpIEKwKBgADf5E1fP8wRmrplbtmv8Z64PhryjzCleH9+2h2nfX5aJRdU3zjh\nSrSOj7uddL5YazUj8LAdKKUuD+6WnJueLPTspL7OHfgeWFVjuDlGv80kGE/OSSZV\ngDm+ohvcZFGyCIsSgzFFcprjSU3Ct7RIYzGpJY8xDEOPfHninyZqO7mvAoGAIsog\ndppikd3Ghmbda+7sgwwEdPHAOHeyzJiARI1BmAJShu7p/vP6YtJ6H+broQIKX4CR\n2R4a+QusiUDPYh/F1EzZVEaQZ32xYJVR9vTjky6u4ZvJTWkHjxipbag8g+WNVRnA\nLdOcyaJeihG9J7H+6C1Smoz4manhhoWFcWWi5/kCgYEAssgWnlZCygCjEQ/XDVtZ\nC8/uelJnMHO93U4yF6Xk61gazKYpXpKjNkD3xfxAyQ3zkBkWo7CXg1env8pT9ld1\nraWCeCmH/w8i0ww3Cmplks5mXIYPrPPuUCEW5D6B8hIyNC1VIoaOlva8+FgJYPIv\nC5AqN3hBRDOUbophIQmAe5I=\n-----END PRIVATE KEY-----\n",
        "client_email": "demand@shopper-reviews-477306.iam.gserviceaccount.com",
        "client_id": "100956109416744224832",
        "auth_uri": "https://accounts.google.com/o/oauth2/auth",
        "token_uri": "https://oauth2.googleapis.com/token",
        "auth_provider_x509_cert_url": "https://www.googleapis.com/oauth2/v1/certs",
        "client_x509_cert_url": "https://www.googleapis.com/robot/v1/metadata/x509/demand%40shopper-reviews-477306.iam.gserviceaccount.com",
        "universe_domain": "googleapis.com"
    }

DATASET_ID = "place_data"
SOURCE_TABLE = "Map_location"
DESTINATION_TABLE = "place_reviews_full"
API_HOST = "google-search-master-mega.p.rapidapi.com"
MAX_PAGES = None  # None = fetch ALL pages (no limit), or set number for safety
RETRY_ATTEMPTS = 3
RETRY_DELAY = 2

print(f"\n‚úÖ Configuration loaded!")
print(f"üìä Source: {PROJECT_ID}.{DATASET_ID}.{SOURCE_TABLE}")
print(f"üìä Destination: {PROJECT_ID}.{DATASET_ID}.{DESTINATION_TABLE}")
print(f"üîë Schema: FLATTENED with review_id (no duplicates!)")

## üõ†Ô∏è Step 4: Define Functions

In [None]:
def get_bigquery_client() -> Optional[bigquery.Client]:
    try:
        credentials = service_account.Credentials.from_service_account_info(
            BIGQUERY_CREDENTIALS,
            scopes=["https://www.googleapis.com/auth/cloud-platform"],
        )
        client = bigquery.Client(credentials=credentials, project=PROJECT_ID)
        logger.info(f"‚úÖ Connected to BigQuery: {PROJECT_ID}")
        return client
    except Exception as e:
        logger.error(f"‚ùå BigQuery error: {e}")
        return None

def fetch_reviews_for_place(place_id: str, page: int = 1) -> Optional[Dict[str, Any]]:
    for attempt in range(RETRY_ATTEMPTS):
        try:
            conn = http.client.HTTPSConnection(API_HOST)
            headers = {'x-rapidapi-key': RAPIDAPI_KEY, 'x-rapidapi-host': API_HOST}
            params = f"?cid={place_id}&sortBy=mostRelevant&gl=us&hl=en&page={page}"
            
            logger.info(f"üì° Fetching page {page} for CID {place_id}...")
            
            conn.request("GET", "/reviews" + params, headers=headers)
            res = conn.getresponse()
            data = res.read()
            
            if res.status == 200:
                result = json.loads(data.decode("utf-8"))
                logger.info(f"‚úÖ Page {page} fetched")
                return result
            else:
                logger.warning(f"‚ö†Ô∏è API status {res.status}")
                if attempt < RETRY_ATTEMPTS - 1:
                    time.sleep(RETRY_DELAY)
        except Exception as e:
            logger.error(f"‚ùå Error: {e}")
            if attempt < RETRY_ATTEMPTS - 1:
                time.sleep(RETRY_DELAY)
    return None

def fetch_all_reviews_for_place(place_id: str) -> Dict[str, Any]:
    all_reviews = []
    page = 1
    
    logger.info(f"üîç Fetching reviews for CID {place_id}...")
    
    while page <= MAX_PAGES:
        result = fetch_reviews_for_place(place_id, page)
        if not result:
            break
        
        reviews = result.get('reviews', [])
        all_reviews.extend(reviews)
        logger.info(f"‚úÖ Page {page}: {len(reviews)} reviews")
        
        if not result.get('nextPageToken') or len(reviews) == 0:
            break
        
        page += 1
        time.sleep(0.5)
    
    logger.info(f"üéâ Total: {len(all_reviews)} reviews")
    return {'place_id': place_id, 'total_reviews': len(all_reviews), 'reviews': all_reviews, 'pages_fetched': page}

def generate_review_id(place_id: str, iso_date: str, reviewer_name: str, snippet: str) -> str:
    """Generates unique review ID using hash."""
    unique_string = f"{place_id}_{iso_date}_{reviewer_name}_{snippet[:100]}"
    review_id = hashlib.sha256(unique_string.encode('utf-8')).hexdigest()[:16]
    return review_id

def flatten_reviews_to_rows(review_data: Dict[str, Any]) -> pd.DataFrame:
    """Flattens reviews with unique review_id for each row."""
    place_id = review_data['place_id']
    reviews = review_data['reviews']
    current_time = datetime.now(timezone.utc)
    current_date = current_time.date()
    
    rows = []
    
    for review in reviews:
        user = review.get('user', {})
        iso_date = review.get('isoDate', '')
        
        try:
            iso_timestamp = datetime.fromisoformat(iso_date.replace('Z', '+00:00')) if iso_date else None
        except:
            iso_timestamp = None
        
        reviewer_name = user.get('name', '')
        snippet = review.get('snippet', '')
        
        # Generate unique review_id
        review_id = generate_review_id(place_id, iso_date, reviewer_name, snippet)
        
        row = {
            'review_id': review_id,
            'place_id': place_id,
            'rating': review.get('rating'),
            'date': review.get('date'),
            'isoDate': iso_timestamp,
            'snippet': snippet,
            'likes': review.get('likes'),
            'reviewer_name': reviewer_name,
            'reviewer_link': user.get('link'),
            'reviewer_thumbnail': user.get('thumbnail'),
            'reviewer_reviews': user.get('reviews'),
            'reviewer_photos': user.get('photos'),
            'timestamp': current_time,
            'fetch_date': current_date,
        }
        
        rows.append(row)
    
    df = pd.DataFrame(rows)
    logger.info(f"‚úÖ Flattened {len(rows)} reviews with unique review_ids")
    return df

def get_existing_review_ids(client: bigquery.Client) -> set:
    """Gets existing review_ids to prevent duplicates."""
    table_id = f"{PROJECT_ID}.{DATASET_ID}.{DESTINATION_TABLE}"
    
    try:
        client.get_table(table_id)
        query = f"SELECT DISTINCT review_id FROM `{table_id}` WHERE review_id IS NOT NULL"
        result = client.query(query).to_dataframe()
        existing_ids = set(result['review_id'].tolist())
        logger.info(f"üìä Found {len(existing_ids)} existing review IDs")
        return existing_ids
    except Exception:
        logger.info("No existing reviews found")
        return set()

def remove_duplicate_reviews(df: pd.DataFrame, client: bigquery.Client) -> pd.DataFrame:
    """Removes reviews that already exist in BigQuery."""
    if df.empty:
        return df
    
    original_count = len(df)
    existing_ids = get_existing_review_ids(client)
    
    if not existing_ids:
        logger.info("‚úÖ No existing reviews, uploading all")
        return df
    
    df_filtered = df[~df['review_id'].isin(existing_ids)].copy()
    duplicates_removed = original_count - len(df_filtered)
    
    if duplicates_removed > 0:
        logger.info(f"üîç Removed {duplicates_removed} duplicate(s)")
        logger.info(f"üì§ {len(df_filtered)} new review(s)")
    else:
        logger.info(f"‚úÖ All {original_count} review(s) are new")
    
    return df_filtered

def create_reviews_table_if_not_exists(client: bigquery.Client) -> bool:
    """Creates table with review_id as primary key."""
    table_id = f"{PROJECT_ID}.{DATASET_ID}.{DESTINATION_TABLE}"
    
    try:
        try:
            client.get_table(table_id)
            logger.info(f"‚úÖ Table exists: {DESTINATION_TABLE}")
            return True
        except:
            pass
        
        schema = [
            bigquery.SchemaField("review_id", "STRING", mode="REQUIRED"),
            bigquery.SchemaField("place_id", "STRING", mode="REQUIRED"),
            bigquery.SchemaField("rating", "INTEGER"),
            bigquery.SchemaField("date", "STRING"),
            bigquery.SchemaField("isoDate", "TIMESTAMP"),
            bigquery.SchemaField("snippet", "STRING"),
            bigquery.SchemaField("likes", "INTEGER"),
            bigquery.SchemaField("reviewer_name", "STRING"),
            bigquery.SchemaField("reviewer_link", "STRING"),
            bigquery.SchemaField("reviewer_thumbnail", "STRING"),
            bigquery.SchemaField("reviewer_reviews", "INTEGER"),
            bigquery.SchemaField("reviewer_photos", "INTEGER"),
            bigquery.SchemaField("timestamp", "TIMESTAMP"),
            bigquery.SchemaField("fetch_date", "DATE"),
        ]
        
        table = bigquery.Table(table_id, schema=schema)
        client.create_table(table)
        
        logger.info(f"‚úÖ Created table: {DESTINATION_TABLE} (with review_id)")
        print(f"\nüîë Schema includes review_id to prevent duplicates!")
        return True
        
    except Exception as e:
        logger.error(f"‚ùå Table creation error: {e}")
        return False

def get_place_ids_to_process(client: bigquery.Client, limit: int = None) -> List[str]:
    """Gets CIDs from Map_location."""
    source_table = f"{PROJECT_ID}.{DATASET_ID}.{SOURCE_TABLE}"
    
    try:
        dest_table = f"{PROJECT_ID}.{DATASET_ID}.{DESTINATION_TABLE}"
        
        try:
            client.get_table(dest_table)
            query = f"""SELECT DISTINCT cid as place_id FROM `{source_table}` WHERE cid IS NOT NULL
            AND cid NOT IN (SELECT DISTINCT place_id FROM `{dest_table}` WHERE place_id IS NOT NULL)"""
            if limit:
                query += f" LIMIT {limit}"
            logger.info("üìä Reading 'cid' column...")
        except:
            query = f"SELECT DISTINCT cid as place_id FROM `{source_table}` WHERE cid IS NOT NULL"
            if limit:
                query += f" LIMIT {limit}"
            logger.info("üìä Reading all CIDs...")
        
        result = client.query(query).to_dataframe()
        place_ids = result['place_id'].tolist()
        logger.info(f"‚úÖ Found {len(place_ids)} CID(s)")
        return place_ids
    except Exception as e:
        logger.error(f"‚ùå Error fetching CIDs: {e}")
        return []

def upload_review_data_to_bigquery(client: bigquery.Client, review_data: Dict[str, Any]) -> bool:
    """Uploads reviews with automatic deduplication."""
    table_id = f"{PROJECT_ID}.{DATASET_ID}.{DESTINATION_TABLE}"
    
    try:
        df = flatten_reviews_to_rows(review_data)
        if df.empty:
            logger.warning("No reviews")
            return False
        
        df = remove_duplicate_reviews(df, client)
        if df.empty:
            logger.info("‚ö†Ô∏è All reviews already exist, skipping")
            return True
        
        job_config = bigquery.LoadJobConfig(write_disposition="WRITE_APPEND")
        logger.info(f"Uploading {len(df)} new review(s)...")
        job = client.load_table_from_dataframe(df, table_id, job_config=job_config)
        job.result()
        
        logger.info(f"‚úÖ Uploaded {len(df)} new review(s)")
        return True
    except Exception as e:
        logger.error(f"‚ùå Upload error: {e}")
        return False

print("‚úÖ All functions defined!")
print("\nüîë Key Features:")
print("  ‚Ä¢ Unique review_id for each review")
print("  ‚Ä¢ Automatic deduplication (no duplicate reviews!)")
print("  ‚Ä¢ Flattened schema (easy SQL queries)")

## üîç Step 5: Check Status

In [None]:
client = get_bigquery_client()

if client:
    print("üìä Current Status:\n")
    
    source_table = f"{PROJECT_ID}.{DATASET_ID}.{SOURCE_TABLE}"
    try:
        table = client.get_table(source_table)
        print(f"‚úÖ Source: {SOURCE_TABLE}")
        print(f"   Total rows: {table.num_rows:,}")
        
        query = f"SELECT COUNT(DISTINCT cid) as count FROM `{source_table}` WHERE cid IS NOT NULL"
        result = client.query(query).to_dataframe()
        print(f"   Places with CID: {result['count'].iloc[0]:,}")
    except Exception as e:
        print(f"‚ùå Source error: {e}")
    
    print()
    
    dest_table = f"{PROJECT_ID}.{DATASET_ID}.{DESTINATION_TABLE}"
    try:
        table = client.get_table(dest_table)
        print(f"‚úÖ Destination: {DESTINATION_TABLE}")
        print(f"   Schema: FLATTENED with review_id")
        print(f"   Total review rows: {table.num_rows:,}")
        
        query = f"""SELECT 
            COUNT(DISTINCT place_id) as places,
            COUNT(DISTINCT review_id) as unique_reviews,
            COUNT(*) as total_rows,
            AVG(rating) as avg_rating
        FROM `{dest_table}`"""
        result = client.query(query).to_dataframe()
        print(f"   Places: {result['places'].iloc[0]:,}")
        print(f"   Unique reviews: {result['unique_reviews'].iloc[0]:,}")
        print(f"   Total rows: {result['total_rows'].iloc[0]:,}")
        print(f"   Avg rating: {result['avg_rating'].iloc[0]:.2f} ‚≠ê")
        
        duplicates = result['total_rows'].iloc[0] - result['unique_reviews'].iloc[0]
        if duplicates > 0:
            print(f"   ‚ö†Ô∏è Duplicates: {duplicates}")
        else:
            print(f"   ‚úÖ No duplicates!")
    except:
        print(f"‚ö†Ô∏è Destination doesn't exist (will be created)")
        print(f"   Will use: FLATTENED schema with review_id")
else:
    print("‚ùå Failed to connect")

## üöÄ Step 6: Test Single Place

In [None]:
test_place_id = "7632417579134624850"

print(f"üß™ Testing CID: {test_place_id}\n")

review_data = fetch_all_reviews_for_place(test_place_id)

print(f"\nüìä API Results:")
print(f"Place ID: {review_data['place_id']}")
print(f"Total Reviews: {review_data['total_reviews']}")
print(f"Pages Fetched: {review_data['pages_fetched']}")

df_flattened = flatten_reviews_to_rows(review_data)

print(f"\nüóÇÔ∏è Flattened Data:")
print(f"Rows: {len(df_flattened)} (one per review)")
print(f"Unique review_ids: {df_flattened['review_id'].nunique()}")

print(f"\nüìã Preview:")
display(df_flattened[['review_id', 'rating', 'reviewer_name', 'snippet']].head())

## üì§ Step 7: Upload Test Data

In [None]:
client = get_bigquery_client()

if client and 'review_data' in locals():
    print("üì§ Uploading test data...\n")
    
    if create_reviews_table_if_not_exists(client):
        if upload_review_data_to_bigquery(client, review_data):
            print("\n‚úÖ Test data uploaded!")
            print(f"üîë Duplicates will be automatically prevented by review_id")
        else:
            print("\n‚ùå Upload failed")
else:
    print("‚ùå No client or data")

## üîÑ Step 8: Batch Process (All Places)

In [None]:
client = get_bigquery_client()

if not client:
    print("‚ùå No client")
else:
    print("üöÄ Starting batch processing...\n")
    
    if not create_reviews_table_if_not_exists(client):
        print("‚ùå Table creation failed")
    else:
        place_ids = get_place_ids_to_process(client, limit=5)
        
        if not place_ids:
            print("‚úÖ No new places!")
        else:
            print(f"üìä Processing {len(place_ids)} place(s)...\n")
            
            successful = 0
            failed = 0
            skipped = 0
            total_new_reviews = 0
            total_duplicates = 0
            
            for idx, place_id in enumerate(place_ids, 1):
                print(f"\n{'='*60}")
                print(f"üìç Place {idx}/{len(place_ids)}: {place_id}")
                print(f"{'='*60}")
                
                try:
                    review_data = fetch_all_reviews_for_place(place_id)
                    
                    if review_data['total_reviews'] == 0:
                        print(f"‚ö†Ô∏è No reviews, skipping")
                        skipped += 1
                        continue
                    
                    # Track duplicates
                    df = flatten_reviews_to_rows(review_data)
                    df_new = remove_duplicate_reviews(df, client)
                    duplicates_found = len(df) - len(df_new)
                    total_duplicates += duplicates_found
                    
                    if upload_review_data_to_bigquery(client, review_data):
                        successful += 1
                        total_new_reviews += len(df_new)
                        print(f"‚úÖ Success: {len(df_new)} new, {duplicates_found} duplicates skipped")
                    else:
                        failed += 1
                        
                except KeyboardInterrupt:
                    print(f"\n‚ö†Ô∏è Interrupted!")
                    break
                    
                except Exception as e:
                    failed += 1
                    print(f"‚ùå Error: {e}")
                
                if idx < len(place_ids):
                    time.sleep(1)
            
            print(f"\n{'='*60}")
            print("üìä SUMMARY")
            print(f"{'='*60}")
            print(f"‚úÖ Successful: {successful}")
            print(f"‚ùå Failed: {failed}")
            print(f"‚è≠Ô∏è Skipped: {skipped}")
            print(f"üìä New Reviews Added: {total_new_reviews:,}")
            print(f"üîç Duplicates Prevented: {total_duplicates:,}")
            print(f"{'='*60}")

## üìä Step 9: Query & Analyze

In [None]:
client = get_bigquery_client()

if client:
    table_name = f"{PROJECT_ID}.{DATASET_ID}.{DESTINATION_TABLE}"
    
    try:
        print("üìä Review Statistics\n")
        
        stats_query = f"""SELECT 
            COUNT(DISTINCT place_id) as places,
            COUNT(DISTINCT review_id) as unique_reviews,
            COUNT(*) as total_rows,
            AVG(rating) as avg_rating,
            MAX(timestamp) as last_fetch
        FROM `{table_name}`"""
        
        stats = client.query(stats_query).to_dataframe()
        display(stats)
        
        duplicates = stats['total_rows'].iloc[0] - stats['unique_reviews'].iloc[0]
        if duplicates > 0:
            print(f"\n‚ö†Ô∏è WARNING: {duplicates} duplicate rows detected!")
            print(f"üí° Run deduplication query to clean up.")
        else:
            print(f"\n‚úÖ No duplicates - review_id working perfectly!")
        
        print("\nüìà Sample Reviews:")
        sample_query = f"""SELECT 
            review_id, place_id, rating, date, reviewer_name,
            LEFT(snippet, 80) as snippet_preview
        FROM `{table_name}`
        ORDER BY timestamp DESC LIMIT 10"""
        
        samples = client.query(sample_query).to_dataframe()
        display(samples)
        
    except Exception as e:
        print(f"‚ùå Error: {e}")

---

## üìö Documentation

### üîë Key Feature: review_id

**What is it?**  
A unique 16-character hash generated from:
- `place_id`
- `isoDate`
- `reviewer_name`
- `snippet` (first 100 chars)

**Why?**  
- Same review always gets same ID
- Automatic duplicate prevention
- Can safely re-run script

### üìä Schema:

| Column | Type | Description |
|--------|------|-------------|
| **review_id** | STRING | **Unique review identifier** |
| place_id | STRING | Place CID |
| rating | INTEGER | 1-5 stars |
| date | STRING | Relative date |
| isoDate | TIMESTAMP | ISO timestamp |
| snippet | STRING | Review text |
| likes | INTEGER | Like count |
| reviewer_name | STRING | Reviewer name |
| reviewer_link | STRING | Profile link |
| reviewer_thumbnail | STRING | Image URL |
| reviewer_reviews | INTEGER | Total reviews |
| reviewer_photos | INTEGER | Total photos |
| timestamp | TIMESTAMP | Insert time |
| fetch_date | DATE | Fetch date |

### üîç Deduplication:

The script automatically:
1. Generates `review_id` for each review
2. Queries existing `review_id`s from BigQuery
3. Filters out duplicates before upload
4. Reports: `X new, Y duplicates skipped`

**Result**: No duplicate reviews! üéâ

### Example Query:

```sql
-- Count reviews per place
SELECT 
    place_id,
    COUNT(DISTINCT review_id) as review_count,
    AVG(rating) as avg_rating
FROM `shopper-reviews-477306.place_data.place_reviews_full`
GROUP BY place_id
ORDER BY review_count DESC
```

---

**Version**: 3.0 - With review_id & Deduplication  
**Last Updated**: 2025-11-05