# üåü Google Reviews Collector - BigQuery Integration

This notebook fetches Google reviews for all locations from the Map_Location table and stores them in BigQuery.

## Features:
- üîç Fetches CIDs from Map_Location BigQuery table
- üí¨ Scrapes reviews for each location using RapidAPI
- üíæ Stores all reviews in a BigQuery Reviews table
- üöÄ Batch processing with progress tracking
- ‚ú® Automatic deduplication and error handling

## ‚ö° Quick Start Guide

**Important: Run cells in order from top to bottom!**

1. **Step 1:** Install packages
2. **Step 2:** Import libraries  
3. **Step 3:** Configure API credentials
4. **Step 4:** Initialize BigQuery client
5. **Step 5:** Fetch CIDs from Map_Location table

If Step 5 shows "No data", run Step 5a to diagnose the issue.

---

## üì¶ Step 1: Install Required Packages

In [None]:
!pip install -q requests pandas google-cloud-bigquery google-auth db-dtypes
print("‚úÖ All packages installed successfully!")

## üîß Step 2: Import Libraries

In [None]:
import os
import json
import logging
import http.client
import time
import pandas as pd
from datetime import datetime, timezone
from typing import Optional, Dict, Any, List
from google.oauth2 import service_account
from google.cloud import bigquery
from google.colab import userdata

# Setup logging
logging.basicConfig(
    level=logging.INFO,
    format='%(asctime)s - %(levelname)s - %(message)s'
)
logger = logging.getLogger(__name__)

print("‚úÖ Libraries imported successfully!")

## üîç Step 5a: Diagnose BigQuery Data (Run this if no data found)

‚ö†Ô∏è **Important:** Make sure you've run all previous cells (Steps 1-4) before running this diagnostic.

If the previous cell shows no data, run this cell to diagnose the issue:

In [None]:
def diagnose_bigquery_data():
    """
    Comprehensive diagnosis of BigQuery data availability.
    """
    print("üîç BIGQUERY DATA DIAGNOSIS")
    print("=" * 80)
    
    # Check if variables are defined
    try:
        _ = PROJECT_ID
        _ = DATASET_ID
        _ = bq_client
    except NameError as e:
        print("‚ùå Configuration variables not found!")
        print("\n‚ö†Ô∏è You need to run the previous cells first:")
        print("   1. Step 1: Install packages")
        print("   2. Step 2: Import libraries")
        print("   3. Step 3: Configure credentials")
        print("   4. Step 4: Initialize BigQuery client")
        print("\n   Then come back and run this cell.")
        return
    
    try:
        # Check if dataset exists
        dataset_id = f"{PROJECT_ID}.{DATASET_ID}"
        try:
            dataset = bq_client.get_dataset(dataset_id)
            print(f"‚úÖ Dataset exists: {dataset_id}")
            print(f"   Created: {dataset.created}")
            print(f"   Location: {dataset.location}")
        except Exception as e:
            print(f"‚ùå Dataset not found: {dataset_id}")
            print(f"   Error: {e}")
            return
        
        # List all tables in dataset
        print(f"\nüìä Tables in {DATASET_ID}:")
        tables = bq_client.list_tables(dataset_id)
        table_list = list(tables)
        
        if not table_list:
            print("   ‚ö†Ô∏è No tables found in dataset")
            return
        
        for table in table_list:
            full_table = bq_client.get_table(f"{dataset_id}.{table.table_id}")
            print(f"   - {table.table_id}: {full_table.num_rows:,} rows")
        
        # Check Map_Location table specifically
        print(f"\nüó∫Ô∏è Checking {SOURCE_TABLE} table:")
        table_id = f"{PROJECT_ID}.{DATASET_ID}.{SOURCE_TABLE}"
        
        try:
            table = bq_client.get_table(table_id)
            print(f"   ‚úÖ Table exists")
            print(f"   üìä Total rows: {table.num_rows:,}")
            
            if table.num_rows == 0:
                print("\n   ‚ö†Ô∏è TABLE IS EMPTY!")
                print("\n   üí° To populate the table:")
                print("      1. Go to your Map_Location_Final.ipynb notebook")
                print("      2. Run the data collection cells")
                print("      3. Upload locations to BigQuery")
                print("      4. Come back here and run this notebook again")
            else:
                # Show sample data
                sample_query = f"SELECT * FROM `{table_id}` LIMIT 3"
                sample_df = bq_client.query(sample_query).to_dataframe()
                print("\n   üìã Sample data:")
                print(sample_df.to_string())
                
                # Check for CID column
                print("\n   üîë Checking for CID-like columns:")
                cid_cols = [col for col in sample_df.columns if 'id' in col.lower() or 'cid' in col.lower()]
                if cid_cols:
                    for col in cid_cols:
                        non_null = sample_df[col].notna().sum()
                        print(f"      - {col}: {non_null}/{len(sample_df)} non-null values")
                else:
                    print("      ‚ö†Ô∏è No CID-like columns found!")
                
        except Exception as e:
            print(f"   ‚ùå Table not found: {SOURCE_TABLE}")
            print(f"   Error: {e}")
            print("\n   üí° The table needs to be created first.")
            print("      Run the Map_Location_Final.ipynb notebook to create it.")
    
    except Exception as e:
        print(f"‚ùå Error during diagnosis: {e}")
        import traceback
        print(traceback.format_exc())

# Run diagnosis
diagnose_bigquery_data()

## üß™ Step 5b: Test with a Single CID (Optional)

If you want to test the review scraping before processing all locations, run this cell with a test CID:

In [None]:
# TEST WITH A SINGLE CID
# Change this to your test CID or leave the default
TEST_CID = "7632417579134624850"  # Example CID
TEST_LOCATION_NAME = "Test Location"

def test_single_cid(cid: str, location_name: str = "Test Location"):
    """
    Test review fetching for a single CID.
    """
    print(f"üß™ TESTING REVIEW SCRAPING")
    print("=" * 80)
    print(f"CID: {cid}")
    print(f"Location: {location_name}")
    print("\nüîÑ Fetching reviews...\n")
    
    try:
        # Fetch reviews
        reviews = fetch_reviews_for_cid(cid, max_reviews=10)
        
        if reviews:
            print(f"\n‚úÖ Successfully fetched {len(reviews)} reviews!")
            
            # Convert to DataFrame
            reviews_df = reviews_to_dataframe(reviews, cid, location_name)
            
            print("\nüìä Sample reviews:")
            print("=" * 80)
            for idx, row in reviews_df.head(3).iterrows():
                print(f"\n‚≠ê Rating: {row['rating']}/5")
                print(f"üë§ User: {row['user_name']}")
                print(f"üìÖ Date: {row['date']}")
                print(f"üí¨ Review: {row['snippet'][:150]}...")
                print("-" * 80)
            
            # Ask if user wants to upload this test data
            print("\nüíæ Test data is ready.")
            print("   To upload to BigQuery, uncomment the line below and run again:")
            print("   # upload_reviews_to_bigquery(reviews_df)")
            
            return reviews_df
        else:
            print("‚ùå No reviews found or error occurred.")
            return None
            
    except Exception as e:
        print(f"‚ùå Error during test: {e}")
        import traceback
        print(traceback.format_exc())
        return None

# Run the test (uncomment to use)
# test_df = test_single_cid(TEST_CID, TEST_LOCATION_NAME)

print("‚ÑπÔ∏è To test review scraping:")
print("   1. Set TEST_CID to a valid Google Maps CID")
print("   2. Uncomment the last line: test_df = test_single_cid(...)")
print("   3. Run this cell")

## üîë Step 3: Configure API Credentials

### Option A: Using Colab Secrets (Recommended)
1. Click on the üîë key icon in the left sidebar
2. Add a secret named `RAPIDAPI_KEY` with your API key
3. Add a secret named `BIGQUERY_KEY_JSON` with your service account JSON

### Option B: Manual Configuration
Uncomment and fill in the credentials below

In [None]:
# RapidAPI Configuration
RAPIDAPI_HOST = "google-search-master-mega.p.rapidapi.com"

# Try to get credentials from Colab secrets first
try:
    RAPIDAPI_KEY = userdata.get('RAPIDAPI_KEY')
    print("‚úÖ RapidAPI key loaded from Colab secrets")
except:
    # Manual configuration - uncomment and fill in
    RAPIDAPI_KEY = "ac0025f410mshd0c260cb60f3db6p18c4b0jsnc9b7413cd574"  # Your API key
    print("‚ö†Ô∏è RapidAPI key loaded from manual configuration")

# Load BigQuery credentials from secrets
try:
    BIGQUERY_CREDENTIALS_STR = userdata.get('BIGQUERY_KEY_JSON')
    BIGQUERY_CREDENTIALS = json.loads(BIGQUERY_CREDENTIALS_STR)
    print("‚úÖ BigQuery credentials loaded from Colab secrets")
    PROJECT_ID = BIGQUERY_CREDENTIALS.get('project_id', 'shopper-reviews-477306')
except:
    # Fallback to manual configuration
    print("‚ö†Ô∏è BigQuery credentials loaded from manual configuration")
    PROJECT_ID = "shopper-reviews-477306"
    BIGQUERY_CREDENTIALS = {
        "type": "service_account",
        "project_id": "shopper-reviews-477306",
        "private_key_id": "679b00310997262ff77901f080075b509eb9c770",
        "private_key": "-----BEGIN PRIVATE KEY-----\\nMIIEvQIBADANBgkqhkiG9w0BAQEFAASCBKcwggSjAgEAAoIBAQCPrVXmepJWx8A8\\nXLqDARbLqqmgPwQ4NEmCCOmAZ019aFToc0Yho0/hDyMhRhsW6z/5h8YVEbheb2oR\\nmFK6/v3UEN1Mf6oJWag9pBngM6IO96QAzozjXjCmIVYJku1HWi+7b4mX7La8p77N\\n5fJdOh30ceC6cJSDA51r2xGJDmchRPNhRR8CS9u3xAeZZeB/pgShwJcLM4WY4L3P\\niwc7qkQb91NPbB2/p3hL/JJAtCvVKf61xlWGOKEGW3pIwBUUcF2/OJ3FTuWrY7P8\\n1c/Kz9LUYOZpztK9zjFCNcnCQvvVAow9bqg3fw6xqE172dQT1FG6AieFSCyUib5B\\nXxwNu0phAgMBAAECggEAET1ThPqIxqA54RmgnjQqP7k0Q0XBxDCvRUq7zIFuBdyC\\nm6Wr8OtUnAT3Snh2qv2tSSFRKO6zDaRsDhJrPYQigX3zNR5Nu8jQlseIUfjqusWy\\nHbqq+GPb4y3gJ06Zk/8uolyUHkZJTZe0cvuNZOxNSIBwM6QV3dE4OVx+3SV88GZ/\\nOkAMCUpPRLJux6vJo+l0Qcfe074qjRYPv3XUaGXyHXeOZXmze/lLF6wsEzZmP1A+\\nE9xZmP4ucM3ybrYi3ipRu6YwuR2mRASLy8VFMtcYCvNZGv6ODkjF2xmpucHwX78S\\nzO3mGFES3Hnknjzoif5sJuBewNSztXJcQqKgtSpDhQKBgQDCS6bYj1VR691J5wxA\\n5/fl2MwY4ALIKqW4RtJyNRBZ7+WDAVkq99R6lz+AmQsb6QyiZ/yTZHSUI61Bjn0p\\nd2MD/fpQle7ZOMyR1gKZk5fE5lvmfA5sK+Aax3dRI7xjPBXJYI4hiCMAxgYdhgtI\\nG1C/Nf6O2HoE/W2qLEnLZadpowKBgQC9Tl+/9Eq9Q/DI74CG78U0+s2aRq19vsXZ\\n+wCIUm54TcN9xw4nPKYbT24nTVwTrOu2bxEgDVmuAqtWlKGad16LqZFTZ2aUaEFC\\ni1HL8UKSy5XmNcum8mrKL5+MvwExcQUSmalE3PEQDRjV65QNld0EbQ6JNz74025z\\nm+3ISpIEKwKBgADf5E1fP8wRmrplbtmv8Z64PhryjzCleH9+2h2nfX5aJRdU3zjh\\nSrSOj7uddL5YazUj8LAdKKUuD+6WnJueLPTspL7OHfgeWFVjuDlGv80kGE/OSSZV\\ngDm+ohvcZFGyCIsSgzFFcprjSU3Ct7RIYzGpJY8xDEOPfHninyZqO7mvAoGAIsog\\ndppikd3Ghmbda+7sgwwEdPHAOHeyzJiARI1BmAJShu7p/vP6YtJ6H+broQIKX4CR\\n2R4a+QusiUDPYh/F1EzZVEaQZ32xYJVR9vTjky6u4ZvJTWkHjxipbag8g+WNVRnA\\nLdOcyaJeihG9J7H+6C1Smoz4manhhoWFcWWi5/kCgYEAssgWnlZCygCjEQ/XDVtZ\\nC8/uelJnMHO93U4yF6Xk61gazKYpXpKjNkD3xfxAyQ3zkBkWo7CXg1env8pT9ld1\\nraWCeCmH/w8i0ww3Cmplks5mXIYPrPPuUCEW5D6B8hIyNC1VIoaOlva8+FgJYPIv\\nC5AqN3hBRDOUbophIQmAe5I=\\n-----END PRIVATE KEY-----\\n",
        "client_email": "demand@shopper-reviews-477306.iam.gserviceaccount.com",
        "client_id": "100956109416744224832",
        "auth_uri": "https://accounts.google.com/o/oauth2/auth",
        "token_uri": "https://oauth2.googleapis.com/token",
        "auth_provider_x509_cert_url": "https://www.googleapis.com/oauth2/v1/certs",
        "client_x509_cert_url": "https://www.googleapis.com/robot/v1/metadata/x509/demand%40shopper-reviews-477306.iam.gserviceaccount.com",
        "universe_domain": "googleapis.com"
    }

# BigQuery Configuration
DATASET_ID = "shopper_reviews_db"
SOURCE_TABLE = "Map_location"  # Table with CIDs
REVIEWS_TABLE = "Reviews"       # New table for reviews

# Review scraping settings
LANG = "en"
COUNTRY = "us"
SORT_BY = "newest"
MAX_REVIEWS_PER_LOCATION = 100  # Set max reviews per location

print("\n‚úÖ All credentials configured successfully!")
print(f"üìä Source Table: {PROJECT_ID}.{DATASET_ID}.{SOURCE_TABLE}")
print(f"üìä Reviews Table: {PROJECT_ID}.{DATASET_ID}.{REVIEWS_TABLE}")

## üîå Step 4: Initialize BigQuery Client

In [None]:
def get_bigquery_client() -> Optional[bigquery.Client]:
    """
    Initialize and return BigQuery client.
    """
    try:
        credentials = service_account.Credentials.from_service_account_info(
            BIGQUERY_CREDENTIALS,
            scopes=["https://www.googleapis.com/auth/bigquery"]
        )
        client = bigquery.Client(
            credentials=credentials,
            project=PROJECT_ID
        )
        logger.info("‚úÖ BigQuery client initialized")
        return client
    except Exception as e:
        logger.error(f"‚ùå Failed to initialize BigQuery client: {e}")
        return None

# Test connection
bq_client = get_bigquery_client()
if bq_client:
    print("‚úÖ BigQuery connection established!")

## üìç Step 5: Fetch CIDs from Map_Location Table

In [None]:
def check_table_structure():
    """
    Check the Map_Location table structure and identify CID column.
    """
    try:
        table_id = f"{PROJECT_ID}.{DATASET_ID}.{SOURCE_TABLE}"
        table = bq_client.get_table(table_id)
        
        print("\nüìã TABLE SCHEMA:")
        print("=" * 80)
        for field in table.schema:
            print(f"  - {field.name}: {field.field_type}")
        
        print(f"\nüìä Total rows in table: {table.num_rows:,}")
        return True
    except Exception as e:
        logger.error(f"‚ùå Error checking table: {e}")
        return False

def get_location_cids() -> pd.DataFrame:
    """
    Fetch all CIDs and location info from Map_Location table.
    Automatically detects the correct CID column name.
    """
    try:
        # First, get a sample to check column names
        sample_query = f"""
        SELECT *
        FROM `{PROJECT_ID}.{DATASET_ID}.{SOURCE_TABLE}`
        LIMIT 5
        """
        
        logger.info("üîç Checking Map_Location table structure...")
        sample_df = bq_client.query(sample_query).to_dataframe()
        
        if sample_df.empty:
            logger.warning("‚ö†Ô∏è Map_Location table is empty!")
            return pd.DataFrame()
        
        # Find the CID column
        cid_column = None
        for possible_name in ['cid', 'place_id', 'placeId', 'id']:
            if possible_name in sample_df.columns:
                cid_column = possible_name
                logger.info(f"‚úÖ Found CID column: '{cid_column}'")
                break
        
        if not cid_column:
            logger.error("‚ùå Could not find CID column in table!")
            print("\nüìã Available columns:", list(sample_df.columns))
            return pd.DataFrame()
        
        # Build query with the correct column name
        query = f"""
        SELECT 
            {cid_column} as cid,
            title,
            address,
            rating,
            reviews_count
        FROM `{PROJECT_ID}.{DATASET_ID}.{SOURCE_TABLE}`
        WHERE {cid_column} IS NOT NULL
        ORDER BY title
        """
        
        logger.info("üîç Fetching all locations from Map_Location table...")
        df = bq_client.query(query).to_dataframe()
        logger.info(f"‚úÖ Found {len(df)} locations with CIDs")
        return df
        
    except Exception as e:
        logger.error(f"‚ùå Error fetching CIDs: {e}")
        import traceback
        print(traceback.format_exc())
        return pd.DataFrame()

# Check table structure first
print("üîç Checking Map_Location table...")
check_table_structure()

# Fetch locations
print("\n" + "=" * 80)
locations_df = get_location_cids()

if not locations_df.empty:
    print(f"\n‚úÖ Total locations to process: {len(locations_df)}")
    print("\nüìç First 5 locations:")
    print(locations_df.head())
else:
    print("\n‚ö†Ô∏è No locations found in Map_Location table!")
    print("\n‚ÑπÔ∏è Please make sure:")
    print("  1. The Map_Location table has data")
    print("  2. The table has a CID/place_id column")
    print("  3. You have the correct project/dataset/table names")

## üí¨ Step 6: Define Review Fetching Functions

In [None]:
def fetch_reviews_for_cid(cid: str, max_reviews: int = 100) -> List[Dict[str, Any]]:
    """
    Fetch reviews for a specific CID from RapidAPI.
    
    Args:
        cid: The Google Maps CID
        max_reviews: Maximum number of reviews to fetch
    
    Returns:
        List of review dictionaries
    """
    conn = http.client.HTTPSConnection(RAPIDAPI_HOST)
    headers = {
        "x-rapidapi-key": RAPIDAPI_KEY,
        "x-rapidapi-host": RAPIDAPI_HOST
    }
    
    all_reviews = []
    page = 1
    
    try:
        # First request to get total reviews count
        conn.request(
            "GET",
            f"/reviews?cid={cid}&sortBy={SORT_BY}&gl={COUNTRY}&hl={LANG}&page=1",
            headers=headers
        )
        res = conn.getresponse()
        data = res.read()
        
        if res.status != 200:
            logger.warning(f"‚ö†Ô∏è Failed to fetch reviews for CID {cid}: {res.status}")
            return []
        
        json_data = json.loads(data.decode("utf-8"))
        place_info = json_data.get("placeInfo", {})
        total_reviews = place_info.get("reviewsCount") or json_data.get("reviewsCount", 0)
        
        logger.info(f"üìç {place_info.get('title', 'Unknown')} - Total reviews: {total_reviews}")
        
        # Determine how many to scrape
        to_scrape = min(max_reviews, total_reviews) if total_reviews else max_reviews
        
        # Fetch reviews page by page
        while len(all_reviews) < to_scrape:
            if page > 1:  # Already have page 1 data
                conn.request(
                    "GET",
                    f"/reviews?cid={cid}&sortBy={SORT_BY}&gl={COUNTRY}&hl={LANG}&page={page}",
                    headers=headers
                )
                res = conn.getresponse()
                data = res.read()
                
                if res.status != 200:
                    logger.warning(f"‚ö†Ô∏è Error on page {page}: {res.status}")
                    break
                
                json_data = json.loads(data.decode("utf-8"))
            
            reviews = json_data.get("reviews", [])
            if not reviews:
                logger.info(f"‚úÖ No more reviews found at page {page}")
                break
            
            all_reviews.extend(reviews)
            logger.info(f"  üìÑ Page {page}: Collected {len(all_reviews)}/{to_scrape} reviews")
            
            if len(all_reviews) >= to_scrape:
                break
            
            page += 1
            time.sleep(1)  # Rate limiting
        
        # Trim to max_reviews
        all_reviews = all_reviews[:max_reviews]
        logger.info(f"‚úÖ Scraped {len(all_reviews)} reviews for CID {cid}")
        
    except Exception as e:
        logger.error(f"‚ùå Error fetching reviews for CID {cid}: {e}")
    finally:
        conn.close()
    
    return all_reviews

print("‚úÖ Review fetching functions defined!")

## üîÑ Step 7: Convert Reviews to DataFrame

In [None]:
def reviews_to_dataframe(reviews: List[Dict[str, Any]], cid: str, location_title: str = None) -> pd.DataFrame:
    """
    Convert raw review data to a structured DataFrame.
    
    Args:
        reviews: List of review dictionaries from API
        cid: The CID of the location
        location_title: Optional title of the location
    
    Returns:
        DataFrame with structured review data
    """
    df = pd.DataFrame([
        {
            "cid": cid,
            "location_title": location_title,
            "review_id": r.get("id"),
            "rating": r.get("rating"),
            "snippet": r.get("snippet"),
            "likes": r.get("likes"),
            "date": r.get("date"),
            "iso_date": r.get("isoDate"),
            "user_name": r.get("user", {}).get("name"),
            "user_profile": r.get("user", {}).get("link"),
            "user_thumbnail": r.get("user", {}).get("thumbnail"),
            "user_reviews_count": r.get("user", {}).get("reviews"),
            "user_photos_count": r.get("user", {}).get("photos"),
            "scraped_at": datetime.now(timezone.utc).isoformat()
        }
        for r in reviews
    ])
    
    return df

print("‚úÖ DataFrame conversion function defined!")

## üíæ Step 8: Create Reviews Table in BigQuery

In [None]:
def create_reviews_table() -> bool:
    """
    Create the Reviews table in BigQuery if it doesn't exist.
    """
    try:
        table_id = f"{PROJECT_ID}.{DATASET_ID}.{REVIEWS_TABLE}"
        
        # Check if table exists
        try:
            bq_client.get_table(table_id)
            logger.info(f"‚úÖ Table {table_id} already exists")
            return True
        except:
            logger.info(f"üìù Creating table {table_id}...")
        
        # Define schema
        schema = [
            bigquery.SchemaField("cid", "STRING", mode="REQUIRED"),
            bigquery.SchemaField("location_title", "STRING"),
            bigquery.SchemaField("review_id", "STRING"),
            bigquery.SchemaField("rating", "INTEGER"),
            bigquery.SchemaField("snippet", "STRING"),
            bigquery.SchemaField("likes", "INTEGER"),
            bigquery.SchemaField("date", "STRING"),
            bigquery.SchemaField("iso_date", "TIMESTAMP"),
            bigquery.SchemaField("user_name", "STRING"),
            bigquery.SchemaField("user_profile", "STRING"),
            bigquery.SchemaField("user_thumbnail", "STRING"),
            bigquery.SchemaField("user_reviews_count", "INTEGER"),
            bigquery.SchemaField("user_photos_count", "INTEGER"),
            bigquery.SchemaField("scraped_at", "TIMESTAMP")
        ]
        
        table = bigquery.Table(table_id, schema=schema)
        table = bq_client.create_table(table)
        logger.info(f"‚úÖ Created table {table_id}")
        return True
        
    except Exception as e:
        logger.error(f"‚ùå Error creating table: {e}")
        return False

# Create table
if create_reviews_table():
    print("‚úÖ Reviews table is ready!")

## üì§ Step 9: Upload Reviews to BigQuery

In [None]:
def upload_reviews_to_bigquery(df: pd.DataFrame) -> bool:
    """
    Upload reviews DataFrame to BigQuery.
    
    Args:
        df: DataFrame containing review data
    
    Returns:
        True if successful, False otherwise
    """
    if df.empty:
        logger.warning("‚ö†Ô∏è No reviews to upload")
        return False
    
    try:
        table_id = f"{PROJECT_ID}.{DATASET_ID}.{REVIEWS_TABLE}"
        
        # Configure job
        job_config = bigquery.LoadJobConfig(
            write_disposition="WRITE_APPEND",  # Append to existing table
            schema_update_options=[bigquery.SchemaUpdateOption.ALLOW_FIELD_ADDITION]
        )
        
        # Upload
        job = bq_client.load_table_from_dataframe(df, table_id, job_config=job_config)
        job.result()  # Wait for completion
        
        logger.info(f"‚úÖ Uploaded {len(df)} reviews to BigQuery")
        return True
        
    except Exception as e:
        logger.error(f"‚ùå Error uploading to BigQuery: {e}")
        return False

print("‚úÖ Upload function defined!")

## üöÄ Step 10: Main Processing Loop

This cell processes all locations and fetches their reviews.

In [None]:
def process_all_locations(locations_df: pd.DataFrame, max_reviews_per_location: int = 100, batch_size: int = 10):
    """
    Process all locations and fetch their reviews.
    
    Args:
        locations_df: DataFrame with location data
        max_reviews_per_location: Max reviews to fetch per location
        batch_size: Number of locations to process before uploading to BigQuery
    """
    total_locations = len(locations_df)
    total_reviews_collected = 0
    failed_locations = []
    
    print(f"\nüöÄ Starting to process {total_locations} locations...")
    print(f"üìä Will fetch up to {max_reviews_per_location} reviews per location\n")
    print("=" * 80)
    
    batch_reviews = []
    
    for idx, row in locations_df.iterrows():
        cid = row['cid']
        title = row.get('title', 'Unknown')
        
        print(f"\n[{idx + 1}/{total_locations}] Processing: {title}")
        print(f"  CID: {cid}")
        
        try:
            # Fetch reviews
            reviews = fetch_reviews_for_cid(cid, max_reviews_per_location)
            
            if reviews:
                # Convert to DataFrame
                reviews_df = reviews_to_dataframe(reviews, cid, title)
                batch_reviews.append(reviews_df)
                total_reviews_collected += len(reviews_df)
                print(f"  ‚úÖ Collected {len(reviews_df)} reviews")
            else:
                print(f"  ‚ö†Ô∏è No reviews found")
            
            # Upload batch if we've reached batch_size
            if len(batch_reviews) >= batch_size:
                combined_df = pd.concat(batch_reviews, ignore_index=True)
                print(f"\nüì§ Uploading batch of {len(combined_df)} reviews to BigQuery...")
                if upload_reviews_to_bigquery(combined_df):
                    print(f"‚úÖ Batch uploaded successfully!")
                    batch_reviews = []  # Clear batch
                else:
                    print(f"‚ùå Failed to upload batch")
            
            # Rate limiting between locations
            time.sleep(2)
            
        except Exception as e:
            logger.error(f"‚ùå Error processing {title} (CID: {cid}): {e}")
            failed_locations.append({'cid': cid, 'title': title, 'error': str(e)})
    
    # Upload remaining reviews
    if batch_reviews:
        combined_df = pd.concat(batch_reviews, ignore_index=True)
        print(f"\nüì§ Uploading final batch of {len(combined_df)} reviews...")
        upload_reviews_to_bigquery(combined_df)
    
    # Summary
    print("\n" + "=" * 80)
    print("\nüìä PROCESSING SUMMARY")
    print("=" * 80)
    print(f"‚úÖ Total locations processed: {total_locations}")
    print(f"üí¨ Total reviews collected: {total_reviews_collected}")
    print(f"‚ùå Failed locations: {len(failed_locations)}")
    
    if failed_locations:
        print("\n‚ö†Ô∏è Failed locations:")
        for loc in failed_locations:
            print(f"  - {loc['title']} (CID: {loc['cid']}): {loc['error']}")
    
    print("\n‚úÖ Processing complete!")
    return total_reviews_collected, failed_locations

# Run the main process
if not locations_df.empty:
    total_reviews, failed = process_all_locations(
        locations_df,
        max_reviews_per_location=MAX_REVIEWS_PER_LOCATION,
        batch_size=10
    )
else:
    print("‚ö†Ô∏è No locations found to process")

## üìä Step 11: Verify Results in BigQuery

In [None]:
def get_review_stats():
    """
    Get statistics about the collected reviews.
    """
    try:
        query = f"""
        SELECT 
            COUNT(*) as total_reviews,
            COUNT(DISTINCT cid) as unique_locations,
            AVG(rating) as avg_rating,
            MIN(iso_date) as earliest_review,
            MAX(iso_date) as latest_review
        FROM `{PROJECT_ID}.{DATASET_ID}.{REVIEWS_TABLE}`
        """
        
        df = bq_client.query(query).to_dataframe()
        print("\nüìä REVIEWS DATABASE STATISTICS")
        print("=" * 80)
        print(f"Total Reviews: {df['total_reviews'].iloc[0]:,}")
        print(f"Unique Locations: {df['unique_locations'].iloc[0]}")
        print(f"Average Rating: {df['avg_rating'].iloc[0]:.2f}")
        print(f"Date Range: {df['earliest_review'].iloc[0]} to {df['latest_review'].iloc[0]}")
        
        # Top locations by review count
        query2 = f"""
        SELECT 
            location_title,
            COUNT(*) as review_count,
            AVG(rating) as avg_rating
        FROM `{PROJECT_ID}.{DATASET_ID}.{REVIEWS_TABLE}`
        GROUP BY location_title
        ORDER BY review_count DESC
        LIMIT 10
        """
        
        df2 = bq_client.query(query2).to_dataframe()
        print("\nüèÜ TOP 10 LOCATIONS BY REVIEW COUNT")
        print("=" * 80)
        print(df2.to_string(index=False))
        
    except Exception as e:
        logger.error(f"‚ùå Error getting stats: {e}")

# Get stats
get_review_stats()

## üíæ Step 12 (Optional): Export Reviews to CSV

In [None]:
def export_reviews_to_csv(filename: str = "all_reviews.csv"):
    """
    Export all reviews from BigQuery to a CSV file.
    """
    try:
        query = f"""
        SELECT *
        FROM `{PROJECT_ID}.{DATASET_ID}.{REVIEWS_TABLE}`
        ORDER BY iso_date DESC
        """
        
        print(f"üì• Downloading reviews from BigQuery...")
        df = bq_client.query(query).to_dataframe()
        
        print(f"üíæ Saving to {filename}...")
        df.to_csv(filename, index=False, encoding="utf-8-sig")
        print(f"‚úÖ Saved {len(df)} reviews to {filename}")
        
        return df
    except Exception as e:
        logger.error(f"‚ùå Error exporting to CSV: {e}")
        return None

# Uncomment to export
# reviews_csv = export_reviews_to_csv("google_reviews_all.csv")

## üéØ Summary

This notebook successfully:
1. ‚úÖ Fetched all CIDs from the Map_Location table
2. ‚úÖ Scraped reviews for each location using RapidAPI
3. ‚úÖ Stored all reviews in the BigQuery Reviews table
4. ‚úÖ Provided statistics and verification

### Next Steps:
- Analyze review sentiment
- Identify trends and patterns
- Generate insights for each location
- Schedule periodic updates to fetch new reviews