# üåü Google Reviews Data Fetcher - Google Colab

This notebook fetches complete review data from Google Reviews API (via RapidAPI) and stores it in BigQuery.

## Features:
- üìä Fetches ALL review data (reviews, topics, metadata)
- üîÑ Automatic pagination (follows nextPageToken)
- üíæ Stores complete data in BigQuery
- ‚ö° Incremental processing (only new places)
- üõ°Ô∏è Robust error handling and retries
- üìà Progress tracking and logging

## üì¶ Step 1: Install Required Packages

In [None]:
!pip install -q google-cloud-bigquery google-auth pandas db-dtypes
print("‚úÖ All packages installed successfully!")

## üîß Step 2: Import Libraries

In [None]:
import os
import json
import logging
import http.client
import time
import pandas as pd
from datetime import datetime, timezone
from typing import Optional, Dict, Any, List
from google.oauth2 import service_account
from google.cloud import bigquery
from google.colab import userdata

# Setup logging
logging.basicConfig(
    level=logging.INFO,
    format='%(asctime)s - %(levelname)s - %(message)s'
)
logger = logging.getLogger(__name__)

print("‚úÖ Libraries imported successfully!")

## üîë Step 3: Configure API Credentials

### Option A: Using Colab Secrets (Recommended)
1. Click on the üîë key icon in the left sidebar
2. Add a secret named `RAPIDAPI_KEY` with your API key
3. Add a secret named `BIGQUERY_KEY_JSON` with your service account JSON

### Option B: Manual Configuration
Uncomment and fill in the credentials below

In [None]:
# Try to get credentials from Colab secrets first
try:
    RAPIDAPI_KEY = userdata.get('RAPIDAPI_KEY')
    print("‚úÖ RapidAPI key loaded from Colab secrets")
except:
    # Manual configuration - uncomment and fill in
    RAPIDAPI_KEY = "ac0025f410mshd0c260cb60f3db6p18c4b0jsnc9b7413cd574"  # Your API key
    print("‚ö†Ô∏è RapidAPI key loaded from manual configuration")

# Load BigQuery credentials from secrets
try:
    BIGQUERY_CREDENTIALS_STR = userdata.get('BIGQUERY_KEY_JSON')
    BIGQUERY_CREDENTIALS = json.loads(BIGQUERY_CREDENTIALS_STR)
    print("‚úÖ BigQuery credentials loaded from Colab secrets")
    PROJECT_ID = BIGQUERY_CREDENTIALS.get('project_id', 'shopper-reviews-477306')
except:
    # Fallback to manual configuration
    print("‚ö†Ô∏è BigQuery credentials loaded from manual configuration")
    PROJECT_ID = "shopper-reviews-477306"
    BIGQUERY_CREDENTIALS = {
        "type": "service_account",
        "project_id": "shopper-reviews-477306",
        "private_key_id": "679b00310997262ff77901f080075b509eb9c770",
        "private_key": "-----BEGIN PRIVATE KEY-----\nMIIEvQIBADANBgkqhkiG9w0BAQEFAASCBKcwggSjAgEAAoIBAQCPrVXmepJWx8A8\nXLqDARbLqqmgPwQ4NEmCCOmAZ019aFToc0Yho0/hDyMhRhsW6z/5h8YVEbheb2oR\nmFK6/v3UEN1Mf6oJWag9pBngM6IO96QAzozjXjCmIVYJku1HWi+7b4mX7La8p77N\n5fJdOh30ceC6cJSDA51r2xGJDmchRPNhRR8CS9u3xAeZZeB/pgShwJcLM4WY4L3P\niwc7qkQb91NPbB2/p3hL/JJAtCvVKf61xlWGOKEGW3pIwBUUcF2/OJ3FTuWrY7P8\n1c/Kz9LUYOZpztK9zjFCNcnCQvvVAow9bqg3fw6xqE172dQT1FG6AieFSCyUib5B\nXxwNu0phAgMBAAECggEAET1ThPqIxqA54RmgnjQqP7k0Q0XBxDCvRUq7zIFuBdyC\nm6Wr8OtUnAT3Snh2qv2tSSFRKO6zDaRsDhJrPYQigX3zNR5Nu8jQlseIUfjqusWy\nHbqq+GPb4y3gJ06Zk/8uolyUHkZJTZe0cvuNZOxNSIBwM6QV3dE4OVx+3SV88GZ/\nOkAMCUpPRLJux6vJo+l0Qcfe074qjRYPv3XUaGXyHXeOZXmze/lLF6wsEzZmP1A+\nE9xZmP4ucM3ybrYi3ipRu6YwuR2mRASLy8VFMtcYCvNZGv6ODkjF2xmpucHwX78S\nzO3mGFES3Hnknjzoif5sJuBewNSztXJcQqKgtSpDhQKBgQDCS6bYj1VR691J5wxA\n5/fl2MwY4ALIKqW4RtJyNRBZ7+WDAVkq99R6lz+AmQsb6QyiZ/yTZHSUI61Bjn0p\nd2MD/fpQle7ZOMyR1gKZk5fE5lvmfA5sK+Aax3dRI7xjPBXJYI4hiCMAxgYdhgtI\nG1C/Nf6O2HoE/W2qLEnLZadpowKBgQC9Tl+/9Eq9Q/DI74CG78U0+s2aRq19vsXZ\n+wCIUm54TcN9xw4nPKYbT24nTVwTrOu2bxEgDVmuAqtWlKGad16LqZFTZ2aUaEFC\ni1HL8UKSy5XmNcum8mrKL5+MvwExcQUSmalE3PEQDRjV65QNld0EbQ6JNz74025z\nm+3ISpIEKwKBgADf5E1fP8wRmrplbtmv8Z64PhryjzCleH9+2h2nfX5aJRdU3zjh\nSrSOj7uddL5YazUj8LAdKKUuD+6WnJueLPTspL7OHfgeWFVjuDlGv80kGE/OSSZV\ngDm+ohvcZFGyCIsSgzFFcprjSU3Ct7RIYzGpJY8xDEOPfHninyZqO7mvAoGAIsog\ndppikd3Ghmbda+7sgwwEdPHAOHeyzJiARI1BmAJShu7p/vP6YtJ6H+broQIKX4CR\n2R4a+QusiUDPYh/F1EzZVEaQZ32xYJVR9vTjky6u4ZvJTWkHjxipbag8g+WNVRnA\nLdOcyaJeihG9J7H+6C1Smoz4manhhoWFcWWi5/kCgYEAssgWnlZCygCjEQ/XDVtZ\nC8/uelJnMHO93U4yF6Xk61gazKYpXpKjNkD3xfxAyQ3zkBkWo7CXg1env8pT9ld1\nraWCeCmH/w8i0ww3Cmplks5mXIYPrPPuUCEW5D6B8hIyNC1VIoaOlva8+FgJYPIv\nC5AqN3hBRDOUbophIQmAe5I=\n-----END PRIVATE KEY-----\n",
        "client_email": "demand@shopper-reviews-477306.iam.gserviceaccount.com",
        "client_id": "100956109416744224832",
        "auth_uri": "https://accounts.google.com/o/oauth2/auth",
        "token_uri": "https://oauth2.googleapis.com/token",
        "auth_provider_x509_cert_url": "https://www.googleapis.com/oauth2/v1/certs",
        "client_x509_cert_url": "https://www.googleapis.com/robot/v1/metadata/x509/demand%40shopper-reviews-477306.iam.gserviceaccount.com",
        "universe_domain": "googleapis.com"
    }

# BigQuery Configuration
DATASET_ID = "place_data"
SOURCE_TABLE = "Map_location"  # Table with place_ids
DESTINATION_TABLE = "place_reviews_full"  # Table to store reviews

# API Configuration
API_HOST = "google-search-master-mega.p.rapidapi.com"
MAX_PAGES = 10  # Maximum pages to fetch per place
RETRY_ATTEMPTS = 3
RETRY_DELAY = 2  # seconds

print("\n‚úÖ All configuration loaded!")
print(f"üìä Source Table: {PROJECT_ID}.{DATASET_ID}.{SOURCE_TABLE}")
print(f"üìä Destination Table: {PROJECT_ID}.{DATASET_ID}.{DESTINATION_TABLE}")

## üõ†Ô∏è Step 4: Define Core Functions

In [None]:
# ==================== BIGQUERY CLIENT ====================

def get_bigquery_client() -> Optional[bigquery.Client]:
    """
    Creates and returns a BigQuery client with proper credentials.
    
    Returns:
        BigQuery client or None on error
    """
    try:
        credentials = service_account.Credentials.from_service_account_info(
            BIGQUERY_CREDENTIALS,
            scopes=["https://www.googleapis.com/auth/cloud-platform"],
        )
        client = bigquery.Client(credentials=credentials, project=PROJECT_ID)
        logger.info(f"‚úÖ Connected to BigQuery project: {PROJECT_ID}")
        return client
    except Exception as e:
        logger.error(f"‚ùå Error creating BigQuery client: {e}")
        return None


# ==================== API FUNCTIONS ====================

def fetch_reviews_for_place(place_id: str, page: int = 1) -> Optional[Dict[str, Any]]:
    """
    Fetches review data for a single place from Google Reviews API.
    
    Args:
        place_id: The place CID to fetch reviews for
        page: Page number to fetch (for pagination)
        
    Returns:
        Dictionary containing full API response or None on error
    """
    for attempt in range(RETRY_ATTEMPTS):
        try:
            conn = http.client.HTTPSConnection(API_HOST)
            
            headers = {
                'x-rapidapi-key': RAPIDAPI_KEY,
                'x-rapidapi-host': API_HOST
            }
            
            # Build query parameters
            params = f"?cid={place_id}&sortBy=mostRelevant&gl=us&hl=en&page={page}"
            endpoint = "/reviews" + params
            
            logger.info(f"üì° Fetching page {page} for place {place_id}...")
            
            conn.request("GET", endpoint, headers=headers)
            res = conn.getresponse()
            data = res.read()
            
            if res.status == 200:
                result = json.loads(data.decode("utf-8"))
                logger.info(f"‚úÖ Successfully fetched page {page} for place {place_id}")
                return result
            else:
                logger.warning(f"‚ö†Ô∏è API returned status {res.status}, attempt {attempt + 1}/{RETRY_ATTEMPTS}")
                if attempt < RETRY_ATTEMPTS - 1:
                    time.sleep(RETRY_DELAY)
                    
        except Exception as e:
            logger.error(f"‚ùå Error fetching reviews, attempt {attempt + 1}/{RETRY_ATTEMPTS}: {e}")
            if attempt < RETRY_ATTEMPTS - 1:
                time.sleep(RETRY_DELAY)
    
    return None


def fetch_all_reviews_for_place(place_id: str) -> Dict[str, Any]:
    """
    Fetches ALL reviews for a place by following pagination.
    
    Args:
        place_id: The place CID to fetch reviews for
        
    Returns:
        Dictionary containing aggregated review data
    """
    all_reviews = []
    all_topics = []
    metadata = {}
    page = 1
    
    logger.info(f"üîç Starting to fetch all reviews for place {place_id}...")
    
    while page <= MAX_PAGES:
        result = fetch_reviews_for_place(place_id, page)
        
        if not result:
            logger.warning(f"‚ö†Ô∏è No data received for page {page}, stopping pagination")
            break
        
        # Extract reviews from this page
        reviews = result.get('reviews', [])
        all_reviews.extend(reviews)
        
        # Extract topics (usually same across pages, take from first page)
        if page == 1:
            all_topics = result.get('topics', [])
            metadata = {
                'searchParameters': result.get('searchParameters', {}),
                'credits': result.get('credits', 0),
            }
        
        logger.info(f"‚úÖ Page {page}: {len(reviews)} reviews fetched")
        
        # Check for next page
        next_page_token = result.get('nextPageToken')
        if not next_page_token or len(reviews) == 0:
            logger.info(f"‚úÖ No more pages available, stopping at page {page}")
            break
        
        page += 1
        time.sleep(0.5)  # Rate limiting
    
    logger.info(f"üéâ Completed fetching for place {place_id}: {len(all_reviews)} total reviews, {len(all_topics)} topics")
    
    return {
        'place_id': place_id,
        'total_reviews': len(all_reviews),
        'reviews': all_reviews,
        'topics': all_topics,
        'metadata': metadata,
        'pages_fetched': page,
        'timestamp': datetime.now(timezone.utc).isoformat()
    }


# ==================== BIGQUERY OPERATIONS ====================

def get_place_ids_to_process(client: bigquery.Client, limit: int = None) -> List[str]:
    """
    Retrieves place IDs from the source table that need review data fetched.
    
    Args:
        client: BigQuery client
        limit: Optional limit on number of places to fetch
        
    Returns:
        List of place_id strings
    """
    source_table = f"{PROJECT_ID}.{DATASET_ID}.{SOURCE_TABLE}"
    
    try:
        dest_table = f"{PROJECT_ID}.{DATASET_ID}.{DESTINATION_TABLE}"
        
        try:
            client.get_table(dest_table)
            # Table exists, exclude already processed places
            query = f"""
            SELECT DISTINCT cid as place_id
            FROM `{source_table}`
            WHERE cid IS NOT NULL
            AND cid NOT IN (
                SELECT DISTINCT place_id
                FROM `{dest_table}`
                WHERE place_id IS NOT NULL
            )
            """
            if limit:
                query += f" LIMIT {limit}"
            logger.info("üìä Fetching place_ids that haven't been processed yet...")
        except:
            # Table doesn't exist yet, process all
            query = f"""
            SELECT DISTINCT cid as place_id
            FROM `{source_table}`
            WHERE cid IS NOT NULL
            """
            if limit:
                query += f" LIMIT {limit}"
            logger.info("üìä Destination table doesn't exist, fetching all place_ids...")
        
        result = client.query(query).to_dataframe()
        place_ids = result['place_id'].tolist()
        
        logger.info(f"‚úÖ Found {len(place_ids)} place(s) to process")
        return place_ids
        
    except Exception as e:
        logger.error(f"‚ùå Error fetching place IDs: {e}")
        return []


def create_reviews_table_if_not_exists(client: bigquery.Client) -> bool:
    """
    Creates the place_reviews_full table if it doesn't exist.
    
    Args:
        client: BigQuery client
        
    Returns:
        True if successful, False otherwise
    """
    table_id = f"{PROJECT_ID}.{DATASET_ID}.{DESTINATION_TABLE}"
    
    try:
        # Check if table exists
        try:
            client.get_table(table_id)
            logger.info(f"‚úÖ Table {DESTINATION_TABLE} already exists")
            return True
        except:
            pass
        
        # Create table with schema
        schema = [
            bigquery.SchemaField("place_id", "STRING", mode="REQUIRED"),
            bigquery.SchemaField("total_reviews", "INTEGER"),
            bigquery.SchemaField("pages_fetched", "INTEGER"),
            bigquery.SchemaField("reviews", "STRING"),  # JSON stored as STRING
            bigquery.SchemaField("topics", "STRING"),   # JSON stored as STRING
            bigquery.SchemaField("metadata", "STRING"), # JSON stored as STRING
            bigquery.SchemaField("timestamp", "TIMESTAMP"),
            bigquery.SchemaField("fetch_date", "DATE"),
        ]
        
        table = bigquery.Table(table_id, schema=schema)
        table = client.create_table(table)
        
        logger.info(f"‚úÖ Created table {DESTINATION_TABLE}")
        return True
        
    except Exception as e:
        logger.error(f"‚ùå Error creating table: {e}")
        return False


def upload_review_data_to_bigquery(client: bigquery.Client, review_data: Dict[str, Any]) -> bool:
    """
    Uploads review data to BigQuery.
    
    Args:
        client: BigQuery client
        review_data: Dictionary containing review data
        
    Returns:
        True if successful, False otherwise
    """
    table_id = f"{PROJECT_ID}.{DATASET_ID}.{DESTINATION_TABLE}"
    
    try:
        # Prepare data for upload
        row = {
            'place_id': review_data['place_id'],
            'total_reviews': review_data['total_reviews'],
            'pages_fetched': review_data['pages_fetched'],
            'reviews': json.dumps(review_data['reviews']),
            'topics': json.dumps(review_data['topics']),
            'metadata': json.dumps(review_data['metadata']),
            'timestamp': datetime.now(timezone.utc),
            'fetch_date': datetime.now(timezone.utc).date(),
        }
        
        # Create DataFrame
        df = pd.DataFrame([row])
        
        # Upload to BigQuery
        job_config = bigquery.LoadJobConfig(
            write_disposition="WRITE_APPEND",
        )
        
        job = client.load_table_from_dataframe(df, table_id, job_config=job_config)
        job.result()  # Wait for job to complete
        
        logger.info(f"‚úÖ Uploaded review data for place {review_data['place_id']} to BigQuery")
        return True
        
    except Exception as e:
        logger.error(f"‚ùå Error uploading to BigQuery: {e}")
        return False

print("‚úÖ All functions defined successfully!")

## üîç Step 5: Check Current Status

In [None]:
# Check current status of tables and data
client = get_bigquery_client()

if client:
    print("üìä Checking current status...\n")
    
    # Check source table
    source_table = f"{PROJECT_ID}.{DATASET_ID}.{SOURCE_TABLE}"
    try:
        table = client.get_table(source_table)
        print(f"‚úÖ Source table exists: {SOURCE_TABLE}")
        print(f"   Total rows: {table.num_rows:,}")
        
        # Count places with cid
        query = f"SELECT COUNT(DISTINCT cid) as count FROM `{source_table}` WHERE cid IS NOT NULL"
        result = client.query(query).to_dataframe()
        print(f"   Places with CID: {result['count'].iloc[0]:,}")
    except Exception as e:
        print(f"‚ùå Source table not found: {e}")
    
    print()
    
    # Check destination table
    dest_table = f"{PROJECT_ID}.{DATASET_ID}.{DESTINATION_TABLE}"
    try:
        table = client.get_table(dest_table)
        print(f"‚úÖ Destination table exists: {DESTINATION_TABLE}")
        print(f"   Total rows: {table.num_rows:,}")
        
        # Get summary stats
        query = f"""
        SELECT 
            COUNT(DISTINCT place_id) as places,
            SUM(total_reviews) as total_reviews,
            AVG(total_reviews) as avg_reviews,
            MAX(timestamp) as last_fetch
        FROM `{dest_table}`
        """
        result = client.query(query).to_dataframe()
        print(f"   Places processed: {result['places'].iloc[0]:,}")
        print(f"   Total reviews: {result['total_reviews'].iloc[0]:,}")
        print(f"   Avg reviews/place: {result['avg_reviews'].iloc[0]:.1f}")
        print(f"   Last fetch: {result['last_fetch'].iloc[0]}")
        
    except Exception as e:
        print(f"‚ö†Ô∏è Destination table doesn't exist yet (will be created)")
    
    print("\n" + "="*60)
else:
    print("‚ùå Failed to connect to BigQuery")

## üöÄ Step 6: Fetch Reviews - Single Place (Test)

Test fetching reviews for a single place before processing in batch.

In [None]:
# Test with a single place ID
test_place_id = "17602107806865671526"  # Example place ID

print(f"üß™ Testing with place ID: {test_place_id}\n")

# Fetch reviews
review_data = fetch_all_reviews_for_place(test_place_id)

# Display results
print("\n" + "="*60)
print("üìä RESULTS")
print("="*60)
print(f"Place ID: {review_data['place_id']}")
print(f"Total Reviews: {review_data['total_reviews']}")
print(f"Pages Fetched: {review_data['pages_fetched']}")
print(f"Topics Found: {len(review_data['topics'])}")
print("="*60)

# Show sample review
if review_data['reviews']:
    print("\nüìù Sample Review:")
    sample = review_data['reviews'][0]
    print(f"Rating: {sample.get('rating', 'N/A')} ‚≠ê")
    print(f"Date: {sample.get('date', 'N/A')}")
    print(f"Snippet: {sample.get('snippet', 'N/A')[:200]}...")
    print(f"\nReviewer: {sample.get('user', {}).get('name', 'N/A')}")
    print(f"Reviewer Reviews: {sample.get('user', {}).get('reviews', 'N/A')}")

# Show topics
if review_data['topics']:
    print("\nüìã Topics:")
    for topic in review_data['topics'][:5]:  # Show first 5
        print(f"  - {topic.get('name', 'N/A')}: {topic.get('reviews', 0)} mentions")

## üì§ Step 7: Upload Test Data to BigQuery

In [None]:
# Upload the test data to BigQuery
client = get_bigquery_client()

if client and 'review_data' in locals():
    print("üì§ Uploading test data to BigQuery...\n")
    
    # Create table if needed
    if create_reviews_table_if_not_exists(client):
        # Upload data
        if upload_review_data_to_bigquery(client, review_data):
            print("\n‚úÖ Test data uploaded successfully!")
            print(f"üìä Table: {PROJECT_ID}.{DATASET_ID}.{DESTINATION_TABLE}")
        else:
            print("\n‚ùå Failed to upload test data")
    else:
        print("\n‚ùå Failed to create table")
else:
    print("‚ùå No test data to upload or BigQuery client not available")

## üîÑ Step 8: Batch Process All Places

Process all places from the source table that haven't been fetched yet.

In [None]:
# Batch process all places
client = get_bigquery_client()

if not client:
    print("‚ùå Failed to connect to BigQuery")
else:
    print("üöÄ Starting batch processing...\n")
    
    # Create destination table if needed
    if not create_reviews_table_if_not_exists(client):
        print("‚ùå Failed to create destination table")
    else:
        # Get place IDs to process (limit to 5 for this demo, remove limit for full run)
        place_ids = get_place_ids_to_process(client, limit=5)  # Remove limit=5 for full processing
        
        if not place_ids:
            print("‚úÖ No new places to process!")
        else:
            print(f"üìä Processing {len(place_ids)} place(s)...\n")
            
            # Track results
            successful = 0
            failed = 0
            total_reviews = 0
            
            # Process each place
            for idx, place_id in enumerate(place_ids, 1):
                print("\n" + "="*60)
                print(f"üìç Processing place {idx}/{len(place_ids)}: {place_id}")
                print("="*60)
                
                try:
                    # Fetch all review data
                    review_data = fetch_all_reviews_for_place(place_id)
                    
                    if review_data['total_reviews'] == 0:
                        print(f"‚ö†Ô∏è No reviews found for place {place_id}, skipping")
                        continue
                    
                    # Upload to BigQuery
                    if upload_review_data_to_bigquery(client, review_data):
                        successful += 1
                        total_reviews += review_data['total_reviews']
                        print(f"‚úÖ Successfully processed place {place_id}")
                        print(f"   üìä {review_data['total_reviews']} reviews, {len(review_data['topics'])} topics")
                    else:
                        failed += 1
                        print(f"‚ùå Failed to upload data for place {place_id}")
                        
                except Exception as e:
                    failed += 1
                    print(f"‚ùå Error processing place {place_id}: {e}")
                
                # Rate limiting between places
                if idx < len(place_ids):
                    time.sleep(1)
            
            # Print summary
            print("\n" + "="*60)
            print("üìä PROCESSING SUMMARY")
            print("="*60)
            print(f"‚úÖ Successful: {successful}")
            print(f"‚ùå Failed: {failed}")
            print(f"üìä Total Reviews Fetched: {total_reviews:,}")
            print(f"üìä Avg Reviews/Place: {total_reviews/successful if successful > 0 else 0:.1f}")
            print("="*60)
            print("üéâ Batch processing completed!")

## üìä Step 9: Query and Analyze Results

In [None]:
# Query and analyze the review data
client = get_bigquery_client()

if client:
    table_name = f"{PROJECT_ID}.{DATASET_ID}.{DESTINATION_TABLE}"
    
    try:
        # Overall statistics
        print("üìä Review Data Statistics\n")
        
        stats_query = f"""
        SELECT 
            COUNT(DISTINCT place_id) as total_places,
            SUM(total_reviews) as total_reviews,
            AVG(total_reviews) as avg_reviews_per_place,
            MIN(total_reviews) as min_reviews,
            MAX(total_reviews) as max_reviews,
            MAX(timestamp) as last_fetch
        FROM `{table_name}`
        """
        
        stats = client.query(stats_query).to_dataframe()
        display(stats)
        
        # Places by review count
        print("\nüìà Places by Review Count:")
        
        places_query = f"""
        SELECT 
            place_id,
            total_reviews,
            pages_fetched,
            timestamp
        FROM `{table_name}`
        ORDER BY total_reviews DESC
        LIMIT 10
        """
        
        places = client.query(places_query).to_dataframe()
        display(places)
        
        print("\n‚úÖ Data query completed!")
        
    except Exception as e:
        print(f"‚ùå Error querying data: {e}")
else:
    print("‚ùå Failed to connect to BigQuery")

## üîç Step 10: Extract and View Individual Reviews

In [None]:
# Extract individual reviews from JSON for a specific place
client = get_bigquery_client()

if client:
    table_name = f"{PROJECT_ID}.{DATASET_ID}.{DESTINATION_TABLE}"
    
    # Get a sample place
    sample_query = f"SELECT place_id FROM `{table_name}` LIMIT 1"
    sample_result = client.query(sample_query).to_dataframe()
    
    if not sample_result.empty:
        sample_place_id = sample_result['place_id'].iloc[0]
        
        print(f"üìù Extracting reviews for place: {sample_place_id}\n")
        
        # Extract individual reviews using JSON functions
        reviews_query = f"""
        SELECT 
            place_id,
            JSON_VALUE(review, '$.rating') as rating,
            JSON_VALUE(review, '$.date') as date,
            JSON_VALUE(review, '$.snippet') as snippet,
            JSON_VALUE(review, '$.user.name') as reviewer_name,
            CAST(JSON_VALUE(review, '$.user.reviews') AS INT64) as reviewer_total_reviews,
            CAST(JSON_VALUE(review, '$.likes') AS INT64) as likes
        FROM `{table_name}`,
        UNNEST(JSON_EXTRACT_ARRAY(reviews)) as review
        WHERE place_id = '{sample_place_id}'
        LIMIT 10
        """
        
        reviews_df = client.query(reviews_query).to_dataframe()
        display(reviews_df)
        
        print(f"\n‚úÖ Found {len(reviews_df)} reviews")
    else:
        print("‚ö†Ô∏è No data in table yet")
else:
    print("‚ùå Failed to connect to BigQuery")

## üè∑Ô∏è Step 11: View Topics Summary

In [None]:
# View topics across all places
client = get_bigquery_client()

if client:
    table_name = f"{PROJECT_ID}.{DATASET_ID}.{DESTINATION_TABLE}"
    
    print("üè∑Ô∏è Topic Analysis Across All Places\n")
    
    try:
        # Extract and aggregate topics
        topics_query = f"""
        SELECT 
            JSON_VALUE(topic, '$.name') as topic_name,
            SUM(CAST(JSON_VALUE(topic, '$.reviews') AS INT64)) as total_mentions,
            COUNT(DISTINCT place_id) as places_with_topic
        FROM `{table_name}`,
        UNNEST(JSON_EXTRACT_ARRAY(topics)) as topic
        GROUP BY topic_name
        ORDER BY total_mentions DESC
        LIMIT 20
        """
        
        topics_df = client.query(topics_query).to_dataframe()
        display(topics_df)
        
        print(f"\n‚úÖ Found {len(topics_df)} unique topics")
        
    except Exception as e:
        print(f"‚ùå Error analyzing topics: {e}")
else:
    print("‚ùå Failed to connect to BigQuery")

---

## üìö Additional Information

### How It Works:

#### **Data Collection Flow:**
1. Reads place_ids (CID) from `Map_location` table
2. For each place:
   - Fetches reviews from Google Reviews API
   - Follows pagination automatically (nextPageToken)
   - Aggregates all reviews, topics, and metadata
3. Stores complete data in `place_reviews_full` table
4. Runs incrementally (only processes new places)

#### **Table Schema:**
- `place_id` (STRING): Place CID from Google Maps
- `total_reviews` (INTEGER): Total number of reviews fetched
- `pages_fetched` (INTEGER): Number of API pages processed
- `reviews` (JSON): Complete array of all reviews with full data
- `topics` (JSON): Array of topics mentioned in reviews
- `metadata` (JSON): API metadata (searchParameters, credits, etc.)
- `timestamp` (TIMESTAMP): When data was fetched (UTC)
- `fetch_date` (DATE): Date of fetch

#### **Review Data Structure:**
Each review contains:
- rating, date, isoDate, snippet, likes
- user.name, user.link, user.thumbnail
- user.reviews (count), user.photos (count)

#### **Topic Data Structure:**
Each topic contains:
- name (e.g., "studying", "coffee", "wifi")
- reviews (count of mentions)
- id (Google topic identifier)

### Configuration:

**Rate Limiting:**
- 0.5 seconds between pages
- 1 second between places
- 3 retry attempts with 2-second delays

**Adjustable Parameters:**
```python
MAX_PAGES = 10        # Max pages per place
RETRY_ATTEMPTS = 3    # API retry attempts
RETRY_DELAY = 2       # Seconds between retries
```

### API Information:
- **Provider**: RapidAPI - Google Search Master Mega
- **Endpoint**: `/reviews`
- **Parameters**: cid, sortBy, gl, hl, page

### Tips:
1. **Test First**: Use Step 6 to test with a single place before batch processing
2. **Batch Limit**: Adjust `limit` parameter in Step 8 for controlled processing
3. **Monitor Quotas**: Check your RapidAPI usage limits
4. **Incremental**: Safe to run multiple times - only processes new places
5. **Query Data**: Use Steps 9-11 to analyze collected data

### Troubleshooting:
- **No place IDs**: Check `Map_location` table has `cid` column
- **API errors**: Verify RAPIDAPI_KEY is correct
- **Upload fails**: Check BigQuery credentials and permissions
- **Rate limits**: Increase delays between requests

---

**Created for Google Colab** | Last updated: 2025-11-05