# üåü Google Reviews Data Fetcher - Google Colab

This notebook fetches complete review data from Google Reviews API (via RapidAPI) and stores it in **FLATTENED** BigQuery format.

## Features:
- üìä Fetches ALL review data from Google Reviews API
- üóÇÔ∏è **FLATTENED structure**: Each review = One row with individual columns
- üîÑ Automatic pagination (follows nextPageToken)
- üíæ Stores in structured BigQuery table (no JSON!)
- ‚ö° Incremental processing (only new places)
- üõ°Ô∏è Robust error handling and retries
- üìà Progress tracking and logging

## üì¶ Step 1: Install Required Packages

In [None]:
!pip install -q google-cloud-bigquery google-auth pandas db-dtypes
print("‚úÖ All packages installed successfully!")

## üîß Step 2: Import Libraries

In [None]:
import os
import json
import logging
import http.client
import time
import pandas as pd
from datetime import datetime, timezone
from typing import Optional, Dict, Any, List
from google.oauth2 import service_account
from google.cloud import bigquery
from google.colab import userdata

# Setup logging
logging.basicConfig(
    level=logging.INFO,
    format='%(asctime)s - %(levelname)s - %(message)s'
)
logger = logging.getLogger(__name__)

print("‚úÖ Libraries imported successfully!")

## üîë Step 3: Configure API Credentials

### Option A: Using Colab Secrets (Recommended)
1. Click on the üîë key icon in the left sidebar
2. Add a secret named `RAPIDAPI_KEY` with your API key
3. Add a secret named `BIGQUERY_KEY_JSON` with your service account JSON

### Option B: Manual Configuration
Uncomment and fill in the credentials below

In [None]:
# Try to get credentials from Colab secrets first
try:
    RAPIDAPI_KEY = userdata.get('RAPIDAPI_KEY')
    print("‚úÖ RapidAPI key loaded from Colab secrets")
except:
    # Manual configuration
    RAPIDAPI_KEY = "ac0025f410mshd0c260cb60f3db6p18c4b0jsnc9b7413cd574"
    print("‚ö†Ô∏è RapidAPI key loaded from manual configuration")

# Load BigQuery credentials from secrets
try:
    BIGQUERY_CREDENTIALS_STR = userdata.get('BIGQUERY_KEY_JSON')
    BIGQUERY_CREDENTIALS = json.loads(BIGQUERY_CREDENTIALS_STR)
    print("‚úÖ BigQuery credentials loaded from Colab secrets")
    PROJECT_ID = BIGQUERY_CREDENTIALS.get('project_id', 'shopper-reviews-477306')
except:
    # Fallback to manual configuration
    print("‚ö†Ô∏è BigQuery credentials loaded from manual configuration")
    PROJECT_ID = "shopper-reviews-477306"
    BIGQUERY_CREDENTIALS = {
        "type": "service_account",
        "project_id": "shopper-reviews-477306",
        "private_key_id": "679b00310997262ff77901f080075b509eb9c770",
        "private_key": "-----BEGIN PRIVATE KEY-----\nMIIEvQIBADANBgkqhkiG9w0BAQEFAASCBKcwggSjAgEAAoIBAQCPrVXmepJWx8A8\nXLqDARbLqqmgPwQ4NEmCCOmAZ019aFToc0Yho0/hDyMhRhsW6z/5h8YVEbheb2oR\nmFK6/v3UEN1Mf6oJWag9pBngM6IO96QAzozjXjCmIVYJku1HWi+7b4mX7La8p77N\n5fJdOh30ceC6cJSDA51r2xGJDmchRPNhRR8CS9u3xAeZZeB/pgShwJcLM4WY4L3P\niwc7qkQb91NPbB2/p3hL/JJAtCvVKf61xlWGOKEGW3pIwBUUcF2/OJ3FTuWrY7P8\n1c/Kz9LUYOZpztK9zjFCNcnCQvvVAow9bqg3fw6xqE172dQT1FG6AieFSCyUib5B\nXxwNu0phAgMBAAECggEAET1ThPqIxqA54RmgnjQqP7k0Q0XBxDCvRUq7zIFuBdyC\nm6Wr8OtUnAT3Snh2qv2tSSFRKO6zDaRsDhJrPYQigX3zNR5Nu8jQlseIUfjqusWy\nHbqq+GPb4y3gJ06Zk/8uolyUHkZJTZe0cvuNZOxNSIBwM6QV3dE4OVx+3SV88GZ/\nOkAMCUpPRLJux6vJo+l0Qcfe074qjRYPv3XUaGXyHXeOZXmze/lLF6wsEzZmP1A+\nE9xZmP4ucM3ybrYi3ipRu6YwuR2mRASLy8VFMtcYCvNZGv6ODkjF2xmpucHwX78S\nzO3mGFES3Hnknjzoif5sJuBewNSztXJcQqKgtSpDhQKBgQDCS6bYj1VR691J5wxA\n5/fl2MwY4ALIKqW4RtJyNRBZ7+WDAVkq99R6lz+AmQsb6QyiZ/yTZHSUI61Bjn0p\nd2MD/fpQle7ZOMyR1gKZk5fE5lvmfA5sK+Aax3dRI7xjPBXJYI4hiCMAxgYdhgtI\nG1C/Nf6O2HoE/W2qLEnLZadpowKBgQC9Tl+/9Eq9Q/DI74CG78U0+s2aRq19vsXZ\n+wCIUm54TcN9xw4nPKYbT24nTVwTrOu2bxEgDVmuAqtWlKGad16LqZFTZ2aUaEFC\ni1HL8UKSy5XmNcum8mrKL5+MvwExcQUSmalE3PEQDRjV65QNld0EbQ6JNz74025z\nm+3ISpIEKwKBgADf5E1fP8wRmrplbtmv8Z64PhryjzCleH9+2h2nfX5aJRdU3zjh\nSrSOj7uddL5YazUj8LAdKKUuD+6WnJueLPTspL7OHfgeWFVjuDlGv80kGE/OSSZV\ngDm+ohvcZFGyCIsSgzFFcprjSU3Ct7RIYzGpJY8xDEOPfHninyZqO7mvAoGAIsog\ndppikd3Ghmbda+7sgwwEdPHAOHeyzJiARI1BmAJShu7p/vP6YtJ6H+broQIKX4CR\n2R4a+QusiUDPYh/F1EzZVEaQZ32xYJVR9vTjky6u4ZvJTWkHjxipbag8g+WNVRnA\nLdOcyaJeihG9J7H+6C1Smoz4manhhoWFcWWi5/kCgYEAssgWnlZCygCjEQ/XDVtZ\nC8/uelJnMHO93U4yF6Xk61gazKYpXpKjNkD3xfxAyQ3zkBkWo7CXg1env8pT9ld1\nraWCeCmH/w8i0ww3Cmplks5mXIYPrPPuUCEW5D6B8hIyNC1VIoaOlva8+FgJYPIv\nC5AqN3hBRDOUbophIQmAe5I=\n-----END PRIVATE KEY-----\n",
        "client_email": "demand@shopper-reviews-477306.iam.gserviceaccount.com",
        "client_id": "100956109416744224832",
        "auth_uri": "https://accounts.google.com/o/oauth2/auth",
        "token_uri": "https://oauth2.googleapis.com/token",
        "auth_provider_x509_cert_url": "https://www.googleapis.com/oauth2/v1/certs",
        "client_x509_cert_url": "https://www.googleapis.com/robot/v1/metadata/x509/demand%40shopper-reviews-477306.iam.gserviceaccount.com",
        "universe_domain": "googleapis.com"
    }

# BigQuery Configuration
DATASET_ID = "place_data"
SOURCE_TABLE = "Map_location"  # Source table (reads from 'cid' column)
DESTINATION_TABLE = "place_reviews_full"  # Table to store FLATTENED reviews

# API Configuration
API_HOST = "google-search-master-mega.p.rapidapi.com"
MAX_PAGES = 10  # Maximum pages to fetch per place
RETRY_ATTEMPTS = 3
RETRY_DELAY = 2  # seconds

print("\n‚úÖ All configuration loaded!")
print(f"üìä Source Table: {PROJECT_ID}.{DATASET_ID}.{SOURCE_TABLE}")
print(f"üìä Destination Table: {PROJECT_ID}.{DATASET_ID}.{DESTINATION_TABLE}")
print("\nüóÇÔ∏è Schema: FLATTENED - Each review = One row")

## üõ†Ô∏è Step 4: Define Core Functions

In [None]:
# ==================== BIGQUERY CLIENT ====================

def get_bigquery_client() -> Optional[bigquery.Client]:
    """Creates and returns a BigQuery client."""
    try:
        credentials = service_account.Credentials.from_service_account_info(
            BIGQUERY_CREDENTIALS,
            scopes=["https://www.googleapis.com/auth/cloud-platform"],
        )
        client = bigquery.Client(credentials=credentials, project=PROJECT_ID)
        logger.info(f"‚úÖ Connected to BigQuery project: {PROJECT_ID}")
        return client
    except Exception as e:
        logger.error(f"‚ùå Error creating BigQuery client: {e}")
        return None


# ==================== API FUNCTIONS ====================

def fetch_reviews_for_place(place_id: str, page: int = 1) -> Optional[Dict[str, Any]]:
    """Fetches review data for a single page from Google Reviews API."""
    for attempt in range(RETRY_ATTEMPTS):
        try:
            conn = http.client.HTTPSConnection(API_HOST)
            
            headers = {
                'x-rapidapi-key': RAPIDAPI_KEY,
                'x-rapidapi-host': API_HOST
            }
            
            params = f"?cid={place_id}&sortBy=mostRelevant&gl=us&hl=en&page={page}"
            endpoint = "/reviews" + params
            
            logger.info(f"üì° Fetching page {page} for CID {place_id}...")
            
            conn.request("GET", endpoint, headers=headers)
            res = conn.getresponse()
            data = res.read()
            
            if res.status == 200:
                result = json.loads(data.decode("utf-8"))
                logger.info(f"‚úÖ Page {page} fetched successfully")
                return result
            else:
                logger.warning(f"‚ö†Ô∏è API status {res.status}, attempt {attempt + 1}/{RETRY_ATTEMPTS}")
                if attempt < RETRY_ATTEMPTS - 1:
                    time.sleep(RETRY_DELAY)
                    
        except Exception as e:
            logger.error(f"‚ùå Error: {e}, attempt {attempt + 1}/{RETRY_ATTEMPTS}")
            if attempt < RETRY_ATTEMPTS - 1:
                time.sleep(RETRY_DELAY)
    
    return None


def fetch_all_reviews_for_place(place_id: str) -> Dict[str, Any]:
    """Fetches ALL reviews for a place by following pagination."""
    all_reviews = []
    all_topics = []
    metadata = {}
    page = 1
    
    logger.info(f"üîç Fetching all reviews for CID {place_id}...")
    
    while page <= MAX_PAGES:
        result = fetch_reviews_for_place(place_id, page)
        
        if not result:
            logger.warning(f"‚ö†Ô∏è No data for page {page}, stopping")
            break
        
        reviews = result.get('reviews', [])
        all_reviews.extend(reviews)
        
        if page == 1:
            all_topics = result.get('topics', [])
            metadata = {
                'searchParameters': result.get('searchParameters', {}),
                'credits': result.get('credits', 0),
            }
        
        logger.info(f"‚úÖ Page {page}: {len(reviews)} reviews")
        
        next_page_token = result.get('nextPageToken')
        if not next_page_token or len(reviews) == 0:
            logger.info(f"‚úÖ All pages fetched (stopped at page {page})")
            break
        
        page += 1
        time.sleep(0.5)
    
    logger.info(f"üéâ Total: {len(all_reviews)} reviews, {len(all_topics)} topics")
    
    return {
        'place_id': place_id,
        'total_reviews': len(all_reviews),
        'reviews': all_reviews,
        'topics': all_topics,
        'metadata': metadata,
        'pages_fetched': page
    }


# ==================== DATA FLATTENING ====================

def flatten_reviews_to_rows(review_data: Dict[str, Any]) -> pd.DataFrame:
    """
    Flattens review data into individual rows.
    Each review becomes ONE row with all fields as separate columns.
    
    Args:
        review_data: Dictionary containing review data from API
        
    Returns:
        DataFrame with flattened review rows
    """
    place_id = review_data['place_id']
    reviews = review_data['reviews']
    current_time = datetime.now(timezone.utc)
    current_date = current_time.date()
    
    rows = []
    
    for review in reviews:
        # Extract user data safely
        user = review.get('user', {})
        
        # Parse ISO date
        iso_date = review.get('isoDate')
        try:
            iso_timestamp = datetime.fromisoformat(iso_date.replace('Z', '+00:00')) if iso_date else None
        except:
            iso_timestamp = None
        
        # Create flattened row - each review = one row
        row = {
            'place_id': place_id,
            'rating': review.get('rating'),
            'date': review.get('date'),
            'isoDate': iso_timestamp,
            'snippet': review.get('snippet'),
            'likes': review.get('likes'),
            'reviewer_name': user.get('name'),
            'reviewer_link': user.get('link'),
            'reviewer_thumbnail': user.get('thumbnail'),
            'reviewer_reviews': user.get('reviews'),
            'reviewer_photos': user.get('photos'),
            'timestamp': current_time,
            'fetch_date': current_date,
        }
        
        rows.append(row)
    
    df = pd.DataFrame(rows)
    
    logger.info(f"‚úÖ Flattened {len(rows)} reviews into individual rows")
    return df


# ==================== BIGQUERY OPERATIONS ====================

def get_place_ids_to_process(client: bigquery.Client, limit: int = None) -> List[str]:
    """Retrieves CIDs from Map_location table that need reviews fetched."""
    source_table = f"{PROJECT_ID}.{DATASET_ID}.{SOURCE_TABLE}"
    
    try:
        dest_table = f"{PROJECT_ID}.{DATASET_ID}.{DESTINATION_TABLE}"
        
        try:
            client.get_table(dest_table)
            # Table exists, exclude already processed places
            query = f"""
            SELECT DISTINCT cid as place_id
            FROM `{source_table}`
            WHERE cid IS NOT NULL
            AND cid NOT IN (
                SELECT DISTINCT place_id
                FROM `{dest_table}`
                WHERE place_id IS NOT NULL
            )
            """
            if limit:
                query += f" LIMIT {limit}"
            logger.info("üìä Reading 'cid' column from Map_location...")
        except:
            # Table doesn't exist yet
            query = f"""
            SELECT DISTINCT cid as place_id
            FROM `{source_table}`
            WHERE cid IS NOT NULL
            """
            if limit:
                query += f" LIMIT {limit}"
            logger.info("üìä Reading all CIDs from Map_location...")
        
        result = client.query(query).to_dataframe()
        place_ids = result['place_id'].tolist()
        
        logger.info(f"‚úÖ Found {len(place_ids)} CID(s) to process")
        return place_ids
        
    except Exception as e:
        logger.error(f"‚ùå Error fetching CIDs: {e}")
        return []


def create_reviews_table_if_not_exists(client: bigquery.Client) -> bool:
    """
    Creates FLATTENED reviews table.
    Schema: Each review = One row with individual columns.
    """
    table_id = f"{PROJECT_ID}.{DATASET_ID}.{DESTINATION_TABLE}"
    
    try:
        try:
            client.get_table(table_id)
            logger.info(f"‚úÖ Table {DESTINATION_TABLE} already exists")
            return True
        except:
            pass
        
        # FLATTENED schema - each review is a separate row
        schema = [
            bigquery.SchemaField("place_id", "STRING", mode="REQUIRED"),
            bigquery.SchemaField("rating", "INTEGER"),
            bigquery.SchemaField("date", "STRING"),
            bigquery.SchemaField("isoDate", "TIMESTAMP"),
            bigquery.SchemaField("snippet", "STRING"),
            bigquery.SchemaField("likes", "INTEGER"),
            bigquery.SchemaField("reviewer_name", "STRING"),
            bigquery.SchemaField("reviewer_link", "STRING"),
            bigquery.SchemaField("reviewer_thumbnail", "STRING"),
            bigquery.SchemaField("reviewer_reviews", "INTEGER"),
            bigquery.SchemaField("reviewer_photos", "INTEGER"),
            bigquery.SchemaField("timestamp", "TIMESTAMP"),
            bigquery.SchemaField("fetch_date", "DATE"),
        ]
        
        table = bigquery.Table(table_id, schema=schema)
        table = client.create_table(table)
        
        logger.info(f"‚úÖ Created FLATTENED table {DESTINATION_TABLE}")
        print(f"\nüóÇÔ∏è Table created: {DESTINATION_TABLE}")
        print(f"üìã Schema: FLATTENED (Each review = One row)")
        print(f"üìä Columns: place_id, rating, date, snippet, reviewer_name, etc.")
        return True
        
    except Exception as e:
        logger.error(f"‚ùå Error creating table: {e}")
        return False


def upload_review_data_to_bigquery(client: bigquery.Client, review_data: Dict[str, Any]) -> bool:
    """
    Uploads FLATTENED review data to BigQuery.
    Each review is stored as a separate row.
    """
    table_id = f"{PROJECT_ID}.{DATASET_ID}.{DESTINATION_TABLE}"
    
    try:
        # Flatten reviews into individual rows
        df = flatten_reviews_to_rows(review_data)
        
        if df.empty:
            logger.warning("No reviews to upload")
            return False
        
        # Upload to BigQuery
        job_config = bigquery.LoadJobConfig(
            write_disposition="WRITE_APPEND",
        )
        
        logger.info(f"Uploading {len(df)} review row(s)...")
        job = client.load_table_from_dataframe(df, table_id, job_config=job_config)
        job.result()
        
        logger.info(f"‚úÖ Uploaded {len(df)} review row(s) for place {review_data['place_id']}")
        return True
        
    except Exception as e:
        logger.error(f"‚ùå Error uploading: {e}")
        return False

print("‚úÖ All functions defined successfully!")
print("\nüìã Features:")
print("  üóÇÔ∏è FLATTENED schema (each review = one row)")
print("  üìä Individual columns for all review fields")
print("  ‚úÖ Reads from 'cid' column in Map_location")
print("  üîÑ Incremental processing (only new places)")

## üîç Step 5: Check Current Status

In [None]:
# Check current status
client = get_bigquery_client()

if client:
    print("üìä Checking current status...\n")
    
    # Check source table
    source_table = f"{PROJECT_ID}.{DATASET_ID}.{SOURCE_TABLE}"
    try:
        table = client.get_table(source_table)
        print(f"‚úÖ Source table exists: {SOURCE_TABLE}")
        print(f"   Total rows: {table.num_rows:,}")
        
        query = f"SELECT COUNT(DISTINCT cid) as count FROM `{source_table}` WHERE cid IS NOT NULL"
        result = client.query(query).to_dataframe()
        print(f"   Places with CID: {result['count'].iloc[0]:,}")
    except Exception as e:
        print(f"‚ùå Source table error: {e}")
    
    print()
    
    # Check destination table
    dest_table = f"{PROJECT_ID}.{DATASET_ID}.{DESTINATION_TABLE}"
    try:
        table = client.get_table(dest_table)
        print(f"‚úÖ Destination table exists: {DESTINATION_TABLE}")
        print(f"   Schema: FLATTENED (each review = one row)")
        print(f"   Total review rows: {table.num_rows:,}")
        
        query = f"""
        SELECT 
            COUNT(DISTINCT place_id) as places,
            COUNT(*) as total_reviews,
            AVG(rating) as avg_rating,
            MAX(timestamp) as last_fetch
        FROM `{dest_table}`
        """
        result = client.query(query).to_dataframe()
        print(f"   Places processed: {result['places'].iloc[0]:,}")
        print(f"   Total reviews: {result['total_reviews'].iloc[0]:,}")
        print(f"   Avg rating: {result['avg_rating'].iloc[0]:.2f} ‚≠ê")
        print(f"   Last fetch: {result['last_fetch'].iloc[0]}")
        
    except:
        print(f"‚ö†Ô∏è Destination table doesn't exist (will be created)")
        print(f"   Will use FLATTENED schema: each review = one row")
    
    print("\n" + "="*60)
else:
    print("‚ùå Failed to connect to BigQuery")

## üöÄ Step 6: Fetch Reviews - Single Place (Test)

Test fetching and flattening reviews for a single place.

In [None]:
# Test with a single place ID
test_place_id = "7632417579134624850"  # Your test CID

print(f"üß™ Testing with CID: {test_place_id}\n")

# Fetch reviews
review_data = fetch_all_reviews_for_place(test_place_id)

print("\n" + "="*60)
print("üìä API RESULTS")
print("="*60)
print(f"Place ID: {review_data['place_id']}")
print(f"Total Reviews: {review_data['total_reviews']}")
print(f"Pages Fetched: {review_data['pages_fetched']}")
print(f"Topics: {len(review_data['topics'])}")

# Flatten to DataFrame
print("\nüìã Flattening reviews to table format...")
df_flattened = flatten_reviews_to_rows(review_data)

print("\n" + "="*60)
print("üóÇÔ∏è FLATTENED DATA PREVIEW")
print("="*60)
print(f"Total rows (one per review): {len(df_flattened)}")
print(f"\nColumns: {list(df_flattened.columns)}")

print("\nüìä First 5 reviews (flattened):")
display(df_flattened[['rating', 'date', 'reviewer_name', 'snippet']].head())

print("\n‚úÖ Data successfully flattened!")
print("Each review is now a separate row with individual columns.")

## üì§ Step 7: Upload Test Data to BigQuery

In [None]:
# Upload flattened test data
client = get_bigquery_client()

if client and 'review_data' in locals():
    print("üì§ Uploading FLATTENED test data to BigQuery...\n")
    
    if create_reviews_table_if_not_exists(client):
        if upload_review_data_to_bigquery(client, review_data):
            print("\n‚úÖ Test data uploaded successfully!")
            print(f"üìä Table: {PROJECT_ID}.{DATASET_ID}.{DESTINATION_TABLE}")
            print(f"üóÇÔ∏è Format: {review_data['total_reviews']} reviews = {review_data['total_reviews']} rows")
        else:
            print("\n‚ùå Failed to upload")
    else:
        print("\n‚ùå Failed to create table")
else:
    print("‚ùå No data or client unavailable")

## üîÑ Step 8: Batch Process All Places

Process all places and store reviews in **FLATTENED** format (each review = one row).

In [None]:
# Batch process all places
client = get_bigquery_client()

if not client:
    print("‚ùå Failed to connect to BigQuery")
else:
    print("üöÄ Starting batch processing...\n")
    
    if not create_reviews_table_if_not_exists(client):
        print("‚ùå Failed to create table")
    else:
        # Get CIDs to process
        place_ids = get_place_ids_to_process(client, limit=5)  # Remove limit for full run
        
        if not place_ids:
            print("‚úÖ No new places to process!")
        else:
            print(f"üìä Processing {len(place_ids)} place(s)...\n")
            
            successful = 0
            failed = 0
            skipped = 0
            total_review_rows = 0
            
            for idx, place_id in enumerate(place_ids, 1):
                print("\n" + "="*60)
                print(f"üìç Place {idx}/{len(place_ids)}: {place_id}")
                print("="*60)
                
                try:
                    review_data = fetch_all_reviews_for_place(place_id)
                    
                    if review_data['total_reviews'] == 0:
                        print(f"‚ö†Ô∏è No reviews found, skipping")
                        skipped += 1
                        continue
                    
                    if upload_review_data_to_bigquery(client, review_data):
                        successful += 1
                        total_review_rows += review_data['total_reviews']
                        print(f"‚úÖ Success: {review_data['total_reviews']} review rows uploaded")
                        print(f"   üìä {len(review_data['topics'])} topics found")
                    else:
                        failed += 1
                        print(f"‚ùå Upload failed")
                        
                except KeyboardInterrupt:
                    print(f"\n‚ö†Ô∏è Interrupted! Progress: {successful} done, {failed} failed")
                    break
                    
                except Exception as e:
                    failed += 1
                    print(f"‚ùå Error: {e}")
                
                if idx < len(place_ids):
                    time.sleep(1)
            
            # Summary
            print("\n" + "="*60)
            print("üìä SUMMARY")
            print("="*60)
            print(f"‚úÖ Successful: {successful} places")
            print(f"‚ùå Failed: {failed} places")
            print(f"‚è≠Ô∏è Skipped: {skipped} places")
            print(f"üìä Total Review Rows: {total_review_rows:,}")
            if successful > 0:
                print(f"üìä Avg Reviews/Place: {total_review_rows/successful:.1f}")
            print("="*60)

## üìä Step 9: Query Flattened Review Data

In [None]:
# Query the flattened review data
client = get_bigquery_client()

if client:
    table_name = f"{PROJECT_ID}.{DATASET_ID}.{DESTINATION_TABLE}"
    
    try:
        print("üìä Flattened Review Data Statistics\n")
        
        stats_query = f"""
        SELECT 
            COUNT(DISTINCT place_id) as total_places,
            COUNT(*) as total_review_rows,
            AVG(rating) as avg_rating,
            COUNT(DISTINCT reviewer_name) as unique_reviewers,
            MAX(timestamp) as last_fetch
        FROM `{table_name}`
        """
        
        stats = client.query(stats_query).to_dataframe()
        display(stats)
        
        print("\nüìà Sample Review Rows:")
        sample_query = f"""
        SELECT 
            place_id,
            rating,
            date,
            reviewer_name,
            LEFT(snippet, 100) as snippet_preview
        FROM `{table_name}`
        ORDER BY timestamp DESC
        LIMIT 10
        """
        
        samples = client.query(sample_query).to_dataframe()
        display(samples)
        
        print("\n‚úÖ Query completed!")
        
    except Exception as e:
        print(f"‚ùå Error: {e}")
else:
    print("‚ùå No client")

## üìà Step 10: Analyze Review Data

In [None]:
# Analyze flattened review data
client = get_bigquery_client()

if client:
    table_name = f"{PROJECT_ID}.{DATASET_ID}.{DESTINATION_TABLE}"
    
    print("üìä Review Analysis\n")
    
    # Rating distribution
    print("‚≠ê Rating Distribution:")
    rating_query = f"""
    SELECT 
        rating,
        COUNT(*) as count,
        ROUND(COUNT(*) * 100.0 / SUM(COUNT(*)) OVER(), 2) as percentage
    FROM `{table_name}`
    WHERE rating IS NOT NULL
    GROUP BY rating
    ORDER BY rating DESC
    """
    
    try:
        ratings = client.query(rating_query).to_dataframe()
        display(ratings)
        
        # Top reviewers
        print("\nüë• Top Reviewers:")
        reviewers_query = f"""
        SELECT 
            reviewer_name,
            COUNT(*) as reviews_in_dataset,
            AVG(rating) as avg_rating,
            MAX(reviewer_reviews) as total_google_reviews
        FROM `{table_name}`
        WHERE reviewer_name IS NOT NULL
        GROUP BY reviewer_name
        ORDER BY reviews_in_dataset DESC
        LIMIT 10
        """
        
        reviewers = client.query(reviewers_query).to_dataframe()
        display(reviewers)
        
    except Exception as e:
        print(f"‚ùå Error: {e}")
else:
    print("‚ùå No client")

---

## üìö Documentation

### Table Schema (FLATTENED):

**Table:** `place_reviews_full`

**Structure:** Each review = One row

| Column | Type | Description |
|--------|------|-------------|
| `place_id` | STRING | Place CID from Map_location |
| `rating` | INTEGER | Review rating (1-5) |
| `date` | STRING | Relative date (e.g., "2 months ago") |
| `isoDate` | TIMESTAMP | ISO 8601 timestamp |
| `snippet` | STRING | Full review text |
| `likes` | INTEGER | Number of likes |
| `reviewer_name` | STRING | Reviewer name |
| `reviewer_link` | STRING | Reviewer profile link |
| `reviewer_thumbnail` | STRING | Reviewer profile image |
| `reviewer_reviews` | INTEGER | Reviewer's total reviews |
| `reviewer_photos` | INTEGER | Reviewer's total photos |
| `timestamp` | TIMESTAMP | When inserted to BigQuery |
| `fetch_date` | DATE | Date of fetch |

### Example Queries:

**Get all reviews for a place:**
```sql
SELECT *
FROM `shopper-reviews-477306.place_data.place_reviews_full`
WHERE place_id = '7632417579134624850'
ORDER BY isoDate DESC
```

**Get average rating per place:**
```sql
SELECT 
    place_id,
    COUNT(*) as review_count,
    AVG(rating) as avg_rating
FROM `shopper-reviews-477306.place_data.place_reviews_full`
GROUP BY place_id
ORDER BY review_count DESC
```

**Find reviews by rating:**
```sql
SELECT place_id, reviewer_name, rating, snippet
FROM `shopper-reviews-477306.place_data.place_reviews_full`
WHERE rating = 5
LIMIT 100
```

### How It Works:

1. **Reads CIDs** from `Map_location.cid` column
2. **Fetches reviews** via Google Reviews API (with pagination)
3. **Flattens data**: Each review becomes one row
4. **Uploads to BigQuery**: Structured table format
5. **Incremental**: Only processes new places

### Key Benefits:

‚úÖ **No JSON parsing** needed - direct column access
‚úÖ **Easy queries** - standard SQL on individual columns
‚úÖ **Better performance** - indexed columns, faster queries
‚úÖ **Clear structure** - one review per row

---

**Created for Google Colab** | **Last updated: 2025-11-05** | **Version: 2.0 - Flattened Schema**