# üó∫Ô∏è Map Location Data Collector - Google Colab

This notebook fetches location data from RapidAPI and uploads it to Google BigQuery.

## Features:
- üîç Search for places using RapidAPI Google Maps API
- üíæ Save data to BigQuery or CSV
- üìä Interactive and batch processing modes
- üöÄ In-memory caching for efficient API usage
- ‚ú® Automatic table creation on first run, append on subsequent runs

## üì¶ Step 1: Install Required Packages

In [None]:
!pip install -q requests pandas google-cloud-bigquery google-auth db-dtypes
print("‚úÖ All packages installed successfully!")

## üîß Step 2: Import Libraries

In [None]:
import os
import json
import logging
import requests
import pandas as pd
from typing import Optional, Dict, Any, List
from google.oauth2 import service_account
from google.cloud import bigquery
from google.colab import userdata

# Setup logging
logging.basicConfig(
    level=logging.INFO,
    format='%(asctime)s - %(levelname)s - %(message)s'
)
logger = logging.getLogger(__name__)

# In-memory cache
API_CACHE: Dict[str, Any] = {}

print("‚úÖ Libraries imported successfully!")

## üîë Step 3: Configure API Credentials

### Option A: Using Colab Secrets (Recommended)
1. Click on the üîë key icon in the left sidebar
2. Add a secret named `RAPIDAPI_KEY` with your API key
3. Add a secret named `BIGQUERY_CREDENTIALS` with your service account JSON

### Option B: Manual Configuration
Uncomment and fill in the credentials below

In [None]:
# Try to get credentials from Colab secrets first
try:
    RAPIDAPI_KEY = userdata.get('RAPIDAPI_KEY')
    print("‚úÖ RapidAPI key loaded from Colab secrets")
except:
    # Manual configuration - uncomment and fill in
    RAPIDAPI_KEY = "ac0025f410mshd0c260cb60f3db6p18c4b0jsnc9b7413cd574"  # Your API key
    print("‚úÖ RapidAPI key loaded from manual configuration")

# BigQuery Configuration
PROJECT_ID = "shopper-reviews-477306"
DATASET_ID = "place_data"
TABLE_ID = "Map_location"

# BigQuery credentials JSON
BIGQUERY_CREDENTIALS = {
    "type": "service_account",
    "project_id": "shopper-reviews-477306",
    "private_key_id": "679b00310997262ff77901f080075b509eb9c770",
    "private_key": "-----BEGIN PRIVATE KEY-----\nMIIEvQIBADANBgkqhkiG9w0BAQEFAASCBKcwggSjAgEAAoIBAQCPrVXmepJWx8A8\nXLqDARbLqqmgPwQ4NEmCCOmAZ019aFToc0Yho0/hDyMhRhsW6z/5h8YVEbheb2oR\nmFK6/v3UEN1Mf6oJWag9pBngM6IO96QAzozjXjCmIVYJku1HWi+7b4mX7La8p77N\n5fJdOh30ceC6cJSDA51r2xGJDmchRPNhRR8CS9u3xAeZZeB/pgShwJcLM4WY4L3P\niwc7qkQb91NPbB2/p3hL/JJAtCvVKf61xlWGOKEGW3pIwBUUcF2/OJ3FTuWrY7P8\n1c/Kz9LUYOZpztK9zjFCNcnCQvvVAow9bqg3fw6xqE172dQT1FG6AieFSCyUib5B\nXxwNu0phAgMBAAECggEAET1ThPqIxqA54RmgnjQqP7k0Q0XBxDCvRUq7zIFuBdyC\nm6Wr8OtUnAT3Snh2qv2tSSFRKO6zDaRsDhJrPYQigX3zNR5Nu8jQlseIUfjqusWy\nHbqq+GPb4y3gJ06Zk/8uolyUHkZJTZe0cvuNZOxNSIBwM6QV3dE4OVx+3SV88GZ/\nOkAMCUpPRLJux6vJo+l0Qcfe074qjRYPv3XUaGXyHXeOZXmze/lLF6wsEzZmP1A+\nE9xZmP4ucM3ybrYi3ipRu6YwuR2mRASLy8VFMtcYCvNZGv6ODkjF2xmpucHwX78S\nzO3mGFES3Hnknjzoif5sJuBewNSztXJcQqKgtSpDhQKBgQDCS6bYj1VR691J5wxA\n5/fl2MwY4ALIKqW4RtJyNRBZ7+WDAVkq99R6lz+AmQsb6QyiZ/yTZHSUI61Bjn0p\nd2MD/fpQle7ZOMyR1gKZk5fE5lvmfA5sK+Aax3dRI7xjPBXJYI4hiCMAxgYdhgtI\nG1C/Nf6O2HoE/W2qLEnLZadpowKBgQC9Tl+/9Eq9Q/DI74CG78U0+s2aRq19vsXZ\n+wCIUm54TcN9xw4nPKYbT24nTVwTrOu2bxEgDVmuAqtWlKGad16LqZFTZ2aUaEFC\ni1HL8UKSy5XmNcum8mrKL5+MvwExcQUSmalE3PEQDRjV65QNld0EbQ6JNz74025z\nm+3ISpIEKwKBgADf5E1fP8wRmrplbtmv8Z64PhryjzCleH9+2h2nfX5aJRdU3zjh\nSrSOj7uddL5YazUj8LAdKKUuD+6WnJueLPTspL7OHfgeWFVjuDlGv80kGE/OSSZV\ngDm+ohvcZFGyCIsSgzFFcprjSU3Ct7RIYzGpJY8xDEOPfHninyZqO7mvAoGAIsog\ndppikd3Ghmbda+7sgwwEdPHAOHeyzJiARI1BmAJShu7p/vP6YtJ6H+broQIKX4CR\n2R4a+QusiUDPYh/F1EzZVEaQZ32xYJVR9vTjky6u4ZvJTWkHjxipbag8g+WNVRnA\nLdOcyaJeihG9J7H+6C1Smoz4manhhoWFcWWi5/kCgYEAssgWnlZCygCjEQ/XDVtZ\nC8/uelJnMHO93U4yF6Xk61gazKYpXpKjNkD3xfxAyQ3zkBkWo7CXg1env8pT9ld1\nraWCeCmH/w8i0ww3Cmplks5mXIYPrPPuUCEW5D6B8hIyNC1VIoaOlva8+FgJYPIv\nC5AqN3hBRDOUbophIQmAe5I=\n-----END PRIVATE KEY-----\n",
    "client_email": "demand@shopper-reviews-477306.iam.gserviceaccount.com",
    "client_id": "100956109416744224832",
    "auth_uri": "https://accounts.google.com/o/oauth2/auth",
    "token_uri": "https://oauth2.googleapis.com/token",
    "auth_provider_x509_cert_url": "https://www.googleapis.com/oauth2/v1/certs",
    "client_x509_cert_url": "https://www.googleapis.com/robot/v1/metadata/x509/demand%40shopper-reviews-477306.iam.gserviceaccount.com",
    "universe_domain": "googleapis.com"
}

print("‚úÖ Credentials configured successfully!")
print(f"üìä Target Table: {PROJECT_ID}.{DATASET_ID}.{TABLE_ID}")

## üõ†Ô∏è Step 4: Define Core Functions

In [None]:
def search_by_place_name(place_name: str, api_key: str = None) -> Optional[Dict[str, Any]]:
    """
    Fetches data for a single query from the RapidAPI.
    
    Args:
        place_name: The place to search for
        api_key: RapidAPI key (uses global RAPIDAPI_KEY if not provided)
    
    Returns:
        Dictionary containing place data or None on error
    """
    if place_name in API_CACHE:
        logger.info(f"Loading '{place_name}' from cache")
        return API_CACHE[place_name]

    logger.info(f"Calling API for '{place_name}'")

    api_key = api_key or RAPIDAPI_KEY
    API_HOST = "google-search-master-mega.p.rapidapi.com"

    if not api_key:
        logger.error("RAPIDAPI_KEY not found")
        return None

    url = f"https://{API_HOST}/maps"
    querystring = {"q": place_name, "hl": "en", "page": "1"}
    headers = {"x-rapidapi-key": api_key, "x-rapidapi-host": API_HOST}

    try:
        response = requests.get(url, headers=headers, params=querystring, timeout=10)

        if response.status_code == 200:
            data = response.json()
            API_CACHE[place_name] = data
            logger.info(f"Successfully fetched data for '{place_name}'")
            return data
        else:
            logger.error(f"API returned status code {response.status_code}")
            logger.error(f"Response: {response.text}")
            return None

    except requests.exceptions.RequestException as e:
        logger.error(f"Request error for '{place_name}': {e}")
        return None


def collect_places_for_query(query: str) -> Optional[pd.DataFrame]:
    """
    Collects place data for a single query.
    
    Args:
        query: The place name to search for
    
    Returns:
        DataFrame with place data or None on error
    """
    results_data = search_by_place_name(query)

    if results_data and 'places' in results_data and results_data['places']:
        try:
            df = pd.json_normalize(results_data['places'])
            df['search_query'] = query
            logger.info(f"Collected {len(df)} places for '{query}'")
            return df
        except Exception as e:
            logger.error(f"Error processing data for '{query}': {e}")
            return None
    else:
        logger.warning(f"No 'places' found for '{query}'")
        return None


def collect_places_from_list(place_names: List[str]) -> Optional[pd.DataFrame]:
    """
    Collects place data for a list of place names.
    
    Args:
        place_names: List of place names to search for
    
    Returns:
        DataFrame with all collected place data or None if no data collected
    """
    all_dataframes_list: List[pd.DataFrame] = []

    for query in place_names:
        query = query.strip()
        if query:
            df = collect_places_for_query(query)
            if df is not None:
                all_dataframes_list.append(df)

    if not all_dataframes_list:
        logger.warning("No data was collected")
        return None

    return pd.concat(all_dataframes_list, ignore_index=True)


def combine_opening_hours(df: pd.DataFrame) -> pd.DataFrame:
    """
    Combines all openingHours columns into a single JSON string column.
    
    Finds columns like 'openingHours.Monday', 'openingHours.Tuesday', etc.
    and combines them into a single 'openingHours' column as a JSON string.
    Also cleans Unicode characters for better readability.
    
    Args:
        df: DataFrame with potentially separate openingHours columns
        
    Returns:
        DataFrame with combined openingHours column
    """
    df_copy = df.copy()
    
    # Find all columns that start with 'openingHours.'
    opening_hours_cols = [col for col in df_copy.columns if col.startswith('openingHours.')]
    
    if opening_hours_cols:
        logger.info(f"Combining {len(opening_hours_cols)} openingHours columns into one")
        
        def clean_hours_text(text):
            """Clean Unicode characters from opening hours text"""
            if not isinstance(text, str):
                return text
            
            # Replace Unicode characters with standard equivalents
            text = text.replace('\u202f', ' ')      # Narrow no-break space ‚Üí regular space
            text = text.replace('\u2013', '-')      # En dash ‚Üí hyphen
            text = text.replace('\u2014', '-')      # Em dash ‚Üí hyphen
            text = text.replace('\xa0', ' ')        # Non-breaking space ‚Üí regular space
            text = text.replace('\u2009', ' ')      # Thin space ‚Üí regular space
            
            # Remove multiple spaces
            text = ' '.join(text.split())
            
            return text
        
        # Create a new column with dictionary of all opening hours
        def combine_hours_row(row):
            hours_dict = {}
            for col in opening_hours_cols:
                # Extract day name (e.g., 'Monday' from 'openingHours.Monday')
                day = col.replace('openingHours.', '')
                value = row[col]
                # Only add if not null/empty
                if pd.notna(value) and value != '':
                    # Clean the value
                    cleaned_value = clean_hours_text(value)
                    hours_dict[day] = cleaned_value
            # Return as JSON string for BigQuery compatibility
            return json.dumps(hours_dict, ensure_ascii=False) if hours_dict else None
        
        # Create the combined column
        df_copy['openingHours'] = df_copy.apply(combine_hours_row, axis=1)
        
        # Drop the individual columns
        df_copy = df_copy.drop(columns=opening_hours_cols)
        
        logger.info(f"‚úÖ Combined openingHours columns into single JSON column")
    
    return df_copy


def sanitize_column_names(df: pd.DataFrame) -> pd.DataFrame:
    """
    Sanitizes DataFrame column names to be BigQuery-compatible.
    
    BigQuery column names must:
    - Contain only letters, numbers, and underscores
    - Start with a letter or underscore
    - Be at most 300 characters long
    
    Args:
        df: DataFrame with potentially invalid column names
        
    Returns:
        DataFrame with sanitized column names
    """
    import re
    
    new_columns = {}
    for col in df.columns:
        # Replace dots, spaces, and other special characters with underscores
        sanitized = re.sub(r'[^a-zA-Z0-9_]', '_', col)
        
        # Ensure it doesn't start with a number
        if sanitized and sanitized[0].isdigit():
            sanitized = '_' + sanitized
        
        # Ensure it's not empty
        if not sanitized:
            sanitized = 'column_' + str(df.columns.get_loc(col))
        
        # Limit to 300 characters
        sanitized = sanitized[:300]
        
        # Handle duplicates by appending number
        if sanitized in new_columns.values():
            counter = 1
            while f"{sanitized}_{counter}" in new_columns.values():
                counter += 1
            sanitized = f"{sanitized}_{counter}"
        
        new_columns[col] = sanitized
    
    df_copy = df.copy()
    df_copy.columns = [new_columns[col] for col in df.columns]
    
    logger.info(f"Sanitized {len([c for c in df.columns if c != new_columns[c]])} column names for BigQuery compatibility")
    
    return df_copy


def get_bigquery_client() -> Optional[bigquery.Client]:
    """
    Creates and returns a BigQuery client with proper credentials.
    
    Returns:
        BigQuery client or None on error
    """
    try:
        credentials = service_account.Credentials.from_service_account_info(
            BIGQUERY_CREDENTIALS,
            scopes=["https://www.googleapis.com/auth/cloud-platform"],
        )
        client = bigquery.Client(credentials=credentials, project=PROJECT_ID)
        logger.info(f"Connected to BigQuery project: {PROJECT_ID}")
        return client
    except Exception as e:
        logger.error(f"Error creating BigQuery client: {e}")
        return None


def check_table_exists(table_id: str = None) -> bool:
    """
    Checks if a BigQuery table exists.
    
    Args:
        table_id: Full table ID in format project.dataset.table
        
    Returns:
        True if table exists, False otherwise
    """
    client = get_bigquery_client()
    if not client:
        return False
    
    table_id = table_id or f"{PROJECT_ID}.{DATASET_ID}.{TABLE_ID}"
    
    try:
        client.get_table(table_id)
        logger.info(f"‚úÖ Table {table_id} exists")
        return True
    except Exception:
        logger.info(f"‚ö†Ô∏è Table {table_id} does not exist")
        return False


def get_existing_place_ids(table_id: str = None) -> set:
    """
    Retrieves all existing place IDs from BigQuery table.
    
    Args:
        table_id: Full table ID in format project.dataset.table
        
    Returns:
        Set of existing place IDs, empty set if table doesn't exist or on error
    """
    client = get_bigquery_client()
    if not client:
        return set()
    
    table_id = table_id or f"{PROJECT_ID}.{DATASET_ID}.{TABLE_ID}"
    
    # Check if table exists first
    if not check_table_exists(table_id):
        logger.info("Table doesn't exist yet, no existing place IDs to check")
        return set()
    
    try:
        # Query to get all place IDs (try common field names)
        # Try place_id first, then placeId, then id
        query = f"""
        SELECT DISTINCT 
            COALESCE(place_id, placeId, id) as place_id
        FROM `{table_id}`
        WHERE COALESCE(place_id, placeId, id) IS NOT NULL
        """
        
        result = client.query(query).result()
        existing_ids = {row.place_id for row in result}
        
        logger.info(f"Found {len(existing_ids)} existing place IDs in table")
        return existing_ids
        
    except Exception as e:
        logger.warning(f"Could not retrieve existing place IDs: {e}")
        logger.info("Proceeding without deduplication check")
        return set()


def remove_duplicate_places(df: pd.DataFrame, table_id: str = None) -> pd.DataFrame:
    """
    Removes rows with place IDs that already exist in BigQuery.
    
    Args:
        df: DataFrame with place data
        table_id: Full table ID in format project.dataset.table
        
    Returns:
        DataFrame with duplicate places removed
    """
    if df is None or df.empty:
        return df
    
    # Find place_id column (could be place_id, placeId, or id)
    place_id_col = None
    for col in ['place_id', 'placeId', 'id']:
        if col in df.columns:
            place_id_col = col
            break
    
    if place_id_col is None:
        logger.warning("No place_id column found in data, skipping deduplication")
        return df
    
    original_count = len(df)
    
    # Get existing place IDs from BigQuery
    existing_ids = get_existing_place_ids(table_id)
    
    if not existing_ids:
        logger.info("No existing place IDs to check, uploading all records")
        return df
    
    # Filter out rows with existing place IDs
    df_filtered = df[~df[place_id_col].isin(existing_ids)].copy()
    
    duplicates_removed = original_count - len(df_filtered)
    
    if duplicates_removed > 0:
        logger.info(f"üîç Removed {duplicates_removed} duplicate place(s) that already exist")
        logger.info(f"üì§ {len(df_filtered)} new place(s) to upload")
    else:
        logger.info(f"‚úÖ All {original_count} place(s) are new")
    
    return df_filtered


def create_bigquery_table(table_id: str = None, schema: List[bigquery.SchemaField] = None) -> bool:
    """
    Creates a new BigQuery table.
    
    Args:
        table_id: Full table ID in format project.dataset.table
        schema: List of SchemaField objects (optional, will auto-detect if not provided)
        
    Returns:
        True if creation successful, False otherwise
    """
    client = get_bigquery_client()
    if not client:
        return False
    
    table_id = table_id or f"{PROJECT_ID}.{DATASET_ID}.{TABLE_ID}"
    
    try:
        # Check if table already exists
        if check_table_exists(table_id):
            logger.info(f"Table {table_id} already exists, skipping creation")
            return True
        
        # Create table object
        table = bigquery.Table(table_id, schema=schema)
        
        # Create the table
        table = client.create_table(table)
        logger.info(f"‚úÖ Created table {table_id}")
        return True
    except Exception as e:
        logger.error(f"Error creating table: {e}")
        return False


def upload_to_bigquery(df: pd.DataFrame, table_id: str = None, create_if_needed: bool = True) -> bool:
    """
    Uploads a DataFrame to BigQuery.
    Creates the table on first run, then appends on subsequent runs.
    
    Args:
        df: DataFrame to upload
        table_id: Full table ID in format project.dataset.table
        create_if_needed: If True, creates table if it doesn't exist
        
    Returns:
        True if upload successful, False otherwise
    """
    if df is None or df.empty:
        logger.warning("Cannot upload empty DataFrame")
        return False
    
    client = get_bigquery_client()
    if not client:
        return False
    
    table_id = table_id or f"{PROJECT_ID}.{DATASET_ID}.{TABLE_ID}"
    
    # Combine openingHours columns into one
    df = combine_opening_hours(df)
    
    # Sanitize column names for BigQuery compatibility
    df = sanitize_column_names(df)
    
    # Check if table exists
    table_exists = check_table_exists(table_id)
    
    # Remove duplicates if table exists
    if table_exists:
        df = remove_duplicate_places(df, table_id)
        
        # If all records are duplicates, nothing to upload
        if df.empty:
            logger.info("‚ö†Ô∏è All records already exist in BigQuery. Nothing to upload.")
            return True
    
    if not table_exists:
        if create_if_needed:
            logger.info(f"Table does not exist. Creating table {table_id}...")
            # First, create table with schema from first batch of data
            job_config = bigquery.LoadJobConfig(
                write_disposition="WRITE_TRUNCATE",  # Create new table
                autodetect=True,  # Auto-detect schema
            )
        else:
            logger.error(f"Table {table_id} does not exist and create_if_needed=False")
            return False
    else:
        logger.info(f"Table exists. Appending data to {table_id}...")
        job_config = bigquery.LoadJobConfig(
            write_disposition="WRITE_APPEND",  # Append to existing table
            autodetect=False,  # Use existing schema
        )
    
    try:
        logger.info(f"Uploading {len(df)} rows to {table_id}")
        job = client.load_table_from_dataframe(df, table_id, job_config=job_config)
        job.result()  # Wait for the job to complete
        
        if not table_exists and create_if_needed:
            logger.info(f"‚úÖ Successfully created table and uploaded {len(df)} rows to {table_id}")
        else:
            logger.info(f"‚úÖ Successfully appended {len(df)} rows to {table_id}")
        return True
    except Exception as e:
        logger.error(f"Error uploading to BigQuery: {e}")
        return False


def save_to_csv(df: pd.DataFrame, output_path: str) -> bool:
    """
    Saves DataFrame to CSV file.
    
    Args:
        df: DataFrame to save
        output_path: Path to save CSV file
        
    Returns:
        True if save successful, False otherwise
    """
    if df is None or df.empty:
        logger.warning("Cannot save empty DataFrame")
        return False
    
    try:
        df.to_csv(output_path, index=False)
        logger.info(f"‚úÖ Data saved to {output_path}")
        return True
    except Exception as e:
        logger.error(f"Error saving to CSV: {e}")
        return False

print("‚úÖ All functions defined successfully!")

## üîç Step 5: Check BigQuery Table Status

In [None]:
# Check if the Map_location table exists
table_name = f"{PROJECT_ID}.{DATASET_ID}.{TABLE_ID}"
print(f"Checking table: {table_name}")
print()

exists = check_table_exists()

if exists:
    print(f"\n‚úÖ Table '{TABLE_ID}' exists!")
    print("Future uploads will APPEND data to this table.")
    
    # Get table info
    client = get_bigquery_client()
    if client:
        table = client.get_table(table_name)
        print(f"\nüìä Table Info:")
        print(f"  - Total rows: {table.num_rows:,}")
        print(f"  - Created: {table.created}")
        print(f"  - Modified: {table.modified}")
        print(f"  - Size: {table.num_bytes / (1024*1024):.2f} MB")
else:
    print(f"\n‚ö†Ô∏è Table '{TABLE_ID}' does NOT exist yet.")
    print("It will be created automatically on first data upload.")

## üöÄ Step 6: Usage Examples

### Option 1: Search for a Single Place

In [None]:
# Example: Search for restaurants in New York
query = "restaurants in New York"

df = collect_places_for_query(query)

if df is not None:
    print(f"\n‚úÖ Found {len(df)} places for '{query}'")
    print("\nFirst 5 results:")
    display(df.head())
    
    # Optionally save to CSV
    # save_to_csv(df, "single_query_results.csv")
    
    # Optionally upload to BigQuery
    # upload_to_bigquery(df)
else:
    print("‚ùå No data found")

### Option 2: Batch Search for Multiple Places

In [None]:
# Define your list of places to search
place_names = [
    "coffee shops in San Francisco",
    "hotels in Los Angeles",
    "museums in Chicago"
]

print(f"üîç Searching for {len(place_names)} locations...\n")

df = collect_places_from_list(place_names)

if df is not None:
    print(f"\n‚úÖ Collected {len(df)} total places")
    print(f"\nüìä Data Summary:")
    print(df['search_query'].value_counts())
    print("\nFirst 5 results:")
    display(df.head())
    
    # Save to CSV
    # save_to_csv(df, "batch_results.csv")
else:
    print("‚ùå No data collected")

### Option 3: Upload Results to BigQuery (Creates Table or Appends)

In [None]:
# Upload the DataFrame to BigQuery
# This will CREATE the table on first run, then APPEND on subsequent runs

if 'df' in locals() and df is not None:
    print(f"üì§ Uploading {len(df)} rows to BigQuery...\n")
    
    # Check if table exists before upload
    exists_before = check_table_exists()
    print()
    
    # Upload (will create table if needed, or append if it exists)
    success = upload_to_bigquery(df)
    
    if success:
        print(f"\n‚úÖ Upload successful!")
        print(f"\nüìä Table: {PROJECT_ID}.{DATASET_ID}.{TABLE_ID}")
        
        if not exists_before:
            print("\nüéâ Table was CREATED with this upload (first time)")
            print("Future uploads will APPEND to this table.")
        else:
            print("\nüìù Data was APPENDED to existing table")
    else:
        print("\n‚ùå Upload failed. Check logs above for details.")
else:
    print("‚ö†Ô∏è No data to upload. Please run a search first.")

### Option 4: Interactive Search (Input-based)

In [None]:
# Interactive search - enter places one by one
all_results = []

print("üîç Interactive Place Search")
print("Enter place names to search (or 'done' to finish)\n")

while True:
    query = input("Enter place name: ").strip()
    
    if query.lower() in ['done', 'exit', 'quit', '']:
        break
    
    df = collect_places_for_query(query)
    if df is not None:
        all_results.append(df)
        print(f"‚úÖ Found {len(df)} places\n")
    else:
        print("‚ùå No results found\n")

if all_results:
    combined_df = pd.concat(all_results, ignore_index=True)
    print(f"\n‚úÖ Total collected: {len(combined_df)} places")
    display(combined_df.head(10))
    
    # Optionally upload to BigQuery
    upload_choice = input("\nUpload to BigQuery? (yes/no): ").strip().lower()
    if upload_choice == 'yes':
        upload_to_bigquery(combined_df)
else:
    print("No data collected")

## üì• Step 7: Download Results as CSV (Optional)

In [None]:
# Download the results as CSV
from google.colab import files

if 'df' in locals() and df is not None:
    filename = "map_location_results.csv"
    df.to_csv(filename, index=False)
    print(f"‚úÖ CSV file created: {filename}")
    
    # Download the file
    files.download(filename)
    print("üì• File downloaded!")
else:
    print("‚ö†Ô∏è No data available to download")

## üîç Step 8: Query BigQuery Table

In [None]:
# Query the BigQuery table to see what's stored
client = get_bigquery_client()

if client and check_table_exists():
    query = f"""
    SELECT 
        search_query,
        COUNT(*) as place_count
    FROM `{PROJECT_ID}.{DATASET_ID}.{TABLE_ID}`
    GROUP BY search_query
    ORDER BY place_count DESC
    """
    
    print("üìä Querying BigQuery table...\n")
    
    try:
        result_df = client.query(query).to_dataframe()
        print(f"‚úÖ Query successful! Found {len(result_df)} unique searches\n")
        display(result_df)
        
        print(f"\nüìà Total places in table: {result_df['place_count'].sum():,}")
    except Exception as e:
        print(f"‚ùå Query failed: {e}")
else:
    print("‚ö†Ô∏è Table does not exist yet. Upload data first.")

## üîç Step 9: View Cache Status

In [None]:
# View cached queries
print(f"üì¶ Cache Status:")
print(f"Total cached queries: {len(API_CACHE)}")

if API_CACHE:
    print("\nCached queries:")
    for query in API_CACHE.keys():
        print(f"  - {query}")
else:
    print("Cache is empty")

## üßπ Step 10: Clear Cache (Optional)

In [None]:
# Clear the API cache
API_CACHE.clear()
print("‚úÖ Cache cleared!")

---

## üìö Additional Information

### How It Works:

#### **First Run (Table Creation):**
1. Run Step 6 to collect data
2. Run Step 6 Option 3 to upload - **Table will be CREATED**
3. Schema is auto-detected from your data
4. Table: `shopper-reviews-477306.place_data.Map_location`

#### **Subsequent Runs (Append Data):**
1. Collect more data with new searches
2. Upload again - **Data will be APPENDED**
3. No duplicates are removed (manual deduplication needed if required)

### API Information:
- **API Provider**: RapidAPI - Google Search Master Mega
- **Endpoint**: `/maps`
- **Rate Limits**: Check your RapidAPI subscription

### BigQuery Table Schema (Auto-detected):
Common fields include:
- `title` - Place name
- `address` - Full address
- `rating` - Average rating
- `reviews` - Number of reviews
- `openingHours` - Combined opening hours as JSON (e.g., {"Monday": "9 AM-5 PM", "Tuesday": "9 AM-5 PM"})
- `search_query` - Original search term (added by script)
- And many more fields from the API response

**Automatic Data Processing:**
1. **Opening Hours Combination**: All `openingHours.Monday`, `openingHours.Tuesday`, etc. columns are automatically combined into a single `openingHours` column as a JSON string with clean formatting
2. **Unicode Character Cleaning**: Special characters (\u202f, \u2013, etc.) are replaced with standard spaces and hyphens
3. **Column Name Sanitization**: Special characters (dots, spaces, etc.) are replaced with underscores
4. **Duplicate Prevention**: Before uploading, checks existing `place_id` values in BigQuery and skips duplicates (only uploads new places)
5. This ensures clean, organized, and unique data in BigQuery

### Tips:
1. Use specific search queries for better results
2. The cache prevents duplicate API calls for the same query
3. Check table status with Step 5 before uploading
4. Query your data with Step 8 to see what's stored
5. Save intermediate results to CSV as backup
6. **No need to worry about duplicates!** The system automatically checks for existing `place_id` values and only uploads new places

### Troubleshooting:
- **API errors**: Check your RapidAPI key and subscription status
- **BigQuery errors**: Verify credentials and project permissions
- **Empty results**: Try different search terms
- **Schema errors**: On first upload, ensure your data is clean

---

**Created for Google Colab** | Last updated: 2025-11-05