# üó∫Ô∏è Map Location Data Collector for Google Colab

This notebook collects location data from RapidAPI and uploads to BigQuery.

## üìã Setup Instructions:

1. **Set up Colab Secrets** (üîë icon in left sidebar):
   - `RAPIDAPI_KEY` - Your RapidAPI key
   - `BIGQUERY_KEY_JSON` - Your BigQuery service account JSON (as a string)

2. **Run cells in order**

---

## üì¶ Step 1: Install Required Libraries

In [None]:
!pip install -q pandas-gbq google-auth google-cloud-bigquery db-dtypes

## üìö Step 2: Import Libraries

In [None]:
import requests
import pandas as pd
import json
from typing import Optional, Dict, Any, List
from google.colab import userdata
from google.oauth2 import service_account
from google.cloud import bigquery
from IPython.display import display
import warnings
warnings.filterwarnings('ignore')

print("‚úÖ All libraries imported successfully!")

## ‚öôÔ∏è Step 3: Configuration

In [None]:
# Configuration - Update these if needed
PROJECT_ID = 'shopper-reviews-477306'
DATASET_ID = 'place_data'
TABLE_ID = 'Map_location'

# Colab Secret names
RAPIDAPI_KEY_SECRET = 'RAPIDAPI_KEY'
BIGQUERY_KEY_SECRET = 'BIGQUERY_KEY_JSON'

# In-memory cache for API calls
API_CACHE: Dict[str, Any] = {}

print("‚úÖ Configuration loaded!")
print(f"   Project: {PROJECT_ID}")
print(f"   Dataset: {DATASET_ID}")
print(f"   Table: {TABLE_ID}")

## üîç Step 4: Define Data Collection Functions

In [None]:
def search_by_place_name(place_name: str) -> Optional[Dict[str, Any]]:
    """
    Fetches data for a single place from RapidAPI.
    Uses Colab Secrets for API key.
    """
    # Check cache first
    if place_name in API_CACHE:
        print(f"üì¶ Loading '{place_name}' from cache")
        return API_CACHE[place_name]

    print(f"üîç Searching for '{place_name}'...")

    # Get API key from Colab Secrets
    try:
        API_KEY = userdata.get(RAPIDAPI_KEY_SECRET)
    except Exception as e:
        print(f"‚ùå Error: Could not get '{RAPIDAPI_KEY_SECRET}' from Colab Secrets")
        print(f"   Make sure to add it in the Secrets panel (üîë icon)")
        return None

    API_HOST = "google-search-master-mega.p.rapidapi.com"
    url = f"https://{API_HOST}/maps"
    
    querystring = {"q": place_name, "hl": "en", "page": "1"}
    headers = {
        "x-rapidapi-key": API_KEY,
        "x-rapidapi-host": API_HOST
    }

    try:
        response = requests.get(url, headers=headers, params=querystring, timeout=10)

        if response.status_code == 200:
            data = response.json()
            API_CACHE[place_name] = data  # Cache the result
            print(f"‚úÖ Found data for '{place_name}'")
            return data
        else:
            print(f"‚ùå API Error: Status code {response.status_code}")
            print(f"   Response: {response.text[:200]}")
            return None

    except requests.exceptions.RequestException as e:
        print(f"‚ùå Request error: {e}")
        return None


def collect_places_for_query(query: str) -> Optional[pd.DataFrame]:
    """
    Collects and normalizes place data for a single query.
    """
    results_data = search_by_place_name(query)

    if results_data and 'places' in results_data and results_data['places']:
        try:
            df = pd.json_normalize(results_data['places'])
            df['search_query'] = query
            print(f"‚úÖ Collected {len(df)} places for '{query}'")
            return df
        except Exception as e:
            print(f"‚ùå Error processing data: {e}")
            return None
    else:
        print(f"‚ö†Ô∏è No places found for '{query}'")
        return None


def run_data_collection_loop() -> Optional[pd.DataFrame]:
    """
    Interactive loop to collect user queries.
    Type 'exit' to stop collecting.
    """
    all_dataframes_list: List[pd.DataFrame] = []

    print("\n" + "="*60)
    print("üó∫Ô∏è  PLACE SEARCHER")
    print("="*60)
    print("Type place names to search. Type 'exit' when done.\n")

    while True:
        try:
            query = input("\nüîç Enter place name: ").strip()
        except KeyboardInterrupt:
            print("\n‚èπÔ∏è Stopping data collection...")
            break

        if query.lower() == 'exit':
            print("‚èπÔ∏è Exiting data collection...")
            break

        if query:
            df = collect_places_for_query(query)
            if df is not None:
                all_dataframes_list.append(df)
                print(f"üìä Total queries collected: {len(all_dataframes_list)}")
        else:
            print("‚ö†Ô∏è Please enter a place name")

    if not all_dataframes_list:
        print("\n‚ö†Ô∏è No data was collected")
        return None

    return pd.concat(all_dataframes_list, ignore_index=True)


print("‚úÖ Data collection functions defined!")

## üìä Step 5: COLLECT DATA (Interactive)

Run this cell and enter place names one by one. Type `exit` when done.

In [None]:
# Collect data interactively
collected_data_df = run_data_collection_loop()

# Display results
if collected_data_df is not None and not collected_data_df.empty:
    print("\n" + "="*60)
    print("‚úÖ DATA COLLECTION COMPLETE")
    print("="*60)
    print(f"üìä Total places collected: {len(collected_data_df)}")
    print(f"üìã Total columns: {len(collected_data_df.columns)}")
    print("\nüîç Preview (first 5 rows):")
    display(collected_data_df.head())
    print("\n‚úÖ Data is ready for upload!")
else:
    print("\n‚ùå No data to process")

## üîÑ Step 5b: ALTERNATIVE - Batch Collection (Optional)

Use this instead of Step 5 if you want to process multiple places at once without interactive input.

In [None]:
# OPTIONAL: Batch mode - Define your places here
places_to_search = [
    "Pizza New York",
    "Sushi Tokyo",
    "Coffee Shop Paris",
    # Add more places here...
]

# Collect data
all_dataframes_list = []
for place in places_to_search:
    df = collect_places_for_query(place)
    if df is not None:
        all_dataframes_list.append(df)

if all_dataframes_list:
    collected_data_df = pd.concat(all_dataframes_list, ignore_index=True)
    print("\n‚úÖ Batch collection complete!")
    print(f"üìä Total places collected: {len(collected_data_df)}")
    display(collected_data_df.head())
else:
    print("‚ùå No data collected")
    collected_data_df = None

## üîß Step 6: Define BigQuery Upload Functions

In [None]:
def get_bigquery_client() -> Optional[bigquery.Client]:
    """
    Creates BigQuery client using Colab Secrets.
    """
    try:
        # Get credentials JSON from Colab Secrets
        credentials_json = userdata.get(BIGQUERY_KEY_SECRET)
        credentials_dict = json.loads(credentials_json)
        
        # Create credentials object
        credentials = service_account.Credentials.from_service_account_info(
            credentials_dict,
            scopes=["https://www.googleapis.com/auth/cloud-platform"],
        )
        
        # Create BigQuery client
        client = bigquery.Client(credentials=credentials, project=PROJECT_ID)
        print(f"‚úÖ Connected to BigQuery project: {PROJECT_ID}")
        return client
        
    except Exception as e:
        print(f"‚ùå Error creating BigQuery client: {e}")
        print(f"   Make sure '{BIGQUERY_KEY_SECRET}' is set in Colab Secrets")
        return None


def upload_to_bigquery(df: pd.DataFrame, table_id: str = None) -> bool:
    """
    Uploads DataFrame to BigQuery.
    """
    if df is None or df.empty:
        print("‚ö†Ô∏è Cannot upload empty DataFrame")
        return False

    client = get_bigquery_client()
    if not client:
        return False

    table_id = table_id or f"{PROJECT_ID}.{DATASET_ID}.{TABLE_ID}"

    job_config = bigquery.LoadJobConfig(
        write_disposition="WRITE_APPEND",  # Append to existing table
        autodetect=True,  # Auto-detect schema
    )

    try:
        print(f"\n‚è≥ Uploading {len(df)} rows to BigQuery...")
        print(f"   Table: {table_id}")
        
        job = client.load_table_from_dataframe(df, table_id, job_config=job_config)
        job.result()  # Wait for upload to complete

        print(f"\n‚úÖ Successfully uploaded {len(df)} rows!")
        print(f"   View data at: https://console.cloud.google.com/bigquery?project={PROJECT_ID}")
        return True
        
    except Exception as e:
        print(f"‚ùå Upload error: {e}")
        return False


print("‚úÖ BigQuery functions defined!")

## ‚òÅÔ∏è Step 7: Upload to BigQuery

Upload the collected data to BigQuery.

In [None]:
# Upload to BigQuery
if collected_data_df is not None and not collected_data_df.empty:
    print("="*60)
    print("‚òÅÔ∏è  UPLOADING TO BIGQUERY")
    print("="*60)
    
    success = upload_to_bigquery(collected_data_df)
    
    if success:
        print("\nüéâ Upload complete! Your data is now in BigQuery.")
    else:
        print("\n‚ö†Ô∏è Upload failed. Check the error messages above.")
else:
    print("‚ùå No data to upload. Run Step 5 first to collect data.")

## üíæ Step 8 (Optional): Save to CSV

Download the data as a CSV file instead of or in addition to uploading to BigQuery.

In [None]:
# Save to CSV file
if collected_data_df is not None and not collected_data_df.empty:
    filename = "map_locations_data.csv"
    collected_data_df.to_csv(filename, index=False)
    print(f"‚úÖ Data saved to: {filename}")
    print(f"üìä Rows: {len(collected_data_df)} | Columns: {len(collected_data_df.columns)}")
    
    # Download the file
    from google.colab import files
    files.download(filename)
    print(f"‚¨áÔ∏è Download started for {filename}")
else:
    print("‚ùå No data to save")

## üìä Step 9 (Optional): View Data Summary

Get a detailed summary of the collected data.

In [None]:
# Data summary
if collected_data_df is not None and not collected_data_df.empty:
    print("="*60)
    print("üìä DATA SUMMARY")
    print("="*60)
    
    print(f"\nüìà Shape: {collected_data_df.shape[0]} rows √ó {collected_data_df.shape[1]} columns")
    
    print("\nüìã Column names:")
    for i, col in enumerate(collected_data_df.columns, 1):
        print(f"   {i}. {col}")
    
    print("\nüî¢ Data types:")
    print(collected_data_df.dtypes)
    
    print("\nüìä Full data preview:")
    display(collected_data_df)
else:
    print("‚ùå No data available")

---

## üéâ Done!

### What you did:
1. ‚úÖ Collected location data from RapidAPI
2. ‚úÖ Processed and normalized the data
3. ‚úÖ Uploaded to BigQuery (or saved as CSV)

### Next steps:
- Run Step 5 again to collect more data
- Query your data in BigQuery
- Analyze your location data!

---