# Geocoding Real Estate Data using LocationIQ API

This notebook demonstrates how to enrich real estate data by adding latitude and longitude coordinates to property listings. We'll use the LocationIQ Geocoding API to convert textual addresses into geographical coordinates. The process involves loading data from an Excel file, making API calls for each location, and saving the updated data back to an Excel file.

## 1. Setting Up the Environment and Loading Data

First, we import the necessary libraries. If you don't have them installed, you can install them using `pip`:
`pip install pandas requests openpyxl`

Then, we load the real estate data from an Excel file and prepare the DataFrame by adding new columns for latitude and longitude. We also configure the LocationIQ API key and base URL. **Remember to handle your API key securely; for production, consider using environment variables.**

In [None]:
import pandas as pd
import requests
import time

# Load Excel file (make sure 'Delhi_RealEstate_Data.xlsx' is in the same directory)
file_path = "Delhi_RealEstate_Data.xlsx"
df = pd.read_excel(file_path)

# Add placeholders for latitude and longitude columns
df["latitude"] = None
df["longitude"] = None

# LocationIQ API configuration
# WARNING: For production use, consider loading API_KEY from environment variables for security.
API_KEY = "pk.8a47fb22376c64cbbf3605de8848dd5b" 
BASE_URL = "https://us1.locationiq.com/v1/search"

## 2. Geocoding Function Definition

This section defines two helper functions:
- `format_location`: Adds contextual information (like city and country) to the raw location string to improve geocoding accuracy.
- `geocode_address`: Handles the API request to LocationIQ. It includes retry logic for transient network issues and robust error handling to prevent crashes.

In [None]:
def format_location(location):
    # Add city and country context to improve geocoding results
    return f"{location}, Gurugram, India"

def geocode_address(address, api_key, max_retries=3):
    formatted_address = format_location(address)
    print(f"\nAttempting to geocode: {formatted_address}")
    
    for attempt in range(max_retries):
        try:
            params = {
                "key": api_key,
                "q": formatted_address,
                "format": "json",
                "limit": 1
            }
            headers = {"accept": "application/json"}
            
            response = requests.get(BASE_URL, params=params, headers=headers, timeout=10)
            response.raise_for_status() # Raise HTTPError for bad responses (4xx or 5xx)
            
            data = response.json()
            if data and len(data) > 0:
                print(f"Successfully geocoded: {address}")
                print(f"Found coordinates: {data[0]['lat']}, {data[0]['lon']}")
                return float(data[0]['lat']), float(data[0]['lon'])
            else:
                print(f"No results found for: {address}")
                return None, None
                
        except requests.exceptions.RequestException as e:
            if attempt == max_retries - 1:
                print(f"Failed to geocode location: {address} after {max_retries} attempts")
                print(f"Error: {str(e)}")
                return None, None
            print(f"Attempt {attempt + 1} failed for {address}, retrying...")
            time.sleep(2) # Wait before retrying

## 3. Executing the Geocoding Process

This loop iterates through each row of the DataFrame. It checks if coordinates are already present; if not, it calls the `geocode_address` function. Progress is printed, and a temporary backup file is saved periodically to prevent data loss during long geocoding runs. A small delay (`time.sleep(1)`) is added to respect API rate limits.

In [None]:
total_rows = len(df)
print(f"Starting geocoding process for {total_rows} locations...")

for idx, row in df.iterrows():
    # Only geocode if coordinates are not already present
    if pd.isna(row["latitude"]) or pd.isna(row["longitude"]):
        lat, lng = geocode_address(row["location"], API_KEY)
        df.at[idx, "latitude"] = lat
        df.at[idx, "longitude"] = lng
        
        progress = (idx + 1) / total_rows * 100
        print(f"Progress: {progress:.1f}% ({idx + 1}/{total_rows})")
        
        # Save a backup file every 10 rows
        if (idx + 1) % 10 == 0:
            df.to_excel("Real_Estate_Data_Geocoded_TEMP.xlsx", index=False)
            print("Saved backup file")
        
        time.sleep(1) # Pause to respect API rate limits

## 4. Geocoding Summary and Final Save

After the loop completes, a summary of the geocoding process is printed, showing how many locations were successfully geocoded versus how many failed. Finally, the entire DataFrame with the newly added coordinates is saved to a new Excel file.

In [None]:
successful_geocodes = df["latitude"].notna().sum()
print(f"\nGeocoding Summary:")
print(f"Total locations: {total_rows}")
print(f"Successfully geocoded: {successful_geocodes}")
print(f"Failed to geocode: {total_rows - successful_geocodes}")

# Final save of the complete geocoded data
df.to_excel("Real_Estate_Data_Geocoded.xlsx", index=False)
print("Geocoding complete. File saved as Real_Estate_Data_Geocoded.xlsx")