# Philippine Rental Property Scraper - Lamudi.com.ph

This notebook scrapes rental property listings from Lamudi Philippines for areas within commuting distance of Ortigas Center, Mandaluyong.

## Target Criteria:
- **Location**: Mandaluyong, Pasig, San Juan, Quezon City, Makati, Taguig
- **Property Types**: Condos and Apartments
- **Budget**: Under ‚Ç±20,000/month
- **Bedrooms**: Studio and 1BR (for solo living)
- **Furnishing**: Furnished units

## Workflow:
1. **Phase 1**: Scrape property listing links from search pages
2. **Phase 2**: Extract detailed information from individual property pages
3. **Phase 3**: Clean and process the data
4. **Phase 4**: Export to CSV for analysis

## Setup: Import Required Libraries

In [None]:
import numpy as np
import pandas as pd
import time
import re
import random
import concurrent.futures
from datetime import datetime

# Selenium imports
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.chrome.service import Service
from selenium.webdriver.chrome.options import Options
from webdriver_manager.chrome import ChromeDriverManager
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC

# BeautifulSoup for HTML parsing
from bs4 import BeautifulSoup

print("‚úÖ All libraries imported successfully!")

## Configuration: Define Search Parameters

In [None]:
# ========== USER CONFIGURATION ==========

# Target cities (within 1 hour commute to Ortigas Center)
CITIES = [
    'mandaluyong',    # 0-5 min from Ortigas
    'pasig',          # 0-10 min from Ortigas
    'san-juan',       # 5-15 min from Ortigas
    'quezon-city',    # 10-30 min from Ortigas
    'makati',         # 20-35 min from Ortigas
    'taguig',         # 20-40 min from Ortigas
]

# Property types
PROPERTY_TYPES = [
    'condo',
    'apartment',
]

# Bedroom options (for solo living)
BEDROOMS = ['studio', '1-bedroom']

# Budget constraint
MAX_PRICE = 20000  # PHP per month

# Region (Metro Manila)
REGION = 'metro-manila'

# Output path for CSV files
OUTPUT_PATH = 'C:/Users/anhpd/OneDrive/Desktop/projects/phillipine-rental-price/data'

# Scraping settings
MAX_WORKERS = 3  # Reduced from 5 to be more conservative
HEADLESS = True  # Set to False to see browser during scraping
PAGE_LOAD_WAIT = 4  # Seconds to wait for page to load

print(f"‚úÖ Configuration loaded!")
print(f"   üìç Cities: {', '.join(CITIES)}")
print(f"   üè¢ Property Types: {', '.join(PROPERTY_TYPES)}")
print(f"   üí∞ Max Budget: ‚Ç±{MAX_PRICE:,}/month")
print(f"   üõèÔ∏è  Bedrooms: {', '.join(BEDROOMS)}")
print(f"   üë∑ Workers: {MAX_WORKERS}")
print(f"   üìÅ Output: {OUTPUT_PATH}")

## Test Cell: Inspect Lamudi Page Structure

**Run this first** to identify the correct CSS selectors for Lamudi's current HTML structure.

In [None]:
def inspect_lamudi_structure(city='pasig', prop_type='condo'):
    """
    Opens a Lamudi search page and inspects its HTML structure.
    This helps us identify the correct CSS selectors.
    """
    url = f'https://www.lamudi.com.ph/rent/{REGION}/{city}/{prop_type}/'
    print(f"üîç Inspecting: {url}")
    
    driver = None
    try:
        # Setup Chrome options
        chrome_options = Options()
        if HEADLESS:
            chrome_options.add_argument('--headless')
        chrome_options.add_argument('--no-sandbox')
        chrome_options.add_argument('--disable-dev-shm-usage')
        chrome_options.add_argument('--disable-blink-features=AutomationControlled')
        chrome_options.add_experimental_option("excludeSwitches", ["enable-automation"])
        chrome_options.add_experimental_option('useAutomationExtension', False)
        
        driver = webdriver.Chrome(service=Service(ChromeDriverManager().install()), options=chrome_options)
        driver.get(url)
        time.sleep(PAGE_LOAD_WAIT)
        
        soup = BeautifulSoup(driver.page_source, 'html.parser')
        
        print("\nüìÑ Page Title:", driver.title)
        print("\nüîç Looking for common patterns...\n")
        
        # Try to find listing containers
        possible_containers = [
            soup.find_all('div', class_=re.compile(r'listing|property|card|item', re.I)),
            soup.find_all('article'),
            soup.find_all('a', href=re.compile(r'/.*/'))
        ]
        
        print("üì¶ Found potential listing containers:")
        for i, containers in enumerate(possible_containers):
            print(f"   Pattern {i+1}: {len(containers)} elements")
        
        # Look for links that might be property links
        all_links = soup.find_all('a', href=True)
        property_links = [link for link in all_links if '/rent/' in link.get('href', '')]
        
        print(f"\nüîó Found {len(property_links)} links containing '/rent/'")
        
        if property_links:
            print("\nüìã Sample property links:")
            for link in property_links[:3]:
                print(f"   - {link.get('href')}")
                print(f"     Classes: {link.get('class')}")
                print(f"     Data attributes: {[k for k in link.attrs.keys() if k.startswith('data-')]}")
        
        # Look for pagination
        pagination = soup.find_all(['nav', 'div'], class_=re.compile(r'paginat', re.I))
        print(f"\nüìÑ Pagination elements found: {len(pagination)}")
        
        # Save sample HTML for manual inspection
        with open(f'{OUTPUT_PATH}/sample_page.html', 'w', encoding='utf-8') as f:
            f.write(soup.prettify())
        print(f"\nüíæ Full HTML saved to: {OUTPUT_PATH}/sample_page.html")
        
        return soup
        
    except Exception as e:
        print(f"‚ùå Error: {e}")
        return None
    finally:
        if driver:
            driver.quit()

# Run the inspection
print("üöÄ Starting inspection...\n")
soup = inspect_lamudi_structure()
print("\n‚úÖ Inspection complete! Check the output above to identify CSS selectors.")

## Helper Functions: Page Range Detection and Link Extraction

In [None]:
def get_page_range(city, prop_type):
    """
    Detects the number of pages available for a given city/property type combination.
    Returns (min_page, max_page)
    """
    url = f'https://www.lamudi.com.ph/rent/{REGION}/{city}/{prop_type}/'
    print(f"üîç Detecting page range for: {city}/{prop_type}")
    
    driver = None
    try:
        chrome_options = Options()
        if HEADLESS:
            chrome_options.add_argument('--headless')
        chrome_options.add_argument('--no-sandbox')
        chrome_options.add_argument('--disable-dev-shm-usage')
        chrome_options.add_argument('--disable-blink-features=AutomationControlled')
        chrome_options.add_experimental_option("excludeSwitches", ["enable-automation"])
        
        driver = webdriver.Chrome(service=Service(ChromeDriverManager().install()), options=chrome_options)
        driver.get(url)
        time.sleep(PAGE_LOAD_WAIT)
        
        soup = BeautifulSoup(driver.page_source, 'html.parser')
        
        # TODO: Update these selectors based on inspection results
        # Look for pagination elements
        pagination = soup.find('nav', class_=re.compile(r'paginat', re.I))
        
        if not pagination:
            print("   ‚ö†Ô∏è  No pagination found, assuming single page")
            return (1, 1)
        
        # Extract page numbers from pagination
        page_links = pagination.find_all('a', href=True)
        page_numbers = []
        
        for link in page_links:
            # Try to extract page number from href (e.g., /rent/metro-manila/pasig/condo/?page=2)
            match = re.search(r'page=(\d+)', link.get('href', ''))
            if match:
                page_numbers.append(int(match.group(1)))
        
        if page_numbers:
            min_page = 1  # Always start from page 1
            max_page = max(page_numbers)
            print(f"   ‚úÖ Pages: 1 to {max_page}")
            return (min_page, max_page)
        else:
            print("   ‚ö†Ô∏è  Could not parse page numbers, assuming single page")
            return (1, 1)
            
    except Exception as e:
        print(f"   ‚ùå Error: {e}")
        return (1, 1)
    finally:
        if driver:
            driver.quit()


def extract_links_from_soup(soup, city, prop_type):
    """
    Extracts property links and IDs from a BeautifulSoup object.
    Returns a DataFrame with columns: property_id, url, city, property_type
    """
    # TODO: Update these selectors based on inspection results
    # Look for property listing links
    
    # Try multiple patterns to find property links
    property_links = []
    
    # Pattern 1: Links with specific data attributes (adjust based on inspection)
    links_with_data = soup.find_all('a', attrs={'data-property-id': True})
    property_links.extend(links_with_data)
    
    # Pattern 2: Links in listing containers (adjust class name based on inspection)
    listings = soup.find_all(['div', 'article'], class_=re.compile(r'listing|property|card', re.I))
    for listing in listings:
        link = listing.find('a', href=True)
        if link and '/rent/' in link.get('href', ''):
            property_links.append(link)
    
    # Pattern 3: All links containing '/rent/' in href
    if not property_links:
        all_links = soup.find_all('a', href=re.compile(r'/rent/.*'))
        property_links.extend(all_links)
    
    # Extract data
    data = []
    seen_urls = set()
    
    for link in property_links:
        href = link.get('href', '')
        
        # Skip if not a full property URL
        if not href or href in seen_urls:
            continue
        
        # Make absolute URL if needed
        if href.startswith('/'):
            href = f'https://www.lamudi.com.ph{href}'
        
        # Extract property ID from URL or data attribute
        prop_id = link.get('data-property-id') or link.get('id') or re.search(r'/(\d+)/?$', href)
        if prop_id and hasattr(prop_id, 'group'):
            prop_id = prop_id.group(1)
        
        data.append({
            'property_id': str(prop_id) if prop_id else None,
            'url': href,
            'city': city,
            'property_type': prop_type
        })
        seen_urls.add(href)
    
    return pd.DataFrame(data)


print("‚úÖ Helper functions defined!")

## Phase 1: Scrape Property Links

In [None]:
def scrape_links_from_page(city, prop_type, page_number):
    """
    Scrapes property links from a single search results page.
    """
    # Build URL with page number
    if page_number == 1:
        url = f'https://www.lamudi.com.ph/rent/{REGION}/{city}/{prop_type}/'
    else:
        url = f'https://www.lamudi.com.ph/rent/{REGION}/{city}/{prop_type}/?page={page_number}'
    
    print(f"üöÄ Scraping: {city}/{prop_type} - Page {page_number}")
    
    driver = None
    max_retries = 2
    retry_count = 0
    
    while retry_count <= max_retries:
        try:
            chrome_options = Options()
            if HEADLESS:
                chrome_options.add_argument('--headless')
            chrome_options.add_argument('--no-sandbox')
            chrome_options.add_argument('--disable-dev-shm-usage')
            chrome_options.add_argument('--disable-gpu')
            chrome_options.add_argument('--log-level=3')
            chrome_options.add_argument('--disable-blink-features=AutomationControlled')
            chrome_options.add_experimental_option("excludeSwitches", ["enable-automation"])
            chrome_options.add_experimental_option('useAutomationExtension', False)
            
            driver = webdriver.Chrome(service=Service(ChromeDriverManager().install()), options=chrome_options)
            driver.get(url)
            time.sleep(PAGE_LOAD_WAIT)
            
            soup = BeautifulSoup(driver.page_source, 'html.parser')
            df = extract_links_from_soup(soup, city, prop_type)
            
            if not df.empty:
                print(f"   ‚úÖ Found {len(df)} properties")
                return df
            else:
                print(f"   ‚ö†Ô∏è  No properties found")
                return pd.DataFrame()
                
        except Exception as e:
            retry_count += 1
            print(f"   ‚ùå Error (attempt {retry_count}/{max_retries + 1}): {e}")
            if retry_count <= max_retries:
                print(f"   üîÑ Retrying...")
                time.sleep(3)
            else:
                return pd.DataFrame()
        finally:
            if driver:
                driver.quit()


print("‚úÖ Link scraping function defined!")

## Execute Phase 1: Collect All Property Links

In [None]:
print("=" * 60)
print("PHASE 1: COLLECTING PROPERTY LINKS")
print("=" * 60)

start_time = time.time()

# Step 1: Detect page ranges for all city/property combinations
print("\nüîç Step 1: Detecting page ranges...\n")
task_ranges = {}

for city in CITIES:
    for prop_type in PROPERTY_TYPES:
        min_page, max_page = get_page_range(city, prop_type)
        task_ranges[(city, prop_type)] = (min_page, max_page)
        time.sleep(1)  # Brief pause between page range checks

# Step 2: Create list of all scraping tasks
print("\nüìã Step 2: Building task list...\n")
tasks = []

for (city, prop_type), (min_page, max_page) in task_ranges.items():
    for page_num in range(min_page, max_page + 1):
        tasks.append((city, prop_type, page_num))

# Randomize task order to avoid patterns
random.shuffle(tasks)
print(f"‚úÖ Created {len(tasks)} scraping tasks (randomized)\n")

# Step 3: Execute tasks concurrently
print("\nüèÉ Step 3: Scraping links...\n")

with concurrent.futures.ThreadPoolExecutor(max_workers=MAX_WORKERS) as executor:
    results = executor.map(lambda p: scrape_links_from_page(*p), tasks)
    df_list = [df for df in results if df is not None and not df.empty]

# Step 4: Combine and deduplicate
print("\nüîó Step 4: Combining results...\n")

if df_list:
    links_df = pd.concat(df_list, ignore_index=True)
    rows_before = len(links_df)
    links_df = links_df.drop_duplicates(subset=['url']).reset_index(drop=True)
    rows_after = len(links_df)
    duplicates_removed = rows_before - rows_after
    
    print(f"‚úÖ Combined {len(df_list)} result sets")
    print(f"üßπ Removed {duplicates_removed} duplicates")
    print(f"üìä Total unique properties: {rows_after}")
    
    # Show breakdown by city and property type
    print("\nüìç Breakdown by location and type:")
    for city in CITIES:
        city_total = len(links_df[links_df['city'] == city])
        if city_total > 0:
            print(f"\n   {city.upper()}: {city_total} properties")
            for prop_type in PROPERTY_TYPES:
                count = len(links_df[(links_df['city'] == city) & (links_df['property_type'] == prop_type)])
                if count > 0:
                    print(f"      - {prop_type}: {count}")
    
    # Save links to CSV
    links_file = f"{OUTPUT_PATH}/property_links_{datetime.now().strftime('%Y%m%d_%H%M%S')}.csv"
    links_df.to_csv(links_file, index=False)
    print(f"\nüíæ Links saved to: {links_file}")
    
else:
    print("‚ö†Ô∏è  No links were collected")
    links_df = pd.DataFrame()

elapsed = time.time() - start_time
print(f"\n‚è±Ô∏è  Phase 1 completed in {elapsed:.2f} seconds")
print("=" * 60)

## Phase 2: Scrape Detailed Property Information

In [None]:
def scrape_property_details(url):
    """
    Scrapes detailed information from a single property listing page.
    Returns a single-row DataFrame with all extracted data.
    """
    print(f"üè† Scraping: {url}")
    
    driver = None
    try:
        chrome_options = Options()
        if HEADLESS:
            chrome_options.add_argument('--headless')
        chrome_options.add_argument('--no-sandbox')
        chrome_options.add_argument('--disable-dev-shm-usage')
        chrome_options.add_argument('--disable-blink-features=AutomationControlled')
        chrome_options.add_experimental_option("excludeSwitches", ["enable-automation"])
        
        driver = webdriver.Chrome(service=Service(ChromeDriverManager().install()), options=chrome_options)
        driver.get(url)
        time.sleep(PAGE_LOAD_WAIT)
        
        soup = BeautifulSoup(driver.page_source, 'html.parser')
        
        # Helper function to safely extract text
        def get_text(selector, attrs=None, default=None):
            element = soup.find(selector, attrs=attrs) if attrs else soup.find(selector)
            return element.get_text(strip=True) if element else default
        
        # TODO: Update these selectors based on actual Lamudi structure
        # You'll need to inspect the property detail pages to find the correct selectors
        
        data = {
            'url': url,
            'title': get_text('h1'),  # Adjust selector
            'price': None,  # Will be extracted and parsed below
            'bedrooms': None,
            'bathrooms': None,
            'floor_area_sqm': None,
            'furnishing': None,
            'address': None,
            'location': None,
            'description': None,
            'latitude': None,
            'longitude': None,
            'amenities': None,
            'parking': None,
        }
        
        # Extract price (look for PHP symbol or pattern)
        price_elem = soup.find(text=re.compile(r'‚Ç±|PHP'))
        if price_elem:
            price_match = re.search(r'[‚Ç±PHP\s]*([\d,]+)', str(price_elem))
            if price_match:
                data['price'] = price_match.group(1).replace(',', '')
        
        # Extract specifications (bedrooms, bathrooms, area)
        # Look for common patterns in property specs
        specs = soup.find_all(['div', 'span', 'li'], class_=re.compile(r'spec|feature|detail', re.I))
        for spec in specs:
            text = spec.get_text(strip=True).lower()
            
            if 'bedroom' in text:
                match = re.search(r'(\d+)', text)
                if match:
                    data['bedrooms'] = match.group(1)
            
            if 'bathroom' in text or 'bath' in text:
                match = re.search(r'(\d+)', text)
                if match:
                    data['bathrooms'] = match.group(1)
            
            if 'sqm' in text or 'm¬≤' in text or 'floor area' in text:
                match = re.search(r'([\d,\.]+)', text)
                if match:
                    data['floor_area_sqm'] = match.group(1).replace(',', '')
            
            if 'furnish' in text:
                data['furnishing'] = text
            
            if 'parking' in text:
                data['parking'] = text
        
        # Extract coordinates from script tags
        scripts = soup.find_all('script')
        for script in scripts:
            if script.string and ('latitude' in script.string or 'lat' in script.string):
                lat_match = re.search(r'["\']?lat(?:itude)?["\']?\s*[:=]\s*([\d\.\-]+)', script.string)
                lon_match = re.search(r'["\']?lon(?:g|gitude)?["\']?\s*[:=]\s*([\d\.\-]+)', script.string)
                
                if lat_match and lon_match:
                    data['latitude'] = lat_match.group(1)
                    data['longitude'] = lon_match.group(1)
                    break
        
        # Extract description
        desc_elem = soup.find(['div', 'p'], class_=re.compile(r'description|detail', re.I))
        if desc_elem:
            data['description'] = desc_elem.get_text(strip=True)[:500]  # Limit length
        
        # Extract address/location
        addr_elem = soup.find(['div', 'span', 'p'], class_=re.compile(r'address|location', re.I))
        if addr_elem:
            data['address'] = addr_elem.get_text(strip=True)
        
        print(f"   ‚úÖ Extracted: {data.get('title', 'Unknown')[:50]}...")
        
        # Return as single-row DataFrame
        return pd.DataFrame([data])
        
    except Exception as e:
        print(f"   ‚ùå Error: {e}")
        return pd.DataFrame()
    finally:
        if driver:
            driver.quit()


print("‚úÖ Property details scraping function defined!")

## Execute Phase 2: Collect Property Details

In [None]:
print("=" * 60)
print("PHASE 2: COLLECTING PROPERTY DETAILS")
print("=" * 60)

if links_df.empty:
    print("‚ö†Ô∏è  No links available. Please run Phase 1 first.")
else:
    start_time = time.time()
    
    # Get list of URLs to scrape
    urls_to_scrape = links_df['url'].tolist()
    print(f"üìã Will scrape {len(urls_to_scrape)} property pages\n")
    
    # Execute scraping with thread pool
    with concurrent.futures.ThreadPoolExecutor(max_workers=MAX_WORKERS) as executor:
        results = executor.map(scrape_property_details, urls_to_scrape)
        details_list = [df for df in results if df is not None and not df.empty]
    
    # Combine results
    if details_list:
        details_df = pd.concat(details_list, ignore_index=True)
        print(f"\n‚úÖ Successfully scraped {len(details_df)} properties")
        
        # Merge with links data
        final_df = pd.merge(details_df, links_df, on='url', how='left')
        
        # Save raw details
        raw_file = f"{OUTPUT_PATH}/property_details_raw_{datetime.now().strftime('%Y%m%d_%H%M%S')}.csv"
        final_df.to_csv(raw_file, index=False, encoding='utf-8-sig')
        print(f"üíæ Raw details saved to: {raw_file}")
        
    else:
        print("‚ö†Ô∏è  No property details were collected")
        final_df = pd.DataFrame()
    
    elapsed = time.time() - start_time
    print(f"\n‚è±Ô∏è  Phase 2 completed in {elapsed:.2f} seconds")
    print("=" * 60)

## Phase 3: Data Cleaning and Processing

In [None]:
print("=" * 60)
print("PHASE 3: DATA CLEANING AND FILTERING")
print("=" * 60)

if 'final_df' not in locals() or final_df.empty:
    print("‚ö†Ô∏è  No data available. Please run Phase 2 first.")
else:
    print(f"\nüìä Starting with {len(final_df)} properties\n")
    
    # Create cleaned dataframe
    cleaned_df = final_df.copy()
    
    # 1. Clean and convert price
    print("üí∞ Step 1: Cleaning price data...")
    cleaned_df['price_php'] = pd.to_numeric(
        cleaned_df['price'].astype(str).str.replace(',', '').str.replace('[^0-9]', '', regex=True),
        errors='coerce'
    )
    
    # 2. Filter by budget (under ‚Ç±20,000/month)
    print(f"üîç Step 2: Filtering by budget (under ‚Ç±{MAX_PRICE:,}/month)...")
    before_filter = len(cleaned_df)
    cleaned_df = cleaned_df[
        (cleaned_df['price_php'] > 0) & 
        (cleaned_df['price_php'] <= MAX_PRICE)
    ].copy()
    after_filter = len(cleaned_df)
    print(f"   Removed {before_filter - after_filter} properties outside budget")
    print(f"   Remaining: {after_filter} properties")
    
    # 3. Filter by furnishing (furnished only)
    print("üõãÔ∏è  Step 3: Filtering for furnished properties...")
    before_filter = len(cleaned_df)
    cleaned_df = cleaned_df[
        cleaned_df['furnishing'].astype(str).str.contains('furnished', case=False, na=False)
    ].copy()
    after_filter = len(cleaned_df)
    print(f"   Removed {before_filter - after_filter} unfurnished properties")
    print(f"   Remaining: {after_filter} properties")
    
    # 4. Clean numeric fields
    print("üî¢ Step 4: Converting numeric fields...")
    cleaned_df['bedrooms'] = pd.to_numeric(cleaned_df['bedrooms'], errors='coerce')
    cleaned_df['bathrooms'] = pd.to_numeric(cleaned_df['bathrooms'], errors='coerce')
    cleaned_df['floor_area_sqm'] = pd.to_numeric(
        cleaned_df['floor_area_sqm'].astype(str).str.replace(',', ''),
        errors='coerce'
    )
    cleaned_df['latitude'] = pd.to_numeric(cleaned_df['latitude'], errors='coerce')
    cleaned_df['longitude'] = pd.to_numeric(cleaned_df['longitude'], errors='coerce')
    
    # 5. Calculate price per sqm
    print("üìê Step 5: Calculating price per sqm...")
    cleaned_df['price_per_sqm'] = (
        cleaned_df['price_php'] / cleaned_df['floor_area_sqm']
    ).round(2)
    
    # 6. Add commute time estimates (based on city)
    print("üöá Step 6: Adding commute time estimates...")
    commute_times = {
        'mandaluyong': '0-5 min',
        'pasig': '0-10 min',
        'san-juan': '5-15 min',
        'quezon-city': '10-30 min',
        'makati': '20-35 min',
        'taguig': '20-40 min',
    }
    cleaned_df['commute_estimate'] = cleaned_df['city'].map(commute_times)
    
    # 7. Reorder and select important columns
    print("üìã Step 7: Organizing columns...")
    column_order = [
        'title', 'price_php', 'bedrooms', 'bathrooms', 'floor_area_sqm', 'price_per_sqm',
        'furnishing', 'city', 'property_type', 'commute_estimate',
        'address', 'latitude', 'longitude',
        'parking', 'amenities', 'description', 'url'
    ]
    
    # Only include columns that exist
    column_order = [col for col in column_order if col in cleaned_df.columns]
    cleaned_df = cleaned_df[column_order]
    
    # 8. Sort by price and commute time
    print("üìä Step 8: Sorting results...")
    cleaned_df = cleaned_df.sort_values(['city', 'price_php']).reset_index(drop=True)
    
    # 9. Display summary statistics
    print("\n" + "=" * 60)
    print("SUMMARY STATISTICS")
    print("=" * 60)
    print(f"\nüìä Total properties matching criteria: {len(cleaned_df)}")
    
    if len(cleaned_df) > 0:
        print(f"\nüí∞ Price Range:")
        print(f"   Minimum: ‚Ç±{cleaned_df['price_php'].min():,.0f}/month")
        print(f"   Average: ‚Ç±{cleaned_df['price_php'].mean():,.0f}/month")
        print(f"   Maximum: ‚Ç±{cleaned_df['price_php'].max():,.0f}/month")
        
        print(f"\nüìç Properties by City:")
        for city in CITIES:
            count = len(cleaned_df[cleaned_df['city'] == city])
            if count > 0:
                avg_price = cleaned_df[cleaned_df['city'] == city]['price_php'].mean()
                print(f"   {city.title()}: {count} properties (avg: ‚Ç±{avg_price:,.0f}/month)")
        
        print(f"\nüè¢ Properties by Type:")
        for prop_type in PROPERTY_TYPES:
            count = len(cleaned_df[cleaned_df['property_type'] == prop_type])
            if count > 0:
                print(f"   {prop_type.title()}: {count} properties")
        
        # 10. Save cleaned data
        print("\nüíæ Saving cleaned data...")
        output_file = f"{OUTPUT_PATH}/ortigas_rentals_under_20k_{datetime.now().strftime('%Y%m%d_%H%M%S')}.csv"
        cleaned_df.to_csv(output_file, index=False, encoding='utf-8-sig')
        print(f"   ‚úÖ Saved to: {output_file}")
        
        # Display sample of best deals
        print("\nüåü Top 10 Best Deals (Lowest Price):")
        print(cleaned_df[['title', 'price_php', 'city', 'bedrooms', 'floor_area_sqm', 'commute_estimate']].head(10).to_string(index=False))
    
    print("\n" + "=" * 60)
    print("‚úÖ Phase 3 Complete!")
    print("=" * 60)

## Explore the Data

In [None]:
# Display the cleaned dataframe
if 'cleaned_df' in locals() and not cleaned_df.empty:
    display(cleaned_df.head(20))
else:
    print("No data available. Please run all phases first.")

## Next Steps

After running this notebook:

1. **Review the data**: Check the CSV file for quality and completeness
2. **Adjust CSS selectors**: If data extraction is incomplete, update the selectors based on browser inspection
3. **Refine filters**: Adjust budget, cities, or property types as needed
4. **Schedule regular runs**: Run periodically to track new listings
5. **Data analysis**: Import the CSV into your preferred analysis tool

**Important Notes:**
- Always respect the website's Terms of Service and robots.txt
- Use appropriate rate limiting to avoid overloading servers
- Website structures change - selectors may need periodic updates
- Verify extracted data accuracy before making decisions