# Immobiliare Connector Test

This notebook demonstrates the usage of the immobiliare connector for scraping real estate data.

In [1]:
import os
import sys
from pathlib import Path
import pandas as pd
import matplotlib.pyplot as plt

# Add project root to Python path
project_root = Path.cwd().parent.parent
if str(project_root) not in sys.path:
    sys.path.insert(0, str(project_root))

# Import the connector
from sources.connectors.immobiliare import ImmobiliareScraper

## Basic Usage

Let's start with a basic example of scraping a single page of real estate listings.

In [2]:
# Create a scraper instance for a single page
scraper = ImmobiliareScraper(
    url="https://www.immobiliare.it/vendita-case/milano/?criterio=rilevanza",
    get_data_of_following_pages=False
)

# Display basic information
print(f"Scraped {len(scraper.real_estates)} properties")
print(f"\nFirst property details:")
if scraper.real_estates:
    first_property = scraper.real_estates[0]
    print(f"ID: {first_property.id}")
    print(f"Type: {first_property.type}")
    print(f"Price: {first_property.formatted_price}")
    print(f"Location: {first_property.city}, {first_property.province}")

Storage directory: C:\Users\gabri\workspace\aida_projects\quant-estate\data\immobiliare_data_2025-05-31
Initialized storage at: C:\Users\gabri\workspace\aida_projects\quant-estate\data\immobiliare_data_2025-05-31
Initialized JSON file: C:\Users\gabri\workspace\aida_projects\quant-estate\data\immobiliare_data_2025-05-31\immobiliare.json
Initialized CSV file: C:\Users\gabri\workspace\aida_projects\quant-estate\data\immobiliare_data_2025-05-31\immobiliare.csv

Processing page 1
Requesting URL: https://www.immobiliare.it/vendita-case/milano/?criterio=rilevanza
Appended 25 records to JSON file: C:\Users\gabri\workspace\aida_projects\quant-estate\data\immobiliare_data_2025-05-31\immobiliare.json
Appended 25 records to CSV file: C:\Users\gabri\workspace\aida_projects\quant-estate\data\immobiliare_data_2025-05-31\immobiliare.csv
Scraped 25 properties

First property details:
ID: 120855202
Type: Appartamento
Price: € 750.000
Location: Milano, Milano


## Data Analysis

Let's analyze the scraped data using pandas.

In [4]:
# Convert to DataFrame and display basic statistics
df = scraper.data_frame

# Display basic statistics
print("Basic Statistics:")
print(f"Total properties: {len(df)}")
print(f"\nProperty types:")
print(df['type'].value_counts())

# Price analysis
print(f"\nPrice statistics (€):")
print(df['price'].describe())

# Location analysis
print(f"\nProperties by city:")
print(df['city'].value_counts())

Basic Statistics:
Total properties: 25

Property types:
type
Appartamento    20
Progetto         5
Name: count, dtype: int64

Price statistics (€):
count    2.500000e+01
mean     4.398000e+05
std      2.293851e+05
min      2.280000e+05
25%      2.850000e+05
50%      3.495000e+05
75%      5.100000e+05
max      1.130000e+06
Name: price, dtype: float64

Properties by city:
city
Milano    25
Name: count, dtype: int64


## Multi-page Scraping

Let's test scraping multiple pages of results.

In [5]:
# Create a scraper instance for multiple pages
multi_page_scraper = ImmobiliareScraper(
    url="https://www.immobiliare.it/vendita-case/milano/?criterio=rilevanza",
    get_data_of_following_pages=True,
    max_pages=3
)

# Display results
print(f"Scraped {len(multi_page_scraper.real_estates)} properties from multiple pages")

Storage directory: C:\Users\gabri\workspace\aida_projects\quant-estate\data\immobiliare_data_2025-05-31
Initialized storage at: C:\Users\gabri\workspace\aida_projects\quant-estate\data\immobiliare_data_2025-05-31

Processing page 1
Requesting URL: https://www.immobiliare.it/vendita-case/milano/?criterio=rilevanza
Appended 25 records to JSON file: C:\Users\gabri\workspace\aida_projects\quant-estate\data\immobiliare_data_2025-05-31\immobiliare.json
Appended 25 records to CSV file: C:\Users\gabri\workspace\aida_projects\quant-estate\data\immobiliare_data_2025-05-31\immobiliare.csv

Processing page 2
Requesting URL: https://www.immobiliare.it/vendita-case/milano/?criterio=rilevanza&pag=2
Appended 25 records to JSON file: C:\Users\gabri\workspace\aida_projects\quant-estate\data\immobiliare_data_2025-05-31\immobiliare.json
Appended 25 records to CSV file: C:\Users\gabri\workspace\aida_projects\quant-estate\data\immobiliare_data_2025-05-31\immobiliare.csv

Processing page 3
Requesting URL: ht

## Error Handling

Let's test the error handling capabilities of the connector.

In [7]:
from sources.connectors.immobiliare import InvalidURLError

try:
    # Try with an invalid URL
    invalid_scraper = ImmobiliareScraper(
        url="https://invalid-url.com",
        get_data_of_following_pages=False
    )
except InvalidURLError as e:
    print(f"Caught expected error: {e}")
except Exception as e:
    print(f"Unexpected error: {e}")

Storage directory: C:\Users\gabri\workspace\aida_projects\quant-estate\data\immobiliare_data_2025-05-30
Initialized storage at: C:\Users\gabri\workspace\aida_projects\quant-estate\data\immobiliare_data_2025-05-30
Getting real estate data of https://invalid-url.com
Caught expected error: Given URL must include 'https://www.immobiliare.it'


## Data Export/Import Test

Let's test saving and loading data in different formats.

In [8]:
from sources.connectors.immobiliare import DataStorage

# Save data
scraper.save_data_json("test_export.json")
scraper.save_data_csv("test_export.csv")

# Load data
storage = DataStorage()
loaded_json_data = storage.load_json("test_export.json")
loaded_csv_data = storage.load_csv("test_export.csv")

print(f"Loaded {len(loaded_json_data)} properties from JSON")
print(f"Loaded {len(loaded_csv_data)} properties from CSV")

Saved 25 records to JSON file: C:\Users\gabri\workspace\aida_projects\quant-estate\data\immobiliare_data_2025-05-30\test_export.json
Saved 25 records to CSV file: C:\Users\gabri\workspace\aida_projects\quant-estate\data\immobiliare_data_2025-05-30\test_export.csv
Storage directory: C:\Users\gabri\workspace\aida_projects\quant-estate\data\immobiliare_data_2025-05-30
Initialized storage at: C:\Users\gabri\workspace\aida_projects\quant-estate\data\immobiliare_data_2025-05-30


KeyError: 'realEstate'