# Australian Property Analytics Tool

## Comprehensive Property Market Analysis and Valuation System

This notebook provides a complete property analytics solution for the Australian market, including:

1. **Data Fetching**: Domain.com.au API integration and ABS data downloads
2. **Data Processing**: Cleaning, normalization, and geospatial analysis
3. **KPI Computation**: Market indicators and regional analysis
4. **ML Modeling**: XGBoost/Random Forest valuation models
5. **Monte Carlo Simulation**: Future price forecasting
6. **Visualization**: Interactive charts, maps, and dashboards

---

## 1. Setup and Configuration

Import required libraries and configure the analysis environment.

In [None]:
# Core libraries
import pandas as pd
import numpy as np
import warnings
import os
import sys
from datetime import datetime, timedelta

# Add scripts directory to path
sys.path.append('../scripts')

# Suppress warnings for cleaner output
warnings.filterwarnings('ignore')

# Display settings
pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', 100)

print("✅ Core libraries imported successfully")
print(f"📅 Analysis Date: {datetime.now().strftime('%Y-%m-%d %H:%M:%S')}")

In [None]:
# Import custom modules
try:
    from data_fetcher import DomainAPIClient, ABSDataFetcher, fetch_property_listings, fetch_abs_socioeconomic_data
    from data_processor import PropertyDataProcessor, calculate_property_kpis
    from ml_models import PropertyValuationModel, identify_overvalued_properties
    from monte_carlo import MonteCarloPropertySimulation, run_portfolio_simulation
    from visualization import PropertyVisualizationSuite
    print("✅ Custom modules imported successfully")
except ImportError as e:
    print(f"⚠️ Import warning: {e}")
    print("Some features may not be available without required packages")

In [None]:
# Configuration
CONFIG = {
    'DOMAIN_API_KEY': 'your_domain_api_key_here',  # Replace with actual API key
    'ANALYSIS_SUBURBS': ['Sydney', 'Melbourne', 'Brisbane', 'Perth', 'Adelaide'],
    'PROPERTY_TYPES': ['House', 'Unit', 'Townhouse'],
    'SIMULATION_YEARS': 5,
    'SIMULATION_RUNS': 1000,
    'MODEL_TYPE': 'xgboost',  # or 'random_forest'
    'RANDOM_STATE': 42
}

# Create data directories
os.makedirs('../data/raw', exist_ok=True)
os.makedirs('../data/processed', exist_ok=True)
os.makedirs('../data/outputs', exist_ok=True)

print("✅ Configuration loaded")
print(f"🏙️ Analysis suburbs: {', '.join(CONFIG['ANALYSIS_SUBURBS'])}")
print(f"🏠 Property types: {', '.join(CONFIG['PROPERTY_TYPES'])}")

## 2. Data Fetching

Fetch property listings from Domain.com.au and socio-economic data from ABS.

In [None]:
# Note: For demonstration, we'll create sample data
# In production, replace with actual API calls

def create_sample_property_data():
    """Create sample property data for demonstration"""
    np.random.seed(CONFIG['RANDOM_STATE'])
    
    suburbs = CONFIG['ANALYSIS_SUBURBS'] * 100  # Repeat suburbs for more data
    property_types = np.random.choice(CONFIG['PROPERTY_TYPES'], len(suburbs))
    
    # Generate sample data
    data = {
        'suburb': suburbs,
        'property_type': property_types,
        'bedrooms': np.random.choice([1, 2, 3, 4, 5], len(suburbs), p=[0.1, 0.25, 0.35, 0.25, 0.05]),
        'bathrooms': np.random.choice([1, 2, 3, 4], len(suburbs), p=[0.3, 0.45, 0.2, 0.05]),
        'parking': np.random.choice([0, 1, 2, 3], len(suburbs), p=[0.1, 0.4, 0.35, 0.15]),
        'land_area': np.random.normal(600, 200, len(suburbs)).clip(100, 2000),
        'building_area': np.random.normal(150, 50, len(suburbs)).clip(50, 500),
        'latitude': np.random.uniform(-37.8, -33.8, len(suburbs)),
        'longitude': np.random.uniform(144.9, 151.3, len(suburbs)),
        'date_listed': pd.date_range('2023-01-01', '2024-12-31', periods=len(suburbs)),
        'listing_type': 'Sale'
    }
    
    df = pd.DataFrame(data)
    
    # Generate realistic prices based on features
    base_prices = {'Sydney': 800000, 'Melbourne': 650000, 'Brisbane': 500000, 'Perth': 450000, 'Adelaide': 400000}
    type_multipliers = {'House': 1.0, 'Unit': 0.7, 'Townhouse': 0.85}
    
    prices = []
    for _, row in df.iterrows():
        base = base_prices[row['suburb']]
        type_mult = type_multipliers[row['property_type']]
        bedroom_mult = 1 + (row['bedrooms'] - 2) * 0.15
        area_mult = 1 + (row['building_area'] - 150) / 1000
        
        price = base * type_mult * bedroom_mult * area_mult
        price *= np.random.normal(1, 0.2)  # Add some randomness
        prices.append(max(price, 100000))  # Minimum price
    
    df['price'] = prices
    
    return df

# Create sample data
print("📊 Creating sample property data...")
property_data = create_sample_property_data()

print(f"✅ Generated {len(property_data)} property records")
print(f"📈 Price range: ${property_data['price'].min():,.0f} - ${property_data['price'].max():,.0f}")
print(f"🏠 Property types: {property_data['property_type'].value_counts().to_dict()}")

# Display sample
property_data.head()

In [None]:
# Fetch socio-economic data
print("📊 Fetching socio-economic data...")

# Create sample socio-economic data
socioeconomic_data = pd.DataFrame({
    'SA2_CODE': ['101011001', '201011002', '301011003', '401011004', '501011005'],
    'SA2_NAME': CONFIG['ANALYSIS_SUBURBS'],
    'MEDIAN_INCOME': [85000, 72000, 65000, 68000, 62000],
    'UNEMPLOYMENT_RATE': [4.2, 5.1, 4.8, 5.5, 5.8],
    'POPULATION': [25000, 18000, 22000, 15000, 12000],
    'EDUCATION_BACHELOR_PCT': [65.2, 58.4, 48.9, 52.1, 45.6]
})

print(f"✅ Loaded socio-economic data for {len(socioeconomic_data)} regions")
socioeconomic_data

## 3. Data Processing and Cleaning

Clean, normalize, and merge the datasets for analysis.

In [None]:
# Initialize data processor
processor = PropertyDataProcessor()

# Clean property data
print("🧹 Cleaning property data...")
clean_property_data = processor.clean_property_data(property_data)

# Merge with socio-economic data
print("🔗 Merging with socio-economic data...")
merged_data = processor.merge_with_socioeconomic_data(clean_property_data, socioeconomic_data)

# Create geospatial features
print("🗺️ Creating geospatial features...")
geo_data = processor.create_geospatial_features(merged_data)
geo_data = processor.calculate_distance_features(geo_data)

print(f"✅ Data processing complete")
print(f"📊 Final dataset shape: {geo_data.shape}")
print(f"📋 Columns: {list(geo_data.columns)}")

# Display processed data sample
geo_data[['suburb', 'property_type', 'price', 'bedrooms', 'MEDIAN_INCOME', 'distance_to_cbd_km']].head()

## 4. KPI Computation

Calculate key property market indicators and regional statistics.

In [None]:
# Calculate property KPIs
print("📈 Calculating property market KPIs...")

kpis = calculate_property_kpis(geo_data)

# Display overall market metrics
print("\n🏠 OVERALL MARKET METRICS")
print(f"Median Price: ${kpis['median_price']:,.0f}")
print(f"Mean Price: ${kpis['mean_price']:,.0f}")
print(f"Price Std Dev: ${kpis['price_std']:,.0f}")

# Suburb-level analysis
print("\n🏙️ SUBURB-LEVEL ANALYSIS")
suburb_stats = geo_data.groupby('suburb').agg({
    'price': ['count', 'median', 'mean'],
    'MEDIAN_INCOME': 'first',
    'UNEMPLOYMENT_RATE': 'first'
}).round(0)

suburb_stats.columns = ['Count', 'Median_Price', 'Mean_Price', 'Median_Income', 'Unemployment_Rate']
suburb_stats['Affordability_Ratio'] = suburb_stats['Median_Price'] / suburb_stats['Median_Income']

print(suburb_stats)

# Property type analysis
print("\n🏠 PROPERTY TYPE ANALYSIS")
type_stats = geo_data.groupby('property_type')['price'].agg(['count', 'median', 'mean']).round(0)
print(type_stats)

## 5. Machine Learning Valuation Model

Build and train XGBoost/Random Forest models for property valuation.

In [None]:
# Initialize and train valuation model
print(f"🤖 Training {CONFIG['MODEL_TYPE']} valuation model...")

model = PropertyValuationModel(model_type=CONFIG['MODEL_TYPE'])

# Train the model
training_metrics = model.train(
    geo_data, 
    target_col='price',
    random_state=CONFIG['RANDOM_STATE']
)

print("\n📊 MODEL PERFORMANCE METRICS")
print(f"Train R²: {training_metrics['train_r2']:.3f}")
print(f"Test R²: {training_metrics['test_r2']:.3f}")
print(f"Train RMSE: ${training_metrics['train_rmse']:,.0f}")
print(f"Test RMSE: ${training_metrics['test_rmse']:,.0f}")
print(f"Test MAE: ${training_metrics['test_mae']:,.0f}")
print(f"Features used: {training_metrics['feature_count']}")

# Feature importance
print("\n🔍 TOP 10 MOST IMPORTANT FEATURES")
feature_importance = model.get_feature_importance()
if not feature_importance.empty:
    print(feature_importance.head(10))
else:
    print("Feature importance not available for this model type")

In [None]:
# Identify over/undervalued properties
print("🔍 Identifying over/undervalued properties...")

valuation_analysis = identify_overvalued_properties(
    geo_data, 
    model, 
    threshold=0.15  # 15% threshold
)

# Valuation summary
valuation_summary = valuation_analysis['valuation_status'].value_counts()
print("\n💰 VALUATION ANALYSIS SUMMARY")
for status, count in valuation_summary.items():
    percentage = (count / len(valuation_analysis)) * 100
    print(f"{status}: {count} properties ({percentage:.1f}%)")

# Show some examples
print("\n📋 SAMPLE OVERVALUED PROPERTIES")
overvalued = valuation_analysis[valuation_analysis['valuation_status'] == 'Overvalued']
if len(overvalued) > 0:
    sample_overvalued = overvalued[['suburb', 'property_type', 'price', 'predicted_price', 'price_difference_pct']].head()
    print(sample_overvalued)

print("\n📋 SAMPLE UNDERVALUED PROPERTIES")
undervalued = valuation_analysis[valuation_analysis['valuation_status'] == 'Undervalued']
if len(undervalued) > 0:
    sample_undervalued = undervalued[['suburb', 'property_type', 'price', 'predicted_price', 'price_difference_pct']].head()
    print(sample_undervalued)
else:
    print("No undervalued properties found with current threshold")

## 6. Monte Carlo Price Simulation

Forecast future property prices using Monte Carlo simulation with economic variables.

In [None]:
# Monte Carlo simulation for a sample property
sample_price = geo_data['price'].median()

print(f"🎯 Running Monte Carlo simulation for property valued at ${sample_price:,.0f}")
print(f"⏱️ Simulation parameters: {CONFIG['SIMULATION_YEARS']} years, {CONFIG['SIMULATION_RUNS']} runs")

# Initialize simulator
simulator = MonteCarloPropertySimulation(
    base_price=sample_price,
    simulation_years=CONFIG['SIMULATION_YEARS'],
    num_simulations=CONFIG['SIMULATION_RUNS']
)

# Run simulation
simulation_results = simulator.run_simulation()
simulation_stats = simulator.get_simulation_statistics()

print("\n📊 SIMULATION RESULTS")
print(f"Expected price after {CONFIG['SIMULATION_YEARS']} years: ${simulation_stats['final_price_mean']:,.0f}")
print(f"Median forecast: ${simulation_stats['final_price_median']:,.0f}")
print(f"Expected annual return: {simulation_stats['annual_return_pct']:.2f}%")
print(f"Probability of gain: {simulation_stats['probability_gain']:.1%}")
print(f"Probability of loss: {simulation_stats['probability_loss']:.1%}")

print("\n📈 CONFIDENCE INTERVALS")
print(f"5th percentile: ${simulation_stats['percentile_5']:,.0f}")
print(f"25th percentile: ${simulation_stats['percentile_25']:,.0f}")
print(f"75th percentile: ${simulation_stats['percentile_75']:,.0f}")
print(f"95th percentile: ${simulation_stats['percentile_95']:,.0f}")

In [None]:
# Portfolio simulation example
print("🏠 Running portfolio simulation...")

# Select sample portfolio (median prices by suburb)
portfolio_prices = geo_data.groupby('suburb')['price'].median().tolist()

portfolio_stats = run_portfolio_simulation(
    portfolio_prices,
    simulation_years=CONFIG['SIMULATION_YEARS'],
    num_simulations=500  # Reduced for faster computation
)

print("\n💼 PORTFOLIO SIMULATION RESULTS")
print(f"Initial portfolio value: ${portfolio_stats['initial_portfolio_value']:,.0f}")
print(f"Expected portfolio value after {CONFIG['SIMULATION_YEARS']} years: ${portfolio_stats['final_portfolio_mean']:,.0f}")
print(f"Portfolio annual return: {portfolio_stats['annual_return_pct']:.2f}%")
print(f"Value at Risk (5%): ${portfolio_stats['value_at_risk_5pct']:,.0f}")

print("\n🏘️ INDIVIDUAL PROPERTY RETURNS")
for i, (prop_id, stats) in enumerate(portfolio_stats['individual_properties'].items()):
    suburb = CONFIG['ANALYSIS_SUBURBS'][i]
    print(f"{suburb}: {stats['annual_return_pct']:.2f}% annual return")

## 7. Interactive Visualizations

Create interactive charts, maps, and dashboards for data exploration.

In [None]:
# Note: Visualization code requires plotly and folium packages
# Install with: pip install plotly folium

try:
    import plotly.express as px
    import plotly.graph_objects as go
    
    # Initialize visualization suite
    viz = PropertyVisualizationSuite()
    
    print("📊 Creating visualizations...")
    
    # Price distribution by property type
    price_dist_fig = viz.create_price_distribution_chart(
        geo_data, 
        group_by='property_type',
        price_col='price'
    )
    price_dist_fig.show()
    
except ImportError:
    print("⚠️ Plotly not available. Install with: pip install plotly")
    
    # Create simple matplotlib visualization as fallback
    import matplotlib.pyplot as plt
    
    fig, axes = plt.subplots(2, 2, figsize=(15, 10))
    
    # Price by suburb
    geo_data.boxplot(column='price', by='suburb', ax=axes[0,0])
    axes[0,0].set_title('Price Distribution by Suburb')
    axes[0,0].tick_params(axis='x', rotation=45)
    
    # Price by property type
    geo_data.boxplot(column='price', by='property_type', ax=axes[0,1])
    axes[0,1].set_title('Price Distribution by Property Type')
    
    # Price vs bedrooms
    geo_data.groupby('bedrooms')['price'].median().plot(kind='bar', ax=axes[1,0])
    axes[1,0].set_title('Median Price by Bedrooms')
    
    # Price vs distance to CBD
    axes[1,1].scatter(geo_data['distance_to_cbd_km'], geo_data['price'], alpha=0.6)
    axes[1,1].set_xlabel('Distance to CBD (km)')
    axes[1,1].set_ylabel('Price ($)')
    axes[1,1].set_title('Price vs Distance to CBD')
    
    plt.tight_layout()
    plt.show()

In [None]:
# Valuation gauge for a sample property
try:
    sample_property = valuation_analysis.iloc[0]
    
    gauge_fig = viz.create_valuation_gauge(
        actual_price=sample_property['price'],
        predicted_price=sample_property['predicted_price'],
        property_address=f"{sample_property['suburb']} - {sample_property['property_type']}"
    )
    gauge_fig.show()
    
except (NameError, ImportError):
    print("⚠️ Interactive gauge visualization requires plotly")
    
    # Text-based valuation summary
    sample_property = valuation_analysis.iloc[0]
    valuation_pct = sample_property['price_difference_pct']
    
    print(f"\n🏠 SAMPLE PROPERTY VALUATION")
    print(f"Location: {sample_property['suburb']}")
    print(f"Type: {sample_property['property_type']}")
    print(f"Actual Price: ${sample_property['price']:,.0f}")
    print(f"Model Prediction: ${sample_property['predicted_price']:,.0f}")
    print(f"Difference: {valuation_pct:.1f}%")
    print(f"Status: {sample_property['valuation_status']}")

In [None]:
# Monte Carlo simulation visualization
try:
    # Simulation forecast chart
    forecast_fig = simulator.plot_simulation_results()
    forecast_fig.show()
    
    # Final price distribution
    dist_fig = simulator.plot_final_price_distribution()
    dist_fig.show()
    
except (NameError, ImportError):
    print("⚠️ Interactive simulation charts require plotly")
    
    # Simple matplotlib version
    import matplotlib.pyplot as plt
    
    final_year_col = f'Year_{CONFIG["SIMULATION_YEARS"]}'
    final_prices = simulation_results[final_year_col]
    
    plt.figure(figsize=(12, 5))
    
    # Histogram of final prices
    plt.subplot(1, 2, 1)
    plt.hist(final_prices, bins=50, alpha=0.7, edgecolor='black')
    plt.axvline(sample_price, color='green', linestyle='--', label='Current Price')
    plt.axvline(final_prices.mean(), color='red', linestyle='--', label='Expected Price')
    plt.xlabel('Property Price ($)')
    plt.ylabel('Frequency')
    plt.title(f'Price Distribution After {CONFIG["SIMULATION_YEARS"]} Years')
    plt.legend()
    
    # Simulation paths (sample)
    plt.subplot(1, 2, 2)
    years = list(range(CONFIG['SIMULATION_YEARS'] + 1))
    for i in range(min(50, len(simulation_results))):
        prices = [simulation_results.iloc[i][f'Year_{year}'] for year in years]
        plt.plot(years, prices, alpha=0.1, color='blue')
    
    # Add mean forecast
    mean_prices = [simulation_results[f'Year_{year}'].mean() for year in years]
    plt.plot(years, mean_prices, color='red', linewidth=3, label='Mean Forecast')
    
    plt.xlabel('Years')
    plt.ylabel('Property Price ($)')
    plt.title('Monte Carlo Price Forecasts')
    plt.legend()
    
    plt.tight_layout()
    plt.show()

## 8. Summary and Export Results

Generate summary report and export results for further analysis.

In [None]:
# Generate comprehensive summary report
print("📝 PROPERTY ANALYTICS SUMMARY REPORT")
print("=" * 50)

print(f"\n📊 DATASET OVERVIEW")
print(f"Total properties analyzed: {len(geo_data):,}")
print(f"Analysis period: {geo_data['date_listed'].min().strftime('%Y-%m-%d')} to {geo_data['date_listed'].max().strftime('%Y-%m-%d')}")
print(f"Suburbs covered: {', '.join(CONFIG['ANALYSIS_SUBURBS'])}")
print(f"Property types: {', '.join(CONFIG['PROPERTY_TYPES'])}")

print(f"\n💰 MARKET OVERVIEW")
print(f"Overall median price: ${geo_data['price'].median():,.0f}")
print(f"Price range: ${geo_data['price'].min():,.0f} - ${geo_data['price'].max():,.0f}")
print(f"Most expensive suburb: {geo_data.groupby('suburb')['price'].median().idxmax()}")
print(f"Most affordable suburb: {geo_data.groupby('suburb')['price'].median().idxmin()}")

print(f"\n🤖 MODEL PERFORMANCE")
print(f"Model type: {CONFIG['MODEL_TYPE'].title()}")
print(f"Test R² Score: {training_metrics['test_r2']:.3f}")
print(f"Test RMSE: ${training_metrics['test_rmse']:,.0f}")
print(f"Mean Absolute Error: ${training_metrics['test_mae']:,.0f}")

print(f"\n🔍 VALUATION INSIGHTS")
overvalued_count = len(valuation_analysis[valuation_analysis['valuation_status'] == 'Overvalued'])
undervalued_count = len(valuation_analysis[valuation_analysis['valuation_status'] == 'Undervalued'])
fair_value_count = len(valuation_analysis[valuation_analysis['valuation_status'] == 'Fair Value'])

print(f"Overvalued properties: {overvalued_count} ({overvalued_count/len(valuation_analysis)*100:.1f}%)")
print(f"Undervalued properties: {undervalued_count} ({undervalued_count/len(valuation_analysis)*100:.1f}%)")
print(f"Fair value properties: {fair_value_count} ({fair_value_count/len(valuation_analysis)*100:.1f}%)")

print(f"\n🔮 FORECAST SUMMARY")
print(f"Simulation period: {CONFIG['SIMULATION_YEARS']} years")
print(f"Expected annual return: {simulation_stats['annual_return_pct']:.2f}%")
print(f"Probability of gain: {simulation_stats['probability_gain']:.1%}")
print(f"95% confidence interval: ${simulation_stats['percentile_5']:,.0f} - ${simulation_stats['percentile_95']:,.0f}")

print(f"\n📈 TOP PERFORMING SUBURBS (by median price)")
top_suburbs = geo_data.groupby('suburb')['price'].median().sort_values(ascending=False)
for suburb, price in top_suburbs.items():
    print(f"{suburb}: ${price:,.0f}")

In [None]:
# Export results
print("💾 Exporting analysis results...")

# Save processed data
geo_data.to_csv('../data/processed/property_data_processed.csv', index=False)
valuation_analysis.to_csv('../data/outputs/valuation_analysis.csv', index=False)
simulation_results.to_csv('../data/outputs/monte_carlo_simulation.csv', index=False)

# Save model
model.save_model('../models/property_valuation_model.joblib')

# Create summary statistics file
summary_stats = {
    'analysis_date': datetime.now().isoformat(),
    'total_properties': len(geo_data),
    'model_performance': training_metrics,
    'simulation_stats': simulation_stats,
    'suburb_medians': geo_data.groupby('suburb')['price'].median().to_dict()
}

import json
with open('../data/outputs/analysis_summary.json', 'w') as f:
    json.dump(summary_stats, f, indent=2, default=str)

print("✅ Results exported successfully!")
print("\n📁 OUTPUT FILES:")
print("- ../data/processed/property_data_processed.csv")
print("- ../data/outputs/valuation_analysis.csv")
print("- ../data/outputs/monte_carlo_simulation.csv")
print("- ../data/outputs/analysis_summary.json")
print("- ../models/property_valuation_model.joblib")

## Conclusion

This property analytics tool provides a comprehensive framework for analyzing the Australian property market. The system successfully:

1. ✅ **Data Integration**: Combines property listings with socio-economic data
2. ✅ **Advanced Analytics**: Calculates key market indicators and regional statistics
3. ✅ **Machine Learning**: Builds accurate valuation models for price prediction
4. ✅ **Risk Assessment**: Identifies over/undervalued properties
5. ✅ **Forecasting**: Uses Monte Carlo simulation for future price predictions
6. ✅ **Visualization**: Creates interactive charts and maps for data exploration

### Next Steps

1. **API Integration**: Replace sample data with real Domain.com.au API calls
2. **Real-time Updates**: Implement automated data refresh mechanisms
3. **Enhanced Features**: Add more sophisticated economic variables
4. **Web Dashboard**: Deploy as a web application for broader access
5. **Advanced Models**: Experiment with deep learning approaches

### Usage Notes

- Ensure all required packages are installed: `pip install -r requirements.txt`
- Update API keys in the configuration section
- Modify analysis parameters as needed for your specific use case
- Results are saved in the `data/outputs/` directory for further analysis