# Real-World Use Case: Geospatial Clustering (Housing)

## 1. The Problem
We have latitude and longitude data for thousands of houses. We want to identify "High Density" urban areas vs "Low Density" rural areas automatically.

## 2. Why DBSCAN?
*   **Geography is irregular**: Cities follow coastlines, valleys, and roads. They aren't perfect spheres (so K-Means is bad).
*   **Noise**: Rural houses far from anyone else should be classified as "outliers/noise", not forced into a city cluster.

## 3. Data (California Housing Proxy)
We use Lat/Lon coordinates.

In [None]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from sklearn.cluster import DBSCAN
from sklearn.datasets import fetch_california_housing

# 1. Load Data
cali = fetch_california_housing()
df = pd.DataFrame(cali.data, columns=cali.feature_names)
X = df[['Latitude', 'Longitude']].values

# Take a sample for speed and visibility
idx = np.random.choice(len(X), 2000, replace=False)
X_sample = X[idx]

# 2. DBSCAN
# eps=0.3 degrees (roughly 30km? depends on Lat), min_samples=10
# Note: In real geospatial apps, convert Lat/Lon to Kilometers first!
db = DBSCAN(eps=0.3, min_samples=10)
labels = db.fit_predict(X_sample)

# 3. Visualize Map
plt.figure(figsize=(8, 8))
plt.scatter(X_sample[:, 1], X_sample[:, 0], c=labels, cmap='tab20', s=10, alpha=0.6)
plt.xlabel('Longitude')
plt.ylabel('Latitude')
plt.title(f'Detected Cities/regions: {len(set(labels)) - (1 if -1 in labels else 0)}')
plt.show()

# Note: Points with color -1 (often Black/Grey) are Noise (Rural/Isolated values)