
# Predicting Urban Heat Island (UHI) Intensity Using Satellite Data

## Introduction
Urban Heat Island (UHI) effect occurs when urban areas experience higher temperatures than nearby rural areas due to human activities, infrastructure, and reduced vegetation.  
This project aims to build a machine learning model to predict UHI intensity using Sentinel-2 multispectral satellite imagery and ground temperature measurements.

**Goals:**
- Identify environmental factors contributing to UHI
- Predict UHI intensity for targeted intervention
- Provide insights for urban planning and sustainability


In [None]:

# Import standard libraries
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

# Machine learning libraries
from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestRegressor, GradientBoostingRegressor
from sklearn.metrics import mean_squared_error, r2_score

# For geospatial data (optional if working with raster/vector data)
# import rasterio
# import geopandas as gpd

# Display settings
sns.set(style="whitegrid")



## Data Loading
The dataset should include:
- **NDVI**: Normalized Difference Vegetation Index
- **NDBI**: Normalized Difference Built-up Index
- **Water Index**: Presence of water bodies
- **Temperature**: Ground-truth surface temperature

Replace `your_dataset.csv` with your actual dataset file.


In [None]:

# Load dataset
# Example dataset format: ndvi, ndbi, water_index, temperature
df = pd.read_csv("your_dataset.csv")

# Preview dataset
df.head()



## Data Preprocessing
Check missing values, handle outliers, and ensure data types are correct.


In [None]:

# Check missing values
print(df.isnull().sum())

# Drop rows with missing values (or use imputation if necessary)
df = df.dropna()

# Basic statistics
df.describe()



## Exploratory Data Analysis
Visualize relationships between environmental indices and temperature.


In [None]:

# Pairplot
sns.pairplot(df, diag_kind="kde")
plt.show()

# Correlation heatmap
plt.figure(figsize=(8,6))
sns.heatmap(df.corr(), annot=True, cmap="coolwarm")
plt.title("Correlation Heatmap")
plt.show()



## Modeling
Train and evaluate Random Forest and Gradient Boosting models.


In [None]:

# Define features and target
X = df.drop(columns=["temperature"])
y = df["temperature"]

# Split data
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Random Forest
rf = RandomForestRegressor(random_state=42)
rf.fit(X_train, y_train)
rf_preds = rf.predict(X_test)

# Gradient Boosting
gb = GradientBoostingRegressor(random_state=42)
gb.fit(X_train, y_train)
gb_preds = gb.predict(X_test)

# Evaluate
def evaluate_model(y_true, y_pred, model_name):
    rmse = mean_squared_error(y_true, y_pred, squared=False)
    r2 = r2_score(y_true, y_pred)
    print(f"{model_name} - RMSE: {rmse:.2f}, R²: {r2:.2f}")

evaluate_model(y_test, rf_preds, "Random Forest")
evaluate_model(y_test, gb_preds, "Gradient Boosting")



## Feature Importance
Identify which features most influence the UHI intensity.


In [None]:

# Feature importance from Random Forest
importances = rf.feature_importances_
feature_names = X.columns

feat_importances = pd.Series(importances, index=feature_names)
feat_importances.sort_values().plot(kind="barh", figsize=(8,5))
plt.title("Feature Importance (Random Forest)")
plt.show()



## Conclusion & Recommendations
- **NDVI** (vegetation cover) has a negative correlation with UHI → More vegetation = lower temperature.
- **NDBI** (built-up areas) has a positive correlation with UHI → Dense urban zones are hotter.
- Recommended interventions:
  1. Increase vegetation in identified hotspots.
  2. Promote vertical greening and green roofs.
  3. Target retrofitting in moderate-NDBI areas.
