I didn't make it to the top 20, but this hackathon was one of the most rewarding learning experiences I've had in a while. Here's how I approached the challenge and what I took away from it.
When I saw the problem, predicting Urban Heat Island (UHI) intensity at given points in a city. I didn’t just see it as a machine learning task. I saw it as a geospatial problem (thats why my first simple distance based averaging model had 97% accuracy :P). UHI isn’t something that occurs in isolation; it’s spatial by nature. A hot spot in a city is influenced not just by what’s at that exact point, but by what surrounds it: buildings, vegetation, elevation, materials, and more.
The Data I Used
DEM (~5m) and DSM (~20cm) (raster)
Landcover and NDVI (~20cm and Sentinel-derived) for vegetation and surface type (raster)
Building shadow data (~50cm) (raster)
Weather data as time-series (vector)
For each training and prediction point, I cropped a ~100x100 grid from all raster layers. Yes, there was overlap between points, but that was intentional. I had planned to experiment with different crop sizes, but time wasn’t on my side.
Experiment 1: CNN + LSTM Hybrid
My first approach was to treat this like a multi-modal learning problem. Instead of merging all raster data and feeding it into one big ResNet50, I grouped similar-resolution layers:
High-res group: Landcover, DSM
Lower-res group: DEM, NDVI
Each group was passed through its own ResNet50. Meanwhile, I used an LSTM to handle the weather time series. I combined the CNN and LSTM embeddings via simple concatenation, added a few fully connected layers, and trained the model to regress UHI values.
The results were okay, I got around 84% accuracy. Not bad, but I knew there was room to grow.
Experiment 2: Building Spatial Awareness with GNNs
Since UHI is inherently spatial, I wanted to try a graph-based approach. I confirmed with the organizers that GNNs were allowed. The idea was to build a rich embedding for each point and then connect those points in a graph where the spatial relationships mattered.
This time, I used a added one more dataset to the input i.e. building shadow dataset and revised my CNN encoder:
One branch took DSM + DEM + Shadow as a 3-channel input
Another branch used Landcover + NDVI as 2-channel input
This time, instead of simple concatenation, I used an attention module to fuse these two streams into a single 128-dimensional embedding.
I trained this CNN to predict UHI, then stripped off the final layer to use it purely for generating embeddings. I passed both training and prediction points through it, and got 128-dim vectors for each.
Interestingly, many of those dimensions were zeros, so I dropped them and kept only the active features.
To model spatial context, I built a graph using K-Nearest Neighbors (k=4) based on location. I masked out prediction points during training (since they had no labels), and trained a vanilla GNN and a Graph Attention Network (GAT).
Both gave similar results, and I reached a final score of 94%. I know others hit 99% huge congrats to them.
Why I Think This Approach Matters
UHI isn’t just a localized phenomenon, it’s about how a place relates to its surroundings. That’s why I believe GNNs are a powerful tool here. They're inherently spatial and relational. They allow the model to not just learn "what's at this location" but also "how is this location influenced by others nearby?"
And since the embedding stage is modular, this method could be scaled to other cities or locations with similar data.
What I Wish I Had Time For
Feature importance analysis before feeding the GNN—would’ve helped simplify or optimize inputs
Speeding up training, GNNs are powerful but slow, especially with message passing and large graphs. I needed more epochs than I had time for.