# Detecting Decentralized Solar Panel Assemblies in Post-War Syria

This project introduces a novel workflow for identifying solar panel assemblies in the challenging, post-war urban landscapes of Syria. Decentralized solar electricity production has become common in Syria due to unreliability and inconsistent delivery from the national electric grid. Our primary goal was to develop a deep learning model capable of not only detecting panels but also performing instance segmentation to estimate their area and keypoint regression to determine each panel's cardinal direction. Both of these outputs are important for accurately estimating energy production from residential solar, which is vital for grid integration and reconstruction efforts.

## Overview

Existing models for solar panel identification have been mainly trained on data from Western countries with very different conditions--large, uniform solar farms, buildings with slanted, uncluttered roofs--from the Syrian context. This environment includes panels installed on flat, crowded rooftops in non-standardized configurations. The nature of and difficulties in acquiring our data are key distinguishing features of this project.

### Data Acquisition Challenges

Our data was acquired through the invaluable efforts of local, on-the-ground aerial photographers. However, this process was fraught with difficulties. The challenges included:

* **Limited Equipment**: Access was restricted to consumer-focused drones, which are equipped with only consumer-level cameras, and can only cover limited areas at lower altitudes.
* **Security Clearances**: Obtaining the necessary permits to fly drones can be exceptionally difficult in volatile regions like Syria.
* **Environmental Conditions**: Windy conditions pose a significant challenge for smaller, consumer-grade drones, affecting image stability and quality, especially at higher altitudes.

### Hurdles in Syrian Urban Landscapes

Another major hurdle is the unique nature of urban spaces in Syria, which feature dense, cluttered rooftops. This environment creates numerous challenges for accurate solar panel detection. Below are some examples of these difficult factors:

**a. Dense and Obstructed Rooftops**

Urban density leads to many obstructions and features like building windows, balconies, cars (especially sunroofs), and decorative tiled patterns that easily be mistaken for solar panels.

<div style="display: flex; justify-content: center;margin: 25px 0;">
<img src="main_ipynb_images/roof_pattern.png" width="300" alt="A grid structure on a roof"><img src="main_ipynb_images/roof_pattern2.png" width="300" alt="A grid structure on a roof"><img src="main_ipynb_images/confusing_patterns.png" width="300" alt="Tiles in a square pattern can confuse the model. ">
</div>


**b. Solar Water Heaters**

Solar water heaters, which are irrelevant to our goal of estimating electric output, are common features on rooftops in Syria and appear visually similar to the photovoltaic (PV) solar panels we aim to detect.

<div style="display: flex; justify-content: center;margin: 25px 0;">
<img src="main_ipynb_images/water_heater2.png" width="400" alt="Solar water heaters are widespread and look like PV panels.">
</div>

**c. Limited and Varied Data**

The limited nature of our data introduces further complications:
* **Shadows**: Depending on the time of day and their interplay with the geometry of buildings, shadows can be mistaken for panels.
* **Camera Angle**: Images are not always taken from a straight-down (orthographic) perspective, leading to distortion and lower image quality, especially away from the center.


<div style="display: flex; justify-content: center;margin: 25px 0;">
<img src="main_ipynb_images/dense_rooftops_at_an_angle.png" width="300" alt="Low resolution and dense rooftops make it difficult to identify solar panels."><img src="main_ipynb_images/distorted_low_quality.png" width="300" alt="At a sharp angle and dark shadows can be confused for panels.">
</div>

### Methodological Choice: Corners over Angles

Our model identifies the two bottom corners of a solar panel assembly rather than predict its angle directly for two primary reasons:

1.  **Data Limitation**: A direct angle prediction would require strictly orthographic (straight-down) images. By focusing on corners, we can still utilize images taken from an angle along with every part of images taken straight-down regardless of perspective distortion. For the images that are indeed orthographic, the angle can be derived by taking the direction perpendicular to the line connecting the two corners.

<div style="display: flex; justify-content: center;margin: 25px 0;">
<img src="main_ipynb_images/calculating_angle.png" width="400" alt="By finding the bottom corners, we automatically can calculate the direction in straight-down images.">
</div>


2.  **Loss Function**: The loss function for a direct angle prediction would be non-differentiable due to its cyclic nature (e.g., the difference between 359° and 1° is small, but a simple subtraction yields a large value). A regression task to find corner coordinates, where we can simply use MSE, avoids this issue.

<div style="display: flex; justify-content: center;margin: 25px 0;">
<img src="main_ipynb_images/loss_function_y=0.png" width="400" alt="Graph of an 'L1' cyclic loss function between two angles, where one of the angles is fixed to zero. ">
</div>

## Labeling the Data

1.  Our local photographers took 10 drone images from diverse neighborhoods varying by socioeconomic level, damage to property, and type (residential, commercial) in the city of Homs, Syria.

2.  **Patch Sampling**: We sampled 1544 image patches of size 224x224 from 9 of our source images that we reserve for training and validation. The sampling process involved random rotations (applied *before* labeling to avoid the difficulty of transforming our annotations by rotation) and ensured minimal overlap between patches (the center of one patch cannot appear in another).

3.  **Test Set**: To prevent information leakage from the training and validation sets, we kept one source image from a separate geographical location aside. From this image, we sampled 109 patches to form our final test set.

4.  **Custom Annotation Tools**: We developed custom tools to facilitate the hand-labeling of our images with all necessary annotations: bounding boxes, segmentation masks, and the two bottom corners for each solar panel assembly. A separate tool was created to display these annotations for verification. Using these, we meticulously labeled and verified our data to produce annotated COCO JSON files.

5.  **Active Learning Workflow**: To manage the extensive labeling effort, we employed an active learning strategy. We began by randomly selecting and labeling 300 image patches. We then used this initial training set to identify the 200 most impactful (i.e. uncertain) examples from the unlabeled pool to label next. Our acquisition function was the average classification entropy for the bounding box predictions in a given patch. To address cases of model overconfidence, we employed a multiple inference strategy varying our model hyperparameters. This process was repeated twice more, each time adding 100 more images. This iterative approach allowed us to build a robust training set of 700 fully hand-labeled image patches, leaving 844 patches from the original training/validation pool unlabeled.


    <div style="display: flex; justify-content: center;margin: 25px 0;">
    <img src="main_ipynb_images/active_learning_entropy_graph.png" width="600" alt="Overview of our models">
    </div>

## Our Models

1.  **Three-Stage Architecture**: Our model employs a three-stage pipeline:
    * **Stage 1 (Object Detection)**: A **Faster R-CNN** model identifies and produces a list of bounding boxes for the solar panels in an image.
    * **Stage 2 (Segmentation)**: A version of **SlimSAM** takes the image and a single bounding box at a time to generate a precise segmentation mask for each detected solar panel.
    * **Stage 3 (Corner Prediction)**: A custom model predicts the coordinates of the two bottom corners. This model takes the image, bounding box, and mask for a single panel as input. It utilizes a **ResNet50 with FPN backbone shared with the Faster R-CNN** to extract image features, a small CNN to process the mask, and simple MLPs to process the bounding box coordinates and combine the information from all three processed inputs.


    <div style="display: flex; justify-content: center;margin: 25px 0;">
    <img src="main_ipynb_images/model_architecture.png" width="600" alt="Overview of our models">
    </div>
  

2.  **Training Routine**: We developed a 5-phase training routine to train the Faster R-CNN and the corner prediction model jointly, leveraging their shared backbone. The SlimSAM model was not fine-tuned as it performed exceptionally well out-of-the-box. The 5 phases are as follows:
    * **Phase 0**: Freeze the backbone and train only the head of the corner model on a subset of the data. This is necessary because the corner model's head is initialized randomly, unlike the pre-trained head of the Faster R-CNN.
    * **Phase 1**: Train the heads of both the bounding box and corner models on the same subset of data.
    * **Phase 2**: Freeze the heads and fine-tune the shared backbone using losses from both tasks simultaneously on the remainder of the data.
    * **Phase 3**: Re-train the heads alone on the entire labeled dataset.
    * **Phase 4**: Fine-tune the complete models on the entire dataset.
    
    The hyperparameters for this routine were tuned using Optuna. More details can be found in the `multistage_multitask_training.ipynb` notebook.
    
3.  **Inference and Deployment**: We have implemented functions for loading the trained models and running the full inference pipeline. Additionally, a small web application and Python server code (`server.py` and `inference_app.html`) were created to allow users to interact with the saved models through a web interface.

## Evaluation

To evaluate the performance of our segmentation model, we used the **Jaccard Index (Intersection over Union - IoU)**, **Accuracy**, and **F1-Score**. The code for this evaluation is contained in `solareval.py`. The following code cell will run the evaluation on our test set and display the results.

In [9]:
!pip install --quiet torchmetrics
from google.colab import drive
import sys
import os

BASE_PROJECT_PATH = '/content/drive/MyDrive/Erdos Institute - Solar Panels Project'
drive.mount('/content/drive')
sys.path.append(os.path.join(BASE_PROJECT_PATH, 'modules'))
from solarutils import *

from solareval import calculate_metrics

DEVICE = torch.device("cuda" if torch.cuda.is_available() else "cpu")
print(f"Using device: {DEVICE}")
MODEL_PATH = os.path.join(BASE_PROJECT_PATH, 'models', "2025_08_07_02_13_model_bbox.pth")
bbox_transform = transforms.Compose([
    transforms.Resize((224, 224)),
    transforms.ToTensor(),
    transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225])
])

# Load models.

bbox_model = load_bbox_model(DEVICE, path=MODEL_PATH)
#baseline_model = load_bbox_model(DEVICE)
sam_model, sam_processor = load_sam_model("Zigeng/SlimSAM-uniform-77", DEVICE)

# Define paths
TEST_IMAGE_DIR = os.path.join(BASE_PROJECT_PATH, 'data', 'test_images')
TEST_JSON = os.path.join(BASE_PROJECT_PATH, 'data', 'test_set_annotations.json')

# Calculate metrics
metrics = calculate_metrics(bbox_model, sam_model, sam_processor, bbox_transform, TEST_IMAGE_DIR, TEST_JSON, DEVICE, bbox_threshold= 0.5)
print("Evaluation Results:")
print(metrics)

Drive already mounted at /content/drive; to attempt to forcibly remount, call drive.mount("/content/drive", force_remount=True).
Using device: cuda
Bounding box model loaded from /content/drive/MyDrive/Erdos Institute - Solar Panels Project/models/2025_08_07_02_13_model_bbox.pth
SAM model loaded from Zigeng/SlimSAM-uniform-77
Loaded 109 images from /content/drive/MyDrive/Erdos Institute - Solar Panels Project/data/test_images with annotations in /content/drive/MyDrive/Erdos Institute - Solar Panels Project/data/test_set_annotations.json.
Evaluating model...


100%|██████████| 109/109 [00:20<00:00,  5.36it/s]

Evaluation Results:
{'avg_iou': 0.6323783993721008, 'avg_accuracy': 0.9903662204742432, 'avg_f1_score': 0.7747938632965088}





In [11]:
CORNER_MODEL_PATH = os.path.join(BASE_PROJECT_PATH, 'models', "2025_08_11_16_20_model_corner.pth")
corner_model = load_corner_model(CORNER_MODEL_PATH, bbox_model.backbone, DEVICE, strategy = 'basic')
corner_img_transform = transforms.Compose([
    transforms.Resize((224, 224)),
    transforms.ToTensor(),
    transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225])
])
corner_mask_transform = transforms.Compose([
    transforms.Resize((224, 224)),
    transforms.ToTensor()
])

data = SinglePanelDataset(TEST_JSON, TEST_IMAGE_DIR, image_transform = corner_img_transform, mask_transform = corner_mask_transform)



Using provided backbone for feature extraction.
Corner predictor model loaded from /content/drive/MyDrive/Erdos Institute - Solar Panels Project/models/2025_08_11_16_20_model_corner.pth


In [18]:
MSE = 0
for image, target in data:
  image = image.to(DEVICE)
  target['box_xywh'] = target['box_xywh'].to(DEVICE)
  target['mask'] = target['mask'].to(DEVICE)
  MSE+=(target['keypoints'].to(DEVICE)-corner_model(image.unsqueeze(0), target['box_xywh'].unsqueeze(0), target['mask'].unsqueeze(0)))**2

In [23]:
sum(MSE.squeeze(0)/len(data)*(224**2))

tensor(1811.8848, device='cuda:0', grad_fn=<AddBackward0>)

## Results: IoU = 0.63, Accuracy = 0.99, F1-score = 0.77.
For the corner model: MSE between the target and predicted 4-dimensional vectors is 1811.

For comparison, here are the values of these metrics for different models and datasets from the SolarFormer paper, along with a baseline where we use a non-fine-tuned pre-trained Faster R-CNN model with the same SlimSAM workflow: 

<div style="display: flex; justify-content: center;margin: 25px 0;">
    <img src="main_ipynb_images/metrics.png" width="600" alt="Overview of our models">
    </div>


## Further Improvements

While this project has yielded promising results, there are several avenues for future improvement, especially for the corner model:

* **Data Procurement**: Acquiring more high-quality, orthographic drone imagery would be the most direct way to improve model accuracy and reduce the impact of issues like perspective distortion.
* **Context-Aware Corner Prediction**: An interesting direction would be to have the model first use the general image context to determine the angle of the sun (e.g., from shadows), and then combine this information with the bounding box and mask to produce more accurate corner predictions.
* **Alternative Active Learning Strategies**: Our project uses a standard uncertainty-based acquisition function (classification entropy). We would like to explore other uncertainty-based approaches like Monte Carlo dropout or other families of acquisition functions like clustering methods.
* **Model Architecture**: Exploring transformer-based models, such as Vision Transformers (ViT), could potentially lead to performance gains.
* **Model Efficiency**: For wide-scale deployment in a resource-limited environment like Syria, developing smaller, more efficient models that maintain high accuracy would be a critical next step, especially for nationwide deployment.