In [None]:
!python train.py --model fcn50 --epochs 1

# Building Footprint Extraction

## Overview

As high resolution satellite imagery becomes increasingly available in both the public and private domain, a number of beneficial applications that leverage this data are enabled. Extraction of building footprints in satellite imagery is a core component of many downstream applications of satellite imagery such as humanitarian assistance and disaster response. This paper offers a comparative study of methods for building footprint extraction in satellite imagery. The focus is to explore state-of-the-art semantic segmentation models in computer vision using the SpaceNet 2 Building Detection Dataset. Four high-level approaches, and six total variants, are trained and evaluated including U-Net, UNet++, Fully Convolutional Networks (FCN) and DeepLabv3. The Intersection over Union (IoU) is used to quantify the segmentation performance on a held out test set. In our experiments, we found that Deeplabv3 with a Resnet-101 backbone is the most accurate approach to building footprint extraction out of the surveyed methods. In general, models that leverage pretraining achieve high accuracy and require minimal training. Conversely, models that do not leverage pretraining are inaccurate and require longer training regimes. 

## Dataset 
In order to benchmark the aforementioned approaches on building footprint extraction in satellite images, the [SpaceNet Building Detection V2 dataset](https://spacenet.ai/spacenet-buildings-dataset-v2/) is used. This dataset contains high resolution satellite imagery and corresponding labels that specify the location of building footprints. The dataset includes 302,701 Building Labels from across 10,593 multi-spectral satellite images of Vegas, Paris, Shanghai and Khartoum. The labels are binary and indicate whether each pixel is building or background. 

<p align="center">
<img width="600" alt="Screen Shot 2021-09-28 at 5 41 20 PM" src="https://user-images.githubusercontent.com/34798787/160892992-c6586c15-308f-481c-a2d9-61f5f46cdc8e.png">  
    <br>
<div align="center"> 
   <b> Figure 1:</b>  An example of images (left) and labels (right) in the Spacenet Building
Detection V2.
</div> 
</p>

## Experimental Setup 

The dataset is divided into training (80%), validating (10%) and testing (10%) sets. Images are resized from 650x650 to 384x384 using bi-cubic interpolation and normalized using the mean and standard deviation of the Imagenet dataset.
The proposed semantic segmentation models are trained on the training set, while the validating set is used to determine a stopping criteria. Lastly, the trained model is evaluated on the testing set. Intersection over Union (IoU) is the metric used to evaluate the model performance and measures the overlap between the labels of the prediction and ground truth. IoU ranges from 0 to 1 where 1 denotes perfect and complete overlap.

## Results 

<p align="center">
<img width="200" alt="Screen Shot 2021-09-28 at 5 41 20 PM" src="https://user-images.githubusercontent.com/34798787/160891571-1c38cdc6-a2ae-4b00-af71-c85dd50603e1.png">  
    <br>
<div align="center"> 
   <b> Figure 2:</b> IOU score on test set for each approach.
</div> 
</p>

<p align="center">
<img width="1000" alt="Screen Shot 2021-09-28 at 5 41 20 PM" src="https://user-images.githubusercontent.com/34798787/160889601-98814c3e-47e8-45f4-9eb7-18f93daebf75.jpg">  
    <br> 
<div align="center"> 
    <b>Figure 3: </b> A visualization of the predictions generated by each approach along with the input image (far left) and ground truth label (far right).
</div> 
</p>

<p align="center">
<img width="600" alt="Screen Shot 2021-09-28 at 5 41 20 PM" src="https://user-images.githubusercontent.com/34798787/160892139-b52cc258-651b-40b6-8f00-ae2deed1de7f.png">  
    <br> 
<div align="center"> 
    <b>Figure 4: </b> Binary cross entropy loss for training set (top) and validation set
(bottom) across epochs.
</div> 
</p>

## Running Code
To configure the environment to run the experiments navigate to the base of this directory and execute the following commands: 

```
conda create -n new_env
conda activate new_env 
pip install -r requirements.txt
```

To obtain results for a specific architecture simply pass the appropriate arguments to the **train.py** script: 
```
python train.py --model fcn50 --epochs 10 --batch_size 4
```

The **train.py** script has the following arguments: 
- **model**:        (str): Architecture variation for experiments.
- **epochs**        (int): The number of epochs to train the memory.
- **batch_size**    (int) The batch size for training, validation and testing.
- **learning_rate** (float): Learning rates of memory units.
- **size**          (int): Side length of input image. 
- **train_perc**   (float): The proportion of samples used for train.
- **val_perc**    (float): The proportion of samples used for validation.
- **data_path**    (str): The root directory of the dataset.
