The present script was developed and used on Google Colab. The purpose of the script is to allow the user to load the model described in our paper "DamageMap: A post-wildfire damaged buildings classifier", and use it to classify the images of a given dataset. The model will output "0" for an undamaged building, and "1" for a damaged building. 

The dataset should consist of separate images of building roofs, and all of the images should be contained in one folder. If the true labels of the dataset are known and the user wants to calculate the accuracy of the model, then the dataset should be prepared in the following way. 

Create a folder that contains 2 subfolders. The first subfolder (in alphabetical order) should contain the images of the undamaged buildings, because they will automatically get the label "0" (and we want it to match the prediction of our model for undamaged buildings). The second subfolder (in alphabetical order) should contain the images of damaged buildings.

The following cell allows Google Colab to get access to the files of your Google Drive.

In [None]:
from google.colab import drive

drive.mount('/content/drive', force_remount=True)

%cd drive/My\ Drive

Go to this URL in a browser: https://accounts.google.com/o/oauth2/auth?client_id=947318989803-6bn6qk8qdgf4n4g3pfee6491hc0brc4i.apps.googleusercontent.com&redirect_uri=urn%3aietf%3awg%3aoauth%3a2.0%3aoob&response_type=code&scope=email%20https%3a%2f%2fwww.googleapis.com%2fauth%2fdocs.test%20https%3a%2f%2fwww.googleapis.com%2fauth%2fdrive%20https%3a%2f%2fwww.googleapis.com%2fauth%2fdrive.photos.readonly%20https%3a%2f%2fwww.googleapis.com%2fauth%2fpeopleapi.readonly

Enter your authorization code:
··········
Mounted at /content/drive
/content/drive/My Drive


Importing *necessary* libraries.

In [None]:
import torch
import torch.nn as nn
import torch.optim as optim
import torchvision
from torchvision import datasets, models, transforms
from torch.utils.data import DataLoader, Dataset
from torch.utils.data import sampler, RandomSampler, SubsetRandomSampler
from torch.utils.tensorboard import SummaryWriter
from PIL import Image, ImageOps
import torchvision.datasets as dset
import torchvision.transforms as T

import numpy as np
from sklearn.metrics import confusion_matrix
import time

import seaborn as sns
from __future__ import print_function 
from __future__ import division
import matplotlib.pyplot as plt
import time
import os
import copy
print("PyTorch Version: ",torch.__version__)
print("Torchvision Version: ",torchvision.__version__)

PyTorch Version:  1.5.1+cu101
Torchvision Version:  0.6.1+cu101


  import pandas.util.testing as tm


If a GPU is available then the following cell will allow our model to use it, to classify faster.

In [None]:
USE_GPU = True

if USE_GPU and torch.cuda.is_available():
    device = torch.device('cuda')
else:
    device = torch.device('cpu')


print('using device:', device)

using device: cpu


The following cell loads the dataset that the model will later classify.

In [None]:
BATCH_SIZE = 64   # number of images that the model will be classifying in each step (limited by GPU or CPU capacity).
FOLDERNAME = 'damaged_structures_detector/xbd_for_prediction' # path to the folder with the images of the dataset we want to classify. 
#If the folder contains subfolders, then the images in the first subfolder will automatically get the label "0", images in the next subfolder will get the label "1" and so on...

## Following two lines contain the means and standard deviations (std) of the datasets described in our paper. Keep in mind that before classifying a new dataset
## it is necessary to normalize it using the mean and std of the dataset on which the model was trained on.
# Par and Carr: mean=[0.3662, 0.3452, 0.3384], std=[0.1552, 0.1500, 0.1475])
# Xbd: mean=[0.4597, 0.4655, 0.3800], std=[0.1425, 0.1265, 0.1287])

data_transform = transforms.Compose([     # specifying the transformations that we will apply to the new dataset before classifying its' images
        transforms.Resize(224),
        transforms.CenterCrop(224),
        transforms.ToTensor(),
        transforms.Normalize(mean=[0.4597, 0.4655, 0.3800],
                             std=[0.1425, 0.1265, 0.1287])   # since our model was trained on Xbd we normalize with the Xbd mean and std
    ])

test_dataset = datasets.ImageFolder(FOLDERNAME, transform = data_transform)   # Apply transformations on the dataset
test_loader  = DataLoader(test_dataset, batch_size=BATCH_SIZE, shuffle=False, num_workers = 0) # Create the Pytorch dataloader

The following cell repeats the transformed dataset creation and the creation of dataloader just because sometimes Google Colab would fail to load the whole dataset from the provided path. So, repeat one more time to be safe.

In [None]:
test_dataset = datasets.ImageFolder(FOLDERNAME, transform = data_transform)
test_loader  = DataLoader(test_dataset, batch_size=BATCH_SIZE, shuffle=False, num_workers = 0)

Make sure that the loaded dataset contains all of the images in the specified folder.

In [None]:
test_dataset

Dataset ImageFolder
    Number of datapoints: 47543
    Root location: damaged_structures_detector/xbd_for_prediction
    StandardTransform
Transform: Compose(
               Resize(size=224, interpolation=PIL.Image.BILINEAR)
               CenterCrop(size=(224, 224))
               ToTensor()
               Normalize(mean=[0.3662, 0.3452, 0.3384], std=[0.1552, 0.15, 0.1475])
           )

Load the model that will be used for prediction

In [None]:
%%capture
MODEL_PATH = "damaged_structures_detector/checkpoints/Resnet_model_trained_on_xbd.pth"  # path to the model that will be used for classification
model = torch.load(MODEL_PATH, map_location=device)
model.to(device)
model.eval()

Classify the images of the dataset and calculate the accuracy of the classification (if true labels are available).

In [None]:
running_corrects = 0

for inputs, labels in test_loader:
  inputs = inputs.to(device)
  labels = labels.to(device) # This line loads the true labels to later calculate the accuracy of the model

  with torch.no_grad():

     outputs  = model(inputs)
     _, preds = torch.max(outputs, 1) # Get model predictions
    

  running_corrects += torch.sum(preds == labels.data) # Compare model predictions with true labels. This and the following step should be skipped if true labels are not known



test_acc = running_corrects.double() / len(test_loader.dataset) # Calculate model prediction