# Large-Scale Building Detection

We would like to find all of the buildings in a 20,000km2 region over Nigeria and Cameroon. To do so we chip out training and target AOIs from a mosaic of images spanning the region of interest using the [chip-from-vrt](https://github.com/PlatformStories/chip-from-vrt) task. Next we train a model to classify chips as 'Buildings' or 'No Buildings' using [train-cnn-chip-classifier](https://github.com/PlatformStories/train-cnn-chip-classifier), and deploy the model over the entire mosaic using [deploy-cnn-chip-classifier](https://github.com/PlatformStories/deploy-cnn-chip-classifier).

Create a GBDX interface using [gbdxtools](https://github.com/digitalglobe/gbdxtools). You need your credentials to do this; you can find them under your profile on gbdx.geobigdata.io.

In [7]:
import os, uuid
from os.path import join
os.environ['GBDX_USERNAME'] = 
os.environ['GBDX_PASSWORD'] = 
os.environ['GBDX_CLIENT_ID'] = 
os.environ['GBDX_CLIENT_SECRET'] = 

import gbdxtools
gbdx = gbdxtools.Interface()

Specify the AWS credentials associated with this session

In [8]:
session_info = gbdx.s3.info

os.environ['AWS_ACCESS_KEY_ID'] = session_info['S3_access_key']
os.environ['AWS_SECRET_KEY'] = session_info['S3_secret_key']
os.environ['AWS_SESSION_TOKEN'] = session_info['S3_session_token']

Define the location of input files and where to save outputs.

In [9]:
input_location = 's3://gbd-customer-data/58600248-2927-4523-b44b-5fec3d278c09/platform-stories/building-detection-large-scale'

# Generate output location
random_str = str(uuid.uuid4())
output_location = join('platform-stories/trial-runs', random_str)

Chip training data from the imagery using chip-from-vrt

In [10]:
train_chip = gbdx.Task('chip-from-vrt')
train_chip.inputs.geojson = join(input_location, 'train-geojson')
train_chip.inputs.images = join(input_location, 'mosaic')
train_chip.inputs.mosaic = 'True'
train_chip.inputs.aws_access_key = os.environ['AWS_ACCESS_KEY_ID']
train_chip.inputs.aws_secret_key = os.environ['AWS_SECRET_KEY']
train_chip.inputs.aws_session_token = os.environ['AWS_SESSION_TOKEN']

Train the model using the output of the train_chip task

In [11]:
train = gbdx.Task('train-cnn-chip-classifier')
train.inputs.chips = train_chip.outputs.chips.value
train.inputs.max_side_dim = '260'
train.inputs.resize_dim = '150'
train.inputs.classes = 'No Buildings, Buildings'
train.inputs.train_size = '5000'
train.inputs.nb_epoch = '50'
train.inputs.test = 'False'

Create a workflow to chip out labeled training data and train a model.

In [12]:
train_wf = gbdx.Workflow([train_chip, train])

# Save workflow outputs
train_wf.savedata(train_chip.outputs.chips, join(output_location, 'train-chips'))
train_wf.savedata(train.outputs.trained_model, join(output_location, 'trained-model'))

# Execute the workflow
train_wf.execute()

u'4542571739720742321'

While the model is training we can chip all target AOIs from the mosaic. There are about 1,200,000 chips to extract, so we do this part in parallel to speed up the process. Each chip task will produce 100,000 target chips. Each target geojson is in a numbered directory in the input location, so we will loop through all 13 and create chip-from-vrt tasks.

In [14]:
target_chip_tasks = []

# Create tasks for each target geojson
for i in range(1,14):
    target_task = gbdx.Task('chip-from-vrt')
    target_task.inputs.geojson = join(input_location, 'target-geojsons', str(i))
    target_task.inputs.images = join(input_location, 'mosaic')
    target_task.inputs.mosaic = 'True'
    target_task.inputs.aws_access_key = os.environ['AWS_ACCESS_KEY_ID']
    target_task.inputs.aws_secret_key = os.environ['AWS_SECRET_KEY']
    target_task.inputs.aws_session_token = os.environ['AWS_SESSION_TOKEN']
    target_task.domain = 'raid'
    target_chip_tasks.append(target_task)

Create and execute workflows for each target chip task

In [15]:
target_chip_wfs = []

for i in range(len(target_chip_tasks)):
    wf = gbdx.Workflow([target_chip_tasks[i]])
    wf.savedata(target_chip_tasks[i].outputs.chips, join(output_location, 'target-chips', str(i+1)))
    target_chip_wfs.append(wf)
    
# Execute all workflows
for i in target_chip_wfs:
    i.execute()

Once all workflows have completed we can deploy our model on the entire mosaic using the outputs of target_chip_wfs and train_wf. For the purpose of this demo we provide a sample trained model and target chips in the following locations.

In [34]:
dep_tasks, dep_wfs = [],[]

# Loop thorugh all batches of target chips
for i in range(1,2):
    dep_task = gbdx.Task('deploy-cnn-chip-classifier')
    dep_task.inputs.model = join(input_location, 'trained-model')
    dep_task.inputs.max_side_dim = '260'
    dep_task.inputs.chips = join(input_location, 'target-chips', str(i))
    dep_tasks.append(dep_task)
    
# Create a workflow for each task
for i in range(len(dep_tasks)):
    wf = gbdx.Workflow([dep_tasks[i]])
    wf.savedata(dep_tasks[i].outputs.classified_geojson, join(output_location, 'deploy-results', str(i+1)))
    dep_wfs.append(wf)
    
# Execute each workflow
for i in dep_wfs:
    i.execute()

Once complete you can combine the outputs of the deploy workflows to get a final classified geojson. Below is a sample output overlaid on the original imagery.

In [39]:
from IPython.display import IFrame
IFrame('clip-grid-buildings-5000.html', width=1600, height=800)