The Australian company PSMA teamed up with DG to develop the product Geoscape: a diverse set of building attributes including height, rooftop material, solar panel installation and presence of a swimming pool in the property across the entire Australian continent. We used deep learning on GBDX to identify swimming pools in thousands of properties across Adelaide, a major city on the southern coast of Australia with a population of approximately one million. The full story is [here](http://gbdxstories.digitalglobe.com/swimming-pools/).

In [None]:
# Specify your credentials and create a gbdx interface

import os
os.environ['GBDX_USERNAME'] = ''
os.environ['GBDX_PASSWORD'] = ''
os.environ['GBDX_CLIENT_ID'] = '' 
os.environ['GBDX_CLIENT_SECRET'] = ''

import gbdxtools
gbdx = gbdxtools.Interface()

Specify location of input files.

In [2]:
input_location = 's3://gbd-customer-data/32cbab7a-4307-40c8-bb31-e2de32f940c2/platform-stories/swimming-pools'

Create a train_task object and set its input parameters.

In [None]:
from os.path import join

train_task = gbdx.Task('train-cnn-classifier')
train_task.inputs.images = join(input_location, 'images')
train_task.inputs.geojson = join(input_location, 'train-geojson')
train_task.inputs.classes = 'No swimming pool, Swimming pool'     # Classes exactly as they appear in train.geojson

In training our model, we can set optional hyper-parameters. See the [docs](https://github.com/PlatformStories/train-cnn-classifier) for detailed information. Training should take around 3 hours to complete.

In [3]:
train_task.inputs.nb_epoch = '30'
train_task.inputs.nb_epoch_2 = '5'
train_task.inputs.train_size = '4500'
train_task.inputs.train_size_2 = '2500'
train_task.inputs.test_size = '1000'
train_task.inputs.bit_depth = '8'         # Provided imagery is dra'd

Create a deploy_task object with the required inputs, and set the model input as the output of train_task.

In [None]:
deploy_task = gbdx.Task('deploy-cnn-classifier')
deploy_task.inputs.model = train_task.outputs.trained_model.value     # Trained model from train_task
deploy_task.inputs.images = join(input_location, 'images')
deploy_task.inputs.geojson = join(input_location, 'target-geojson')

Specify the classes for the deploy task. We can also restrict the size of polygons that we deploy on and set the appropriate bit depth for the input imagery.

In [4]:
deploy_task.inputs.classes = 'No swimming pool, Swimming pool'
deploy_task.inputs.bit_depth = '8'
deploy_task.inputs.min_side_dim = '10'   

String the two tasks together in a workflow and save the output in the specified directory.

In [5]:
# String the tasks in a workflow
wf = gbdx.Workflow([train_task, deploy_task])

# Set output location to platform-stories/trial-runs/swimming-pools within your bucket/prefix
output_location = 'platform-stories/trial-runs/swimming-pools'

# save workflow outputs
wf.savedata(train_task.outputs.trained_model, join(output_location, 'trained-model'))
wf.savedata(deploy_task.outputs.classified_geojson, join(output_location, 'classified-geojson'))

Execute!

In [6]:
wf.execute()

u'4720896975705607770'

Depending on the hyper-parameters set on the model, training sizes, and size of the deploy file, this workflow can take several hours to run. You may check on the status periodically as follows.

In [27]:
wf.status

{u'event': u'succeeded', u'state': u'complete'}

A detailed description of the status of each task can be obtained as follows.

In [None]:
wf.events

Download outputs.

In [None]:
gbdx.s3.download(join(output_location, 'trained-model/model_architecture.json'), 'trained-model/')
gbdx.s3.download(join(output_location, 'trained-model/model_weights.h5'), 'trained-model/')
gbdx.s3.download(join(output_location, 'trained-model/test_report.txt'), 'trained-model/')
gbdx.s3.download(join(output_location, 'classified-geojson'), 'classified-geojson')

Put the classified properties on the map.

In [31]:
# Create slippy map
from ipyleaflet import Map, TileLayer, GeoJSON
import json

m = Map(center=[-35.28, 138.46], zoom=15)

# This is the Mapbox TMS URL
mapbox_token = 'pk.eyJ1IjoicGxhdGZvcm1zdG9yaWVzIiwiYSI6ImNpeTZkeWlvOTAwNm0yeHFocHFyaGFleDcifQ.wOsbVsS0NXKrWeX2bQwc-g'
url = 'https://a.tiles.mapbox.com/v4/platformstories.swimming-pools-adelaide/{z}/{x}/{y}.png?access_token=' + mapbox_token

# add raster layer
m.add_layer(TileLayer(url=url))
  
with open('classified-geojson/classified.geojson') as f:
    data = json.load(f)
        
n_properties = len(data['features'])
print 'There are ' + str(n_properties) + ' classified properties'

# Assign color based on classification
for feature in data['features']:
    if feature['properties']['CNN_class'] == 'Swimming pool':
        c = 'green'
    else:
        c = 'red'
    feature['properties']['style'] = {'color':c, 'fillOpacity':0}

g = GeoJSON(data=data)

# add vector layer
m.add_layer(g)

m        

There are 4997 classified properties
