We want to use GBDX to obtain a bird's eye view of the degree of completeness of [OpenStreetMap (OSM)](https://www.openstreetmap.org) building footprints in different cities around the world. The goal is to use this information to intelligently direct mappers where they are most needed.

The idea is to run our unsupervised Land Use Land Cover (LULC) classification algorithm over these regions in order to identify built-up areas, then overlay the results of the algorithm with existing OSM building footprints and compare.

Create a GBDX interface using gbdxtools. You need your credentials to do this; you can find them under your profile on [gbdx.geobigdata.io](https://gbdx.geobigdata.io/login).

In [None]:
import os
os.environ['GBDX_USERNAME'] = 
os.environ['GBDX_PASSWORD'] = 
os.environ['GBDX_CLIENT_ID'] =  
os.environ['GBDX_CLIENT_SECRET'] = 

import gbdxtools
gbdx = gbdxtools.Interface()

First, let's see an example of how our idea would work over a small urban area. 
We've picked a small WV02 multispectral chip from [here](https://github.com/platformstories/chips); keep in mind the LULC algorithm requires atmospherically compensated 8-band imagery. 
We've also picked the corresponding pansharpened chip for visualization purposes. 

In [None]:
ms = 'urban_ms.tif'
ps = 'urban_ps.tif'

from matplotlib import pyplot as plt
import matplotlib.image as mi
%matplotlib inline
import gdal
import numpy as np

def plot(plt, chip, band=None, color_map = 'Greys_r'):
    "Generic plotting function."
    if color_map == 'rgb':
        img = mi.imread(chip)
        plt.imshow(img)
    else:    
        sample = gdal.Open(chip)
        img = sample.ReadAsArray()
        if band is not None: 
            img = img[band]
        plt.imshow(img, cmap=color_map)

# plot the pansharpened image and near-infrared-1 (NIR1) band in pseudocolor from the multispectral image
plt.figure(figsize=(20, 10))
plt.subplot(121)
plot(plt, ps, color_map='rgb')
plt.subplot(122)
plot(plt, ms, band=2, color_map='hot')

We can extract an approximate built-up mask using the Protogen unsupervised LULC algorithm which classifies each pixel as water, vegetation, clouds, soil, shadows and unclassified. By exclusion, the last category approximately corresponds to materials like stone, cement and metal, which are used in buildings and roads. The following code produces the LULC classification and the unclassified mask for our example chip.

In [None]:
import protogen

# create protogen interface object that derives lulc
p1 = protogen.Interface('lulc','layers')
p1.lulc.layers.name = 'lulc'
p1.lulc.layers.visualization = 'rgb'

# configure input
p1.image = ms
p1.image_config.bands = [1, 2, 3, 4, 5, 6, 7, 8]

# execute
p1.execute()

# create
p2 = protogen.Interface('lulc', 'masks')
p2.lulc.masks.type = 'single'

# mask settings
p2.lulc.masks.switch_unclassified = True      
p2.lulc.masks.switch_water = False   
p2.lulc.masks.switch_vegetation = False 
p2.lulc.masks.switch_clouds = False 
p2.lulc.masks.switch_bare_soil = False 
p2.lulc.masks.switch_shadows = False 
p2.lulc.masks.switch_no_data = False 

# configure input
p2.image = ms
p2.image_config.bands = [1, 2, 3, 4, 5, 6, 7, 8]

p2.execute()

plt.figure(figsize=(20, 10))
plt.subplot(131)
plot(plt, 'urban_ps.tif', color_map='rgb')
plt.subplot(132)
plot(plt, p1.output, color_map='rgb')
plt.subplot(133)
plot(plt, p2.output)

From left to right we have the pansharpened chip, the LULC classification output (with green, brown, blue and grey indicating vegetation, soil, water and unclassified, respectively) and the unclassified mask. The mask highlights the built-up area quite nicely.

How many OSM buildings are in this chip? First we need to get its bounding box in lat, long coordinates.

In [None]:
import rasterio

sample = rasterio.open(ms)

# projection info
print sample.crs

# bounding box
W, S, E, N = sample.bounds
bbox = W, S, E, N

print bbox

We can use this bounding box to query the [OSM Overpass API](http://wiki.openstreetmap.org/wiki/Overpass_API) for buildings.

In [None]:
import requests 
import subprocess

with open('buildings.osm', 'w') as f:
    r = requests.get(url='http://www.overpass-api.de/api/xapi_meta?*[building=yes][bbox=' + ','.join(map(str, bbox)) + ']')
    f.write(r.text.encode('ascii','ignore'))
    
convert = "ogr2ogr -f GeoJSON buildings.geojson buildings.osm multipolygons"
proc = subprocess.Popen([convert], shell=True, stdout=subprocess.PIPE, stderr=subprocess.PIPE)
out, err = proc.communicate()   

Then plot them on a map:

In [None]:
import json
from ipyleaflet import Map, GeoJSON
from shapely.geometry import shape

with open('buildings.geojson') as f:
    data = json.load(f)

for feature in data['features']:
    feature['properties']['style'] = {'fillOpacity':0.2}

g = GeoJSON(data=data)
    
m = Map(center=[(N+S)/2, (W+E)/2], zoom=16, height = '650px')

m.add_layer(g)    
    
# launch map    
m

Two things are clear. This chip is somewhere in Japan (Osaka!) and that quite a few buildings are missing from OSM.

Our simple experiment indicates that using the LULC algorithm can indeed point us to unmapped buildings in OSM. We will test our idea on a number of cities around the world. The following images were picked at [discover.digitalglobe.com](https://discover.digitalglobe.com/).

In [None]:
# catalog id and center for each city

cat_ids = {'nyc':         '104001001DB7BA00',
           'houston':     '104001001838A000',
           'la':          '104001001EBB4400', 
           'montreal':    '1040010023BEFD00', 
           'athens':      '104001001B6E1400', 
           'madrid':      '1040010019852500', 
           'nairobi':     '103001005C7E5400', 
           'amman':       '103001003E6FFC00', 
           'santiago':    '1040010029467C00',
           'bangkok':     '1030010063748E00',
           'cairo':       '1030010063AFF100',
           'osaka':       '10300100643CAC00',
           'buenosaires': '103001006414E800',
           'shanghai':    '1030010049993B00',
           'tehran':      '103001005ED0D000',
           'asuncion':    '103001005A8A6400',
           'ulaanbaatar': '103001005F575800',
           'perth':       '104001001D365400' }

centers = {'nyc':         [40.71, -74.01],
           'houston':     [29.76, 95.37],
           'la':          [34.05, -118.24],
           'montreal':    [45.50, -73.56],
           'athens':      [37.98, 23.73],
           'madrid':      [40.42, -3.70],
           'nairobi':     [-1.29, 36.82],
           'amman':       [31.95, 35.93],
           'santiago':    [33.45, -70.67],
           'bangkok':     [13.76, 100.50],
           'cairo':       [30.04, 31.24],
           'osaka':       [34.69, 135.50],
           'buenosaires': [-34.60, -58.38],
           'shanghai':    [31.23, 121.47],      
           'tehran':      [35.69, 51.39],
           'asuncion':    [-25.26, 57.58],
           'ulaanbaatar': [47.89, 106.91],
           'perth':       [-31.95, 115.86]}

cities = cat_ids.keys()

For each catalog id, we run a GBDX workflow which orders the raw image from the DG factory and then produces an orthorectified, atmospherically compensated 8-band image using the [AOP_Strip_Processor](http://gbdxdocs.digitalglobe.com/docs/advanced-image-preprocessor). Once the workflows complete, the images are stored under platform-stories/osm-lulc/images. This is a time-consuming step which you can skip as we've already run it for you.

In [None]:
output_location = 'platform-stories/osm-lulc/images'   # where to save the images
wf_ids = {}

from os.path import join

for city in cities:
    
    # create order task
    # it the images are not on GBDX, this task will order them from the DG factory
    order = gbdx.Task('Auto_Ordering')
    order.inputs.cat_id = cat_ids[city]
    order.impersonation_allowed = True

    # run orthorectification and acomp
    aop_ms = gbdx.Task('AOP_Strip_Processor')
    aop_ms.inputs.data = order.outputs.s3_location.value
    aop_ms.inputs.bands = 'MS'
    aop_ms.inputs.enable_acomp = True
    aop_ms.inputs.enable_pansharpen = False
    aop_ms.inputs.enable_dra = False

    # define preprocessing workflow
    wf = gbdx.Workflow([order, aop_ms])

    # set output location 
    wf.savedata(aop_ms.outputs.data, join(output_location, city))

    # execute
    wf_ids[city] = wf.execute()
    

In [None]:
# check status of workflows    
for city in cities:
    wf = gbdx.Workflow([])
    wf.id = wf_ids[city]
    print city, wf.id, wf.status

We can inspect each of these images in full resolution and draw bounding boxes of the areas in which we are interested.

In [None]:
from ipyleaflet import Map, TileLayer, DrawControl
from shapely.geometry import shape
import sys

cat_id = cat_ids['asuncion']

# get tms url and bounding box for each idaho image corresponding to this catid
urls, bboxes = gbdx.idaho.get_tms_layers(cat_id)

# center the map based on idaho image bounds
center = [sum(x)/len(x) for x in zip(*[((N+S)/2.0, (W+E)/2.0) for (W,S,E,N) in bboxes])]

m = Map(center=center, zoom=10, height='800px')

# add idaho images
for url in urls:
    m.add_layer(TileLayer(url=url))

# enable rectangle draw
dc = DrawControl(polygon={'shapeOptions': {'color': '#00f5FF'}}, polyline={})
def handle_draw(self, action, geo_json):
    geom = shape(geo_json['geometry'])
    print 'W, S, E, N = %s\n' % (str(geom.bounds))    
dc.on_draw(handle_draw)
m.add_control(dc)    
    
# launch map    
m

Here are some sample bounding boxes which we've drawn arbitrarily over each city.

In [None]:
bboxes = {'nyc': (-73.95632922649384, 40.67078428278468, -73.89315783977509, 40.74625719581601),
          'houston': (-95.13090491294861, 29.719047995031016, -94.99769568443298, 29.901953768885594),
          'la': (-118.32255542278288, 34.15873756789489, -118.17252337932585, 34.28251185970165),
          'montreal': (-73.68750751018524, 45.62199932750296, -73.52236926555634, 45.755584499516516),
          'athens': (23.63831341266632, 38.01784624197521, 23.791778683662415, 38.11515296145808),
          'madrid': (-3.7636488676071167, 40.451429341017324, -3.6263197660446167, 40.5083584870881),
          'nairobi': (36.73872381448745, -1.3279983315316601, 36.88343435525894, -1.242017618358388),
          'amman': (35.79846203327179, 31.888389626458963, 35.97424328327179, 32.043632232381455),
          'santiago': (-70.74042499065399, -33.45030473839097, -70.58936297893524, -33.3886947889702),
          'bangkok': (100.44030010700226, 13.575965977708762, 100.5965119600296, 13.671059968841831),
          'cairo': (31.244285702705383, 30.045368026249772, 31.349342465400696, 30.11874679593054),
          'osaka': (135.4063493013382, 34.598781325037756, 135.48188030719757, 34.712028818736535),
          'buenosaires': (-58.50621789693832, -34.67611297581835, -58.39000314474106, -34.596736825050044),
          'shanghai': (121.50280773639678, 31.13559214758519, 121.60855114459991, 31.23486673223854),
          'tehran': (51.29124462604522, 35.712275224230865, 51.481788754463196, 35.8234243437396),
          'perth': (115.77790081501009, -31.958215741514543, 115.92449963092803, -31.80399586706599),
          'ulaanbaatar': (106.72447979450226, 47.8784119735556, 106.93768322467804, 47.93939724053319),
          'asuncion': (-57.67184436321258, -25.389250469786013, -57.524215579032905, -25.26729487165973)  
         }

We are now ready to deploy on GBDX. The following GBDX workflow does a few things:
- Creates the built-up mask for each area of interest as a black-and-white tif image. This is done by passing the protogen Interface object we used in our little experiment as a string to the task [protogen-runner](https://github.com/PlatformStories/protogen-runner).
- Downloads the OSM building footprints within the area of interest. We've created the task [download-osm-buildings](https://github.com/PlatformStories/download-osm-buildings) for this purpose, which outputs a geojson file with the building footprints. 
- Uploads the mask and the geojson to mapbox for visualization. We've created the task [upload-to-mapbox](https://github.com/PlatformStories/download-osm-buildings) for this purpose.

In [None]:
import pickle
import uuid
from os.path import join

wwf_ids = {}
random_str = str(uuid.uuid4())

# mapbox upload token
mapbox_token = 'PUT VALID TOKEN HERE'

# execute a workflow for each city
for city in cities:    
    
    # configure extent of mask
    bbox = bboxes[city]
    W, S, E, N = bbox
    p2.image_config.input_latlong_rectangle = [W, N, E, S]

    # pass the protogen object to the protogen-runner task     
    lulc_mask = gbdx.Task('protogen-runner')
    lulc_mask.inputs.pickle = pickle.dumps(p2)
    
    # set the image input to the s3 location of the corresponding strip 
    lulc_mask.inputs.image = join('s3://gbd-customer-data/32cbab7a-4307-40c8-bb31-e2de32f940c2/platform-stories/osm-lulc/images', city)
    
    # download osm building footprints in bounding box
    dob = gbdx.Task('download-osm-buildings')
    dob.inputs.bbox = ','.join(map(str, bbox))

    # upload results to mapbox
    utom_mask = gbdx.Task('upload-to-mapbox')
    utom_mask.inputs.input = lulc_mask.outputs.output.value
    utom_mask.inputs.tileset_name = 'osm-lulc-ras-' + city 
    utom_mask.inputs.token = mapbox_token    
    utom_footprints = gbdx.Task('upload-to-mapbox')
    utom_footprints.inputs.input = dob.outputs.geojson.value
    utom_footprints.inputs.tileset_name = 'osm-lulc-vec-' + city 
    utom_footprints.inputs.token = mapbox_token 
    
    # execute the workflow and save data on s3
    wf = gbdx.Workflow([lulc_mask, dob, utom_mask, utom_footprints])
    output_location = join('platform-stories/trial-runs', random_str, city)
    wf.savedata(lulc_mask.outputs.output, join(output_location, 'mask'))
    wf.savedata(dob.outputs.geojson, join(output_location, 'geojson'))
    wwf_ids[city] = wf.execute()
    
print join('platform-stories/trial-runs', random_str) 

In [None]:
# check status of workflows
for city in cities:
    wf = gbdx.Workflow([])
    wf.id = wwf_ids[city]
    print city, wf.id, wf.status

You can check if your raster and vector tilesets have been successfully uploaded to mapbox at [https://www.mapbox.com/studio/tilesets/](https://www.mapbox.com/studio/tilesets/). We've created the following demo html page which references the tilesets. For each city, the built-up mask is shown in black-and-white and the buildings that we retrieved from the OSM Overpass API are shown in green.

In [None]:
from IPython.display import IFrame
IFrame('http://gbdxstories.digitalglobe.com/pages/osm-lulc/cities.html', width=1600, height=800)