<h1> Downscaling Landcover data using Cloud ML </h1>

In this notebook, we train a TensorFlow model to fit Landsat 8 bands to a low-resolution landcover map. Then, we use that model on the high-resolution Landsat data to create a high-resolution landcover map. In essence, we are using TensorFlow to "statistically downscale" the landcover data (note that the term "downscaling" is counterintuitive -- downscaling an image increases its resolution or upsamples it).

<br/>
<h3> Set up </h3>

As a first step, we install a Python package capable of reading GDAL files.

In [1]:
# Note: after running this cell, you need to Reset Session in Datalab to pick up the new package
# You may need to change this to "!sudo apt-get" if you get permission problems.
!apt-get -y install python-gdal

Reading package lists... Done
Building dependency tree       
Reading state information... Done
python-gdal is already the newest version.
0 upgraded, 0 newly installed, 0 to remove and 0 not upgraded.


The GDAL library won't be able to read directly from Cloud Storage, so let's download the data to disk. If this fails because of space problems, relaunch Datalab after changing instance_details.sh to have a disk of at least 20 GB. 

In [3]:
%bash
if [ ! -d input_data ]; then
   mkdir input_data
   for band in `seq 1 7`; do
     gsutil cp gs://mdh-test/landsat-ml/landsat8-b$band.tif input_data
   done
   gsutil cp gs://mdh-test/landsat-ml/srtm-elevation.tif input_data
   gsutil cp gs://mdh-test/landsat-ml/mcd12-labels.tif input_data
fi

<h3> Preprocessing </h3>

Preprocessing in Cloud ML is done using Cloud Dataflow.  We'll need to read the Geotiffs and then merge them in such a way that all the data corresponding to a pixel becomes a single TFRecord.

In [None]:
import osgeo.gdal as gdal
import struct
import numpy as np
import google.cloud.ml as ml
import tensorflow as tf
import apache_beam as beam
import google.cloud.ml as ml
import google.cloud.ml.io as io
import google.cloud.ml.features as features
import os

print tf.__version__
print ml.sdk_location


def get_pixel_values(filenames, featnames):
    """
    generator function that returns pixel values from all the bands as a dictionary
    """
    ds = [gdal.Open( filename, gdal.GA_ReadOnly)  for filename in filenames]
    bands = [ds1.GetRasterBand(1) for ds1 in ds] 
    ncols = bands[0].XSize
    nrows = bands[0].YSize
    print "Reading ", nrows, "x", ncols, " images from ", filenames

    packformat = 'f' * ncols
    for line in xrange(0, nrows):
      line_data = [struct.unpack(packformat, band.ReadRaster(0, line, ncols, 1, ncols, 1, gdal.GDT_Float32)) for band in  bands]
      for col in xrange(0, ncols):
        result = {}
        for f in xrange(0, len(featnames)):
          result[featnames[f]] = line_data[f][col]
        print result
        yield result
        
class LandcoverFeatures(object):
  columns = ('b1', 'b2', 'landcover')
  target = features.target('landcover').discrete()  # classification problem
  filenames = ['input_data/landsat8-b1.tif', 'input_data/landsat8-b2.tif', 'input_data/mcd12-labels.tif']
  inputs = [
      features.numeric('b1').identity(),
      features.numeric('b2').identity(),
      #features.numeric('b3').identity(),
      #features.numeric('b4').identity(),
      #features.numeric('b5').identity(),
      #features.numeric('b6').identity(),
      #features.numeric('b7').identity(),
      #features.numeric('el').identity(),  # elevation
  ]
  


# defines
feature_set = LandcoverFeatures()
OUTPUT_DIR = './preproc'
pipeline = beam.Pipeline('DirectPipelineRunner')

# preprocessing
train = pipeline | beam.Create(get_pixel_values(feature_set.filenames, feature_set.columns))
(metadata, train_features) = ((train) |
   'Preprocess' >> ml.Preprocess(feature_set))

(metadata
   | 'SaveMetadata'
   >> io.SaveMetadata(os.path.join(OUTPUT_DIR, 'metadata.yaml')))
(train_features
   | 'WriteTraining'
   >> io.SaveFeatures(os.path.join(OUTPUT_DIR, 'features_train')))

# run pipeline
pipeline.run()

0.11.0rc0
gs://cloud-ml/sdk/cloudml-0.1.6-alpha.dataflow.tar.gz
Reading  16384 x 16384  images from  ['input_data/landsat8-b1.tif', 'input_data/landsat8-b2.tif', 'input_data/mcd12-labels.tif']
