<h1> Downscaling Landcover data using Cloud ML </h1>

In this notebook, we train a TensorFlow model to fit Landsat 8 bands to a low-resolution landcover map. Then, we use that model on the high-resolution Landsat data to create a high-resolution landcover map. In essence, we are using TensorFlow to "statistically downscale" the landcover data (note that the term "downscaling" is counterintuitive -- downscaling an image increases its resolution or upsamples it).

<br/>
<h3> Set up </h3>

As a first step, we install a Python package capable of reading GDAL files.

In [1]:
# Note: after running this cell, you need to Reset Session in Datalab to pick up the new package
# You may need to change this to "!sudo apt-get" if you get permission problems.
!apt-get -y install python-gdal

Reading package lists... Done
Building dependency tree       
Reading state information... Done
python-gdal is already the newest version.
0 upgraded, 0 newly installed, 0 to remove and 0 not upgraded.


The GDAL library won't be able to read directly from Cloud Storage, so let's download the data to disk. Note that if you started Datalab on a VM without enough space, this might fail.  If you launched Datalab using the codelab instructors, change instance_details.sh to have a disk of at least 20 GB. 

In [3]:
%bash
if [ ! -d input_data ]; then
   mkdir input_data
   for band in `seq 1 7`; do
     gsutil cp gs://mdh-test/landsat-ml/landsat8-b$band.tif input_data
   done
   gsutil cp gs://mdh-test/landsat-ml/srtm-elevation.tif input_data
   gsutil cp gs://mdh-test/landsat-ml/mcd12-labels.tif input_data
fi

<h3> Preprocessing </h3>

Preprocessing in Cloud ML is done using Cloud Dataflow.  We'll need to read the Geotiffs and then merge them in such a way that all the data corresponding to a pixel becomes a single TFRecord.

In [1]:
import osgeo.gdal as gdal
import struct
import numpy as np

def read_geotiff(filename):
    """
    return 1-D array of pixel values
    """
    ds = gdal.Open( filename, gdal.GA_ReadOnly )
    band = ds.GetRasterBand(1)
    ncols = band.XSize
    nrows = band.YSize
    print "Reading ", nrows, "x", ncols, " image from ", filename
    packformat = 'f' * ncols
    result = np.zeros([nrows*ncols], dtype=np.float32)  # 1-D array of pixel values
    for line in xrange(0, nrows):
      line_data = struct.unpack(packformat, band.ReadRaster(0, line, ncols, 1, ncols, 1, gdal.GDT_Float32))
      result[line*ncols:(line+1)*ncols] = line_data
    ds = None
    return result

In [2]:
b1 = read_geotiff('input_data/landsat8-b1.tif')
b2 = read_geotiff('input_data/landsat8-b2.tif')
lc = read_geotiff('input_data/mcd12-labels.tif')
print b1[200], b2[200], lc[200]

Reading  16384 x 16384  image from  input_data/landsat8-b1.tif
Reading  16384 x 16384  image from  input_data/landsat8-b2.tif
Reading  16384 x 16384  image from  input_data/mcd12-labels.tif
0.100187 0.081537 1.0


In [3]:
all_data = np.stack([b1, b2, lc], axis=-1)
print all_data[200,:]
np.random.shuffle(all_data)
print all_data[200,:]

array([  0.13659808,   0.13656577,  10.        ], dtype=float32)

In [5]:
tot_samples = len(all_data)
num_train = int(tot_samples * 0.7)
print num_train
train_data = all_data[:num_train,:]
eval_data  = all_data[num_train:,:]

187904819


In [6]:
import google.cloud.ml as ml
import tensorflow as tf
print tf.__version__
print ml.sdk_location
import google.cloud.ml.features as features

#import google.cloud.ml as ml
#print ml.sdk_location

class LandcoverFeatures(object):
  columns = ('b1', 'b2', 'landcover')
  target = features.target('landcover').discrete()  # classification problem
  inputs = [
      features.numeric('b1').identity(),
      features.numeric('b2').identity(),
      #features.numeric('b3').identity(),
      #features.numeric('b4').identity(),
      #features.numeric('b5').identity(),
      #features.numeric('b6').identity(),
      #features.numeric('b7').identity(),
      #features.numeric('el').identity(),  # elevation
  ]

0.11.0rc0
gs://cloud-ml/sdk/cloudml-0.1.6-alpha.dataflow.tar.gz


In [None]:
import apache_beam as beam
import google.cloud.ml as ml
import google.cloud.ml.io as io
import os

# defines
feature_set = LandcoverFeatures()
OUTPUT_DIR = './preproc'
pipeline = beam.Pipeline('DirectPipelineRunner')

# preprocessing
train = pipeline | beam.Create(train_data)
eval = pipeline | beam.Create(eval_data)
(metadata, train_features, eval_features) = ((train, eval) |
   'Preprocess' >> ml.Preprocess(feature_set))

(metadata
   | 'SaveMetadata'
   >> io.SaveMetadata(os.path.join(OUTPUT_DIR, 'metadata.yaml')))
(train_features
   | 'WriteTraining'
   >> io.SaveFeatures(os.path.join(OUTPUT_DIR, 'features_train')))
(eval_features
   | 'WriteEval'
   >> io.SaveFeatures(os.path.join(OUTPUT_DIR, 'features_eval')))

# run pipeline
pipeline.run()