<h1> Creating high-resolution Landcover data using Machine Learning </h1>

In this notebook, we train a TensorFlow model to fit Landsat 8 bands to a low-resolution landcover map. Then, we use that model on the high-resolution Landsat data to create a high-resolution landcover map. In essence, we are using TensorFlow to <a href="https://gisclimatechange.ucar.edu/question/63">statistically downscale</a> the landcover data (note that the term "downscaling" is counterintuitive -- downscaling an image increases its resolution or upsamples it).

<div id="toc"></div>

<h2> Workflow </h2>
We will read corresponding pixels out of a set of mosaiced-and-cloud-corrected Landsat GeoTiff images and correlate them with a low-resolution landcover map that has been upsampled to match the Landsat imagery.  This dataset of pixel values is what is used in training.  In prediction, we take the same set of Landsat images and use the trained model to come up with a high-resolution landcover map.

This is the basic workflow:
<img src="landcover_features.png" style='width: 100%;' />

In [54]:
%%javascript
$.getScript('https://kmahelona.github.io/ipython_notebook_goodies/ipython_notebook_toc.js')

<IPython.core.display.Javascript object>

<h2> Preprocessing using Cloud Dataflow </h2>

Cloud Dataflow can scale up and simplify preprocessing in Cloud ML.  We'll need to read the Geotiffs and then merge them in such a way that all the data corresponding to a pixel becomes a single TFRecord. We'll scale the pixel values to lie in the range [-1,1]. If you do this sort of thing naively, you'll run out of memory or burn through your wallet -- the total size of the images alone is 25 GB.

In [57]:
# a Python generator that packs all the training data line-by-line
def get_next_line(SMALL_SAMPLE):
  '''
      return (lineno, linedata, featnames)
      where linedata is a 2D array with first dimension being feature# and second dimension column in image 
  '''  
  import osgeo.gdal as gdal
  import struct
  import os
  import subprocess
  
  # The gdal library can not read from CloudStorage, so this class downloads the data to local VM
  class LandsatReader():
   def __init__(self, gsfile, destdir='./'):
      self.gsfile = gsfile
      self.dest = os.path.join(destdir, os.path.basename(self.gsfile))
      if os.path.exists(self.dest):
        print 'Using already existing {}'.format(self.dest)
      else:
        print 'Getting {0} to {1} '.format(self.gsfile, self.dest)
        ret = subprocess.check_call(['gsutil', 'cp', self.gsfile, self.dest])
      self.dataset = gdal.Open( self.dest, gdal.GA_ReadOnly )
   def __exit__(self, exc_type=None, exc_val=None, exc_tb=None):
      os.remove( self.dest ) # cleanup  
   def ds(self):
      return self.dataset

  # open all the necessary files
  input_dir = 'gs://mdh-test/landsat-ml/'
  featnames = ['b{}'.format(band) for band in xrange(1,8)] # 8
  filenames = [os.path.join(input_dir, 'landsat8-{}.tif'.format(band)) for band in featnames]
  filenames.append(os.path.join(input_dir, 'srtm-elevation.tif')); featnames.append('elev')
  filenames.append(os.path.join(input_dir, 'mcd12-labels.tif')); featnames.append('landcover')
  readers = [LandsatReader(filename) for filename in filenames]
  bands = [reader.ds().GetRasterBand(1) for reader in readers] 
  print "Opened ", filenames
      
  # read one row of each the images and yield them
  ncols = bands[0].XSize
  nrows = bands[0].YSize
  if SMALL_SAMPLE:
    nrows_to_read = 200
    ncols_to_read = 1000
  else:
    nrows_to_read = nrows
    ncols_to_read = ncols
  print "Reading ", nrows_to_read, "x", ncols_to_read, " from ", nrows, 'x', ncols, ' images corresponding to ', featnames
  packformat = 'f' * ncols
  for line in xrange(0, nrows_to_read):
        line_data = [struct.unpack(packformat, band.ReadRaster(0, line, ncols, 1, ncols, 1, gdal.GDT_Float32)) for band in bands]
        yield (line, (line_data, featnames, ncols_to_read))
      
def get_features_from_line(args):
  '''
      return (1, dict)  or (0, dict)
      where the first number is 1 or 0 depending on whether this row belongs to training (1)
      or eval (0) partition.
      dict is the set of features formed from pixels from all the bands
  ''' 
  # line, [(line_data, featnames, ncols_to_read)] = args
  line = args[0]
  for (line_data, featnames, ncols_to_read) in args[1]:
    if line_data:
       for col in xrange(0, ncols_to_read):
          featdict = {'rowcol': '{},{}'.format(line,col)}
          for f in xrange(0, len(featnames)):
            featdict[featnames[f]] = line_data[f][col]
          featdict['landcover'] = '{}'.format(int(featdict['landcover']+0.5))
          yield ( 0 if (line+col)%3==0 else 1, featdict )    # 1/3 are eval

def get_partition(group_and_featdict, nparts):
  (is_train, featdict) = group_and_featdict
  return is_train # 0 or 1

def get_featdict(group_and_featdict):
  (is_train, featdict) = group_and_featdict
  return featdict

def run_preprocessing(BUCKET=None, PROJECT=None):
  import os
  import numpy as np
  import apache_beam as beam
  import google.cloud.ml as ml
  import google.cloud.ml.io as io
  import google.cloud.ml.features as features

  # small sample locally; full dataset on cloud
  if BUCKET is None or PROJECT is None:
    SMALL_SAMPLE = True
    OUTPUT_DIR = './landcover_preproc'
    RUNNER = 'DirectPipelineRunner'
  else:
    SMALL_SAMPLE = False
    OUTPUT_DIR = 'gs://{0}/landcoverml/preproc'.format(BUCKET)
    RUNNER = 'DataflowPipelineRunner'
  #
  
  pipeline = beam.Pipeline(argv=['--project', PROJECT,
                               '--runner', RUNNER,
                               '--job_name', 'landcover',
                               '--extra_package', ml.sdk_location,
                               '--max_num_workers', '50',
                               '--no_save_main_session', 'True',  # to prevent pickling and uploading Datalab itself!
                               '--setup_file', './preproc/setup.py',  # for gdal installation on the cloud -- see CUSTOM_COMMANDS in setup.py
                               '--staging_location', 'gs://{0}/landcoverml/staging'.format(BUCKET),
                               '--temp_location', 'gs://{0}/landcoverml/temp'.format(BUCKET)])
        
  print ml.sdk_location
  
  (evalg, traing) = (pipeline 
     | beam.Create([SMALL_SAMPLE]) # make the generator function like a source
     | beam.FlatMap(get_next_line) # (line, (line_data, featnames, ncols_to_read))
     | beam.GroupByKey() # line, [(line_data, featnames, ncols_to_read)]
     | beam.FlatMap(get_features_from_line) # (is_train, featdict)
     | beam.Partition(get_partition, 2)
  )  # eval, train both contain (is_train, featdict)
  eval = evalg | 'eval_features' >> beam.Map(get_featdict)
  train = traing | 'train_features' >> beam.Map(get_featdict)
  
  class LandcoverFeatures(object):
    key = features.key('rowcol')
    landcover = features.target('landcover').discrete()  # classification problem
    inputbands = [
      features.numeric('b1').scale(),
      features.numeric('b2').scale(),
      features.numeric('b3').scale(),
      features.numeric('b4').scale(),
      features.numeric('b5').scale(),
      features.numeric('b6').scale(),
      features.numeric('b7').scale(),
      #features.numeric('el').discretize(buckets=[1,5001,50], sparse=True),  # elevation
    ]
  feature_set = LandcoverFeatures()
  (metadata, train_features, eval_features) = ((train, eval) |
   'Preprocess' >> ml.Preprocess(feature_set, input_format='json'))
  (metadata
     | 'SaveMetadata'
     >> io.SaveMetadata(os.path.join(OUTPUT_DIR, 'metadata.yaml')))
  (train_features
     | 'WriteTraining'
     >> io.SaveFeatures(os.path.join(OUTPUT_DIR, 'features_train')))
  (eval_features
     | 'WriteEval'
     >> io.SaveFeatures(os.path.join(OUTPUT_DIR, 'features_eval')))
  pipeline.run()

<h2> Create ML model using TensorFlow </h2>

I cheated here. I simply took the <a href="https://github.com/GoogleCloudPlatform/cloudml-samples/tree/master/iris">Cloud ML sample for Iris classification</a> and copied it into my repo.  The only change I had to make was to three fields, changing:
<pre>
KEY_FEATURE_COLUMN = 'key'
TARGET_FEATURE_COLUMN = 'species'
REAL_VALUED_FEATURE_COLUMNS = 'measurements'
</pre>
to
<pre>
KEY_FEATURE_COLUMN = 'key'
TARGET_FEATURE_COLUMN = 'landcover'
REAL_VALUED_FEATURE_COLUMNS = 'inputbands'
</pre>
Essentially, my new values match what I had in the class LandcoverFeatures during preprocessing (see above).  This is needed because that's what now encoded in the tfrecord files the preprocessing step wrote out.

The model itself is a neural network with 2 hidden layers. The Iris sample uses the tf.learn API. It is a classification network, and the sample does all the saving, exporting, distribution, etc. All my inputs are like the Iris sample in that they are all real-valued columns. Like in the Iris example, the target takes only one value -- a landcover that is brushland can not also be forest. So, I'm relatively safe in reusing the Iris model as-is.  Of course, I should probably do some feature engineering, by calculating normalized differences, for example. But for now, the Iris sample will suffice.

In [2]:
!ls -lR landcover

landcover:
total 8
-rw-r--r-- 1 root root  746 Nov  3 18:28 setup.py
drwxr-xr-x 2 root root 4096 Nov  3 23:07 trainer

landcover/trainer:
total 24
-rw-r--r-- 1 root root  677 Nov  3 21:54 __init__.py
-rw-r--r-- 1 root root 9176 Nov  3 23:07 task.py
-rw-r--r-- 1 root root 5553 Nov  3 22:51 util.py


<h2> Train model locally using Cloud ML </h2>

Let's train the model locally on a subset of the data to ensure that we get things right. Then, we can train on the cloud with all of the data.

In [58]:
# process a small sample (200k points) by running the preprocessing locally
# if your Datalab instance can't handle this, reduce the sample size by changing the preprocessing code
# (look for nrows_read and change it from 200 to perhaps 20)
# alternately, if you followed the codelab instructions to launch Datalab on a GCE, change the machine
# type in instance_details.sh to n1-highmem-2   (should take about 5 minutes on n1-highmem-2)
import shutil
shutil.rmtree('landcover_preproc', ignore_errors=True)
run_preprocessing()

gs://cloud-ml/sdk/cloudml-0.1.6-alpha.dataflow.tar.gz




Using already existing ./landsat8-b1.tif
Using already existing ./landsat8-b2.tif
Using already existing ./landsat8-b3.tif
Using already existing ./landsat8-b4.tif
Using already existing ./landsat8-b5.tif
Using already existing ./landsat8-b6.tif
Using already existing ./landsat8-b7.tif
Using already existing ./srtm-elevation.tif
Using already existing ./mcd12-labels.tif
Opened  ['gs://mdh-test/landsat-ml/landsat8-b1.tif', 'gs://mdh-test/landsat-ml/landsat8-b2.tif', 'gs://mdh-test/landsat-ml/landsat8-b3.tif', 'gs://mdh-test/landsat-ml/landsat8-b4.tif', 'gs://mdh-test/landsat-ml/landsat8-b5.tif', 'gs://mdh-test/landsat-ml/landsat8-b6.tif', 'gs://mdh-test/landsat-ml/landsat8-b7.tif', 'gs://mdh-test/landsat-ml/srtm-elevation.tif', 'gs://mdh-test/landsat-ml/mcd12-labels.tif']
Reading  200 x 1000  from  16384 x 16384  images corresponding to  ['b1', 'b2', 'b3', 'b4', 'b5', 'b6', 'b7', 'elev', 'landcover']


In [59]:
!ls -l /content/training-data-analyst/blogs/landsat-ml/landcover_preproc

total 7004
-rw-r--r-- 1 root root 2421530 Dec  6 21:17 features_eval-00000-of-00001.tfrecord.gz
-rw-r--r-- 1 root root 4739884 Dec  6 21:17 features_train-00000-of-00001.tfrecord.gz
-rw-r--r-- 1 root root    2099 Dec  6 21:16 metadata.yaml


In [40]:
%bash
rm -rf /content/training-data-analyst/blogs/landsat-ml/landcover_trained
tar cvfz landcover.tgz landcover

landcover/
landcover/trainer/
landcover/trainer/task.py
landcover/trainer/util.py
landcover/trainer/__init__.py
landcover/setup.py


In [41]:
%mlalpha train
package_uris: /content/training-data-analyst/blogs/landsat-ml/landcover.tgz
python_module: trainer.task
scale_tier: BASIC
region: us-central1
args:
  train_data_paths: /content/training-data-analyst/blogs/landsat-ml/landcover_preproc/features_train-*
  eval_data_paths: /content/training-data-analyst/blogs/landsat-ml/landcover_preproc/features_eval-*
  metadata_path: /content/training-data-analyst/blogs/landsat-ml/landcover_preproc/metadata.yaml
  output_path: /content/training-data-analyst/blogs/landsat-ml/landcover_trained
  max_steps:  2000
  batch_size: 10000
  layer1_size: 30
  layer2_size: 10
  learning_rate: 0.01
  min_eval_frequency: 1000

<h3> Predict with locally trained model</h3>

We can use the preprocessed features to evaluate how well the trained model performs. The evaluation workflow will use the model for prediction, so if we save the predictions from the model when it is being evaluated, we can use those predictions for downscaling.

In [4]:
!ls /content/training-data-analyst/blogs/landsat-ml/landcover_preproc/

features_eval-00000-of-00001.tfrecord.gz   metadata.yaml
features_train-00000-of-00001.tfrecord.gz


In [19]:
# imports
import apache_beam as beam
import google.cloud.ml as ml
import google.cloud.ml.analysis as analysis
import google.cloud.ml.io as io
import json
import os

OUTPUT_DIR = '/content/training-data-analyst/blogs/landsat-ml/landcover_eval'
pipeline = beam.Pipeline('DirectPipelineRunner')

eval_features = (pipeline | 'ReadEval' >> io.LoadFeatures('/content/training-data-analyst/blogs/landsat-ml/landcover_preproc/features_eval*'))
trained_model = pipeline | 'LoadModel' >> io.LoadModel('/content/training-data-analyst/blogs/landsat-ml/landcover_trained/model')
evaluations = (eval_features | 'Evaluate' >> ml.Evaluate(trained_model) |
    beam.Map('ExtractEvaluationResults', lambda (example, prediction): prediction))
eval_data_sink = beam.io.TextFileSink(os.path.join(OUTPUT_DIR, 'eval'), shard_name_template='')
evaluations | beam.io.textio.WriteToText(os.path.join(OUTPUT_DIR, 'eval'), shard_name_template='')

# run pipeline
pipeline.run()



<apache_beam.runners.direct_runner.DirectPipelineResult at 0x7f3fd40eff50>

In [21]:
!head -2 /content/training-data-analyst/blogs/landsat-ml/landcover_eval/eval

{u'score': [0.9201728701591492, 0.005093369632959366, 6.252119055716321e-05, 0.011211490258574486, 0.055562522262334824, 0.007897143252193928], u'target': '1', u'key': '0,0', u'label': '1'}
{u'score': [0.9550836682319641, 0.0030277296900749207, 2.1692805603379384e-05, 0.009645896032452583, 0.02988329716026783, 0.0023376329336315393], u'target': '1', u'key': '0,3', u'label': '1'}


Note that the output includes the key (the pixel location), and the label (the prediction) for that pixel. That is enough for us to be able to do the downscaling.

In [52]:
# imports
import apache_beam as beam
import google.cloud.ml as ml
import google.cloud.ml.analysis as analysis
import google.cloud.ml.io as io
import json
import yaml
import os
import numpy as np

OUTPUT_DIR = '/content/training-data-analyst/blogs/landsat-ml/landcover_eval'
pipeline = beam.Pipeline('DirectPipelineRunner')

# analysis
def read_metadata(filename):
  with open(filename, 'r') as stream:
    try:
        return yaml.load(stream)
    except yaml.YAMLError as exc:
        print(exc)

metadata = read_metadata('/content/training-data-analyst/blogs/landsat-ml/landcover_preproc/metadata.yaml')
lookup = metadata['columns']['landcover']['vocab']
print lookup
def make_data_for_analysis(values):
  return {
      'target': lookup[values['target']],
      'predicted': lookup[values['label']],
      'score': np.max(values['score']), # not needed
  }

metadata = pipeline | io.LoadMetadata('/content/training-data-analyst/blogs/landsat-ml/landcover_preproc/metadata.yaml')
analysis_source = evaluations | beam.Map('CreateAnalysisSource', make_data_for_analysis)
confusion_matrix, precision_recall, logloss = (analysis_source |
    'Analyze Model' >> analysis.AnalyzeModel(metadata))
confusion_matrix_file = os.path.join(OUTPUT_DIR, 'analyze_cm.json')
confusion_matrix_sink = beam.io.TextFileSink(confusion_matrix_file, shard_name_template='')
confusion_matrix | beam.io.Write('WriteConfusionMatrix', confusion_matrix_sink)


# run pipeline
pipeline.run()



{'1': 0, '10': 1, '12': 2, '5': 3, '9': 5, '8': 4}


<apache_beam.runners.direct_runner.DirectPipelineResult at 0x7f400b226290>

In [56]:
import datalab.mlalpha
import yaml
with ml.util._file.open_local_or_gcs(confusion_matrix_file, 'r') as f:
  data = [yaml.load(line) for line in f.read().rstrip().split('\n')]
  for line in data:
    line['target'] = 'lc_{:02d}'.format(int(line['target']))
    line['predicted'] = 'lc_{:02d}'.format(int(line['predicted']))
datalab.mlalpha.ConfusionMatrix([d['predicted'] for d in data],
                           [d['target'] for d in data],
                           [d['count'] for d in data]).plot()

The model seems to get a little confused between categories 1 and 9; categories 5 and 12 are poorly recognized. Let's not get too hung up on this, though, because this is on a very small dataset.

<h2>Train on full dataset on the cloud</h2>

Let's preprocess the complete dataset.  <b>These steps will take several hours and have billing implications</b>.

Specify your bucket and project as appropriate. Make sure that the bucket you use is a single-region bucket (when you create a bucket, there is an option to specify this). If you already have a bucket and it is not a single-region one, you should create a separate single-region bucket for Cloud ML jobs to use.

In [62]:
!python preprocess.py

gs://cloud-ml/sdk/cloudml-0.1.6-alpha.dataflow.tar.gz
running sdist
running egg_info
writing requirements to landcover.egg-info/requires.txt
writing landcover.egg-info/PKG-INFO
writing top-level names to landcover.egg-info/top_level.txt
writing dependency_links to landcover.egg-info/dependency_links.txt
reading manifest file 'landcover.egg-info/SOURCES.txt'
writing manifest file 'landcover.egg-info/SOURCES.txt'

running check


creating landcover-0.0.1
creating landcover-0.0.1/landcover.egg-info
copying files to landcover-0.0.1...
copying setup.py -> landcover-0.0.1
copying landcover.egg-info/PKG-INFO -> landcover-0.0.1/landcover.egg-info
copying landcover.egg-info/SOURCES.txt -> landcover-0.0.1/landcover.egg-info
copying landcover.egg-info/dependency_links.txt -> landcover-0.0.1/landcover.egg-info
copying landcover.egg-info/requires.txt -> landcover-0.0.1/landcover.egg-info
copying landcover.egg-info/top_level.txt -> landcover-0.0.1/landcover.egg-info
Writing landcover-0.0.1/setup.cfg

In [1]:
!gsutil ls -l gs://cloud-training-demos-ml/landcover/preproc/



Updates are available for some Cloud SDK components.  To install them,
please run:
  $ gcloud components update

  11911414  2016-11-04T18:12:29Z  gs://cloud-training-demos-ml/landcover/preproc/features_eval-00000-of-00017.tfrecord.gz
   1399595  2016-11-04T18:12:29Z  gs://cloud-training-demos-ml/landcover/preproc/features_eval-00001-of-00017.tfrecord.gz
    722332  2016-11-04T18:12:29Z  gs://cloud-training-demos-ml/landcover/preproc/features_eval-00002-of-00017.tfrecord.gz
  61908282  2016-11-04T18:12:29Z  gs://cloud-training-demos-ml/landcover/preproc/features_eval-00003-of-00017.tfrecord.gz
  62458050  2016-11-04T18:12:29Z  gs://cloud-training-demos-ml/landcover/preproc/features_eval-00004-of-00017.tfrecord.gz
 233743299  2016-11-04T18:12:29Z  gs://cloud-training-demos-ml/landcover/preproc/features_eval-00005-of-00017.tfrecord.gz
   5783557  2016-11-04T18:12:29Z  gs://cloud-training-demos-ml/landcover/preproc/features_eval-00006-of-00017.tfrecord.gz
 130264412  2016-11-04T18:12:29

Tar up the Python package and make it available on Cloud Storage.

In [23]:
%bash
BUCKET=cloud-training-demos-ml
gsutil -m rm -rf gs://$BUCKET/landcover/trained
tar cvfz landcover.tgz landcover
gsutil cp landcover.tgz gs://$BUCKET/landcover/source/

landcover/
landcover/setup.py
landcover/trainer/
landcover/trainer/__init__.py
landcover/trainer/task.py
landcover/trainer/util.py


CommandException: 1 files/objects could not be removed.
Copying file://landcover.tgz [Content-Type=application/x-tar]...
/ [0 files][    0.0 B/  5.0 KiB]                                                / [1 files][  5.0 KiB/  5.0 KiB]                                                
Operation completed over 1 objects/5.0 KiB.                                      


This is the same as the local training except:
<ol>
<li> --cloud parameter to do the training on the Cloud in a distributed way rather on a single machine.
<li> all the data paths point to Cloud Storage, where our preprocessing code wrote its output
<li> max_steps is much larger.  This is because a step is only one batch.  The entire dataset is 178m points, and since batchsize is 10000, we need
17,800 steps for a single epoch (or pass through training data).  So, the 890000 here is approximately 50 epochs of training.
<li> The evaluation frequency has been upped to 17,800 for the same reason (so that we evaluate approximately once every epoch)
</ol>

In [37]:
%mlalpha train --cloud
package_uris: gs://cloud-training-demos-ml/landcover/source/landcover.tgz
python_module: trainer.task
scale_tier: STANDARD_1
region: us-central1
args:
  train_data_paths: gs://cloud-training-demos-ml/landcover/preproc/features_train-*
  eval_data_paths: gs://cloud-training-demos-ml/landcover/preproc/features_eval-*
  metadata_path: gs://cloud-training-demos-ml/landcover/preproc/metadata.yaml
  output_path: gs://cloud-training-demos-ml/landcover/trained
  max_steps:  890000
  batch_size: 10000
  layer1_size: 30
  layer2_size: 10
  learning_rate: 0.01
  min_eval_frequency: 17800

In [41]:
%mlalpha jobs --name trainer_task_161121_170420

In [42]:
!gsutil ls gs://cloud-training-demos-ml/landcover/trained/model



Updates are available for some Cloud SDK components.  To install them,
please run:
  $ gcloud components update

gs://cloud-training-demos-ml/landcover/trained/model/
gs://cloud-training-demos-ml/landcover/trained/model/checkpoint
gs://cloud-training-demos-ml/landcover/trained/model/export
gs://cloud-training-demos-ml/landcover/trained/model/export.meta
gs://cloud-training-demos-ml/landcover/trained/model/metadata.yaml


<h2> Run prediction </h2>

In [34]:
#!python predict.py

In [14]:
%bash
gsutil rm gs://cloud-training-demos-ml/landcover/prediction/landcover.TIF
gsutil ls gs://cloud-training-demos-ml/landcover/prediction/

gs://cloud-training-demos-ml/landcover/prediction/eval
gs://cloud-training-demos-ml/landcover/prediction/tmp/


Removing gs://cloud-training-demos-ml/landcover/prediction/landcover.TIF...
/ [1 objects]                                                                   
Operation completed over 1 objects.                                              


<h2> Confusion matrix </h2>

Let's display the confusion matrix on the full prediction.

In [19]:
!gsutil cp gs://cloud-training-demos-ml/landcover/trained/model/metadata.yaml /tmp/metadata.yaml

Copying gs://cloud-training-demos-ml/landcover/trained/model/metadata.yaml...
- [1 files][  2.2 KiB/  2.2 KiB]                                                
Operation completed over 1 objects/2.2 KiB.                                      


In [35]:
# analysis
def read_metadata(filename): 
  import yaml
  with open(filename, 'r') as stream:
    try:
        return yaml.load(stream)
    except yaml.YAMLError as exc:
        print(exc)

metadata = read_metadata('/tmp/metadata.yaml')
lookup = metadata['columns']['landcover']['vocab']
print lookup

{'11': 3, '10': 2, '13': 5, '12': 4, '15': 7, '14': 6, '16': 8, '1': 1, '0': 0, '3': 10, '2': 9, '5': 12, '4': 11, '7': 14, '6': 13, '9': 16, '8': 15}


In [39]:
!python confusion.py

Copying gs://cloud-training-demos-ml/landcover/trained/model/metadata.yaml...
/ [1 files][  2.2 KiB/  2.2 KiB]                                                
Operation completed over 1 objects/2.2 KiB.                                      
running sdist
running egg_info
writing requirements to landcover.egg-info/requires.txt
writing landcover.egg-info/PKG-INFO
writing top-level names to landcover.egg-info/top_level.txt
writing dependency_links to landcover.egg-info/dependency_links.txt
reading manifest file 'landcover.egg-info/SOURCES.txt'
writing manifest file 'landcover.egg-info/SOURCES.txt'

running check


creating landcover-0.0.1
creating landcover-0.0.1/landcover.egg-info
copying files to landcover-0.0.1...
copying setup.py -> landcover-0.0.1
copying landcover.egg-info/PKG-INFO -> landcover-0.0.1/landcover.egg-info
copying landcover.egg-info/SOURCES.txt -> landcover-0.0.1/landcover.egg-info
copying landcover.egg-info/dependency_links.txt -> landcover-0.0.1/landcover.egg-info
cop

In [3]:
import datalab.mlalpha
import google.cloud.ml as ml
import yaml
confusion_matrix_file = 'gs://cloud-training-demos-ml/landcover/prediction/confusion/analyze_cm.json'
with ml.util._file.open_local_or_gcs(confusion_matrix_file, 'r') as f:
  data = [yaml.load(line) for line in f.read().rstrip().split('\n')]
  for line in data:
    line['target'] = 'lc_{:02d}'.format(int(line['target']))
    line['predicted'] = 'lc_{:02d}'.format(int(line['predicted']))
datalab.mlalpha.ConfusionMatrix([d['predicted'] for d in data],
                           [d['target'] for d in data],
                           [d['count'] for d in data]).plot()

ERROR:root:Retrying after exception reading gcs file: [Errno 2] Not found: gs://cloud-training-demos-ml/landcover/prediction/confusion/analyze_cm.json


IOError: [Errno 2] Not found: gs://cloud-training-demos-ml/landcover/prediction/confusion/analyze_cm.json

<h2> Create downscaled Landcover image </h2>

We'll read the original Landcover image and replace the pixel values by the predictions from the neural network.

In [4]:
!python create_downscaled.py

gs://cloud-ml/sdk/cloudml-0.1.6-alpha.dataflow.tar.gz
in-memory array created ...
running sdist
running egg_info
writing requirements to landcover.egg-info/requires.txt
writing landcover.egg-info/PKG-INFO
writing top-level names to landcover.egg-info/top_level.txt
writing dependency_links to landcover.egg-info/dependency_links.txt
reading manifest file 'landcover.egg-info/SOURCES.txt'
writing manifest file 'landcover.egg-info/SOURCES.txt'

running check


creating landcover-0.0.1
creating landcover-0.0.1/landcover.egg-info
copying files to landcover-0.0.1...
copying setup.py -> landcover-0.0.1
copying landcover.egg-info/PKG-INFO -> landcover-0.0.1/landcover.egg-info
copying landcover.egg-info/SOURCES.txt -> landcover-0.0.1/landcover.egg-info
copying landcover.egg-info/dependency_links.txt -> landcover-0.0.1/landcover.egg-info
copying landcover.egg-info/requires.txt -> landcover-0.0.1/landcover.egg-info
copying landcover.egg-info/top_level.txt -> landcover-0.0.1/landcover.egg-info
Writi

In [None]:
# Copyright 2016 Google Inc. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
#      http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.