<a href="https://colab.research.google.com/github/aml7hawaiiedu/CCAPLandCoverProject/blob/main/ccap_unet_demo_reimagined.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# CCAP U-Net Demo Reimagined

This is Amanda's version of UNET_regression_demo_whp (Aron's original code)
  
This is an Earth Engine <> TenserFlow demonstration notebook. Suppose you want to predict a continuous output (regression) from a stack of continuous inputs. In this example, the output is impervious surface area from [NLCD](https://www.mrlc.gov/data) and the input is a Landsat 8 composite. The model is a [fully convolutional neural network  (FCNN)](https://www.cv-foundation.org/openaccess/content_cvpr_2015/papers/Long_Fully_Convolutional_Networks_2015_CVPR_paper.pdf), specifically [U-net](https://arxiv.org/abs/1505.04597). 

This version is a reimagination of this original work. I found the reliance on Earth engine functions and tensorflow records to be too difficult for debugging. Here we'll simply export geotiffs to google cloud storage and then move along a more traditional image processing route from there.


# Import and set up dependencies

In [1]:
# Google Cloud Platform authentication.
from google.colab import auth
auth.authenticate_user()

In [2]:
# Import Google Earth Engine Python API, authenticate user's credentials, and initialize the Earth Engine library.
import ee
ee.Authenticate()
ee.Initialize()

To authorize access needed by Earth Engine, open the following URL in a web browser and follow the instructions. If the web browser does not start automatically, please manually browse the URL below.

    https://code.earthengine.google.com/client-auth?scopes=https%3A//www.googleapis.com/auth/earthengine%20https%3A//www.googleapis.com/auth/devstorage.full_control&request_id=P6eDclz703OwCT2oqjKblc_GHrscevTU8XlvCZIzOuY&tc=nILN0wBVeM8VwBGMld_UnOfo63724Wua0JD6oO9tgNA&cc=ZqNptp16rsoyRXE631shET8cSvyZOvK_DPKrh-s1258

The authorization workflow will generate a code, which you should paste in the box below.
Enter verification code: 4/1AbUR2VPNqpQu6S_n3ET-jBGeOB6GGGUVyhvSArZLmWchG1KAkJNKAA6dVsk

Successfully saved authorization token.


In [3]:
# Tensorflow is an open-source machine learning library developed by Google. It is wildly used for building deep learning models.
import tensorflow as tf
print(tf.__version__)

2.12.0


In [4]:
# Folium is a Python library used for creating interactive maps and visualizations.
import folium
print(folium.__version__)

0.14.0


# Define variables

In [5]:
# Name of the Google Cloud Storage (GCS) bucket where data will be stored or accessed from.
# It is common to define a bucket which serves as a top-level container. 
BUCKET = 'remote_sensing_fuckit_bucket'
print(BUCKET)

remote_sensing_fuckit_bucket


In [6]:
# Name of the folder that will be used to store data related to the wetland U-Net model.
FOLDER = 'wetland_unet'
TRAINING_BASE = 'training_patches' 
EVAL_BASE = 'eval_patches'

# Name of the lists with imagery (BANDS) and land cover categories(RESPONSE) concatenated.
opticalBands = ['B1', 'B2', 'B3', 'B4', 'B5', 'B6', 'B7']
thermalBands = ['B10', 'B11']
BANDS = opticalBands + thermalBands

RESPONSE = ['p1','p2','p3','p4','p5','p6','p7','p8','p9','p10','p11','p12','p13','p14','p15','p16','p17','p18','p19','p20','p21','p22','p23','p24','p25']
FEATURES = BANDS + RESPONSE

In [7]:
# Specify the inputs
KERNEL_SIZE = 128
COLUMNS = [tf.io.FixedLenFeature(shape=[KERNEL_SIZE, KERNEL_SIZE], dtype=tf.float32) for _ in FEATURES]
FEATURES_DICT = {feature: column for feature, column in zip(FEATURES, COLUMNS)}

# Define the size of the training and evaluation datasets that will be used to train and evaluate the U-Net model. 
TRAIN_SIZE = 1600
EVAL_SIZE = 800

# Specify model training parameters.
BATCH_SIZE = 16
EPOCHS = 10
BUFFER_SIZE = 2000
OPTIMIZER = 'adam'
LOSS = 'CategoricalCrossentropy'
METRICS = ['acc']

# Create an Earth Engine feature collection object containing Hawaii AOIs with a spatial resolution of 128 meters and size of 3840 meters.
aoiPolys = ee.FeatureCollection('projects/ee-seismosmsr-landcover/assets/hawaii_pos_128_3840m_singleparts') 

In [8]:
import ee
import subprocess
import time

def exportImagesToGCS(image, aoiPolys, bucket_name, file_prefix):
    """Export an Earth Engine image to a GCS bucket as a GeoTIFF for each polygon in a set of polygons.
    
    Args:
        image (ee.Image): The Earth Engine image to export.
        aoiPolys (ee.FeatureCollection): The set of polygons defining the AOI.
        bucket_name (str): The name of the GCS bucket to export the images to.
        file_prefix (str): The prefix to use for the file names of the exported images.
    """
    # Define the GCS URI for the output files.
    output_base_uri = 'gs://{}/{}'.format(bucket_name, file_prefix)
    
    # Cast all bands to Float32 to ensure compatibility.
    image = image.toFloat()
    
    # Loop over the polygons and export one image for each polygon.
    for i, poly in enumerate(aoiPolys.getInfo()['features']):
        poly_name = poly['properties'].get('name', str(i))
        output_uri = '{}/{}.tif'.format(output_base_uri, poly_name)
        
        # Export the image to the GCS bucket as a GeoTIFF.
        task = ee.batch.Export.image.toCloudStorage(
            image=image.clip(poly['geometry']),
            description='Export {} to GCS'.format(poly_name),
            bucket=bucket_name,
            fileNamePrefix='{}/{}'.format(file_prefix, poly_name),
            scale=30,
            maxPixels=1e13,
            fileFormat='GeoTIFF',
        )
        task.start()
        
        # # Print the status while waiting for the export to complete.
        # while task.status()['state'] in ['READY', 'RUNNING']:
        #     print('Exporting {} to GCS: {}...'.format(poly_name, task.status()['description']), end='\r')
        #     time.sleep(30)
        
        # # Check if the export succeeded.
        # if task.status()['state'] == 'COMPLETED':
        #     print('Export {} to GCS succeeded.'.format(poly_name))
        # else:
        #     print('Export {} to GCS failed.'.format(poly_name))
            
        # Set the GCS object's ACL to be publicly accessible.
        # gsutil_path = !which gsutil
        # acl_cmd = '{} acl ch -u AllUsers:R {}'.format(gsutil_path[0], output_uri)
        # subprocess.check_output(acl_cmd.split())

In [16]:
# Identify the imagery data collection from EE. 
l8sr = ee.ImageCollection('LANDSAT/LC08/C01/T1_SR') 
"""Landsat 8 surface reflectance"""

# Define cloud masking to apply to images.
def maskL8sr(image):
  cloudShadowBitMask = ee.Number(2).pow(3).int()
  cloudsBitMask = ee.Number(2).pow(5).int()
  qa = image.select('pixel_qa')
  mask1 = qa.bitwiseAnd(cloudShadowBitMask).eq(0).And(
    qa.bitwiseAnd(cloudsBitMask).eq(0))
  mask2 = image.mask().reduce('min')
  mask3 = image.select(opticalBands).gt(0).And(
          image.select(opticalBands).lt(10000)).reduce('min')
  mask = mask1.And(mask2).And(mask3)
  return image.select(opticalBands).divide(10000).addBands(
          image.select(thermalBands).divide(10).clamp(273.15, 373.15)
            .subtract(273.15).divide(100)).updateMask(mask)

# Filter the imagery data collection from EE. 
l8sr_2017 = l8sr.filterDate('2017-01-01', '2017-12-31').map(maskL8sr).median().clip(aoiPolys.geometry()).toFloat() # updated name to match convention

# Replace NaN values with 0.
nan_mask = l8sr_2017.mask(l8sr_2017)
image_with_zero  = l8sr_2017.updateMask(nan_mask).unmask(0)

# Create single mosaic
# mosaic =  l8sr_2017.addBands(l8sr_2018)
# l8sr_3yr_mosaic = mosaic.addBands(l8sr_2019)

# Reset band names to new stack
# BANDS = ls_3yr_mosaic.bandNames().getInfo()
# FEATURES = BANDS + [RESPONSE]

# Use 'getMapID()' method to get a map ID for the 'l8sr_2017' image. 
# Specify that we want to visualize band 2 of the image using 'bands':['B2']. Set the min and max.
# Create a folium map object centered at specific location using 'folium.Map()'. 
# Add a tile layer to the map using 'folium.TileLayer()' method, specifying the URL format of the tiles using the 'tiles' argument and adding 'attr' attribution info. 
# Set the overlay flag to 'True' and provide a name for the layer using 'name'. 
# Finally, add the tile layer to the map using '.add_to()' method. 
  # This allows us to visualize the 'l8sr_2017' image in the folium map. 

mapid_17 =l8sr_2017.getMapId({'bands': ['B4','B3','B2'], 'min': 0, 'max':.2})
map = folium.Map(location=[21.4, -158.])
folium.TileLayer(
    tiles=mapid_17['tile_fetcher'].url_format,
    attr = 'Map Data &copy; <a href="https://earthengine.google.com/">Google Earth Engine</a>',
    overlay=True,
    name = '17 median composite',
    ).add_to(map)

map

In [33]:
# aoiPolys.getInfo()['features'][0]

In [31]:
# One Hot Encoding function.
def oneHot(image,code = 1):
  one_hot_image = image.eq(code)
  return one_hot_image
  
print(oneHot)

<function oneHot at 0x7fbdfbe7fb50>


In [32]:
# Identify the imagery data collection from EE. 
wetland = ee.ImageCollection("projects/sat-io/open-datasets/HRLC/CCAP_HI_LC")
"""NOAA C-CAP Hawaii Land Cover"""

# Filter imagery data collection from EE.
wetland = wetland.filterDate('2010-01-01', '2022-01-01').mosaic().clip(aoiPolys.geometry())

# Mask out pixels /// ensures model doesn't predict or train on data outside the land cover area.
wetland = wetland.mask(wetland.gt(0))


# Convert a continuous wetland raster into a collection of one-hot-encoded rasters.
"""Represent categorical data using binary values. For example, suppose you have a 
raster representing different land cover types, such as forest, grassland, and water. 
A one-hot-encoded version of this raster would have three bands, one for each land 
cover type, and each pixel would have a value of 1 in the band corresponding to its 
land cover type, and 0 in all other bands."""
wetland_oneHot = []
for code in range(len(RESPONSE)):
  one_hot_image = oneHot(wetland,code)
  wetland_oneHot.append(one_hot_image)
wetland_oneHot = ee.Image(wetland_oneHot)
wetland_oneHot = wetland_oneHot.rename(RESPONSE) # list of class codes for wetlands types
wetland_oneHot = wetland_oneHot.toInt()


mapid = wetland_oneHot.getMapId({'bands': ['p10','p15','p11'],'min': 0, 'max': 1})
map = folium.Map(location=[21.4, -158])
folium.TileLayer(
    tiles=mapid['tile_fetcher'].url_format,
    attr='Map Data &copy; <a href="https://earthengine.google.com/">Google Earth Engine</a>',
    overlay=True,
    name='NOAA CCAP Land Cover',
  ).add_to(map)
map.add_child(folium.LayerControl())
map

In [None]:
# exportImagesToGCS(ls_2017,aoiPolys, BUCKET,'hawaii_2017_ls' )

In [None]:
# exportImagesToGCS(wetland,aoiPolys, BUCKET,'hawaii_2017_wetlands' )

In [None]:
# exportImagesToGCS(wetland_oneHot,aoiPolys, BUCKET,'hawaii_2017_wetlands_oneHot' )

Stack the 2D images (Landsat  and NOAA C-CAP land cover data for Hawaii) to create a single image from which samples can be taken. 

Convert the image into an array image in which each pixel stores 128x128 patches of pixels for each band. 

This is a key step that bears emphasis: to export training patches, convert a multi-band image to an array image using neighborhoodToArray(), then sample the image at points.

In [49]:
import numpy as np
import os

In [50]:
!pip install rasterio

Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
Collecting rasterio
  Downloading rasterio-1.3.6-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (20.0 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m20.0/20.0 MB[0m [31m74.6 MB/s[0m eta [36m0:00:00[0m
Collecting snuggs>=1.4.1
  Downloading snuggs-1.4.7-py3-none-any.whl (5.4 kB)
Collecting cligj>=0.5
  Downloading cligj-0.7.2-py3-none-any.whl (7.1 kB)
Collecting affine
  Downloading affine-2.4.0-py3-none-any.whl (15 kB)
Collecting click-plugins
  Downloading click_plugins-1.1.1-py2.py3-none-any.whl (7.5 kB)
Installing collected packages: snuggs, cligj, click-plugins, affine, rasterio
Successfully installed affine-2.4.0 click-plugins-1.1.1 cligj-0.7.2 rasterio-1.3.6 snuggs-1.4.7


In [51]:
for k in [0,1,2,3,4,5,6,7]:
  import rasterio
  from rasterio import warp
  from skimage.util.shape import view_as_blocks
  # Define the GCP bucket and filename
  BUCKET_NAME = 'remote_sensing_fuckit_bucket'
  BANDS_PATH = 'hawaii_2017_ls/'+str(k)+'.tif'
  CLASS_PATH = 'hawaii_2017_wetlands/'+str(k)+'.tif'

  # Set the block shape (i.e., the size of the image chips)
  BLOCK_SHAPE = (128, 128)

  # Open the raster images using rasterio and Google Cloud Storage
  with rasterio.Env(GS_BUCKET=BUCKET_NAME):
      with rasterio.open(f'/vsigs/{BUCKET_NAME}/{BANDS_PATH}') as dataset_bands, rasterio.open(f'/vsigs/{BUCKET_NAME}/{CLASS_PATH}') as dataset_class:
    # Read the image data into numpy arrays
          # Read the image data into numpy arrays
          bands = dataset_bands.read().astype(np.float32)
          classification = dataset_class.read()


          print(bands.shape)
          print(classification.shape)

          num_bands, height, width  = bands.shape

          clipped_shape_bands = (num_bands,
                                BLOCK_SHAPE[0] * int(np.floor(height / BLOCK_SHAPE[0])),
                                BLOCK_SHAPE[1] * int(np.floor(width / BLOCK_SHAPE[1])))
          
          bands = bands[:clipped_shape_bands[0], :clipped_shape_bands[1],:clipped_shape_bands[2]]


          num_class, height, width  = classification.shape

          clipped_shape_class = (num_class,
                                BLOCK_SHAPE[0] * int(np.floor(height / BLOCK_SHAPE[0])),
                                BLOCK_SHAPE[1] * int(np.floor(width / BLOCK_SHAPE[1])))

          
          classification = classification[:clipped_shape_class[0], :clipped_shape_class[1],:clipped_shape_class[2]]
          print(clipped_shape_class)
          print(clipped_shape_bands)
          # print(bands.shape)
          # print(classification.shape)

          # Use view_as_blocks to extract the image chips
          ys, xs = np.indices(bands[..., 0].shape, dtype=np.float32)

          # bands_chips = view_as_blocks(bands, (*clipped_shape_bands[0:1], num_bands)).reshape(-1, *clipped_shape_bands[0:1], num_bands)

          bands_chips = view_as_blocks(bands, (num_bands,*BLOCK_SHAPE)).reshape(-1, *BLOCK_SHAPE,num_bands)

          class_chips = view_as_blocks(classification, (1,*BLOCK_SHAPE)).reshape(-1, *BLOCK_SHAPE)


          # Normalize the image chips so that pixel values are between 0 and 1
          # bands_chips /= 255.0

          # (Optional) Resize the image chips to a specific size
          # bands_chips = tf.image.resize(bands_chips, (224, 224))

          # Convert the classification chips to one-hot encoded format
          class_chips = tf.one_hot(class_chips, depth=25)

  # Print the shape of the image chips arrays
  print('Bands chips shape:', bands_chips.shape)
  print('Class chips shape:', class_chips.shape)
  # Set the proportion of non-zero pixels required to keep a chip
  min_prop = 0.01


  # print()
  keep_index = [i for i, chip in enumerate(bands_chips) if np.sum(np.isnan(chip))/np.size(chip) < min_prop]

  subset_bands_chips = bands_chips[keep_index,:,:,:]
  subset_bands_chips = np.nan_to_num(subset_bands_chips,0)
  print(subset_bands_chips.shape)

  subset_class_chips = class_chips.numpy()[keep_index,:,:,:]
  subset_class_chips = np.nan_to_num(subset_class_chips,0)
  print(subset_class_chips.shape)
  

  import tensorflow as tf

  def _bytes_feature(value):
    """Returns a bytes_list from a string / byte."""
    if isinstance(value, type(tf.constant(0))):
        value = value.numpy()  # BytesList won't unpack a string from an EagerTensor.
    return tf.train.Feature(bytes_list=tf.train.BytesList(value=[value]))

  def _int64_feature(value):
      """Returns an int64_list from a bool / enum / int / uint."""
      return tf.train.Feature(int64_list=tf.train.Int64List(value=value))

  def serialize_example(image, label):
      feature = {
          'image_shape': _int64_feature(image.shape),
          'label_shape': _int64_feature(label.shape),
          'image': _bytes_feature(tf.io.serialize_tensor(image)),
          'label': _bytes_feature(tf.io.serialize_tensor(label)),
      }
      example_proto = tf.train.Example(features=tf.train.Features(feature=feature))
      return example_proto.SerializeToString()

  with tf.io.TFRecordWriter('dataset.tfrecord') as writer:
    for i in range(subset_bands_chips.shape[0]):
        image = subset_bands_chips[i]
        label = subset_class_chips[i]
        example = serialize_example(image, label)
        writer.write(example)

  from google.cloud import storage

  # Your GCS bucket name
  bucket_name = BUCKET
  # The destination path for the TFRecord file in the bucket
  destination_blob_name = 'hawaii_2017_wetlands_tfrecords/training/'+str(k)+'.tfrecord'

  # Create a GCS client
  storage_client = storage.Client()
  bucket = storage_client.get_bucket(bucket_name)

  # Write the dataset to a temporary local file
  local_file_path = '/content/dataset.tfrecord'
  with tf.io.TFRecordWriter(local_file_path) as writer:
      for i in range(subset_bands_chips.shape[0]):
          image = subset_bands_chips[i]
          label = subset_class_chips[i]
          example = serialize_example(image, label)
          writer.write(example)

  # Upload the local file to the GCS bucket
  blob = bucket.blob(destination_blob_name)
  blob.upload_from_filename(local_file_path)

  # Remove the temporary local file
  os.remove(local_file_path)

(9, 5283, 4910)
(1, 5283, 4910)
(1, 5248, 4864)
(9, 5248, 4864)
Bands chips shape: (1558, 128, 128, 9)
Class chips shape: (1558, 128, 128, 25)
(761, 128, 128, 9)
(761, 128, 128, 25)
(9, 1926, 2926)
(1, 1926, 2926)
(1, 1920, 2816)
(9, 1920, 2816)
Bands chips shape: (330, 128, 128, 9)
Class chips shape: (330, 128, 128, 25)
(159, 128, 128, 9)
(159, 128, 128, 25)
(9, 630, 877)
(1, 630, 877)
(1, 512, 768)
(9, 512, 768)
Bands chips shape: (24, 128, 128, 9)
Class chips shape: (24, 128, 128, 25)
(13, 128, 128, 9)
(13, 128, 128, 25)
(9, 912, 2508)
(1, 912, 2508)
(1, 896, 2432)
(9, 896, 2432)
Bands chips shape: (133, 128, 128, 9)
Class chips shape: (133, 128, 128, 25)
(66, 128, 128, 9)
(66, 128, 128, 25)
(9, 980, 1207)
(1, 980, 1207)
(1, 896, 1152)
(9, 896, 1152)
Bands chips shape: (63, 128, 128, 9)
Class chips shape: (63, 128, 128, 25)
(32, 128, 128, 9)
(32, 128, 128, 25)
(9, 1946, 2598)
(1, 1946, 2598)
(1, 1920, 2560)
(9, 1920, 2560)
Bands chips shape: (300, 128, 128, 9)
Class chips shape: (30

In [54]:
  # Set the proportion of non-zero pixels required to keep a chip
  min_prop = 0.01


  # print()
  keep_index = [i for i, chip in enumerate(bands_chips) if np.sum(np.isnan(chip))/np.size(chip) < min_prop]

  subset_bands_chips = bands_chips[keep_index,:,:,:]
  subset_bands_chips = np.nan_to_num(subset_bands_chips,0)
  print(subset_bands_chips.shape)

  subset_class_chips = class_chips.numpy()[keep_index,:,:,:]
  subset_class_chips = np.nan_to_num(subset_class_chips,0)
 
 # print(subset_class_chips.shape)


In [55]:
import tensorflow as tf

def _bytes_feature(value):
    """Returns a bytes_list from a string / byte."""
    if isinstance(value, type(tf.constant(0))):
        value = value.numpy()  # BytesList won't unpack a string from an EagerTensor.
    return tf.train.Feature(bytes_list=tf.train.BytesList(value=[value]))

def serialize_example(image, label):
    feature = {
        'image': _bytes_feature(tf.io.serialize_tensor(image)),
        'label': _bytes_feature(tf.io.serialize_tensor(label)),
    }
    example_proto = tf.train.Example(features=tf.train.Features(feature=feature))
    return example_proto.SerializeToString()

with tf.io.TFRecordWriter('dataset.tfrecord') as writer:
    for i in range(subset_bands_chips.shape[0]):
        image = subset_bands_chips[i]
        label = subset_class_chips[i]
        example = serialize_example(image, label)
        writer.write(example)


In [56]:
BUCKET

'remote_sensing_fuckit_bucket'

In [None]:
# from google.cloud import storage

# # Your GCS bucket name
# bucket_name = BUCKET
# # The destination path for the TFRecord file in the bucket
# destination_blob_name = 'hawaii_2017_wetlands_tfrecords/training/1.tfrecord'

# # Create a GCS client
# storage_client = storage.Client()
# bucket = storage_client.get_bucket(bucket_name)

# # Write the dataset to a temporary local file
# local_file_path = '/content/dataset.tfrecord'
# with tf.io.TFRecordWriter(local_file_path) as writer:
#     for i in range(subset_bands_chips.shape[0]):
#         image = subset_bands_chips[i]
#         label = subset_class_chips[i]
#         example = serialize_example(image, label)
#         writer.write(example)

# # Upload the local file to the GCS bucket
# blob = bucket.blob(destination_blob_name)
# blob.upload_from_filename(local_file_path)

# # Remove the temporary local file
# os.remove(local_file_path)

In [None]:
# import numpy as np

# # Your GCS bucket name
# bucket_name = BUCKET

# # Split the data into training and validation sets
# train_ratio = 0.8
# total_samples = subset_bands_chips.shape[0]
# train_samples = int(total_samples * train_ratio)

# # Shuffle the indices before splitting
# indices = np.arange(total_samples)
# np.random.shuffle(indices)

# train_indices = indices[:train_samples]
# val_indices = indices[train_samples:]

# # Split the numpy arrays using the indices
# train_bands_chips = subset_bands_chips[train_indices]
# train_class_chips = subset_class_chips[train_indices]
# val_bands_chips = subset_bands_chips[val_indices]
# val_class_chips = subset_class_chips[val_indices]

# # Helper function to write the data to a GCS bucket
# def write_data_to_gcs_bucket(data, labels, local_file_path, bucket_name, destination_blob_name):
#     with tf.io.TFRecordWriter(local_file_path) as writer:
#         for i in range(data.shape[0]):
#             image = data[i]
#             label = labels[i]
#             example = serialize_example(image, label)
#             writer.write(example)

#     storage_client = storage.Client()
#     bucket = storage_client.get_bucket(bucket_name)
#     blob = bucket.blob(destination_blob_name)
#     blob.upload_from_filename(local_file_path)
#     os.remove(local_file_path)

# # Write the training data to the GCS bucket
# train_local_file_path = 'temp_train_dataset.tfrecord'
# train_destination_blob_name = 'hawaii_2017_wetlands_tfrecords/training/1.tfrecord'
# write_data_to_gcs_bucket(train_bands_chips, train_class_chips, train_local_file_path, bucket_name, train_destination_blob_name)

# # Write the validation data to the GCS bucket
# val_local_file_path = 'temp_val_dataset.tfrecord'
# val_destination_blob_name = 'hawaii_2017_wetlands_tfrecords/validation/1.tfrecord'
# write_data_to_gcs_bucket(val_bands_chips, val_class_chips, val_local_file_path, bucket_name, val_destination_blob_name)


Use some pre-made geometries to sample the stack in strategic locations. 

Specifically, these are hand-made polygons in which to take the 128x128 samples. 

Display the sampling polygons on a map, red for training polygons, blue for evaluation.

In [58]:
FEATURES_DICT = {
    'image_shape': tf.io.FixedLenFeature([3], tf.int64),
    'label_shape': tf.io.FixedLenFeature([3], tf.int64),
    'image': tf.io.FixedLenFeature([], tf.string),
    'label': tf.io.FixedLenFeature([], tf.string),
}
print(FEATURES_DICT)

{'image_shape': FixedLenFeature(shape=[3], dtype=tf.int64, default_value=None), 'label_shape': FixedLenFeature(shape=[3], dtype=tf.int64, default_value=None), 'image': FixedLenFeature(shape=[], dtype=tf.string, default_value=None), 'label': FixedLenFeature(shape=[], dtype=tf.string, default_value=None)}


In [85]:
def parse_example(inputs):
    example_proto = inputs[0]
    """The parsing function.
    Read a serialized example into the structure defined by FEATURES_DICT.
    Args:
      example_proto: a serialized Example.
    Returns:
      A dictionary of tensors, keyed by feature name.
    """
    parsed_features = tf.io.parse_single_example(example_proto)
    image = tf.io.parse_tensor(parsed_features['image'], out_type=tf.float32)
    label = tf.io.parse_tensor(parsed_features['label'], out_type=tf.float32)
    return image, label



def to_tuple(inputs):
  """Function to convert a dictionary of tensors to a tuple of (inputs, outputs).
  Turn the tensors returned by parse_tfrecord into a stack in HWC shape.
  Args:
    inputs: A dictionary of tensors, keyed by feature name.
  Returns:
    A tuple of (inputs, outputs).
  """
  inputsList = [inputs.get(key) for key in FEATURES]
  stacked = tf.stack(inputsList, axis=0)
  # Convert from CHW to HWC
  stacked = tf.transpose(stacked, [1, 2, 0])
  return stacked[:,:,:len(BANDS)], stacked[:,:,len(BANDS):]


def get_dataset(pattern):
  """Function to read, parse and format to tuple a set of input tfrecord files.
  Get all the files matching the pattern, parse and convert to tuple.
  Args:
    pattern: A file pattern to match in a Cloud Storage bucket.
  Returns:
    A tf.data.Dataset
  """
  glob = tf.io.gfile.glob(pattern)
  dataset = tf.data.TFRecordDataset(glob, compression_type='GZIP')
  dataset = dataset.map(lambda x: parse_example((x,)), num_parallel_calls=5)
  return dataset


print(get_dataset)
# print(parse_tfrecord)




<function get_dataset at 0x7fbdeddde710>


In [68]:
# def get_training_dataset():
# 		glob = 'remote_sensing_fuckit_bucket/hawaii_2017_wetlands_tfrecords/training/*.tfrecord'
# 		dataset = get_dataset(glob)
# 		dataset = dataset.map(parse_example, num_parallel_calls=5)
# 		dataset = dataset.map(to_tuple, num_parallel_calls=5)

# 		dataset = dataset.shuffle.repeat(100)
# 		return dataset

# training = get_training_dataset()


In [86]:
def get_training_dataset(batch_size):
  glob = 'remote_sensing_fuckit_bucket/hawaii_2017_wetlands_tfrecords/training/*.tfrecord'
  dataset = get_dataset(glob)
  dataset = dataset.map(parse_example, num_parallel_calls=5)
  dataset = dataset.map(to_tuple, num_parallel_calls=5)
  dataset = dataset.shuffle(10000).batch(batch_size).repeat()
  return dataset

In [91]:
# Return a Tenserflow dataset contaiing the preprocessed evaluation data. 
# File path to the evaluation data stored in GCS bucket.
# Calls 'get_dataset' function to create a TenserFlow dataset containing the evaluation data. 
# Evaluation dataset is batched with a batchsize of 1, meaning each element of the dataset consists of a single example. 
# The function returns the resulting evaluation dataset.
def get_eval_dataset():
    glob = 'remote_sensing_fuckit_bucket/hawaii_2017_wetlands_tfrecords/training/*'
    dataset = get_dataset(glob)
    dataset = dataset.shuffle(BUFFER_SIZE).batch(BATCH_SIZE, drop_remainder=True).repeat()
    return dataset


evaluation = get_eval_dataset()

TypeError: ignored

In [None]:
evaluation

<_RepeatDataset element_spec=(TensorSpec(shape=(20, 128, 128, 9), dtype=tf.float32, name=None), TensorSpec(shape=(20, 128, 128, 25), dtype=tf.float32, name=None))>

In [None]:
print(tf.__version__)

2.12.0


In [None]:
# import TenserFlow classes and functions 
from tensorflow.keras import layers
from tensorflow.keras import losses
from tensorflow.keras import models
from tensorflow.keras import metrics
from tensorflow.keras import optimizers
from tensorflow.keras.layers import BatchNormalization

# from tensorflow.python.keras.layers.normalization import BatchNormalization
# from tensorflow.python.keras.layers.normalization import BatchNormalization

# from tensorflow.keras.layers import BatchNormalization
# from keras.layers.normalization.batch_normalization import BatchNormalization
# from tensorflow.python.keras.layers import BatchNormalization 

# U-Net model for image segmentation. 
# Encoder and decoder conncted by a center block. 
# Encoder downsamples the input image while capturing its features. 
# Decoder upsamples the encoded image to generate a segmentation map.
def conv_block(input_tensor, num_filters):
	encoder = layers.Conv2D(num_filters, (3, 3), padding='same')(input_tensor)
	encoder = layers.BatchNormalization()(encoder)
	encoder = layers.Activation('relu')(encoder)
	encoder = layers.Conv2D(num_filters, (3, 3), padding='same')(encoder)
	encoder = layers.BatchNormalization()(encoder)
	encoder = layers.Activation('relu')(encoder)
	return encoder

def encoder_block(input_tensor, num_filters):
	encoder = conv_block(input_tensor, num_filters)
	encoder_pool = layers.MaxPooling2D((2, 2), strides=(2, 2))(encoder)
	return encoder_pool, encoder

def decoder_block(input_tensor, concat_tensor, num_filters):
	decoder = layers.Conv2DTranspose(num_filters, (2, 2), strides=(2, 2), padding='same')(input_tensor)
	decoder = layers.concatenate([concat_tensor, decoder], axis=-1)
	decoder = layers.BatchNormalization()(decoder)
	decoder = layers.Activation('relu')(decoder)
	decoder = layers.Conv2D(num_filters, (3, 3), padding='same')(decoder)
	decoder = layers.BatchNormalization()(decoder)
	decoder = layers.Activation('relu')(decoder)
	decoder = layers.Conv2D(num_filters, (3, 3), padding='same')(decoder)
	decoder = layers.BatchNormalization()(decoder)
	decoder = layers.Activation('relu')(decoder)
	return decoder

def get_model():
	inputs = layers.Input(shape=[KERNEL_SIZE, KERNEL_SIZE, len(BANDS)]) # 256
	encoder0_pool, encoder0 = encoder_block(inputs, 32) # 128
	encoder1_pool, encoder1 = encoder_block(encoder0_pool, 64) # 64
	encoder2_pool, encoder2 = encoder_block(encoder1_pool, 128) # 32
	encoder3_pool, encoder3 = encoder_block(encoder2_pool, 256) # 16
	encoder4_pool, encoder4 = encoder_block(encoder3_pool, 512) # 8
	center = conv_block(encoder4_pool, 1024) # center
	decoder4 = decoder_block(center, encoder4, 512) # 16
	decoder3 = decoder_block(decoder4, encoder3, 256) # 32
	decoder2 = decoder_block(decoder3, encoder2, 128) # 64
	decoder1 = decoder_block(decoder2, encoder1, 64) # 128
	decoder0 = decoder_block(decoder1, encoder0, 32) # 256
	outputs = layers.Conv2D(len(RESPONSE), (1, 1), activation='softmax')(decoder0)
	
	model = models.Model(inputs=[inputs], outputs=[outputs])

	model.compile(
		# optimizer=optimizers.get(OPTIMIZER),
		# optimizer=tf.keras.losses.SparseCategoricalCrossentropy(),  
		loss=losses.get(LOSS),
		optimizer= tf.keras.optimizers.Adam(learning_rate=1e-4),
		# loss=tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True),
		# run_eagerly=True,
		metrics=[metrics.get(metric) for metric in METRICS])

	return model

In [None]:
LOSS = 'CategoricalCrossentropy'

In [None]:
print(LOSS)

CategoricalCrossentropy


In [None]:
m = get_model()
# print(m.summary())

In [None]:
m.fit(
    training,
    epochs=EPOCHS,
    steps_per_epoch=20,
    validation_data=evaluation,
    validation_steps=20
)

Epoch 1/10


ValueError: ignored

In [None]:
dataset = tf.data.TFRecordDataset('/content/2.tfrecord')

In [None]:
m = get_model()
# tf.keras.utils.plot_model(m)
# Load a trained model. 50 epochs. 25 hours. Final RMSE ~0.08.
# tf.config.run_functions_eagerly(True)
MODEL_DIR = 'gs://' + BUCKET +'/'+ FOLDER +  '/'
# Enable eager execution
# tf.config.run_functions_eagerly(True)
# m = tf.keras.models.load_model(MODEL_DIR)
# EPOCHS
#int(TRAIN_SIZE / BATCH_SIZE)
m.fit(
    x= training , 
    epochs=10, 
    steps_per_epoch=10,
    # batch_size=70,
    validation_data=evaluation)

m.save('gs://' + BUCKET +'/'+ FOLDER +  '/')
print(m)

Epoch 1/10




ValueError: ignored

In [None]:
from google.cloud import storage

def download_tfrecord(bucket_name, source_blob_name, destination_file_name):
    storage_client = storage.Client()
    bucket = storage_client.get_bucket(bucket_name)
    blob = bucket.blob(source_blob_name)
    blob.download_to_filename(destination_file_name)

    print(f"File {source_blob_name} downloaded to {destination_file_name}.")


In [None]:
bucket_name = "remote_sensing_fuckit_bucket"
tfrecord_path = "training/2.tfrecord"
local_tfrecord_path = "local.tfrecord"
# download_tfrecord(bucket_name, tfrecord_path, local_tfrecord_path)


In [None]:
def get_single_sample(tfrecord_path):
    def _parse_function(example_proto):
        feature_description = {
            'image': tf.io.FixedLenFeature([], tf.string),
            'label': tf.io.FixedLenFeature([], tf.string),
        }
        parsed_features = tf.io.parse_single_example(example_proto, feature_description)
        image = tf.io.parse_tensor(parsed_features['image'], out_type=tf.float32)
        label = tf.io.parse_tensor(parsed_features['label'], out_type=tf.float32)
        return image, label

    dataset = tf.data.TFRecordDataset(tfrecord_path)
    dataset = dataset.map(_parse_function)

    # Get the first sample
    image, label = next(iter(dataset.take(1)))

    return image, label

# Usage example
local_tfrecord_path = '/content/2.tfrecord'
image, label = get_single_sample(local_tfrecord_path)
# print(image)
print("Image shape:", image.shape)
print("Label shape:", label.shape)

Image shape: (128, 128, 9)
Label shape: (128, 128, 25)


In [None]:
def doExport(out_image_base, shape, region):
  """Run the image export task.  Block until complete.
  """
  task = ee.batch.Export.image.toDrive(
    image = ls_2017.select(BANDS),
    description = out_image_base,
    fileNamePrefix = out_image_base,
    folder = FOLDER,
    region = region.getInfo()['coordinates'],
    scale = 30,
    fileFormat = 'TFRecord',
    maxPixels = 1e10,
    formatOptions = {
      'patchDimensions': shape,
      'compressed': True,
      'maxFileSize': 104857600
    }
  )
  task.start()

  # Block until the task completes.
  print('Running image export to Google Drive...')
  import time
  while task.active():
    time.sleep(30)

  # Error condition
  if task.status()['state'] != 'COMPLETED':
    print('Error with image export.')
  else:
    print('Image export completed.')

In [None]:
def doPrediction(out_image_base, kernel_shape, region):
  """Perform inference on exported imagery.
  """

  print('Looking for TFRecord files...')

  # Get a list of all the files in the output bucket.
  filesList = os.listdir(op.join(ROOT_DIR, FOLDER))

  # Get only the files generated by the image export.
  exportFilesList = [s for s in filesList if out_image_base in s]

  # Get the list of image files and the JSON mixer file.
  imageFilesList = []
  jsonFile = None
  for f in exportFilesList:
    if f.endswith('.tfrecord.gz'):
      imageFilesList.append(op.join(ROOT_DIR, FOLDER, f))
    elif f.endswith('.json'):
      jsonFile = f

  # Make sure the files are in the right order.
  imageFilesList.sort()

  from pprint import pprint
  pprint(imageFilesList)
  print(jsonFile)

  import json
  # Load the contents of the mixer file to a JSON object.
  with open(op.join(ROOT_DIR, FOLDER, jsonFile), 'r') as f:
    mixer = json.load(f)

  pprint(mixer)
  patches = mixer['totalPatches']

  # Get set up for prediction.

  imageColumns = [
    tf.io.FixedLenFeature(shape=kernel_shape, dtype=tf.float32) 
      for k in BANDS
  ]

  imageFeaturesDict = dict(zip(BANDS, imageColumns))

  def parse_image(example_proto):
    return tf.io.parse_single_example(example_proto, imageFeaturesDict)

  def toTupleImage(inputs):
    inputsList = [inputs.get(key) for key in BANDS]
    stacked = tf.stack(inputsList, axis=0)
    stacked = tf.transpose(stacked, [1, 2, 0])
    return stacked

   # Create a dataset from the TFRecord file(s) in Cloud Storage.
  imageDataset = tf.data.TFRecordDataset(imageFilesList, compression_type='GZIP')
  imageDataset = imageDataset.map(parse_image, num_parallel_calls=5)
  imageDataset = imageDataset.map(toTupleImage).batch(1)

  # Perform inference.
  print('Running predictions...')
  predictions = model.predict(imageDataset, steps=patches, verbose=1)
  # print(predictions[0])

  print('Writing predictions...')
  out_image_file = op.join(ROOT_DIR, FOLDER, f'{out_image_base}pred.TFRecord')
  writer = tf.io.TFRecordWriter(out_image_file)
  patches = 0
  for predictionPatch in predictions:
    print('Writing patch ' + str(patches) + '...')
    predictionPatch = tf.argmax(predictionPatch, axis=2)

    # Create an example.
    example = tf.train.Example(
      features=tf.train.Features(
        feature={
          'class': tf.train.Feature(
              float_list=tf.train.FloatList(
                  value=predictionPatch.numpy().flatten()))
        }
      )
    )
    # Write the example.
    writer.write(example.SerializeToString())
    patches += 1

  writer.close()

In [None]:
# Output assets folder: YOUR FOLDER
user_folder = 'users/seismosmsr/UNET_regression' # INSERT YOUR FOLDER HERE.

# Base file name to use for TFRecord files and assets.
image_base = 'FCNN_demo_Oahu'
# Half this will extend on the sides of each patch.
KERNEL_SHAPE = [128, 128]
# Beijing
#-123.3152617122404422,37.7933818681806812 : -121.5833326464016722,38.6741999222186763
OahuDistricts = ee.FeatureCollection('projects/ee-seismosmsr-landcover/assets/Neighborhood_Board_Subdistricts')
query_str = f'SD_DESC == "Makaha"'
makaha = OahuDistricts.filter(query_str).geometry() 
# makaha.getInfo()

In [None]:
doExport(image_base, KERNEL_SHAPE,makaha)

In [None]:
# Mount our Google Drive
from google.colab import drive
drive.mount('/content/drive')

In [None]:
import os as os
from os import path as op
model = m
ROOT_DIR = '/content/drive/My Drive/'
print(image_base)
doPrediction(image_base, KERNEL_SHAPE, makaha)

In [None]:
out_image = ee.Image('projects/ee-seismosmsr-landcover/assets/FCNN_demo_Oahu-mixer')
mapid = out_image.getMapId({'min': 0, 'max': 25, 'palette': ['63C600','E6E600','E9BD3A','ECB176','00A600','63C600','E6E600','E9BD3A','ECB176','00A600','63C600','E6E600','E9BD3A','ECB176','00A600','63C600','E6E600','E9BD3A','ECB176','00A600','63C600','E6E600','E9BD3A','ECB176','00A600','63C600','E6E600','E9BD3A','ECB176']})
map = folium.Map(location=[21.4, -158])
# map = folium.Map(location=[              
#               -29.177943749121233,
#               30.55984497070313,
# ])
folium.TileLayer(
    tiles=mapid['tile_fetcher'].url_format,
    attr='Map Data &copy; <a href="https://earthengine.google.com/">Google Earth Engine</a>',
    overlay=True,
    name='predicted crop type',
  ).add_to(map)
map.add_child(folium.LayerControl())
map