# Upload mangrove extent to GCS and as an asst to GEE

This notebook gets mangrove extent data, uploads to GCS, and creates an earthengine image collection asset. Change in extent between years is calculated and exported to image collections. The linear distance from mangrove habitat is exported to an image collection.

## Setup

In [1]:
import iso8601
import pystac
import python_jsonschema_objects
import rasterio
import rio_cogeo
import shapely
import zenodo_get
import tqdm
import pystac
import uuid
from google.cloud import storage
from osgeo import gdal
%run utils.ipynb

In [2]:
# Check shapely speedups are available
from shapely import speedups
speedups.enabled

True

### Set Cloud credentials (could be done through and env file)

In [4]:
# Set the Google Cloud params
gc_project_id = "mangrove-atlas-246414"
gc_creds = "../../datasets/mangrove-atlas-246414-d7476c5d2381.json"
gc_username = "tamara-huete-vizzuality-com@mangrove-atlas-246414.iam.gserviceaccount.com"
gcs_prefix = "gs://mangrove_atlas"
gcs_http_prefix = "https://storage.googleapis.com/mangrove_atlas"

### Upload the raw file to mangrove atlas GCS

In [45]:
import os
import subprocess

def copy_gcs(source_list, dest_list, opts=""):
  """
  Use gsutil to copy each corresponding item in source_list
  to dest_list.

  Example:
  copy_gcs(["gs://my-bucket/data-file.csv"], ["."])

  """
  for s, d  in zip(source_list, dest_list):
    cmd = f"gsutil -m cp -r {opts} {s} {d}"
    print(f"Processing: {cmd}")
    r = subprocess.call(cmd, shell=True)
    if r == 0:
        print("Task created")
    else:
        print("Task failed")

In [33]:
from google.cloud import storage

def upload_blob(bucket_name, source_file_name, destination_blob_name):
    """Uploads a file to the bucket."""
    # bucket_name = "your-bucket-name"
    # source_file_name = "local/path/to/file"
    # destination_blob_name = "storage-object-name"

    storage_client = storage.Client.from_service_account_json('../../datasets/service_account.json')
    bucket = storage_client.bucket(bucket_name)
    blob = bucket.blob(destination_blob_name)

    blob.upload_from_filename(source_file_name)

    print(
        "File {} uploaded to {}.".format(
            source_file_name, destination_blob_name
        )
    )

In [35]:
upload_blob('mangrove_atlas', '../../datasets/gmw_v3_fnl_mjr_v312_gtiff.tar.gz', 'land-cover/gmw_v3_fnl_mjr_v312_gtiff.tar.gz')

File ../../datasets/gmw_v3_fnl_mjr_v312_gtiff.tar.gz uploaded to land-cover/gmw_v3_fnl_mjr_v312_gtiff.tar.gz.


### Convert separate original tiles to one single file

In [39]:
# Set working directories
path_in = "../../datasets/gmw_v3_fnl_mjr_v312_gtiff"
path_out ="../../datasets/gmw_v3_extent"
os.makedirs(path_out, exist_ok=True)

In [41]:
%%time
for y in [1996, 2007, 2008, 2009, 2010, 2015, 2016, 2017, 2018, 2019, 2020]:
    print("\nBuilding VRT\n")
    !gdalbuildvrt {path_out}/gmw_v3_{y}.vrt {path_in}/gmw_v3_fnl_mjr_{y}_v312/*.tif
    print("\nBuilding GeoTIFF\n")
    !gdal_translate -co "BLOCKXSIZE=512" -co "BLOCKYSIZE=512" -co "TILED=YES" -co "BIGTIFF=YES" -co "COMPRESS=DEFLATE" {path_out}/gmw_v3_{y}.vrt {path_out}/gmw_v3_{y}.tif
    print("\nCopying files to GCS\n")
    upload_blob('mangrove_atlas', f'{path_out}/gmw_v3_{y}.tif', f'land-cover/gmw_v3_{y}.tif')


Building VRT

0...10...20...30...40...50...60...70...80...90...100 - done.

Building GeoTIFF

Input file size is 1620000, 337500
0...10...20...30...40...50...60...70...80...90...100 - done.

Copying files to GCS

File ../../datasets/gmw_v3_extent/gmw_v3_1996.tif uploaded to land-cover/gmw_v3_1996.tif.

Building VRT

0...10...20...30...40...50...60...70...80...90...100 - done.

Building GeoTIFF

Input file size is 1620000, 337500
0...10...20...30...40...50...60...70...80...90...100 - done.

Copying files to GCS

File ../../datasets/gmw_v3_extent/gmw_v3_2007.tif uploaded to land-cover/gmw_v3_2007.tif.

Building VRT

0...10...20...30...40...50...60...70...80...90...100 - done.

Building GeoTIFF

Input file size is 1620000, 337500
0...10...20...30...40...50...60...70...80...90...100 - done.

Copying files to GCS

File ../../datasets/gmw_v3_extent/gmw_v3_2008.tif uploaded to land-cover/gmw_v3_2008.tif.

Building VRT

0...10...20...30...40...50.ERROR 1: MissingRequired:TIFF directory is mis

### Create extent of mangrove forests Image collection
The dataset is multiple single images (on GCS), each with different timestamp.

Upload each image to imageCollection
Export as single image
Update image metadata
Delete tmp files and imageCollection

In [5]:
!earthengine authenticate --quiet

Paste the following address into a web browser:

    https://accounts.google.com/o/oauth2/auth?client_id=517222506229-vsmmajv00ul0bs7p89v5m89qs8eb9359.apps.googleusercontent.com&scope=https%3A%2F%2Fwww.googleapis.com%2Fauth%2Fearthengine+https%3A%2F%2Fwww.googleapis.com%2Fauth%2Fdevstorage.full_control&redirect_uri=urn%3Aietf%3Awg%3Aoauth%3A2.0%3Aoob&response_type=code&code_challenge=TZKRwRnwhB0GZTOkXvUS57sgspMUGUq22F-_ytEBcxs&code_challenge_method=S256

On the web page, please authorize access to your Earth Engine account and copy the authentication code. Next authenticate with the following command:

    earthengine authenticate --code-verifier=LkEpQi3nWt9OiwHf70qDyuswDjbaAjAPXSgiFkGXrPU --authorization-code=PLACE_AUTH_CODE_HERE



In [6]:
!earthengine authenticate --code-verifier=LkEpQi3nWt9OiwHf70qDyuswDjbaAjAPXSgiFkGXrPU --authorization-code=4/1AX4XfWhjJB02MX5cO96vaS-5OJmm0tBqdfVsZwjC874Qw-2k0tX-XC80Ehs


Successfully saved authorization token.


In [8]:
# Authenticate earthengine
import ee

# Trigger the authentication flow.
ee.Authenticate()

# Initialize the library.
ee.Initialize()

Enter verification code:  4/1AX4XfWgW4UKSdbIJpntdoNcygXtaicTtFrZgsYaT8clAEYhgvyZorjopRO0



Successfully saved authorization token.


In [None]:
### Get metadata from v2

In [9]:
storage_client = storage.Client.from_service_account_json('../../datasets/service_account.json')
bucket_name= 'mangrove_atlas'

In [33]:
# asset path
ee_asset_path = f"projects/global-mangrove-watch/land-cover/mangrove-extent" ## using generic name

# get description from file (currently using v2 as v3 is not available)

bucket = storage_client.get_bucket(bucket_name)
    # Create a blob object from the filepath
blob = bucket.blob("land-cover/mangrove-extent_version-2-0_1996--2016_description.md")
    # Download the file to a destination
blob.download_to_filename("../../datasets/mangrove-extent_version-2-0_1996--2016_description.md")
with open("../../datasets/mangrove-extent_version-2-0_1996--2016_description.md", "r") as f:
    description = f.read()
description = description.replace ("#","")

# set collection properties (these are compatible with Skydipper.Dataset.Metadata)
collection_properties = {
    'name': "Global extent of mangrove forests",
    'version': "3.0",
    'creator': "Global Mangrove Watch (GMW): Aberystwyth University/soloEO/Wetlands International/UNEP-WCMC/JAXA/DOB Ecology",
    'description': description,
    'identifier': "",
    'keywords': "Erosion; Coasts; Natural Infrastructure; Biodiversity; Blue Carbon; Forests; Mangroves; Landcover",
    'citation': "Bunting, Pete, Ake Rosenqvist, Richard M. Lucas, Lisa-Maria Rebelo, Lammert Hilarides, Nathan Thomas, Andy Hardy, Takuya Itoh, Masanobu Shimada, and C. Max Finlayson. “The Global Mangrove Watch—A New 2010 Global Baseline of Mangrove Extent.” Remote Sensing 10, no. 10 (October 2018): 1669. doi: 10.3390/rs10101669.",
    'license': "https://creativecommons.org/licenses/by/4.0/",
    'url': "",
    'language': 'en', 
    'altName': "Global Mangrove Watch, Version 3.0",
    'distribution': "",
    'variableMeasured': "Presence of mangrove forest habitat",
    'units': "1",
    'spatialCoverage': "Global tropics",
    'temporalCoverage': "1996--2020",
    'dataLineage': "Raster data supplied by Aberystwyth University (Dr. Dave Bunting) as tilesets per year, each tilset was combined, and added to Google earth engine as multi-temporal ImageCollection."
}

# set individual image properties (minimal)
image_properties = {
    "band_nodata_values": 0,
    "band_pyramiding_policies": "mode",
    "band_names": "lc"
}

In [None]:
#### Repeat but creating the Image collection first. if not, It does not show the Properties/Details/Description tabs
gee_create_image_collection(ee_asset_path, 
                            image_start_times= [f"{y}-01-01" for y in [1996, 2007, 2008, 2009, 2010, 2015, 2016, 2017, 2018, 2019, 2020]],
                            collection_properties = {}, dry_run=False)

In [32]:
list_paths("gs://mangrove_atlas", "land-cover", file_pattern="gmw_*.tif", gsutil=True, return_dir_path=False)


Searching gs://mangrove_atlas/land-cover/gmw_*.tif


Found 11 path(s)



['gs://mangrove_atlas/land-cover/gmw_v3_1996.tif',
 'gs://mangrove_atlas/land-cover/gmw_v3_2007.tif',
 'gs://mangrove_atlas/land-cover/gmw_v3_2008.tif',
 'gs://mangrove_atlas/land-cover/gmw_v3_2009.tif',
 'gs://mangrove_atlas/land-cover/gmw_v3_2010.tif',
 'gs://mangrove_atlas/land-cover/gmw_v3_2015.tif',
 'gs://mangrove_atlas/land-cover/gmw_v3_2016.tif',
 'gs://mangrove_atlas/land-cover/gmw_v3_2017.tif',
 'gs://mangrove_atlas/land-cover/gmw_v3_2018.tif',
 'gs://mangrove_atlas/land-cover/gmw_v3_2019.tif',
 'gs://mangrove_atlas/land-cover/gmw_v3_2020.tif']

In [47]:
# Create ee.ImageCollection, add collecion_properties, and create upload tasks
# You can use dry_run = True to check cmds before running process!
gee_upload_images_to_collection(
    uri_prefix = gcs_prefix,
    dir_path = "land-cover",
    file_pattern="gmw_v3*.tif",
    ee_asset_path = ee_asset_path,
    image_start_times = [f"{y}-01-01" for y in [1996, 2007, 2008, 2009, 2010, 2015, 2016, 2017, 2018, 2019, 2020]],
    band_names = ["lc"],
    band_pyramiding_policys = ["mode"],
    band_nodata_values = [0],
    collection_properties = collection_properties,
    image_properties=[image_properties],
    gcs_tmp_path = None,
    force=True,
    dry_run=False
    )


Getting file paths...

Searching gs://mangrove_atlas/land-cover/gmw_v3*.tif


Found 11 path(s)




Checking if collection exists...

ee.ImageCollection projects/global-mangrove-watch/land-cover/mangrove-extent exists, with properties
:
{
  "description": "# Global extent of mangrove forests\\n\\nThis dataset shows the global extent of mangroves for 1996, 2007, 2008, 2009, 2010, 2015 and 2016. This is the first global map of mangrove extent produced using an automated, reproducible, and globally consistent methodology, using a combination of optical and radar satellite data.\\n\\nMangroves are forested intertidal ecosystems that perform critical landscape-level functions related to the regulation of freshwater, nutrients, and sediment inputs into marine areas. They play a key role in helping to control the quality of marine coastal waters. Mangroves are of critical importance as breeding and nursery sites for birds, fish, and crustaceans. Additionally, they constitute important sinks f

Creating upload tasks:   0%|          | 0/11 [00:00<?, ?it/s]

Started upload task with ID: PN2VHC6SN25MPV55Q22MVDXG
Started upload task with ID: 7DY2O433Q4E6ULHMNPOMOSGS
Started upload task with ID: X53D37KP5VHR5B2CBKN74VQQ
Started upload task with ID: WA3E5OPMCSC6AW3O36RBYQ5K
Started upload task with ID: 4TERP53L5ZIFGPYWMRNA4OUS
Started upload task with ID: 6YW7GNKCX56GLB3YLNLX34VY
Started upload task with ID: 3A2OE5U2TCKF4LUWUKJYGZXK
Started upload task with ID: W3R35SQZ2SQJJYFUQOIFX43J
Started upload task with ID: DV6DQPMEIRNZTIM5I6HY7LYG
Started upload task with ID: NS6W54AOXNTCQS2BVRSHCKEP
Started upload task with ID: XSGVQFBRSHVWJ7G6B6LW4XJ6

Finished upload to projects/global-mangrove-watch/land-cover/mangrove-extent


In [10]:
# Check task status
!earthengine task list --status RUNNING

In [None]:
# Check ImageCollection properties
!earthengine asset info {ee_asset_path}