<h1>Table of Contents<span class="tocSkip"></span></h1>
<div class="toc"><ul class="toc-item"><li><span><a href="#Pre-requisites" data-toc-modified-id="Pre-requisites-1"><span class="toc-item-num">1&nbsp;&nbsp;</span>Pre-requisites</a></span></li><li><span><a href="#Instructions" data-toc-modified-id="Instructions-2"><span class="toc-item-num">2&nbsp;&nbsp;</span>Instructions</a></span></li><li><span><a href="#Imports-and-Constants" data-toc-modified-id="Imports-and-Constants-3"><span class="toc-item-num">3&nbsp;&nbsp;</span>Imports and Constants</a></span></li><li><span><a href="#Constants" data-toc-modified-id="Constants-4"><span class="toc-item-num">4&nbsp;&nbsp;</span>Constants</a></span></li><li><span><a href="#Export-Images" data-toc-modified-id="Export-Images-5"><span class="toc-item-num">5&nbsp;&nbsp;</span>Export Images</a></span></li></ul></div>

## Pre-requisites
Register a Google account at [https://code.earthengine.google.com](https://code.earthengine.google.com). This process may take a couple of days. Without registration, the `ee.Initialize()` command below will throw an error message.

## Instructions

This notebook exports Landsat satellite image composites clusters from Google Earth Engine.

The images are saved in gzipped TFRecord format. By default, this notebook exports images to Google Drive. If you instead prefer to export images to Google Cloud Storage (GCS), change the `EXPORT` constant below to `'gcs'` and set `BUCKET` to the desired GCS bucket name.


|      | Google Drive (default) | GCS
|------|:-----------------------|:---
| VR  | `idhm_tfrecords_raw/`   | `{BUCKET}/idhm_tfrecords_raw/`

Once the images have finished exporting, download the exported TFRecord files to the following folder:

- VR: `data/idhm_tfrecords_raw/`

The folder structure should look as follows:

```
data/
    idhm_tfrecords_raw/
        group1_2010_00.tfrecord.gz
        group1_2010_01.tfrecord.gz
        ...
        group30_2010_XX.tfrecord.gz
```

## Imports and Constants

In [None]:
%load_ext autoreload
%autoreload 2

In [None]:
from google.colab import drive
drive.mount('/content/drive')

Drive already mounted at /content/drive; to attempt to forcibly remount, call drive.mount("/content/drive", force_remount=True).


In [None]:
from __future__ import annotations

import math
from typing import Any, Optional

import ee
import pandas as pd

%cd /content/drive/MyDrive/USP/TCC/code
from preprocessing import ee_utils

/content/drive/MyDrive/USP/TCC/code


Before using the Earth Engine API, you must perform a one-time authentication that authorizes access to Earth Engine on behalf of your Google account you registered at [https://code.earthengine.google.com](https://code.earthengine.google.com). The authentication process saves a credentials file to `$HOME/.config/earthengine/credentials` for future use.

The following command `ee.Authenticate()` runs the authentication process. Once you successfully authenticate, you may comment out this command because you should not need to authenticate again in the future, unless you delete the credentials file. If you do not authenticate, the subsequent `ee.Initialize()` command below will fail.

For more information, see [https://developers.google.com/earth-engine/python_install-conda.html](https://developers.google.com/earth-engine/python_install-conda.html).

This link can help you to understand the functions related to earth engine  https://developers.google.com/earth-engine/apidocs

In [None]:
ee.Authenticate()

To authorize access needed by Earth Engine, open the following URL in a web browser and follow the instructions. If the web browser does not start automatically, please manually browse the URL below.

    https://code.earthengine.google.com/client-auth?scopes=https%3A//www.googleapis.com/auth/earthengine%20https%3A//www.googleapis.com/auth/devstorage.full_control&request_id=nTArPONtwdOcH9Dwytjpg3bX78Vkym_c8kBBgakSXq8&tc=ud7GDD1k-CMHDi5BCBhFKI-SM0qi-cCyO_kxzTWujF8&cc=5CLIlUyw1q-QTJSnRJfxjwuEl5B0sQ9MhJEdY3jt8dI

The authorization workflow will generate a code, which you should paste in the box below.
Enter verification code: 4/1AfgeXvsIOOUx94TWvd5VIEu7keE31b7jAJsBgSdGw5E_s3gw_BmB-a34x5g

Successfully saved authorization token.


In [None]:
ee.Authenticate()

In [None]:
ee.Initialize()  # initialize the Earth Engine API

## Constants

In [None]:
# ========== ADAPT THESE PARAMETERS ==========

# To export to Google Drive, uncomment the next 2 lines
EXPORT = 'drive'
BUCKET = None

# To export to Google Cloud Storage (GCS), uncomment the next 2 lines
# and set the bucket to the desired bucket name
# EXPORT = 'gcs'
# BUCKET = 'mybucket'

# export location parameters
EXPORT_FOLDER = '/content/drive/MyDrive/USP/TCC/data/images/idhm_tfrecords_raw_2'

# Set CHUNK_SIZE to None to export a single TFRecord file per (country, year). However,
# this may fail if it exceeds Google Earth Engine memory limits. Decrease CHUNK_SIZE
# to a small number (<= 50) until Google Earth Engine stops reporting memory errors
CHUNK_SIZE = 50

In [None]:
# ========== DO NOT MODIFY THESE ==========

# input data paths
CSV_PATH = '/content/drive/MyDrive/USP/TCC/data/vr_clusters_2.csv'

# band names
MS_BANDS = ['BLUE', 'GREEN', 'RED', 'NIR', 'SWIR1', 'SWIR2', 'TEMP1']

# image parameters
PROJECTION = 'EPSG:3857'  # see https://epsg.io/3857
SCALE = 30                # export resolution: 30m/px
EXPORT_TILE_RADIUS = 127  # image dimension = (2*EXPORT_TILE_RADIUS) + 1 = 255px

## Export Images

In [None]:
def export_images(df: pd.DataFrame,
                  group: str,
                  year: int,
                  export_folder: str,
                  chunk_size: Optional[int] = None
                  ) -> dict[tuple[str, str, int, int], ee.batch.Task]:
    '''
    Args
    - df: pd.DataFrame, contains columns ['lat', 'lon', 'group', 'year']
    - group: str, together with `year` determines the survey to export
    - year: int, together with `group` determines the survey to export
    - export_folder: str, name of folder for export
    - chunk_size: int, optionally set a limit to the # of images exported per TFRecord file
        - set to a small number (<= 50) if Google Earth Engine reports memory errors

    Returns: dict, maps task name tuple (export_folder, group, year, chunk) to ee.batch.Task
    '''
    subset_df = df[(df['group'] == group) & (df['year'] == year)].reset_index(drop=True)
    if chunk_size is None:
        chunk_size = len(subset_df)
    num_chunks = int(math.ceil(len(subset_df) / chunk_size))
    tasks = {}

    for i in range(num_chunks):
        chunk_slice = slice(i * chunk_size, (i+1) * chunk_size - 1)  # df.loc[] is inclusive
        fc = ee_utils.df_to_fc(subset_df.loc[chunk_slice, :])
        start_date, end_date = ee_utils.surveyyear_to_range(year)

        # create 3-year Landsat composite image
        roi = fc.geometry()
        imgcol = ee_utils.LandsatSR(roi, start_date=start_date, end_date=end_date).merged
        # ee.ImageCollection.map: Maps an algorithm over a collection.
        # ee.ImageCollection.select: Returns the image collection with selected bands.
        imgcol = imgcol.map(ee_utils.mask_qaclear).select(MS_BANDS)
        # ee.ImageCollection.median : Reduces an image collection by calculating the median of all values at each pixel across the stack of all matching bands. Bands are matched by name.
        img = imgcol.median()

        # add nightlights, latitude, and longitude bands
        img = ee_utils.add_latlon(img)
        img = img.addBands(ee_utils.composite_nl(year))

        fname = f'{group}_{year}_{i:02d}'
        tasks[(export_folder, group, year, i)] = ee_utils.get_array_patches(
            img=img, scale=SCALE, projection=PROJECTION, ksize=EXPORT_TILE_RADIUS,
            points=fc, export=EXPORT,
            prefix=export_folder, fname=fname,
            bucket=BUCKET)
    return tasks

In [None]:
tasks: dict[tuple[str, str, int, int], ee.batch.Task] = {}

In [None]:
idhm_df = pd.read_csv(CSV_PATH, float_precision='high', index_col=False)
idhm_surveys = list(idhm_df.groupby(['group', 'year']).groups.keys())

for group, year in idhm_surveys:
    new_tasks = export_images(
        df=idhm_df, group=group, year=year,
        export_folder=EXPORT_FOLDER, chunk_size=CHUNK_SIZE)
    tasks.update(new_tasks)

Check on the status of each export task at https://code.earthengine.google.com/, or run the following cell which checks every minute. Once all tasks have completed, download the IDHM TFRecord files to data/idhm_tfrecords_raw/