# GLAD Agriculture Training Data Export Pipeline for Pretraining data

This notebook demonstrates the export pipeline for generating training data from the GLAD Global Cropland dataset. The pipeline:

1. Processes Landsat imagery and RAP vegetation cover data for multiple years (2003-2019)
2. Combines this with GLAD cropland labels
3. Creates spatially stratified training and evaluation samples
4. Exports the samples as TFRecord files to Google Cloud Storage

## Configuration

The pipeline requires several parameters:
- Storage parameters (bucket, folders, file prefixes)
- Sampling parameters (shards, samples per polygon, scales)
- Asset paths for vegetation cover data
- Export geometry defining the region of interest

## Usage

Simply configure the parameters and run the `process_and_export_samples` function. Progress will be printed for:
- Each year being processed
- Each polygon being sampled
- Individual export tasks being started

Note: The export process is contingent on having access to the RAP data, which is not publicly available for Mexico. This code is for publication purposes only. If you wish to use this pipeline using RAP data in CONUS, you must change `src.training_export_utils/process_and_export_samples/` and remove the Mexico asset from the RAP image collection (**Lines 30 - 52**, **lines 143 - 148**). 

In [1]:
import ee
from src.training_export_utils import process_and_export_samples

In [2]:
# Add to existing config.py

# Training export parameters
BUCKET = 'wlfw-tmp'
TRAINING_FOLDER = 'PRETRAIN_GLAD_DATA'
TRAINING_BASE = 'training_patches_'
EVAL_BASE = 'eval_patches_'

# Sample parameters
SHARDS_PER_POLY = 25
SAMPLES_PER_POLY = 250
SAMPLE_SCALE = 30
TILE_SCALE = 16

# Asset paths
MEXICO_COVER_ASSET = '///REDACTED///'
CONUS_COVER_ASSET = 'projects/rangeland-analysis-platform/vegetation-cover-v3'


In [3]:
# ee.Authenticate()

ee.Initialize()

GEOM_TO_EXPORT = ee.Geometry.Polygon(
        [[[-100.06872998114088, 53.54840419627541],
          [-116.06482373114088, 53.96410259798528],
          [-116.06482373114088, 48.931452079494676],
          [-129.95154248114088, 47.28847269264896],
          [-114.48279248114088, 19.324002572629446],
          [-91.01599560614086, 25.177505795258153],
          [-90.84021435614086, 40.85736734731578],
          [-93.56482373114088, 48.35070265825225],
          [-98.83826123114088, 49.27669529994239]]])

In [5]:
# Process and export samples
process_and_export_samples(
    geom_to_export=GEOM_TO_EXPORT,
    mexico_cover_asset=MEXICO_COVER_ASSET,
    conus_cover_asset=CONUS_COVER_ASSET,
    bucket=BUCKET,
    training_folder=TRAINING_FOLDER,
    training_base=TRAINING_BASE,
    eval_base=EVAL_BASE
)


Processing 9 polygons for year 2003
Processing polygon 1/9 for year 2003
  - Generating shard 1/25
  - Generating shard 6/25
  - Generating shard 11/25
  - Generating shard 16/25
  - Generating shard 21/25
  Started export task: glad_ag_training_patches_2003_g0
Processing polygon 2/9 for year 2003
  - Generating shard 1/25
  - Generating shard 6/25
  - Generating shard 11/25
  - Generating shard 16/25
  - Generating shard 21/25
  Started export task: glad_ag_training_patches_2003_g1
Processing polygon 3/9 for year 2003
  - Generating shard 1/25
  - Generating shard 6/25
  - Generating shard 11/25
  - Generating shard 16/25
  - Generating shard 21/25


KeyboardInterrupt: 