<a href="https://colab.research.google.com/github/JamesLabUofT/GEE_Workshop/blob/main/scripts/working_with_awesome_gee.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Extracting and Processing data from Awesome GEE Community Catalog


## Module Overview

This module introduces participants to accessing, analyzing, and visualizing forest-related geospatial datasets from the Awesome GEE Community Data Catalog using Google Earth Engine (GEE) and Python. Through hands-on exercises in Google Colab, participants will explore forest disturbance, land cover, and species composition across Canada using high-resolution datasets.

## Learning Objectives:
By the end of this module, participants will be able to:

* Understand the structure and content of selected forest datasets in the Awesome GEE Catalog
* Use Python and the Earth Engine API to access, filter, and visualize geospatial data
* Work with both raster (ImageCollections) and vector (FeatureCollections) datasets
* Perform spatial and temporal analysis on forest disturbance and land cover
Export processed data for further analysis or reporting

## Topics Covered:
**Introduction to the Awesome GEE Community Catalog**

* Overview of the catalog and its purpose
* Navigating the catalog and locating datasets

**Working with FeatureCollections: NBAC Dataset
Dataset: National Burned Area Composite (NBAC)**

* Understanding vector data in GEE
* Loading and filtering NBAC by year and region
* Visualizing burned areas on interactive maps
* Calculating burned area statistics
* Exporting filtered data to CSV or Google Drive

**Using the NBAC perimeters, lets explore some forest cover**

**Exploring Raster Datasets**

a. High-Resolution Annual Forest Land Cover (1984–2022)
* Visualizing annual forest cover - Year prior to fire
* Detecting forest cover change over time - how did the forest change in the from 2000-2022


b. Canada Landsat-Derived Forest Harvest Disturbance (1985–2020)
* Mapping harvest disturbances - were there any harvest disturbances detected within fire perimeters between 1985 and 2020?
* Comparing harvest vs. fire impacts
* Temporal analysis of disturbance trends

c. Canada Long-Term Tree Species (1984–2022)
* Mapping dominant tree species
* Overlaying with land cover for ecological insights

**Data Extraction and Visualization**

* Reducing data over custom regions (e.g., provinces, ecozones)
* Exporting time-series data to Pandas DataFrames
* Creating charts and maps using matplotlib and geemap
* Exporting results for use in reports or further analysis



## Introduction to the Awesome GEE Community Catalog

## Introduction to the Awesome GEE Community Catalog


✅ Expand access to valuable datasets not included in the official GEE catalog
🌍 Support reproducible science by sharing preprocessed, ready-to-use data
🧑‍🤝‍🧑 Foster collaboration across the global geospatial community
📈 Accelerate research in climate, ecology, forestry, urban studies, and more

https://gee-community-catalog.org/

**environment setup**

In [3]:
import ee
import geemap
import geopandas as gpd
import pandas as pd
import matplotlib.pyplot as plt
import numpy as np
import ipyleaflet

Authenticate and intialize ee

In [12]:
from google.colab import userdata
id = userdata.get('user_id')
ee.Initialize(project=id) # project='ee-userid'

# Working with Feature Collections: NBAC

In this section, participants will explore how to work with vector data in Google Earth Engine using the NBAC dataset, which maps burned areas across Canada. This hands-on module introduces the concept of FeatureCollections, a core data structure in GEE used to represent geographic features like points, lines, and polygons.

FeatureCollections are the vector equivalent of ImageCollections in Earth Engine. They store geographic features—such as fire perimeters, forest plots, or administrative boundaries—along with associated attributes (e.g., year, area, cause). In this session, we’ll use the National Burned Area Composite (NBAC) to:

* Load and explore a FeatureCollection
* Filter features by year and region
* Visualize fire perimeters on an interactive map
* Calculate statistics like total burned area
* Export selected features for further analysis

We'll start with loading in the NBAC from https://gee-community-catalog.org/projects/nbac/#citation

In [13]:
nbac = ee.FeatureCollection("projects/sat-io/open-datasets/CA_FOREST/NBAC/nbac_1972_2023_20240530")


Lets look at the data

In [36]:
print(nbac.size().getInfo(), nbac.first().propertyNames().getInfo())

48939 ['FIRECAUS', 'GID', 'BASRC', 'HS_SDATE', 'FIREMAPM', 'ADJ_FLAG', 'CAPDATE', 'YEAR', 'ADMIN_AREA', 'AG_EDATE', 'POLY_HA', 'PRESCRIBED', 'VERSION', 'NFIREID', 'FIREMAPS', 'AG_SDATE', 'ADJ_HA', 'NATPARK', 'HS_EDATE', 'system:index']


Lets get all fires from 2023

In [14]:
nbac_2023 = nbac.filter(ee.Filter.eq('YEAR', 2023))
print(nbac_2023.size().getInfo())

2199


Let's look just at the fires in Ontario

https://cwfis.cfs.nrcan.gc.ca/downloads/nbac/NBAC_1972to2024_20250506_shp_metadata.pdf

In [15]:
nbac_on_2023 = nbac_2023.filter(ee.Filter.eq('ADMIN_AREA', 'ON'))
nbac_on_2023.size().getInfo()

89

Always a good idea to check column or property types

In [40]:
nbac_on_2023.limit(0).getInfo()['columns']

{'ADJ_FLAG': 'String',
 'ADJ_HA': 'Float',
 'ADMIN_AREA': 'String',
 'AG_EDATE': 'Long',
 'AG_SDATE': 'Long',
 'BASRC': 'String',
 'CAPDATE': 'Long',
 'FIRECAUS': 'String',
 'FIREMAPM': 'String',
 'FIREMAPS': 'String',
 'GID': 'String',
 'HS_EDATE': 'Long',
 'HS_SDATE': 'Long',
 'NATPARK': 'String',
 'NFIREID': 'Float',
 'POLY_HA': 'Float',
 'PRESCRIBED': 'String',
 'VERSION': 'String',
 'YEAR': 'Float',
 'system:index': 'String'}

In [43]:
ee.Number(nbac_on_2023.first().get('YEAR')).getInfo()

2023

lets visualize it

In [None]:
centroid = nbac_on_2023.geometry().centroid(1).getInfo()['coordinates']
# geemap often automatically zooms to the layer extent when added,
# but explicitly setting a zoom level can help if needed.
# A zoom level between 8 and 10 is usually good for a provincial park.
zoom_level = 9

m = geemap.Map(center=(centroid[1], centroid[0]), zoom=zoom_level)

# Visualization parameters are not strictly needed for feature collections/geometries,
# but we can add a simple outline color if the default is hard to see.
fires_vis = {'color': 'FF0000'} # Red outline

# Add the AOI layer using aoi_ee
m.add_layer(nbac_on_2023, fires_vis, "Wildfires 2023")

# Add the layer manager to easily toggle layers
m.add_layer_manager()

m

Lets calculate some summary statistics

total area burned?

In [None]:
# Calculate the area of each feature and add it as a property
def add_area(feature):
  return feature.set({'area_sq_ha': feature.geometry().area().divide(10000)})

nbac_on_2023 = nbac_on_2023.map(add_area)

# Calculate the total area
total_area = nbac_on_2023.aggregate_sum('area_sq_ha')

# Print the total area
print('Total area:', total_area.getInfo())


## Exploring raster datasets using NBAC feature collection


[High-Resolution Annual Forest Land Cover (1984–2022)](https://gee-community-catalog.org/projects/ca_lc/)

**Exercise:**

Calculate the % area in each fire perimeter that is coniferous.


1.   Filter image collection by date
2.   Clip image to fire perimter for each feature and
3.   *Masks all none*
4.   calculate the percent area in each class
5.   export as csv



**Class codes for raster:**

* Class Code: 0	Unclassified
* Class Code: 20	Water
* Class Code: 31	Snow/Ice
* Class Code: 32	Rock/Rubble
* Class Code: 33	Exposed/Barren Land
* Class Code: 40	Bryoids
* Class Code: 50	Shrubs
* Class Code: 80	Wetland
* Class Code: 81	Wetland Treed
* Class Code: 100	Herbs
* Class Code: 210	Coniferous
* Class Code: 220	Broad Leaf
* Class Code: 230	Mixedwood

In [16]:
forest_lc = ee.ImageCollection("projects/sat-io/open-datasets/CA_FOREST_LC_VLCE2")
forest_lc

Lets break it down for one fire

In [130]:
first_fire = nbac_on_2023.first()


First we get the year and make a start and end date to filter the image collection

In [131]:
year = ee.Number(first_fire.get('YEAR')).toInt().subtract(1)
start_date = ee.Date.fromYMD(year, 1, 1)
end_date = ee.Date.fromYMD(year, 12, 31)

get the geometry of the fire

In [132]:
geom = first_fire.geometry()

Filter the image collection

In [133]:
 # Filter image collection by date and bounds
image = forest_lc.filterDate(start_date, end_date).filterBounds(geom).first()

Mask all values that are not coniferous.

This is a very important aspect of getting data from raster images.

In Google Earth Engine (GEE), masking is the process of hiding or excluding certain pixels in a raster image based on specific criteria—such as pixel values, quality flags, or spatial boundaries. This is crucial for focusing analysis only on relevant data, improving accuracy, and reducing computational load.

For example, when analyzing land cover, you might mask out all pixels except those representing forests. Masking ensures that calculations like area, statistics, or classification are applied only to meaningful regions, making it a foundational step in most remote sensing workflows.

In Google Earth Engine (GEE), creating and applying a mask involves a few clear steps:

* Define the Condition: Decide which pixels you want to keep. For example, to keep only pixels with a value of 210, you use image.eq(210).
* Create the Mask: This condition returns a binary image (mask) where pixels that meet the condition are 1 (true) and others are 0 (false).
* Apply the Mask: Use .updateMask(mask) on the original image. This hides all pixels where the mask is 0, keeping only the desired ones.
* Use the Masked Image: The resulting image can now be used for further analysis, such as calculating area, statistics, or visualization.

In [135]:
# clip image
img = ee.Image(image).clip(geom)
# create mask - all values except 210
class_210_mask = img.eq(210)
#calculate pixel area
pixel_area = ee.Image.pixelArea().divide(10000)  # hectares
# apply mask - masks all values except 210
area_image = pixel_area.updateMask(class_210_mask)

Calculate the area using the sum reducer


In [140]:
coniferous_area = area_image.reduceRegion(
            reducer=ee.Reducer.sum(),
            geometry=geom,
            scale=30,
            maxPixels=1e13
        ).get('area')

lets look at the results

In [141]:
coniferous_area

In [154]:
first_fire2 = first_fire.set('class_210_area_ha', coniferous_area).getInfo()

EEException: Number.divide: Parameter 'left' is required and may not be null.

How much of the fire is coniferous?

In [152]:

first_fire = first_fire.set('area_difference', ee.Number(first_fire.get('class_210_area_ha')).divide(ee.Number(first_fire.get('POLY_HA'))).multiply(100))

first_fire.get('area_difference').getInfo()

EEException: Number.divide: Parameter 'left' is required and may not be null.

We can make this easier by combining everything in to one function and mapping it over the feature collection

In [127]:
def calculate_class_210_area(feature):
    year = ee.Number(feature.get('YEAR')).toInt().subtract(1)
    start_date = ee.Date.fromYMD(year, 1, 1)
    end_date = ee.Date.fromYMD(year, 12, 31)
    geom = feature.geometry()

    # Filter image collection by date and bounds
    image = forest_lc.filterDate(start_date, end_date).filterBounds(geom).first()

    # If image exists, process it
    def process(img):
        img = ee.Image(img).clip(geom)
        class_210_mask = img.eq(210)
        pixel_area = ee.Image.pixelArea().divide(10000)  # hectares
        area_image = pixel_area.updateMask(class_210_mask)

        area = area_image.reduceRegion(
            reducer=ee.Reducer.sum(),
            geometry=geom,
            scale=30,
            maxPixels=1e13
        ).get('area')

        return feature.set('class_210_area_ha', area)

    return ee.Algorithms.If(image, process(image), feature.set('class_210_area_ha', 0))




In [None]:
#Apply to feature collection
result = nbac_on_2023.map(calculate_class_210_area)
result