[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/forestdatapartnership/whisp/blob/main/notebooks/Colab_whisp_geojson_to_csv.ipynb)

# Process a GeoJSON

Python Notebook pathway for [Whisp](https://openforis.org/solutions/whisp/) running in the cloud via [Google Colab](https://colab.google/).

- Use this notebook for datasets up to around 100,000 features. For larger datasets consider the 'whisp_ee_asset_to_drive.ipynb' notebook.
- Please log any issues/requests [here](https://github.com/forestdatapartnership/whisp/issues)

**To open:**
click badge at top.

**To run:** click play buttons (or press shift + enter)

**Requirements:** Google Earth Engine (GEE) account and registered cloud project.



- **Aim:** support compliance with zero deforestation regulations
- **Input**: geojson file of plot boundaries or points
- **Output**: CSV table and geojson containing statistics and risk indicators

### Setup Google Earth Engine

In [3]:
import ee
import datetime

ee.Reset()  # Reset Earth Engine to clear any previous sessions

# Google Earth Engine project name
gee_project_name = "ee-itobonifacius" # change to your project name. If unsure see here: https://developers.google.com/earth-engine/cloud/assets)

# NB opens browser to allow access
ee.Authenticate()

# initialize with chosen project
ee.Initialize(project=gee_project_name, opt_url='https://earthengine-highvolume.googleapis.com' )

### Install and import packages

In [4]:
# Install openforis-whisp (if not already installed)
!pip install --pre openforis-whisp

Collecting openforis-whisp
  Downloading openforis_whisp-3.0.0a11-py3-none-any.whl.metadata (14 kB)
Collecting country_converter<2.0.0,>=0.7 (from openforis-whisp)
  Downloading country_converter-1.3.2-py3-none-any.whl.metadata (25 kB)
Collecting exactextract>=0.2.0 (from openforis-whisp)
  Downloading exactextract-0.3.0-cp312-cp312-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl.metadata (15 kB)
Collecting geojson<3.0.0,>=2.5.0 (from openforis-whisp)
  Downloading geojson-2.5.0-py2.py3-none-any.whl.metadata (15 kB)
Collecting pandera<1.0.0,>=0.22.1 (from pandera[io]<1.0.0,>=0.22.1->openforis-whisp)
  Downloading pandera-0.29.0-py3-none-any.whl.metadata (10 kB)
Collecting rio-vrt>=0.3.0 (from openforis-whisp)
  Downloading rio_vrt-0.3.1-py3-none-any.whl.metadata (3.6 kB)
Collecting typing_inspect>=0.6.0 (from pandera<1.0.0,>=0.22.1->pandera[io]<1.0.0,>=0.22.1->openforis-whisp)
  Downloading typing_inspect-0.9.0-py3-none-any.whl.metadata (1.5 kB)
Collecting jedi>=0.16 (from ipython>=7.2

In [5]:
import openforis_whisp as whisp

### Get a geojson

- Files are stored tempoarily and can be viewed in a panel on the left (click on Folder icon to view).
- Press refresh if updates are not showing
- Alternatively you can work with files in your Google Drive: drive.mount('/content/drive')

In [6]:
#function to upload a geojson file. Download example here: https://github.com/forestdatapartnership/whisp/tree/main/tests/fixtures
def import_geojson():
    from google.colab import files
    fn, content = next(iter(files.upload().items()))
    with open(f'/content/{fn}', 'wb') as f: f.write(content)
    return f'/content/{fn}'

In [7]:
GEOJSON_EXAMPLE_FILEPATH = import_geojson()
print(f"GEOJSON_EXAMPLE_FILEPATH: {GEOJSON_EXAMPLE_FILEPATH}")

Saving Merged_enimiro.geojson to Merged_enimiro.geojson
GEOJSON_EXAMPLE_FILEPATH: /content/Merged_enimiro.geojson


### Prepare inputs

In [8]:
# Choose if want to include additional custom layers
USE_CUSTOM_BANDS = False # set to True if want to add extra ee data to whisp

In [9]:
# =============================================================================
# CUSTOM BANDS SETUP (OPTIONAL) - runs only if USE_CUSTOM_BANDS = True above
# =============================================================================
if USE_CUSTOM_BANDS:

    # Step 1: Define custom Earth Engine images (binary values 0 or 1). Currently showing example placeholders.
    custom_images = {
        'example_treecover': ee.Image(1),  # ee.Image("UMD/hansen/global_forest_change_2024_v1_12").select("treecover2000").gt(10).selfMask()
        'nXX_example_commodity': ee.Image(1) # ee.ImageCollection("projects/forestdatapartnership/assets/cocoa/model_2025a").filter(ee.Filter.date('2020-01-01', '2021-01-01')).mosaic().gt(.8).selfMask()
        # add more images as needed (prefix 'nXX_' = iso2 code for national dataset)
    }

    # Step 2: Define metadata for each custom band (keys must match above)
    # Themes: 'treecover', 'commodities', 'disturbance_before', 'disturbance_after'
    # Timber themes: 'primary', 'naturally_reg_2020', 'planted_plantation_2020', etc.
    custom_bands_info = {
        'example_treecover': {
            'ISO2_code': "",          # Country code (empty = all countries)
            'theme': 'treecover',     # Risk theme
            'theme_timber': "",       # Timber theme (if applicable)
            'use_for_risk': 1,        # Include in risk calculations (1=yes, 0=no)
            'use_for_risk_timber': 0  # Include in timber risk (1=yes, 0=no)
        },
        'nXX_example_commodity': {
            'ISO2_code': "XX",
            'theme': 'commodities',
            'theme_timber': "",
            'use_for_risk': 1,
            'use_for_risk_timber': 0
        }
        # add more band metadata as needed
    }

    # Step 3: Combine custom bands and extract names
    custom_ee_image = whisp.combine_custom_bands(custom_images, custom_bands_info)

    custom_bands = list(custom_bands_info.keys())


In [10]:
# Choose additional national datasets to include (currently four countries: 'co', 'ci', 'br', 'cm').
base_iso2_codes = ['co', 'ci', 'br', 'cm']

# automatically add any custom ISO2 codes from custom_bands_info if USE_CUSTOM_BANDS is True
iso2_codes_list = base_iso2_codes.copy()
if USE_CUSTOM_BANDS:
    iso2_codes_list += [code.lower() for code in {v.get('ISO2_code') for v in custom_bands_info.values()} if code and code.lower() not in iso2_codes_list]

In [11]:
# Create final Whisp image
whisp_image = whisp.combine_datasets(national_codes=iso2_codes_list)

if USE_CUSTOM_BANDS and 'custom_ee_image' in locals():
    whisp_image = whisp_image.addBands(custom_ee_image)

print(f"Final image has {len(whisp_image.bandNames().getInfo())} bands")

Whisp multiband image compiled
Final image has 225 bands


### Run stats processing

In [12]:
df_stats = whisp.whisp_formatted_stats_geojson_to_df(
    input_geojson_filepath=GEOJSON_EXAMPLE_FILEPATH,
    # external_id_column="user_id", # optional -  specify which input column/property to map to the external ID.
    national_codes=iso2_codes_list,  # optional - By default national datasets are not included unless specified here.
    # unit_type='percent', # optional - to change unit type. Default is 'ha'.
    whisp_image=whisp_image, # optional - defaults to standard whisp image if not provided
    custom_bands=custom_bands if USE_CUSTOM_BANDS else None,  # include custom bands in formatted output
    mode="concurrent" # runs processing for multiple batches in parallel. Ideal for large datasets.
)

INFO: Mode: concurrent
INFO: Loaded 373 features
INFO: Processing 373 features in 38 batches (concurrent mode)...
Attempting to fix by stripping Z coordinates...
Successfully converted after stripping Z coordinates

Attempting to fix by stripping Z coordinates...
INFO: Progress: 4/38 batches (10%) | Elapsed: 8s | ETA: calculating...
INFO: Progress: 8/38 batches (21%) | Elapsed: 10s | ETA: 45s
INFO: Progress: 12/38 batches (31%) | Elapsed: 15s | ETA: 37s
INFO: Progress: 16/38 batches (42%) | Elapsed: 17s | ETA: 27s
INFO: Progress: 19/38 batches (50%) | Elapsed: 21s | ETA: 25s
INFO: Progress: 23/38 batches (60%) | Elapsed: 23s | ETA: 17s
INFO: Progress: 27/38 batches (71%) | Elapsed: 25s | ETA: 12s
INFO: Progress: 31/38 batches (81%) | Elapsed: 28s | ETA: 7s
INFO: Progress: 35/38 batches (92%) | Elapsed: 48s | ETA: 5s
INFO: Progress: 38/38 batches (100%) | Total time: 1.3m
INFO: Processing complete: 38/38 batches in 1.3m
INFO: Processing complete: 372 features
INFO: Concurrent processing

### Display results

In [13]:
df_stats

Unnamed: 0,plotId,external_id,Area,Geometry_type,Country,ProducerCountry,Admin_Level_1,Centroid_lon,Centroid_lat,Unit,...,nBR_INPE_TCamz_cer_annual_2020,nBR_MapBiomas_col9_soy_2020,nBR_MapBiomas_col9_annual_crops_2020,nBR_INPE_TCamz_pasture_2020,nBR_INPE_TCcer_pasture_2020,nBR_MapBiomas_col9_pasture_2020,nCI_Cocoa_bnetd,nCM_Treecover_2020,geo,whisp_processing_metadata
0,1,,0.121,Polygon,UGA,UG,Central,32.992336,0.307209,ha,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,"{'type': 'Polygon', 'coordinates': [[[32.99212...","{'whisp_version': '3.0.0a11', 'processing_time..."
1,2,,0.280,Polygon,UGA,UG,Central,32.912133,0.274172,ha,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,"{'type': 'Polygon', 'coordinates': [[[32.91187...","{'whisp_version': '3.0.0a11', 'processing_time..."
2,3,,0.206,Polygon,UGA,UG,Central,32.979236,0.194351,ha,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,"{'type': 'Polygon', 'coordinates': [[[32.97897...","{'whisp_version': '3.0.0a11', 'processing_time..."
3,4,,0.130,Polygon,UGA,UG,Central,32.958102,0.268466,ha,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,"{'type': 'Polygon', 'coordinates': [[[32.95787...","{'whisp_version': '3.0.0a11', 'processing_time..."
4,5,,0.354,Polygon,UGA,UG,Central,32.909303,0.271915,ha,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,"{'type': 'Polygon', 'coordinates': [[[32.90913...","{'whisp_version': '3.0.0a11', 'processing_time..."
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
367,369,,0.251,Polygon,UGA,UG,Western,30.202890,-0.172550,ha,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,"{'type': 'Polygon', 'coordinates': [[[30.20251...","{'whisp_version': '3.0.0a11', 'processing_time..."
368,370,,0.065,Polygon,UGA,UG,Western,30.176853,-0.227416,ha,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,"{'type': 'Polygon', 'coordinates': [[[30.17673...","{'whisp_version': '3.0.0a11', 'processing_time..."
369,371,,0.065,Polygon,UGA,UG,Western,30.176853,-0.227416,ha,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,"{'type': 'Polygon', 'coordinates': [[[30.17673...","{'whisp_version': '3.0.0a11', 'processing_time..."
370,372,,0.038,Polygon,UGA,UG,Western,30.231406,-0.166830,ha,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,"{'type': 'Polygon', 'coordinates': [[[30.23129...","{'whisp_version': '3.0.0a11', 'processing_time..."




### Add risk category columns

In [14]:
# adds risk columns to end of dataframe
df_w_risk = whisp.whisp_risk(
    df=df_stats,
    national_codes=iso2_codes_list,
    custom_bands_info=custom_bands_info if USE_CUSTOM_BANDS else None,  # Add: missing custom bands
    # drop_unused_columns=True # set to True to remove stats columns not used for risk
)

Using unit type: ha
Including additional national data for: ['co', 'ci', 'br', 'cm']


### Display updated table
- Scroll to far right to see additions

In [15]:
df_w_risk

Unnamed: 0,plotId,external_id,Area,Geometry_type,Country,ProducerCountry,Admin_Level_1,Centroid_lon,Centroid_lat,Unit,...,Ind_05_primary_2020,Ind_06_nat_reg_forest_2020,Ind_07_planted_plantations_2020,Ind_08_planted_plantations_after_2020,Ind_09_treecover_after_2020,Ind_10_agri_after_2020,Ind_11_logging_concession_before_2020,risk_pcrop,risk_acrop,risk_timber
0,1,,0.121,Polygon,UGA,UG,Central,32.992336,0.307209,ha,...,no,no,no,no,yes,yes,no,low,low,low
1,2,,0.280,Polygon,UGA,UG,Central,32.912133,0.274172,ha,...,no,yes,no,no,yes,yes,no,low,low,low
2,3,,0.206,Polygon,UGA,UG,Central,32.979236,0.194351,ha,...,no,yes,no,no,yes,yes,no,low,low,low
3,4,,0.130,Polygon,UGA,UG,Central,32.958102,0.268466,ha,...,no,no,no,no,yes,yes,no,low,low,low
4,5,,0.354,Polygon,UGA,UG,Central,32.909303,0.271915,ha,...,no,yes,no,no,yes,yes,no,low,low,low
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
367,369,,0.251,Polygon,UGA,UG,Western,30.202890,-0.172550,ha,...,no,no,no,no,no,yes,no,low,low,low
368,370,,0.065,Polygon,UGA,UG,Western,30.176853,-0.227416,ha,...,no,yes,no,no,yes,yes,no,low,more_info_needed,high
369,371,,0.065,Polygon,UGA,UG,Western,30.176853,-0.227416,ha,...,no,yes,no,no,yes,yes,no,low,more_info_needed,high
370,372,,0.038,Polygon,UGA,UG,Western,30.231406,-0.166830,ha,...,no,yes,no,no,no,yes,no,low,low,low


### Export table with risk columns to CSV (temporary storage)

In [17]:
# Generate timestamp
timestamp = datetime.datetime.now().strftime("%Y_%m_%d_%H_%M")

# Add a suffix to indicate notebook pathway (e.g., '_nb' for notebook)
output_csv_filename = f"whisp_analysis_{timestamp}_nb.csv"

df_w_risk.to_csv(output_csv_filename,index=False)

### Export table with risk columns to geojson (temporary storage)

In [18]:
# Add a suffix to indicate notebook pathway (e.g., '_nb' for notebook)
output_geojson_filename = f"whisp_analysis_{timestamp}_nb.geojson"
whisp.convert_df_to_geojson(df_w_risk,output_geojson_filename) # builds a geojson file containing Whisp columns. Uses the geometry column "geo" to create the spatial features.

GeoJSON saved to whisp_analysis_2026_02_18_12_29_nb.geojson


### Download outputs to local storage
- Saves files in "Downloads" folder on your machine
- If you see a "Downloads blocked" button at top of browser click to allow file downloads.
- Alternatively right click on file in the folder (in the panel on your left) and choose 'Download'.

In [19]:
from google.colab import files
files.download(output_csv_filename)

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

In [20]:
files.download(output_geojson_filename) # spatial output

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>