# Demonstration of EPISTEM-x Module 
This notebook contain the implementation of the source code for each module in the EPISTEM land cover mapping framework

## Library import and earth engine initialization
If you have earth engine account you could used that to authenticate and initialize the earth engine. However, if you did not have the account, service account initialization is avaliable

In [1]:
#This code is used if the notebook is implemented in github codespace. Just remove the (#)
#!python -m pip install ../epistemx --quiet

In [1]:
import ee 
import epistemx

#Option 1: Manual authenticate using personal account
#Instructions for manual authentication
epistemx.print_auth_instructions()
#uncomment the below line and follow earth engine authentication process
#epistemx.authenticate_manually()

#Option 2: Autheticate using service account (json file)
service_account_path = '../auth/ee-rg2icraf-ecab9c534f91.json'
success = epistemx.initialize_with_service_account(service_account_path)

if success:
    print("Earth Engine initialized with service account successfully!")
else:
    print("Service account initialization failed. Try to authenticate earth engine manually")

#Check authentication status
status = epistemx.get_auth_status()
print(f"Initialized: {status['initialized']}")
print(f"Authenticated: {status['authenticated']}")
if status['project']:
    print(f"Project: {status['project']}")

Earth Engine initialized successfully

    EARTH ENGINE AUTHENTICATION NOTES:
    
    1. Make sure you already have a google cloud project that has enable the Earth Engine API and registered to 
       commercial or non-commercial use. For more information visit: https://developers.google.com/earth-engine/guides/access 
    
    2. you can authenticate programmatically by calling: from epistemx.ee_config import authenticate_manually
       authenticate_manually()
    
    3. This will open a web browser. Sign in with your Google account that has Earth Engine access.
    
    4. Copy the authorization code from the browser and paste it in the terminal.
    
    
    For more details, visit: https://developers.google.com/earth-engine/guides/python_install
    


Service account initialization failed: ('invalid_grant: Invalid JWT Signature.', {'error': 'invalid_grant', 'error_description': 'Invalid JWT Signature.'})


Service account initialization failed. Try to authenticate earth engine manually
Initialized: True
Authenticated: True
Project: projects/ee-agilakbarfahrezi/assets/AOICImanukhulu


In [2]:
#!python -m pip install ../epistemx --quiet
import geemap
from epistemx.module_1 import Reflectance_Data, Reflectance_Stats
from epistemx.helpers import get_aoi_from_gaul

## Module 1: Acquisition of Near-Cloud-Free Satellite Imagery

### System Response 1.1: Area of Interest Definition

In [3]:
#Set the country and province for the AOI using GAUL admin boundaries
aoi = get_aoi_from_gaul(country="Indonesia", province="Sumatera Selatan")
#Alternatively, used geemap_shp_to_ee to directly used shapefile in your local machine

### System Response 1.2: Search and Filter Imagery
The EPISTEM source code supports Landsat mission data, ranging from Landsat 1 to Landsat 9. For Landsat 1 - 3, the avaliable data is corrected radiance reflectance. The Landsat 5-9 used here is collection 2 surface reflectance (SR) analysis ready data.

The retrival logic used here is as follow:
1. Retrive multispectral bands (band 1 - 7) from landsat collection 2 SR data (if avaliable)
2. Retrive thermal band from landsat collection 2 TOA data 
3. Create temporal composite for each data 
4. Stacked the final two data into a earth engine image (ee.image)

In [4]:
#========== FIRST RETRIVE THE MULTISPECTRAL BAND===========
#Intialize the relfectance class data function
optical_reflectance = Reflectance_Data()
#define the start and end date for imagery collection
start = '2017-01-01'
end = '2017-12-31'
#get the image collection and corresponding statistics
landsat_data, meta = optical_reflectance.get_optical_data(aoi, start, end, optical_data='L8_SR', 
                                                           cloud_cover=40, compute_detailed_stats=False)
#create mosaic between image collection, and clip based on AOI
mosaic_landsat = landsat_data.mosaic().clip(aoi)
#Alternatively you can use temporal aggregation (ee reducer) to create mode cloudless imagery
median_landsat = landsat_data.median().clip(aoi)
#visualization parameter
l8_sr_visparam = {'min': 0,'max': 0.4,'gamma': [0.95, 1.1, 1],'bands':['NIR', 'RED', 'GREEN']}
#Add the data to the map
Map = geemap.Map()
Map.addLayer(mosaic_landsat, l8_sr_visparam, 'L8 SR Mosaic')
Map.addLayer(median_landsat, l8_sr_visparam, 'L8 SR Median')
Map.addLayer(landsat_data, l8_sr_visparam, 'L8 SR Image Collection')
# set center of the map in the area of interest
Map.centerObject(aoi, 7)

2025-10-28 00:26:45,152 - Reflectance_Data - INFO - ReflectanceData initialized.
2025-10-28 00:26:45,154 - Reflectance_Data - INFO - Starting data fetch for Landsat 8 Operational Land Imager Surface Reflectance
2025-10-28 00:26:45,155 - Reflectance_Data - INFO - Date range: 2017-01-01 to 2017-12-31
2025-10-28 00:26:45,156 - Reflectance_Data - INFO - Cloud cover threshold: 40%
2025-10-28 00:26:45,157 - Reflectance_Data - INFO - detailed statistics will not be computed
2025-10-28 00:26:45,158 - Reflectance_Stats - INFO - Reflectance Stats initialized.
2025-10-28 00:26:45,159 - Reflectance_Data - INFO - Filtered collection created (use compute_detailed_stats=True for more information)


In [5]:
#retive thermal bands from TOA
thermal_bands, thermal_stats = optical_reflectance.get_thermal_bands(aoi, start, end, cloud_cover=40, compute_detailed_stats=False)
median_thermal = thermal_bands.median().clip(aoi)
thermal_vis = {
    'min': 286,
    'max': 300,
    'gammma': 0.4
}
#stacked all landsat bands
stacked_landsat = median_landsat.addBands(median_thermal)
#visualize the thermal bands and multispectral bands
Map.addLayer(median_thermal, thermal_vis, "Thermal Bands")
Map

2025-10-28 00:26:52,457 - Reflectance_Stats - INFO - Reflectance Stats initialized.
2025-10-28 00:26:52,458 - Reflectance_Data - INFO - Starting thermal data fetch for Landsat 8 Top-of-atmosphere reflectance
2025-10-28 00:26:52,460 - Reflectance_Data - INFO - Date range: 2017-01-01 to 2017-12-31
2025-10-28 00:26:52,461 - Reflectance_Data - INFO - Cloud cover threshold: 40%
2025-10-28 00:26:52,462 - Reflectance_Data - INFO - Fast mode enabled - detailed statistics will not be computed
2025-10-28 00:26:52,464 - Reflectance_Data - INFO - Filtered collection created (use compute_detailed_stats=True for detailed info)


Map(center=[-3.2210694545062024, 104.16355582426586], controls=(WidgetControl(options=['position', 'transparen…

### Image retrival report (optional)

In [6]:
#intialize the statistic class
stats = Reflectance_Stats()
#get the retrival report and automatically print them
retrival_report = stats.get_collection_statistics(landsat_data, print_report=True)

2025-10-28 00:26:54,501 - Reflectance_Stats - INFO - Reflectance Stats initialized.


           Landsat Data Collection Retrival Report
Total Images Found: 53
Date Range: 2017-01-13 to 2017-12-17
Unique WRS Tiles: 10

Scene Cloud Cover Statistics:
------------------------------
Average Cloud Cover: 26.8%
Minimum Cloud Cover: 2.9%
Maximum Cloud Cover: 40.0%

WRS Path/Row Tiles:
------------------------------
Path 123/Row 062
Path 123/Row 063
Path 124/Row 061
Path 124/Row 062
Path 124/Row 063
Path 124/Row 064
Path 125/Row 061
Path 125/Row 062
Path 125/Row 063
Path 126/Row 062

Available Acqusition Date:
------------------------------
Date range: 2017-01-13 to 2017-12-17
(53 total acquisition dates)

Scene IDs (first 10):
------------------------------
• LC08_123062_20170405
• LC08_123062_20170421
• LC08_123062_20170726
• LC08_123062_20170912
• LC08_123062_20171014
• LC08_123062_20171217
• LC08_123063_20170710
• LC08_123063_20170827
• LC08_123063_20170912
• LC08_123063_20171030
... and 43 more scenes



### System Response 1.3: Imagery Download

In [None]:
export_task = ee.batch.Export.image.toDrive(
    image=stacked_landsat,
    description='Landsat_Median_composite_2017_Sumsel',
    folder='Earth Engine',
    fileNamePrefix='Landsat_Median_composite_2017_Sumsel',
    scale=30,
    region=aoi,  # or aoi.geometry()
    maxPixels=1e13
)
export_task.start()
import time

while export_task.active():
    print('Exporting... (status: {})'.format(export_task.status()['state']))
    time.sleep(10)

print('Export complete (status: {})'.format(export_task.status()['state']))

## Module 2:  Land-cover classification Scheme
Three approach are provided to handle classification scheme:
1. Upload a csv file 
2. Manual input the classification scheme
3. Use default classification scheme (RESTORE+ project)

### Import the module

In [7]:
from epistemx.module_2 import LULC_Scheme_Manager
#Initialize the LULC Scheme Manager
manager = LULC_Scheme_Manager()
print("Land Cover Classification Scheme Manager initialized!")
print(f"Current class count: {manager.get_class_count()}")
#Temporary function to display the classiifcation scheme in notebook
#Display current classification scheme
def display_classification_scheme(manager):
    """Display the current classification scheme in a readable format"""
    if not manager.has_classes():
        print("No classes defined yet.")
        return
    
    print("\n=== Current Classification Scheme ===")
    df = manager.get_dataframe()
    print(df.to_string(index=False))
    
    return df

# Display the scheme
df = display_classification_scheme(manager)

Land Cover Classification Scheme Manager initialized!
Current class count: 0
No classes defined yet.


### System Response 2.1a: Upload Classification Scheme

In [8]:
import pandas as pd
#Reset manager for CSV upload example
manager = LULC_Scheme_Manager()
#path to csv 
csv_path = "../test_data/Example_Classification_scheme.csv"

print("=== CSV Upload Process ===")

# Load the CSV
df = pd.read_csv(csv_path, sep=None, engine="python")
print("Loaded CSV:")
print(df)

# Auto-detect columns
id_col, name_col, color_col = manager.auto_detect_csv_columns(df)
print(f"\nAuto-detected columns:")
print(f"ID column: {id_col}")
print(f"Name column: {name_col}")
print(f"Color column: {color_col}")

=== CSV Upload Process ===
Loaded CSV:
    ﻿No                       Class_name  Class_id
0     1               Rubber monoculture        15
1     2                       Settlement        16
2     3             Oil palm monoculture        11
3     4                       Water Body        19
4     5                Acacia plantation         1
5     6                       Rice Field        13
6     7                     Mixed Garden        10
7     8                Rubber agroforest        14
8     9   Logged Over Forest Low Density         8
9    10                            Shrub        17
10   11                            Grass         6
11   12         Logged over swamp forest         9
12   13  Logged over forest-high density         7
13   14                Coffee Agroforest         5
14   15              Coconut monoculture         4
15   16                      Other Crops        12
16   17                     Cleared Land         3
17   18                   Tea plantation   

In [11]:
# Process CSV upload
success, message = manager.process_csv_upload(df, id_col, name_col, color_col)
if success:
    print(f"✅ {message}")
    
    # Finalize the upload
    success, message = manager.finalize_csv_upload()
    if success:
        print(f"✅ {message}")
    else:
        print(f"❌ {message}")
else:
    print(f"❌ {message}")

# Display the loaded scheme
display_classification_scheme(manager)

✅ Successfully loaded 19 classes from CSV with auto-generated colors
✅ Classification scheme created with 19 classes

=== Current Classification Scheme ===
 ID                Land Cover Class Color Palette
  1               Acacia plantation       #FFEAA7
  2                            Cane       #AED6F1
  3                    Cleared Land       #F9E79F
  4             Coconut monoculture       #D7BDE2
  5               Coffee Agroforest       #85C1E9
  6                           Grass       #F8C471
  7 Logged over forest-high density       #F1948A
  8  Logged Over Forest Low Density       #BB8FCE
  9        Logged over swamp forest       #82E0AA
 10                    Mixed Garden       #98D8C8
 11            Oil palm monoculture       #45B7D1
 12                     Other Crops       #A3E4D7
 13                      Rice Field       #DDA0DD
 14               Rubber agroforest       #F7DC6F
 15              Rubber monoculture       #FF6B6B
 16                      Settlement       #4

Unnamed: 0,ID,Land Cover Class,Color Palette
0,1,Acacia plantation,#FFEAA7
1,2,Cane,#AED6F1
2,3,Cleared Land,#F9E79F
3,4,Coconut monoculture,#D7BDE2
4,5,Coffee Agroforest,#85C1E9
5,6,Grass,#F8C471
6,7,Logged over forest-high density,#F1948A
7,8,Logged Over Forest Low Density,#BB8FCE
8,9,Logged over swamp forest,#82E0AA
9,10,Mixed Garden,#98D8C8


### System Response 2.1b: Manual Scheme Definition

In [None]:
#Reset manager for manual input example
manager = LULC_Scheme_Manager()
#Manually add the class
print("=== Manual Class Addition ===")

#Example of class to add
classes_to_add = [
    (1, "Hutan Lahan Kering", "#0E6D0E"),
    (2, "Pertanian Lahan Kering", "#E8F800"),
    (3, "Permukiman", "#F81D00"),
    (4, "Badan Air", "#1512F3"),
    (5, "Pertanian Lahan Basah", "#")
]

for class_id, class_name, color_code in classes_to_add:
    success, message = manager.add_class(class_id, class_name, color_code)
    if success:
        print(f"✅ {message}")
    else:
        print(f"❌ {message}")

print(f"\nTotal classes: {manager.get_class_count()}")

=== Manual Class Addition ===
✅ Class 'Hutan Lahan Kering' (ID: 1) added successfully!
✅ Class 'Pertanian Lahan Kering' (ID: 2) added successfully!
✅ Class 'Permukiman' (ID: 3) added successfully!
✅ Class 'Badan Air' (ID: 4) added successfully!
✅ Class 'Pertanian Lahan Basah' (ID: 5) added successfully!

Total classes: 5


In [None]:
# Example: Edit an existing class
print("=== Editing a Class ===")

# Edit the first class (index 0)
class_to_edit = manager.edit_class(0)
if class_to_edit:
    print(f"Editing class: {class_to_edit}")
    
    # Update the class with new information
    success, message = manager.add_class(1, "HUtan Lahan Rendah", "#004D00")
    if success:
        print(f"✅ {message}")
    else:
        print(f"❌ {message}")

# Display updated scheme
display_classification_scheme(manager)

=== Editing a Class ===
Editing class: {'ID': 1, 'Class Name': 'Hutan Lahan Kering', 'Color Code': '#0E6D0E'}
✅ Class 'HUtan Lahan Rendah' (ID: 1) updated successfully!

=== Current Classification Scheme ===
 ID       Land Cover Class Color Palette
  1     HUtan Lahan Rendah       #004D00
  2 Pertanian Lahan Kering       #E8F800
  3             Permukiman       #F81D00
  4              Badan Air       #1512F3
  5  Pertanian Lahan Basah             #


Unnamed: 0,ID,Land Cover Class,Color Palette
0,1,HUtan Lahan Rendah,#004D00
1,2,Pertanian Lahan Kering,#E8F800
2,3,Permukiman,#F81D00
3,4,Badan Air,#1512F3
4,5,Pertanian Lahan Basah,#


### System Response 2.1c: Template Classification Scheme

In [None]:
# Reset manager for default scheme example
manager = LULC_Scheme_Manager()

print("=== Available Default Schemes ===")
default_schemes = manager.get_default_schemes()

for scheme_name, classes in default_schemes.items():
    print(f"\n{scheme_name}: {len(classes)} classes")
    for class_data in classes:
        print(f"  - ID {class_data['ID']}: {class_data['Class Name']} ({class_data['Color Code']})")

=== Available Default Schemes ===

RESTORE+ Project: 10 classes
  - ID 1: Natural Forest (#0E6D0E)
  - ID 2: Agroforestry (#F08306)
  - ID 3: Monoculture Plantation (#38E638)
  - ID 4: Grassland or Savanna (#80DD80)
  - ID 5: Shrub (#5F972A)
  - ID 6: Paddy Field (#777907)
  - ID 7: Cropland (Palawija, Horticulture) (#E8F800)
  - ID 8: Settlement (#F81D00)
  - ID 9: Cleared Land (#E9B970)
  - ID 10: Waterbody (#1512F3)


In [10]:
# Load the RESTORE+ default scheme
scheme_name = "RESTORE+ Project"
success, message = manager.load_default_scheme(scheme_name)

if success:
    print(f"✅ {message}")
else:
    print(f"❌ {message}")

# Display the loaded scheme
display_classification_scheme(manager)

✅ Loaded RESTORE+ Project with 10 classes

=== Current Classification Scheme ===
 ID                  Land Cover Class Color Palette
  1                    Natural Forest       #0E6D0E
  2                      Agroforestry       #F08306
  3            Monoculture Plantation       #38E638
  4              Grassland or Savanna       #80DD80
  5                             Shrub       #5F972A
  6                       Paddy Field       #777907
  7 Cropland (Palawija, Horticulture)       #E8F800
  8                        Settlement       #F81D00
  9                      Cleared Land       #E9B970
 10                         Waterbody       #1512F3


Unnamed: 0,ID,Land Cover Class,Color Palette
0,1,Natural Forest,#0E6D0E
1,2,Agroforestry,#F08306
2,3,Monoculture Plantation,#38E638
3,4,Grassland or Savanna,#80DD80
4,5,Shrub,#5F972A
5,6,Paddy Field,#777907
6,7,"Cropland (Palawija, Horticulture)",#E8F800
7,8,Settlement,#F81D00
8,9,Cleared Land,#E9B970
9,10,Waterbody,#1512F3


### System Response 2.2: Download classification scheme

In [12]:
print("=== Export Classification Scheme ===")
#Convert the selected  classification scheme manager to dataframe
classification_df = manager.get_dataframe()
print("Classification DataFrame:")
print(classification_df)
#Save the file
output_path = '../Selected_LC_Classification_Scheme.csv'
classification_df.to_csv(output_path, index=False)
print(f"\n✅ Classification scheme saved to: {output_path}")

=== Export Classification Scheme ===
Classification DataFrame:
    ID                 Land Cover Class Color Palette
0    1                Acacia plantation       #FFEAA7
1    2                             Cane       #AED6F1
2    3                     Cleared Land       #F9E79F
3    4              Coconut monoculture       #D7BDE2
4    5                Coffee Agroforest       #85C1E9
5    6                            Grass       #F8C471
6    7  Logged over forest-high density       #F1948A
7    8   Logged Over Forest Low Density       #BB8FCE
8    9         Logged over swamp forest       #82E0AA
9   10                     Mixed Garden       #98D8C8
10  11             Oil palm monoculture       #45B7D1
11  12                      Other Crops       #A3E4D7
12  13                       Rice Field       #DDA0DD
13  14                Rubber agroforest       #F7DC6F
14  15               Rubber monoculture       #FF6B6B
15  16                       Settlement       #4ECDC4
16  17             

# Module 3: Generate Region Of Interest
Three methods to generate ROI are supported in EPISTEM platform:
1. **Upload Training Data** - Upload your own shapefile
2. **On-screen Sampling** - Create samples using interactive map
3. **Default Reference Data** - Use Epistem's default training data

## Library Import and Setup

In [13]:
#import the library
import pandas as pd
import geopandas as gpd

#import module source code
from epistemx.module_3 import InputCheck, SyncTrainData, SplitTrainData, LULCSamplingTool
from epistemx.ee_config import initialize_earth_engine

In [17]:
collection_method = 'upload' #can be change

print(f"Selected method: {collection_method}")

if collection_method == 'upload':
    print("Upload your shapefile by specifying file path")
elif collection_method == 'sampling':
    print("create training samples using onscreen sampling")
elif collection_method == 'reference':
    print("📚 use default Epistem reference data")
else:
    print("❌ Invalid method. Please choose 'upload', 'interactive', or 'reference'")

Selected method: upload
Upload your shapefile by specifying file path


## System Response 3.1 Prerequisite Check

In [18]:
print("=== Checking Prerequisites ===")
#Load from previous module
#From Module 1 - AOI data
try:
    AOI = aoi
    print("✅ AOI from Module 1 is available")
    aoi_available = True
except:
    print("❌ AOI data not available, please run Module 1 first")
    aoi_available = False

#From Module 2 - Classification scheme
try:
    
    # For demonstration, create sample classification scheme
    LULCTable = classification_df
    print("✅ Classification scheme from Module 2 is available")
    print(f"   - Number of classes: {len(LULCTable)}")
    scheme_available = True
except:
    print("❌ Classification scheme not available, please run Module 2 first")
    scheme_available = False

if aoi_available and scheme_available:
    print("\n✅ All prerequisites met! You can proceed with training data collection.")
else:
    print("\n❌ Prerequisites not met. Please complete previous modules first.")

=== Checking Prerequisites ===
✅ AOI from Module 1 is available
✅ Classification scheme from Module 2 is available
   - Number of classes: 19

✅ All prerequisites met! You can proceed with training data collection.


## System Response 3.2 ROI Upload and content Verification

In [24]:
if collection_method == 'upload':
    # Specify path to your training data shapefile
    training_shp_path = '../test_data/Evaluation_Sumsel_data.shp'  # Update this path
    # Specify the field name that contains class information
    TrainField = 'LULC_ID'
    try:
        print("Loading training data from shapefile...")
        # Load and process training data
        TrainDataDict = SyncTrainData.LoadTrainData(
            landcover_df=LULCTable,
            aoi_geometry=AOI,
            training_shp_path=training_shp_path
        )
        #Clas containing LULC ID
        TrainDataDict = SyncTrainData.SetClassField(TrainDataDict, TrainField)
        
        #Validating the class
        TrainDataDict = SyncTrainData.ValidClass(TrainDataDict, class_col_index=1)
        
        #Check sufficiency
        TrainDataDict = SyncTrainData.CheckSufficiency(TrainDataDict, min_samples=20)
        
        #Make sure the ROI is within the AOI
        TrainDataDict = SyncTrainData.FilterTrainAoi(TrainDataDict)
        
        # Create summary table
        table_df, total_samples, insufficient_df = SyncTrainData.TrainDataRaw(
            training_data=TrainDataDict.get('training_data'),
            landcover_df=TrainDataDict.get('landcover_df'),
            class_field=TrainDataDict.get('class_field')
        )
        
        print("✅ Training data loaded and processed successfully!")
        print(f"Total samples: {total_samples}")
        
        # Display summary table
        display(table_df)
        
        # Store final training data
        TrainDataFinal = TrainDataDict.get('training_data')
    except Exception as e:
        print(f"❌ Error loading training data: {e}")
        print("Please check your file path and field name.")
else:
    print("Skipping upload method...")

Skipping upload method...


In [22]:
collection_method = 'sampling'
if collection_method == 'sampling':
    print("Initializing Interactive Sampling Tool...")
    
    # Create sampling tool
    sampling_tool = LULCSamplingTool(
        lulc_dataframe=LULCTable,
        aoi_ee_featurecollection=AOI
    )
    
    print("✅ Sampling tool initialized!")
    print("\nInstructions:")
    print("1. Select a class from the dropdown")
    print("2. Click 'Set Active Class'")
    print("3. Click on the map to add points")
    print("4. Use 'Update Data' to refresh the summary")
    print("5. Use 'Export to Shapefile' when done")
    
else:
    print("Skipping interactive sampling method...")

Initializing Interactive Sampling Tool...


HTML(value='\n        <style>\n        .jupyter-widgets.widget-container.widget-box.widget-vbox .jupyter-widge…

✅ Sampling tool initialized!

Instructions:
1. Select a class from the dropdown
2. Click 'Set Active Class'
3. Click on the map to add points
4. Use 'Update Data' to refresh the summary
5. Use 'Export to Shapefile' when done


In [23]:
if collection_method == 'sampling':
    # After sampling, get the training data
    print("Getting training data from sampling tool...")
    
    if len(sampling_tool.training_data) > 0:
        # Convert to GeoDataFrame
        from shapely.geometry import Point
        
        geometries = [Point(item['longitude'], item['latitude']) for item in sampling_tool.training_data]
        
        training_points = []
        for item in sampling_tool.training_data:
            training_points.append({
                'kelas': item['class_id'],
                'LULC_Type': item['class_type'],
                'latitude': item['latitude'],
                'longitude': item['longitude']
            })
        
        TrainDataFinal = gpd.GeoDataFrame(training_points, geometry=geometries, crs='EPSG:4326')
        
        print(f"✅ Collected {len(TrainDataFinal)} training samples!")
        
        # Display summary
        summary = TrainDataFinal.groupby('LULC_Type').size().reset_index(name='Count')
        print("\nSampling Summary:")
        display(summary)
        
    else:
        print("❌ No training data collected. Please use the sampling tool above.")
        TrainDataFinal = None
else:
    print("Interactive sampling not selected.")

Getting training data from sampling tool...
❌ No training data collected. Please use the sampling tool above.


In [26]:
collection_method = 'reference'
if collection_method == 'reference':
    print("Loading default reference training data...")
    
    # Default training data path
    TrainEePath = 'projects/ee-rg2icraf/assets/Indonesia_lulc_Sample'
    TrainField = 'kelas'
    
    try:
        print("Loading reference training data from Earth Engine...")
        
        # Load training data
        TrainDataDict = SyncTrainData.LoadTrainData(
            landcover_df=LULCTable,
            aoi_geometry=AOI,
            training_ee_path=TrainEePath
        )
        
        print("Processing and validating reference data...")
        
        # Set class field
        TrainDataDict = SyncTrainData.SetClassField(TrainDataDict, TrainField)
        
        # Validate classes
        TrainDataDict = SyncTrainData.ValidClass(TrainDataDict, class_col_index=0)
        
        # Check sufficiency
        TrainDataDict = SyncTrainData.CheckSufficiency(TrainDataDict, min_samples=20)
        
        # Filter by AOI
        TrainDataDict = SyncTrainData.FilterTrainAoi(TrainDataDict)
        
        # Create summary table
        table_df, total_samples, insufficient_df = SyncTrainData.TrainDataRaw(
            training_data=TrainDataDict.get('training_data'),
            landcover_df=TrainDataDict.get('landcover_df'),
            class_field=TrainDataDict.get('class_field')
        )
        
        print("✅ Reference training data loaded and processed successfully!")
        print(f"Total samples: {total_samples}")
        
        # Display summary table
        display(table_df)
        
        # Store final training data
        TrainDataFinal = TrainDataDict.get('training_data')
        
        # Show validation results
        vr = TrainDataDict.get('validation_results', {})
        print(f"\nValidation Results:")
        print(f"- Total points loaded: {vr.get('total_points', 'N/A')}")
        print(f"- Points after class filter: {vr.get('points_after_class_filter', 'N/A')}")
        print(f"- Valid points (within AOI): {vr.get('valid_points', 'N/A')}")
        print(f"- Invalid classes: {len(vr.get('invalid_classes', []))}")
        
    except Exception as e:
        print(f"❌ Error loading reference data: {e}")
        TrainDataFinal = None
        
else:
    print("Reference data method not selected.")

Loading default reference training data...
Loading reference training data from Earth Engine...
[INFO] Large FeatureCollection detected (39003 inside AOI)
Processing and validating reference data...
✅ Reference training data loaded and processed successfully!
Total samples: 39003


Unnamed: 0,ID,LULC_class,Sample_Count,Percentage,Status
0,1,Acacia plantation,916,2.348537,Sufficient
1,2,Cane,684,1.753711,Sufficient
2,3,Cleared Land,9467,24.272492,Sufficient
3,4,Coconut monoculture,1089,2.792093,Sufficient
4,5,Coffee Agroforest,0,0.0,No Samples
5,6,Grass,374,0.958901,Sufficient
6,7,Logged over forest-high density,1879,4.817578,Sufficient
7,8,Logged Over Forest Low Density,2057,5.273953,Sufficient
8,9,Logged over swamp forest,4690,12.024716,Sufficient
9,10,Mixed Garden,637,1.633208,Sufficient



Validation Results:
- Total points loaded: 39003
- Points after class filter: 39003
- Valid points (within AOI): 39003
- Invalid classes: 0
