<font color="green"><h2> **Welcome to ABT/HYD 182**


 ## **Lab 9**: GeoAI – Instance Segmentation (Mask R-CNN)

### **Due Date: March 11 | 11:59 PM | 2026**
------------------------------------------------------------------------------

## Academic Integrity Statement

**This work was completed without the use of Generative AI tools (such as ChatGPT, Copilot, etc.).**

By completing the information below, you certify that you have completed this assignment independently and without the assistance of generative AI tools. This lab is designed to help you learn GeoAI and instance segmentation through hands-on practice.

---

**Note:** If you did use a generative AI tool, you must clearly disclose this in your notebook, including which tool you used and how you used it (e.g., debugging, understanding error messages, or clarifying concepts). Failure to disclose the use of generative AI tools may be considered a violation of course academic integrity policies.

In [None]:
# Enter your information below
your_name = ""  # Replace with your full name
date = ""       # Replace with today's date (e.g., "March 5, 2026")

# Print your statement
print("Academic Integrity Statement")
print("=" * 50)
print(f"Name: {your_name}")
print(f"Date: {date}")
print("=" * 50)
print("I certify that this work was completed without the use of Generative AI tools.")

## **Table of Contents**

1. [Exercise 1](#section1) – Install packages & import libraries  
2. [Exercise 2](#section2) – Download sample data  
3. [Exercise 3](#section3) – Visualize sample data  
4. [Exercise 4](#section4) – Create training data  
5. [Exercise 5](#section5) – Train instance segmentation model  
6. [Exercise 6](#section6) – Run inference  
7. [Exercise 7](#section7) – Vectorize masks & add geometric properties  
8. [Exercise 8](#section8) – Visualize results  
9. [Exercise 9](#section9) – Filter by area & compare predictions  
10. [Exercise 10](#section10) – Model performance  

--------------------------------------------
Learning objectives
---------------------------------------------

* In this lab you will:

    *   install and use the **geoai** package for geospatial AI
    *   download sample imagery and vector labels for building detection
    *   create training tiles and train a **Mask R-CNN** instance segmentation model
    *   run inference, vectorize masks, and add geometric properties
    *   visualize and compare predictions with imagery
    *   interpret training metrics and understand instance vs semantic segmentation

This notebook is based on training instance segmentation models for object detection (e.g., building detection) using Mask R-CNN. Unlike semantic segmentation, instance segmentation distinguishes between individual objects of the same class.

Resources:
- [GeoAI documentation](https://geoai.readthedocs.io/)
- [OpenGeos GeoAI examples](https://github.com/opengeos/geoai)

<a name="section1"></a>
## **Exercise 1** – Install packages & import libraries

**What this section does:** Install the GeoAI package and import it so we can use it for downloading data, creating tiles, training, and inference. The GPU check ensures Colab is using a T4 GPU (set in Runtime → Change runtime type).

### Check GPU (required)

This lab runs on GPU. **Colab cannot enable GPU from code**—you must choose it once in the menu:

1. Click **Runtime** (or the **▶ Connect** dropdown) → **Change runtime type**  
2. Set **Hardware accelerator** to **T4 GPU** → **Save**  
3. Re-run the notebook from the top  

After that, the check below will report "GPU OK".

In [None]:
# Check GPU (Colab cannot enable GPU from code—you must set it in the menu once)
try:
    gpu_name = __import__("subprocess").check_output(
        ["nvidia-smi", "--query-gpu=name", "--format=csv,noheader"], text=True
    ).strip().split("\n")[0]
except Exception:
    gpu_name = ""
if not gpu_name or "T4" not in gpu_name.upper():
    print("WARNING: T4 GPU not detected.")
    print("  → Runtime → Change runtime type → Hardware accelerator: T4 GPU → Save")
    print("  → Then re-run the notebook from the top.")
else:
    print("GPU OK:", gpu_name)

In [None]:
%pip install geoai-py

## Import libraries

### Colab: Pro or free?

Run the next cell and answer the prompt: **Are you using Google Colab Pro? (True/False)**. Type `True` or `False` and press Enter. Settings will adjust automatically: Pro uses larger batches and training visuals; free uses batch size 1 and fewer windows to avoid out-of-memory errors.

In [None]:
# Answer the prompt: Are you using Colab Pro? (True/False)
reply = input("Are you using Google Colab Pro? (True/False): ").strip().lower()
COLAB_PRO = reply in ("true", "t", "yes", "y", "1")

USE_LOW_RAM = not COLAB_PRO

if USE_LOW_RAM:
    TILE_SIZE = 256
    STRIDE = 256
    BATCH_SIZE = 1
    WINDOW_SIZE = 256
    OVERLAP = 64
    NUM_EPOCHS = 5
    VISUALIZE_TRAINING = False
else:
    TILE_SIZE = 512
    STRIDE = 256
    BATCH_SIZE = 4
    WINDOW_SIZE = 512
    OVERLAP = 256
    NUM_EPOCHS = 10
    VISUALIZE_TRAINING = True
print("COLAB_PRO =", COLAB_PRO, "| USE_LOW_RAM =", USE_LOW_RAM)

In [None]:
import geoai

### Mount Google Drive and create lab folder

All data, tiles, models, and outputs will be saved under **My Drive → ABT182_GeoAI** so you can reuse them and avoid re-downloading.

In [None]:
from google.colab import drive
import os

drive.mount("/content/drive")

# Base folder in your Google Drive
BASE_DIR = "/content/drive/MyDrive/ABT182_GeoAI"
os.makedirs(f"{BASE_DIR}/data/train", exist_ok=True)
os.makedirs(f"{BASE_DIR}/data/test", exist_ok=True)
os.makedirs(f"{BASE_DIR}/tiles", exist_ok=True)
os.makedirs(f"{BASE_DIR}/models", exist_ok=True)
os.makedirs(f"{BASE_DIR}/outputs", exist_ok=True)
print(f"Created: {BASE_DIR}")
print("  data/train, data/test, tiles, models, outputs")

<a name="section2"></a>
## **Exercise 2** – Download sample data

**What this section does:** Download training raster, training vector (building footprints), and test raster from Hugging Face, then copy them to your Drive folder so they persist. Paths are under `BASE_DIR` (e.g. `data/train`, `data/test`).

In [None]:
train_raster_url = (
    "https://huggingface.co/datasets/giswqs/geospatial/resolve/main/naip_rgb_train.tif"
)
train_vector_url = "https://huggingface.co/datasets/giswqs/geospatial/resolve/main/naip_train_buildings.geojson"
test_raster_url = (
    "https://huggingface.co/datasets/giswqs/geospatial/resolve/main/naip_test.tif"
)

In [None]:
import shutil

# Download to temp, then save to Drive so you keep the data
_train = geoai.download_file(train_raster_url)
_vector = geoai.download_file(train_vector_url)
_test = geoai.download_file(test_raster_url)

train_raster_path = f"{BASE_DIR}/data/train/naip_rgb_train.tif"
train_vector_path = f"{BASE_DIR}/data/train/naip_train_buildings.geojson"
test_raster_path = f"{BASE_DIR}/data/test/naip_test.tif"

shutil.copy(_train, train_raster_path)
shutil.copy(_vector, train_vector_path)
shutil.copy(_test, test_raster_path)
print("Saved to Drive:", BASE_DIR)

<a name="section3"></a>
## **Exercise 3** – Visualize sample data

**What this section does:** Inspect the training raster metadata, then view NAIP with building footprints on top using leafmap (add raster first with RGB bands, then add_geojson so NAIP is visible under the vectors). Finally view the test image alone.

In [None]:
geoai.get_raster_info(train_raster_path)

In [None]:
# Leafmap: add NAIP raster first, then vectors on top (so NAIP is visible under buildings)
# If raster doesn't load, run: %pip install "leafmap[raster]" then re-run the cell
import leafmap
m = leafmap.Map()
m.add_raster(train_raster_path, layer_name="NAIP", bands=[1, 2, 3])
m.add_geojson(train_vector_path, layer_name="Buildings", style={"stroke": True, "color": "#ff0000", "weight": 2, "fill": False, "fillOpacity": 0})
m

In [None]:
# Test NAIP image (no slider)
import leafmap
m = leafmap.Map()
m.add_raster(test_raster_path, layer_name="Test NAIP", bands=[1, 2, 3])
m

<a name="section4"></a>
## **Exercise 4** – Create training data

**What this section does:** Create training tiles from the NAIP raster and building vector using `export_geotiff_tiles`. Tiles and labels are written to `out_folder`; `tile_size` and `stride` come from the Colab/Pro settings.

In [None]:
out_folder = f"{BASE_DIR}/tiles/buildings_instance"
tiles = geoai.export_geotiff_tiles(
    in_raster=train_raster_path,
    out_folder=out_folder,
    in_class_data=train_vector_path,
    tile_size=TILE_SIZE,
    stride=STRIDE,
    buffer_radius=0,
)

<a name="section5"></a>
## **Exercise 5** – Train instance segmentation model

**What this section does:** Train a Mask R-CNN model on the tiles. Key parameters: `num_classes` (2 = background + building), `num_channels` (3 for RGB), `batch_size`, `num_epochs` from the Colab/Pro cell. With Colab Pro, training curves are shown; Exercise 10 plots them from the saved history.

In [None]:
# Train Mask R-CNN model (uses BATCH_SIZE, NUM_EPOCHS, VISUALIZE_TRAINING from Colab/low-RAM cell)
geoai.train_instance_segmentation_model(
    images_dir=f"{out_folder}/images",
    labels_dir=f"{out_folder}/labels",
    output_dir=f"{out_folder}/instance_models",
    num_classes=2,  # background + building
    num_channels=3,
    batch_size=BATCH_SIZE,
    num_epochs=NUM_EPOCHS,
    learning_rate=0.005,
    val_split=0.2,
    visualize=VISUALIZE_TRAINING,
    verbose=True,
)

<a name="section6"></a>
## **Exercise 6** – Run inference

**What this section does:** Run the trained model on the test NAIP image; the output is a mask raster saved to `masks_path`. Uses `WINDOW_SIZE`, `OVERLAP`, and `BATCH_SIZE` from the Colab/Pro cell. You can change `confidence_threshold` to keep only high-confidence detections.

In [None]:
# Define paths (all on Drive)
masks_path = f"{BASE_DIR}/outputs/naip_test_instance_prediction.tif"
model_path = f"{out_folder}/instance_models/best_model.pth"

In [None]:
# Run instance segmentation inference (uses WINDOW_SIZE, OVERLAP, BATCH_SIZE for lower RAM)
import gc
geoai.instance_segmentation(
    input_path=test_raster_path,
    output_path=masks_path,
    model_path=model_path,
    num_classes=2,
    num_channels=3,
    window_size=WINDOW_SIZE,
    overlap=OVERLAP,
    confidence_threshold=0.5,
    batch_size=BATCH_SIZE,
)
gc.collect()  # free memory before next steps

<a name="section7"></a>
## **Exercise 7** – Vectorize masks & add geometric properties

**What this section does:** Convert the predicted mask raster to polygons with `orthogonalize`, save as GeoJSON, then add area (m²), perimeter, and other geometric properties with `add_geometric_properties` so we can filter and color by area in the next exercises.

In [None]:
output_vector_path = f"{BASE_DIR}/outputs/naip_test_instance_prediction.geojson"
gdf = geoai.orthogonalize(masks_path, output_vector_path, epsilon=2)

In [None]:
gdf_props = geoai.add_geometric_properties(gdf, area_unit="m2", length_unit="m")

<a name="section8"></a>
## **Exercise 8** – Visualize results

**What we do here:** View the predicted mask over NAIP, then buildings colored by area. Popup text is set to black so it’s readable on the white background. A split map lets you compare NAIP only (left) vs mask only (right).

**Details:** We add the NAIP raster first (RGB `bands=[1, 2, 3]`), then the mask layer with a colormap. For buildings we use `add_data` with `column="area_m2"` and `scheme="Quantiles"` so colors vary by area; the legend shows Area (m²).

In [None]:
# Make popup/tooltip text black so it's readable on white background (run before creating maps)
from IPython.display import display, HTML
display(HTML("""
<style>
.leaflet-popup-content-wrapper, .leaflet-popup-content { color: #000 !important; }
.leaflet-tooltip { color: #000 !important; }
</style>
"""))

In [None]:
# This cell: NAIP + predicted masks. Layer control lets you toggle "Predicted masks" on/off (with vs without).
import leafmap
m = leafmap.Map()
m.add_raster(test_raster_path, layer_name="NAIP", bands=[1, 2, 3])
m.add_raster(masks_path, layer_name="Predicted masks", cmap="tab20", nodata=0)
m

In [None]:
# Split map: left = NAIP only, right = mask only (slide to compare)
import leafmap
m_split = leafmap.Map()
m_split.split_map(left_layer=test_raster_path, right_layer=masks_path, left_label="NAIP only", right_label="Predicted masks")
m_split

In [None]:
# NAIP first, then buildings colored by area_m2 (choropleth). Hover to see area and other attributes.
import leafmap
m = leafmap.Map()
m.add_raster(test_raster_path, layer_name="NAIP", bands=[1, 2, 3])
m.add_data(gdf_props, column="area_m2", scheme="Quantiles", cmap="YlOrRd", legend_title="Area (m²)")
m

After Exercise 8 – **Histogram of building areas:** Simple distribution of predicted building sizes (area_m²). Use it to see how many small vs large buildings were detected.

In [None]:
# What this cell does: plots a histogram of building areas (area_m2) to see size distribution
import matplotlib.pyplot as plt
plt.figure(figsize=(8, 4))
plt.hist(gdf_props["area_m2"], bins=25, color="steelblue", edgecolor="white")
plt.xlabel("Area (m²)"); plt.ylabel("Count"); plt.title("Distribution of predicted building areas")
plt.tight_layout(); plt.show()

<a name="section9"></a>
## **Exercise 9** – Filter by area & compare predictions

**What we do here:** Keep only buildings above a minimum area (e.g. 50 m²) to drop small false detections, then visualize filtered buildings and compare with the imagery.

**Details:** We filter the GeoDataFrame by `area_m2`, then map with the same NAIP + add_data pattern; the compare view uses a different colormap (RdYlBu_r) for variety.

In [None]:
gdf_filtered = gdf_props[(gdf_props["area_m2"] > 50)]

In [None]:
# NAIP first, then filtered buildings colored by area_m2
import leafmap
m = leafmap.Map()
m.add_raster(test_raster_path, layer_name="NAIP", bands=[1, 2, 3])
m.add_data(gdf_filtered, column="area_m2", scheme="Quantiles", cmap="YlOrRd", legend_title="Area (m²)")
m

## Compare predictions with imagery

In [None]:
# NAIP first, then predicted buildings (compare view; colored by area_m2)
import leafmap
m = leafmap.Map()
m.add_raster(test_raster_path, layer_name="NAIP", bands=[1, 2, 3])
m.add_data(gdf_filtered, column="area_m2", scheme="Quantiles", cmap="RdYlBu_r", legend_title="Area (m²)")
m

**Box plot by size range:** Group buildings into size ranges and plot a simple box plot (tutorial: explore area distribution).

In [None]:
# What this cell does: box plot of area_m2 by simple size categories (small / medium / large)
import matplotlib.pyplot as plt
import pandas as pd
gdf_filtered = gdf_filtered.copy()
gdf_filtered["size_range"] = pd.cut(gdf_filtered["area_m2"], bins=[0, 100, 250, 500, 2000], labels=["0-100", "100-250", "250-500", "500+"])
gdf_filtered.boxplot(column="area_m2", by="size_range", figsize=(8, 4))
plt.suptitle(""); plt.xlabel("Area range (m²)"); plt.ylabel("Area (m²)"); plt.title("Building areas by size range")
plt.tight_layout(); plt.show()

<a name="section10"></a>
## **Exercise 10** – Model performance

**What this section does:** Load the saved training history and plot loss and accuracy curves so you can see how the model converged. With Colab Pro you train for more epochs (10) for better convergence.

In [None]:
# Training and validation curves (val shown when available)
geoai.plot_performance_metrics(
    history_path=f"{out_folder}/instance_models/training_history.pth",
    figsize=(15, 5),
    verbose=True,
)

---
## **Credits**

This lab uses the **[GeoAI](https://opengeoai.org/)** Python package. We thank **Dr. Qiusheng Wu** for creating GeoAI and for the examples that inspired this tutorial.

For more information, documentation, and examples, visit: **https://opengeoai.org/**

<a name="section_submit"></a>
## **How to Submit Lab 9**

Once you have finished the exercises, follow the steps below to submit your assignment.

### Step 1: Run the Entire Notebook
- Run all code cells to make sure your notebook works correctly and displays all results.
- Go to the **Runtime** tab and click **Run all**.

### Step 2: Check Your Work
- Make sure you followed all instructions.
- Confirm that you completed the Academic Integrity cell and that all outputs look correct.

### Step 3: Rename and Save the Notebook
- Click on the notebook name at the **top left** of the page.
- Rename the file by replacing the default name with your own (e.g., **lastname_firstname_lab9.ipynb**).
- Save the notebook after renaming.

### Step 4: Create PDF Version
- Creating a PDF version of your notebook is **required**.
- **File** → **Print** → set **Destination** to **Save as PDF**, or **File** → **Download** → **Download .pdf**.

### Step 5: Submit Both Files to Canvas
- **Upload BOTH files** to the **Canvas** assignment for Lab 9:
  1. The renamed notebook (`.ipynb`) – **REQUIRED**
  2. The PDF (`.pdf`) – **REQUIRED**
- Submit before **March 11, 11:59 PM 2026**.

### Use of Generative AI Tools (GenAI Policy)
- This course **discourages the use of generative AI tools** (such as ChatGPT) for completing assignments, so you can develop your own skills.
- If you **do use a generative AI tool**, you must **clearly disclose** it in your notebook (which tool and how you used it).
- Failure to disclose may be considered a violation of course academic integrity policies.