# Advanced Preprocessing for Remote Sensing Data

This notebook demonstrates advanced preprocessing techniques for remote sensing data using `rasterio`, `geopandas`, and `numpy` in Python. It covers essential preprocessing steps such as cropping rasters to an area of interest (AOI), resampling to match resolutions, geometric correction, and handling no-data values. These techniques prepare data for downstream analysis like classification, segmentation, or time series analysis.

## Prerequisites
- Install required libraries: `rasterio`, `geopandas`, `numpy`, `matplotlib`, `pyproj` (listed in `requirements.txt`).
- A multi-band GeoTIFF file (e.g., `sentinel_rgb.tif` from `21_download_data.ipynb`).
- A GeoJSON or shapefile defining the AOI (e.g., `aoi.geojson`).
- Replace file paths with your own data.

## Learning Objectives
- Crop a raster to an AOI using vector data.
- Resample raster data to a target resolution.
- Perform geometric correction to align rasters.
- Handle no-data values and ensure data consistency.

In [None]:
# Import required libraries
import rasterio
import geopandas as gpd
import numpy as np
import matplotlib.pyplot as plt
from rasterio.mask import mask
from rasterio.warp import reproject, Resampling
from rasterio.merge import merge
from pyproj import CRS
import os

## Step 1: Load Raster and AOI

Load the input raster and AOI vector data, ensuring they are in compatible coordinate reference systems (CRS).

In [None]:
# Define file paths
raster_path = 'remote_sensing_data/sentinel_rgb.tif'  # Replace with your GeoTIFF
aoi_path = 'aoi.geojson'  # Replace with your AOI file

# Load raster
with rasterio.open(raster_path) as src:
    raster_data = src.read(masked=True)  # Shape: (bands, height, width)
    raster_profile = src.profile
    raster_crs = src.crs

# Load AOI
aoi_gdf = gpd.read_file(aoi_path)
if aoi_gdf.crs != raster_crs:
    aoi_gdf = aoi_gdf.to_crs(raster_crs)

# Print basic information
print(f'Raster shape: {raster_data.shape}')
print(f'Raster CRS: {raster_crs}')
print(f'AOI CRS: {aoi_gdf.crs}')
print(f'AOI geometry count: {len(aoi_gdf)}')

## Step 2: Crop Raster to AOI

Crop the raster to the AOI boundaries using the vector geometry.

In [None]:
# Crop raster to AOI
with rasterio.open(raster_path) as src:
    cropped_data, cropped_transform = mask(src, aoi_gdf.geometry, crop=True, nodata=np.nan)
    cropped_profile = src.profile.copy()
    cropped_profile.update({
        'height': cropped_data.shape[1],
        'width': cropped_data.shape[2],
        'transform': cropped_transform,
        'nodata': np.nan
    })

# Visualize cropped RGB composite
cropped_rgb = cropped_data[:3].transpose(1, 2, 0)  # Use first 3 bands for RGB
cropped_rgb = cropped_rgb / np.nanpercentile(cropped_rgb, 98) if np.nanpercentile(cropped_rgb, 98) > 0 else cropped_rgb
cropped_rgb = np.clip(cropped_rgb, 0, 1)

plt.figure(figsize=(8, 8))
plt.imshow(cropped_rgb)
plt.title('Cropped RGB Composite')
plt.xlabel('Column')
plt.ylabel('Row')
plt.show()

# Save cropped raster
cropped_output_path = 'remote_sensing_data/cropped_raster.tif'
with rasterio.open(cropped_output_path, 'w', **cropped_profile) as dst:
    dst.write(cropped_data)

print(f'Cropped raster saved to: {cropped_output_path}')

## Step 3: Resample Raster to Target Resolution

Resample the cropped raster to a target resolution (e.g., 30m to match Landsat).

In [None]:
# Define target resolution (e.g., 30m for Landsat compatibility)
target_resolution = 30

# Calculate scaling factors
with rasterio.open(cropped_output_path) as src:
    src_transform = src.transform
    scale_x = target_resolution / src_transform.a
    scale_y = target_resolution / abs(src_transform.e)
    new_width = int(src.width / scale_x)
    new_height = int(src.height / scale_y)

# Resample raster
resampled_data = np.empty((cropped_data.shape[0], new_height, new_width), dtype=np.float32)
resampled_transform = src_transform * src_transform.scale(scale_x, scale_y)

with rasterio.open(cropped_output_path) as src:
    for i in range(cropped_data.shape[0]):
        reproject(
            source=cropped_data[i],
            destination=resampled_data[i],
            src_transform=src_transform,
            src_crs=raster_crs,
            dst_transform=resampled_transform,
            dst_crs=raster_crs,
            resampling=Resampling.bilinear
        )

# Update profile
resampled_profile = cropped_profile.copy()
resampled_profile.update({
    'height': new_height,
    'width': new_width,
    'transform': resampled_transform
})

# Visualize resampled RGB composite
resampled_rgb = resampled_data[:3].transpose(1, 2, 0)
resampled_rgb = resampled_rgb / np.nanpercentile(resampled_rgb, 98) if np.nanpercentile(resampled_rgb, 98) > 0 else resampled_rgb
resampled_rgb = np.clip(resampled_rgb, 0, 1)

plt.figure(figsize=(8, 8))
plt.imshow(resampled_rgb)
plt.title(f'Resampled RGB Composite ({target_resolution}m)')
plt.xlabel('Column')
plt.ylabel('Row')
plt.show()

# Save resampled raster
resampled_output_path = 'remote_sensing_data/resampled_raster.tif'
with rasterio.open(resampled_output_path, 'w', **resampled_profile) as dst:
    dst.write(resampled_data)

print(f'Resampled raster saved to: {resampled_output_path}')

## Step 4: Geometric Correction

Align the raster to a reference raster (e.g., a Landsat image) to correct geometric distortions.

In [None]:
# Define reference raster (e.g., Landsat RGB from notebook 21)
reference_raster_path = 'remote_sensing_data/landsat_rgb.tif'  # Replace with your reference GeoTIFF

# Load reference raster
with rasterio.open(reference_raster_path) as ref_src:
    ref_crs = ref_src.crs
    ref_transform = ref_src.transform
    ref_width, ref_height = ref_src.width, ref_src.height

# Reproject resampled raster to match reference
aligned_data = np.empty((resampled_data.shape[0], ref_height, ref_width), dtype=np.float32)

with rasterio.open(resampled_output_path) as src:
    for i in range(resampled_data.shape[0]):
        reproject(
            source=resampled_data[i],
            destination=aligned_data[i],
            src_transform=src.transform,
            src_crs=src.crs,
            dst_transform=ref_transform,
            dst_crs=ref_crs,
            resampling=Resampling.bilinear
        )

# Update profile
aligned_profile = resampled_profile.copy()
aligned_profile.update({
    'crs': ref_crs,
    'transform': ref_transform,
    'height': ref_height,
    'width': ref_width
})

# Visualize aligned RGB composite
aligned_rgb = aligned_data[:3].transpose(1, 2, 0)
aligned_rgb = aligned_rgb / np.nanpercentile(aligned_rgb, 98) if np.nanpercentile(aligned_rgb, 98) > 0 else aligned_rgb
aligned_rgb = np.clip(aligned_rgb, 0, 1)

plt.figure(figsize=(8, 8))
plt.imshow(aligned_rgb)
plt.title('Aligned RGB Composite')
plt.xlabel('Column')
plt.ylabel('Row')
plt.show()

# Save aligned raster
aligned_output_path = 'remote_sensing_data/aligned_raster.tif'
with rasterio.open(aligned_output_path, 'w', **aligned_profile) as dst:
    dst.write(aligned_data)

print(f'Aligned raster saved to: {aligned_output_path}')

## Step 5: Handle No-Data Values

Replace no-data values with interpolated values to ensure data consistency.

In [None]:
# Interpolate no-data values using nearest neighbor
from scipy.interpolate import griddata

# Create coordinate grid
height, width = aligned_data.shape[1], aligned_data.shape[2]
x, y = np.meshgrid(np.arange(width), np.arange(height))
points = np.vstack((x[~np.isnan(aligned_data[0])].ravel(), y[~np.isnan(aligned_data[0])].ravel())).T

# Interpolate each band
interpolated_data = np.copy(aligned_data)
for i in range(aligned_data.shape[0]):
    values = aligned_data[i][~np.isnan(aligned_data[i])].ravel()
    interpolated_data[i] = griddata(points, values, (x, y), method='nearest', fill_value=np.nanmean(values))

# Visualize interpolated RGB composite
interpolated_rgb = interpolated_data[:3].transpose(1, 2, 0)
interpolated_rgb = interpolated_rgb / np.nanpercentile(interpolated_rgb, 98) if np.nanpercentile(interpolated_rgb, 98) > 0 else interpolated_rgb
interpolated_rgb = np.clip(interpolated_rgb, 0, 1)

plt.figure(figsize=(8, 8))
plt.imshow(interpolated_rgb)
plt.title('Interpolated RGB Composite (No-Data Filled)')
plt.xlabel('Column')
plt.ylabel('Row')
plt.show()

# Save interpolated raster
interpolated_output_path = 'remote_sensing_data/interpolated_raster.tif'
with rasterio.open(interpolated_output_path, 'w', **aligned_profile) as dst:
    dst.write(interpolated_data)

print(f'Interpolated raster saved to: {interpolated_output_path}')

## Next Steps

- Replace `sentinel_rgb.tif` and `landsat_rgb.tif` with your own GeoTIFF files (e.g., from `21_download_data.ipynb`).
- Update `aoi.geojson` with your area of interest file.
- Adjust the target resolution in Step 3 to match your analysis needs (e.g., 10m for Sentinel-2).
- Explore additional preprocessing steps like atmospheric correction (see `08_atmospheric_correction_snap.ipynb`) or cloud masking (see `09_cloud_masking.ipynb`).
- Proceed to analysis notebooks like `15_unet_segmentation.ipynb` or `16_landcover_classification_cnn.ipynb` with the preprocessed data.

## Notes
- Ensure the input raster and AOI have compatible CRS to avoid alignment issues.
- Resampling and reprojection may introduce interpolation artifacts; verify results visually.
- Interpolation of no-data values assumes spatial continuity; consider alternative methods (e.g., cubic interpolation) for specific use cases.
- See `docs/installation.md` for troubleshooting library installation.