# Automated Workflow Pipeline for Remote Sensing Data

This notebook demonstrates how to create an automated workflow pipeline for processing remote sensing data using `snakemake` and Python. The pipeline integrates downloading Sentinel-2 imagery, preprocessing (e.g., cropping, reprojecting), cloud detection (using a pre-trained model from `28_cloud_detection_deep_learning.ipynb`), and calculating vegetation indices (e.g., NDVI). This is useful for scalable, reproducible data processing.

## Prerequisites
- Install required libraries: `snakemake`, `sentinelhub`, `rasterio`, `geopandas`, `numpy`, `torch`, `matplotlib` (listed in `requirements.txt`).
- A configuration file for SentinelHub (e.g., `sentinelhub_config.json`).
- A GeoJSON or shapefile defining the area of interest (AOI) (e.g., `aoi.geojson`).
- A pre-trained cloud detection model (e.g., from `28_cloud_detection_deep_learning.ipynb`).
- Replace file paths with your own data.

## Learning Objectives
- Design a `snakemake` pipeline for automated remote sensing data processing.
- Integrate data download, preprocessing, cloud detection, and analysis steps.
- Execute the pipeline and validate outputs.
- Visualize processed results.

In [None]:
# Import required libraries
import os
import rasterio
import geopandas as gpd
import numpy as np
import matplotlib.pyplot as plt
from rasterio.mask import mask
from sentinelhub import SHConfig, SentinelHubRequest, DataCollection, MimeType, CRS, BBox
import torch
import segmentation_models_pytorch as smp
from datetime import datetime
import json

## Step 1: Set Up Configuration and Directories

Define paths, SentinelHub credentials, and create necessary directories.

In [None]:
# Define paths and configuration
config_path = 'sentinelhub_config.json'  # Replace with your SentinelHub config file
aoi_path = 'aoi.geojson'                # Replace with your AOI file
output_dir = 'remote_sensing_data/processed/'
cloud_model_path = 'unet_cloud_detection.pth'  # From 28_cloud_detection_deep_learning.ipynb
os.makedirs(output_dir, exist_ok=True)

# Load SentinelHub configuration
with open(config_path, 'r') as f:
    config_dict = json.load(f)
config = SHConfig()
config.instance_id = config_dict['instance_id']
config.sh_client_id = config_dict['client_id']
config.sh_client_secret = config_dict['client_secret']

# Load AOI
aoi_gdf = gpd.read_file(aoi_path)
aoi_bounds = aoi_gdf.total_bounds  # [minx, miny, maxx, maxy]
aoi_bbox = BBox(bbox=aoi_bounds, crs=CRS.WGS84)

print(f'AOI bounds: {aoi_bounds}')

## Step 2: Define Snakemake Pipeline

Create a `Snakemake` workflow to automate data download, preprocessing, cloud detection, and NDVI calculation.

In [None]:
%%writefile Snakefile
from datetime import datetime

# Configuration
configfile: 'config.yaml'
AOI_PATH = config['aoi_path']
OUTPUT_DIR = config['output_dir']
START_DATE = config['start_date']
END_DATE = config['end_date']
DATES = [datetime.strptime(START_DATE, '%Y-%m-%d').strftime('%Y%m%d'),
         datetime.strptime(END_DATE, '%Y-%m-%d').strftime('%Y%m%d')]

# Rules
rule all:
    input:
        expand('{output_dir}/ndvi_{date}.tif', output_dir=OUTPUT_DIR, date=DATES)

rule download_sentinel2:
    output:
        '{output_dir}/sentinel2_{date}.tif'
    run:
        from sentinelhub import SentinelHubRequest, DataCollection, MimeType, CRS, BBox
        import geopandas as gpd
        import rasterio
        from rasterio.crs import CRS as rioCRS

        aoi_gdf = gpd.read_file(AOI_PATH)
        aoi_bounds = aoi_gdf.total_bounds
        aoi_bbox = BBox(bbox=aoi_bounds, crs=CRS.WGS84)

        evalscript = '''
        //VERSION=3
        function setup() {{
            return {{
                input: ["B04", "B08", "B03", "B02"],
                output: {{ bands: 4, sampleType: "FLOAT32" }}
            }};
        }}

        function evaluatePixel(sample) {{
            return [sample.B04, sample.B08, sample.B03, sample.B02];
        }}
        '''

        date = wildcards.date
        request = SentinelHubRequest(
            evalscript=evalscript,
            input_data=[SentinelHubRequest.input_data(
                data_collection=DataCollection.SENTINEL2_L1C,
                time_interval=(f'{date[:4]}-{date[4:6]}-{date[6:]}', f'{date[:4]}-{date[4:6]}-{date[6:]}')
            )],
            responses=[SentinelHubRequest.output_response('default', MimeType.TIFF)],
            bbox=aoi_bbox,
            size=(512, 512),
            config=config
        )
        data = request.get_data()[0]
        with rasterio.open(output[0], 'w', driver='GTiff', height=data.shape[0], width=data.shape[1],
                           count=4, dtype='float32', crs=rioCRS.from_epsg(4326), transform=request.get_transform()) as dst:
            dst.write(data.transpose(2, 0, 1))

rule preprocess_raster:
    input:
        raster='{output_dir}/sentinel2_{date}.tif',
        aoi=AOI_PATH
    output:
        '{output_dir}/preprocessed_{date}.tif'
    run:
        import rasterio
        from rasterio.mask import mask
        import geopandas as gpd
        from rasterio.warp import reproject, Resampling

        aoi_gdf = gpd.read_file(input.aoi)
        with rasterio.open(input.raster) as src:
            raster_data, transform = mask(src, aoi_gdf.geometry, crop=True, nodata=np.nan)
            profile = src.profile
            profile.update({
                'height': raster_data.shape[1],
                'width': raster_data.shape[2],
                'transform': transform,
                'nodata': np.nan,
                'crs': 'EPSG:32632'  # Example UTM zone, adjust as needed
            })

        # Reproject to UTM
        with rasterio.open(output[0], 'w', **profile) as dst:
            dst.write(raster_data)

rule cloud_detection:
    input:
        raster='{output_dir}/preprocessed_{date}.tif'
    output:
        '{output_dir}/cloud_mask_{date}.tif'
    run:
        import rasterio
        import torch
        import segmentation_models_pytorch as smp
        import numpy as np

        model = smp.Unet(encoder_name='resnet18', in_channels=4, classes=2).to('cuda' if torch.cuda.is_available() else 'cpu')
        model.load_state_dict(torch.load('unet_cloud_detection.pth', map_location='cpu'))
        model.eval()

        with rasterio.open(input.raster) as src:
            raster_data = src.read(masked=True)
            profile = src.profile

        norm_data = raster_data / np.nanpercentile(raster_data, 98, axis=(1, 2), keepdims=True)
        norm_data = np.clip(norm_data, 0, 1)

        patch_size = 256
        height, width = norm_data.shape[1], norm_data.shape[2]
        cloud_mask = np.zeros((height, width), dtype=np.uint8)

        with torch.no_grad():
            for i in range(0, height - patch_size + 1, patch_size//2):
                for j in range(0, width - patch_size + 1, patch_size//2):
                    patch = norm_data[:, i:i+patch_size, j:j+patch_size]
                    if not np.any(np.isnan(patch)):
                        patch_tensor = torch.from_numpy(patch.astype(np.float32)).unsqueeze(0).to(model.device)
                        output = torch.sigmoid(model(patch_tensor)).cpu().numpy()
                        pred = (output[0, 0] > 0.5).astype(np.uint8)
                        cloud_mask[i:i+patch_size, j:j+patch_size] = pred

        profile.update({'count': 1, 'dtype': 'uint8', 'nodata': None})
        with rasterio.open(output[0], 'w', **profile) as dst:
            dst.write(cloud_mask, 1)

rule calculate_ndvi:
    input:
        raster='{output_dir}/preprocessed_{date}.tif',
        cloud_mask='{output_dir}/cloud_mask_{date}.tif'
    output:
        '{output_dir}/ndvi_{date}.tif'
    run:
        import rasterio
        import numpy as np

        with rasterio.open(input.raster) as src:
            raster_data = src.read(masked=True)
            profile = src.profile
        with rasterio.open(input.cloud_mask) as src:
            cloud_mask = src.read(1, masked=True)

        red = raster_data[0].astype(float)
        nir = raster_data[1].astype(float)
        ndvi = np.where((nir + red) != 0, (nir - red) / (nir + red), np.nan)
        ndvi[cloud_mask == 1] = np.nan

        profile.update({'count': 1, 'dtype': 'float32'})
        with rasterio.open(output[0], 'w', **profile) as dst:
            dst.write(ndvi, 1)

# Configuration file
%%writefile config.yaml
aoi_path: 'aoi.geojson'
output_dir: 'remote_sensing_data/processed'
start_date: '2023-01-01'
end_date: '2023-12-31'
cloud_model_path: 'unet_cloud_detection.pth'

# Run pipeline (in terminal: `snakemake -c1`)
print('Snakemake pipeline defined. Run `snakemake -c1` in the terminal to execute.')

## Step 3: Execute Pipeline

Run the `snakemake` pipeline to process the data. This step is typically executed in the terminal, but a sample execution is shown here.

In [None]:
# Execute pipeline programmatically (for demonstration)
!snakemake -c1 --quiet

# List output files
output_files = glob.glob(os.path.join(output_dir, 'ndvi_*.tif'))
print(f'Generated NDVI files: {output_files}')

## Step 4: Visualize Results

Visualize the NDVI output for the first processed date with AOI overlay.

In [None]:
# Load and visualize NDVI
if output_files:
    with rasterio.open(output_files[0]) as src:
        ndvi_data = src.read(1, masked=True)
        ndvi_profile = src.profile

    fig, ax = plt.subplots(figsize=(8, 8))
    ax.imshow(ndvi_data, cmap='RdYlGn', vmin=-1, vmax=1)
    aoi_gdf.plot(ax=ax, facecolor='none', edgecolor='red', linewidth=2)
    plt.colorbar(ax=ax, label='NDVI')
    plt.title(f'NDVI - {os.path.basename(output_files[0]).split("_")[1][:-4]}')
    plt.xlabel('Column')
    plt.ylabel('Row')
    plt.show()
else:
    print('No NDVI files generated. Check pipeline execution.')

## Next Steps

- Replace `sentinelhub_config.json`, `aoi.geojson`, and `unet_cloud_detection.pth` with your own files.
- Update `config.yaml` with desired dates and output directory.
- Extend the pipeline by adding rules for other analyses (e.g., classification from `12_classification_rf_svm.ipynb` or change detection from `30_change_detection_timeseries.ipynb`).
- Use outputs in visualization notebooks like `23_kepler_gl_demo.ipynb` or `26_time_series_animation.ipynb`.
- Explore advanced `snakemake` features like parallel execution or cloud integration.

## Notes
- Ensure SentinelHub credentials are valid in `sentinelhub_config.json`.
- Adjust the UTM zone in the preprocessing rule based on your AOI.
- The cloud detection model assumes a pre-trained U-Net from `28_cloud_detection_deep_learning.ipynb`.
- Run `snakemake -c1` in the terminal for sequential execution or `-cN` for parallel processing (N=number of cores).
- See `docs/installation.md` for troubleshooting library installation.