# Starter Notebook — Flood Risk Prediction

This notebook contains runnable, well-commented examples for: downloading Sentinel samples (instructions), reading GeoTIFFs with `rasterio`, basic preprocessing (NDVI, normalization), extracting tiles, a tiny Keras training example (placeholder), and visualization (heatmap & Folium overlay). Replace placeholders with your real data paths and credentials.

In [None]:
# Basic imports and environment checks
import os
from pathlib import Path
import numpy as np
import matplotlib.pyplot as plt
from pprint import pprint

print('Python executable:', os.sys.executable)
print('Working directory:', Path.cwd())

# Helper to check optional libraries without failing the notebook
def check_libs():
    libs = ['rasterio','rioxarray','sentinelsat','tensorflow','folium']
    status = {}
    for lib in libs:
        try:
            __import__(lib)
            status[lib] = True
        except Exception as e:
            status[lib] = False
    return status

pprint(check_libs())

In [None]:
# Data directory setup — update this to your local data folder
DATA_DIR = Path('..') / 'data'
DATA_DIR.mkdir(parents=True, exist_ok=True)
print('Data dir:', DATA_DIR.resolve())

# Show a few files if present
files = list(DATA_DIR.rglob('*.*'))[:20]
print('Found files:', len(files))
for f in files[:10]:
    print('-', f.name)

if len(files)==0:
    print('No sample files found. See the Sentinel-2 example cell below to download samples or place GeoTIFFs into the data/ folder.')

## Sentinel-2 sample download (instructions)

Automated download requires registering at Copernicus Open Access Hub and using `sentinelsat` or the AWS Public Datasets (if available for your region). The code below is a template — fill in your credentials and query parameters. If you prefer, manually download a small region's GeoTIFFs and place them in `data/`.

Note: do NOT run the download cell unless you have sufficient disk and a registered account.

In [None]:
# Example sentinelsat usage (template; commented out to avoid accidental runs)
# from sentinelsat import SentinelAPI, read_geojson, geojson_to_wkt
# user = 'your-username'
# password = 'your-password'
# api = SentinelAPI(user, password, 'https://scihub.copernicus.eu/dhus')
# footprint = geojson_to_wkt(read_geojson('aoi.geojson'))
# products = api.query(footprint, date=('20220101','20220115'), platformname='Sentinel-2', cloudcoverpercentage=(0,20))
# print('Products found:', len(products))
# api.download_all(products, directory_path=str(DATA_DIR))
print('Template sentinelsat code shown (commented).')

## Read a single GeoTIFF with rasterio
The cell below tries to open the first raster-like file in `data/` and displays basic band info. This uses `rasterio` if available and falls back to a placeholder image if not.

In [None]:
# Try reading a GeoTIFF with rasterio (safe: wrapped in try/except)
sample = None
tif_files = [p for p in DATA_DIR.rglob('*.tif') if p.is_file()]
if len(tif_files)>0:
    fp = tif_files[0]
    try:
        import rasterio
        with rasterio.open(fp) as src:
            print('Raster CRS:', src.crs)
            print('Raster size (w,h):', src.width, src.height)
            bands = src.count
            print('Band count:', bands)
            arr = src.read([1,2,3])  # read first 3 bands (may be RGB or B,G,R depending on product)
            # convert to HWC for display
            img = np.moveaxis(arr, 0, -1)
            sample = img
    except Exception as e:
        print('rasterio read failed:', e)
        sample = None
else:
    print('No .tif files found in data/.')

if sample is None:
    # fallback placeholder
    sample = np.clip(np.random.rand(256,256,3),0,1)

plt.figure(figsize=(6,6))
plt.imshow(sample[:,:, :3])
plt.title('Sample (first 3 bands or placeholder)')
plt.axis('off')
plt.show()

## Compute NDVI (example)
NDVI = (NIR - RED) / (NIR + RED). For Sentinel-2, band 8 is NIR (B08) and band 4 is Red (B04) — file formats vary. Adjust indexing based on your raster's band order.

In [None]:
def compute_ndvi(nir, red, eps=1e-6):
    # nir, red: numpy arrays (H,W) or (H,W,1)
    nir = nir.astype('float32')
    red = red.astype('float32')
    ndvi = (nir - red) / (nir + red + eps)
    return ndvi

# Demo: if `img` has NIR in channel 2 and RED in channel 0 (this is just illustrative)
if sample is not None:
    try:
        # adapt the indices below to match your raster band order
        red = sample[:,:,0]
        nir = sample[:,:,2]
        ndvi = compute_ndvi(nir, red)
        plt.figure(figsize=(6,5))
        plt.imshow(ndvi, cmap='RdYlGn')
        plt.colorbar(label='NDVI')
        plt.title('NDVI (demo)')
        plt.axis('off')
        plt.show()
    except Exception as e:
        print('NDVI demo failed (band indices may be different):', e)

## Tile extraction / dataset creation (placeholder)
Split large rasters into smaller tiles and save paired images + labels (if available). The code below is a template — adapt tile size, stride, and label logic to your project.

In [None]:
def extract_tiles(img, tile_size=256, stride=256):
    h,w,_ = img.shape
    tiles = []
    for y in range(0, h-tile_size+1, stride):
        for x in range(0, w-tile_size+1, stride):
            tiles.append(img[y:y+tile_size, x:x+tile_size])
    return tiles

# Demo extract a few tiles from `sample`
tiles = extract_tiles((sample*255).astype('uint8'), tile_size=128, stride=128)
print('Extracted tiles:', len(tiles))
plt.figure(figsize=(8,4))
for i,t in enumerate(tiles[:6]):
    plt.subplot(2,3,i+1)
    plt.imshow(t[:,:, :3])
    plt.axis('off')
plt.suptitle('Example tiles')
plt.show()

## Minimal Keras training example (placeholder)
This example builds a tiny CNN and trains on random data to demonstrate the API; replace with your `tf.data` pipeline or ImageDataGenerator.

In [None]:
try:
    import tensorflow as tf
    from tensorflow.keras import layers, models
    tf_available = True
except Exception as e:
    print('TensorFlow not available:', e)
    tf_available = False

if tf_available:
    # tiny model
    model = models.Sequential([
        layers.Input((128,128,3)),
        layers.Conv2D(16,3,activation='relu'),
        layers.MaxPool2D(),
        layers.Conv2D(32,3,activation='relu'),
        layers.GlobalAveragePooling2D(),
        layers.Dense(3, activation='softmax')
    ])
    model.compile(optimizer='adam', loss='sparse_categorical_crossentropy', metrics=['accuracy'])
    model.summary()

    # create tiny random dataset for demo
    x = np.random.rand(8,128,128,3).astype('float32')
    y = np.random.randint(0,3,size=(8,))
    history = model.fit(x,y, epochs=1, batch_size=4)
else:
    print('Skipping model demo — install TensorFlow to run this cell.')

## Visualization: heatmap & Folium (export)
Create a heatmap from model predictions (per-tile) and either display with matplotlib or export geo-referenced tiles as GeoJSON/PNG for overlay in `folium` or QGIS. The cell below shows a simple matplotlib heatmap example.

In [None]:
# Demo heatmap from random 'risk' scores
risk_map = np.random.rand(64,64)
plt.figure(figsize=(6,5))
plt.imshow(risk_map, cmap='hot')
plt.colorbar(label='Flood risk (0-1)')
plt.title('Demo flood risk heatmap')
plt.axis('off')
plt.show()

# To overlay on folium: create an image overlay or tile layer from geo-referenced array (requires CRS and bounds). See README for folium link example.

## Next steps
1. Replace placeholder tiles with real Sentinel-2/Sentinel-1 tiles (use `rasterio` or `xarray` to read and align bands).
2. Implement label logic (e.g., pre/post flood masks or proxy labels from water segmentation).
3. Build a `tf.data` pipeline that yields (image, label) pairs and use a transfer-learning backbone (ResNet50/EfficientNet).
4. Export per-tile predictions to GeoTIFF/GeoJSON and visualize in `folium`/QGIS.

If you'd like, I can now:
- (A) Fill a notebook cell to automatically download a small Sentinel-2 sample (AWS or Copernicus template) and show the exact commands to run locally, OR
- (B) Implement a full `src/data.py` + `src/model.py` + a runnable `train.py` that trains on a small bundled sample and includes unit tests.

Reply with `download-sample` or `full-train` to pick the next step.

In [None]:
# ===== Sentinel-2 sample download templates =====
# 1) Using sentinelsat (Python) - fill credentials and footprint AOI
# Uncomment and update before running:
# from sentinelsat import SentinelAPI, read_geojson, geojson_to_wkt
# user = 'your-username'
# password = 'your-password'
# api = SentinelAPI(user, password, 'https://scihub.copernicus.eu/dhus')
# footprint = geojson_to_wkt(read_geojson('aoi.geojson'))
# products = api.query(footprint, date=('20230101','20230115'), platformname='Sentinel-2', cloudcoverpercentage=(0,30))
# print('Products found:', len(products))
# api.download_all(products, directory_path=str(DATA_DIR))

# 2) Using AWS CLI (recommended for public AWS-hosted tiles) - replace <S3_URI> with a real path
# Example PowerShell command (run in your shell, not inside notebook):
# aws s3 cp "s3://<bucket>/<path-to-tile>.zip" . --no-sign-request
# unzip the archive and move the .SAFE or .tif files into the data/ folder

# 3) Using boto3 (Python) to download an S3 object (works for public buckets with no auth)
from pathlib import Path

def download_s3_object(s3_uri, out_dir='.'):
    """Download an S3 object using boto3. s3_uri example: 's3://bucket/path/to/file.zip'
    This function will only work if boto3 is installed and (for private buckets) AWS credentials are configured.
    """
    try:
        import boto3
        from urllib.parse import urlparse
    except Exception as e:
        print('boto3 not available. Install boto3 to use this helper:', e)
        return False
    parsed = urlparse(s3_uri)
    bucket = parsed.netloc
    key = parsed.path.lstrip('/')
    out_dir = Path(out_dir)
    out_dir.mkdir(parents=True, exist_ok=True)
    out_path = out_dir / Path(key).name
    s3 = boto3.client('s3')
    print(f'Downloading s3://{bucket}/{key} -> {out_path}')
    try:
        s3.download_file(bucket, key, str(out_path))
        print('Download complete')
        return True
    except Exception as e:
        print('Download failed:', e)
        return False

# Usage examples (edit before running):
# download_s3_object('s3://sentinel-sample-bucket/path/to/sample.zip', out_dir=str(DATA_DIR))
# Or run the AWS CLI command from PowerShell (recommended for speed):
# aws s3 cp "s3://<bucket>/<tile>.zip" . --no-sign-request

print('Download templates available. Edit and run the appropriate command for your environment.')

In [None]:
# ===== Concrete public AWS example (discover and download a small Sentinel-2 file) =====
# WARNING: inspect file size before downloading. Use the commands below to discover small sample files.
# 1) List a few objects under the public Sentinel-2 L2A tiles bucket (no credentials needed):
#    PowerShell example (run in your terminal, not in the notebook):
#    aws s3 ls s3://sentinel-s2-l2a/tiles/ --no-sign-request | Select-Object -First 100
#    The output shows prefixes (tile grid). Drill down to a year/date/prefix to find smaller files.
#
# 2) To list files in a specific tile prefix, e.g. tiles/10/UR/ABC (replace with prefix you found):
#    aws s3 ls s3://sentinel-s2-l2a/tiles/10/UR/ABC/2020/10/10/0/ --no-sign-request
#
# 3) After you identify a small .jp2 or .zip file, download it with aws s3 cp (fast):
#    aws s3 cp "s3://sentinel-s2-l2a/tiles/10/UR/ABC/2020/10/10/0/B04.jp2" . --no-sign-request
#
# 4) Or use the Python helper defined earlier (boto3) from the notebook (requires boto3):
#    from src.download_sample import download_s3_object
#    download_s3_object('s3://sentinel-s2-l2a/tiles/10/UR/ABC/2020/10/10/0/B04.jp2', out_dir=str(DATA_DIR))
#
# Notes:
# - The bucket 'sentinel-s2-l2a' is a public AWS Open Data bucket (may be large). The example above uses the generic path structure for L2A tiles — replace the prefix with one you discover via `aws s3 ls`.
# - If the file is large (>500MB), consider downloading only a single band (.jp2) or a small tile/preview instead of entire SAFE archives.
# - The notebook will NOT download anything automatically; these are templates you should run in your PowerShell terminal after checking file sizes.

print('Concrete AWS example cell added. Run the listed aws CLI commands in PowerShell to explore and download.')
