* **occurrence (JRC)**: A value from `0–100`. If it's 100, the lake has had water consistently for decades. If it's low, it's a seasonal or disappearing lake.

* **VV (Sentinel-1)**: Radar backscatter. Values around -20 or lower usually indicate standing water/inundation. Higher values indicate buildings or rough ground.

* **elevation (SRTM)**: The height above sea level. Useful for identifying if a lake is at the bottom of a catchment (high flood risk) or perched higher up.

* **precipitation (GPM)**: The total accumulated rainfall "signal" for the 5-year period. You can divide this by 5 to get the average annual rainfall for that specific lake's micro-climate.

    * `Map` (ESA WorldCover): A class code.

    * `80`: Water bodies

    * `50`: Built-up (Buildings)

    * `10`: Tree cover

    * `40`: Cropland

**For Encroachment Study**:
* Once I have this CSV, I can correlate `in_build_ha` with the Map value from ESA.

* If `in_build_ha` is high and the ESA class is 50 (Built-up), two independent satellite datasets are confirming the encroachment.

* If `in_build_ha` is high but the occurrence (JRC) is also high, it means buildings are being constructed in areas that are historically underwater.

#### SRTM 30m DEM

* **Elevation**: This is the raw value provided by the **SRTM band**.

* **Slope**: Calculated by comparing the elevation of a pixel to its neighbors.

* **Flow Direction & Accumulation**: These require "Pit Filling" (removing tiny errors in the DEM). While you can calculate them, it is computationally heavy. Most researchers use **WWF/HydroSHEDS**, which was built directly from SRTM data.

* **Local Depressions**: These are "Sinks" where water would naturally pool. These are critical for lake analysis as they define the natural basin.

#### What these results will tell you:
* **Slope**: If a lake has a very low slope (flat) but high in_build_ha, it means buildings are being built on a natural floodplain, making the area extremely prone to waterlogging.

* **Flow Accumulation**: A high value here indicates that the lake is a major "drainage node." If this lake is 80% encroached, the water that used to accumulate here will now be pushed into the surrounding streets.

* **Sink Depth (sink_depth)**: If this value is positive, it identifies a "bowl" in the landscape. If buildings are inside a zone where sink_depth > 0, they are physically sitting in a location where water is geographically "supposed" to be.

* **Flow Direction**: Tells you exactly which way the sewage or overflow will move when the lake reaches capacity.

#### Summary of Source IDs:
* **Elevation/Slope**: USGS/SRTMGL1_003

* **Flow/Basins**: WWF/HydroSHEDS/15ACC (Accumulation) and WWF/HydroSHEDS/15VFD (Direction).

* **Precise Hydrology**: MERIT/Hydro/v1_0_1 (This is a newer, "cleaned" version of SRTM that is better for flat urban areas like Bengaluru).

In [2]:
import os
import ee
import geemap
import pandas as pd

# 1. Initialize
PROJECT_ID = 'bengaluru-lakes-485612' 
ee.Initialize(project=PROJECT_ID)

In [None]:
# 2. Load Lakes
df = pd.read_csv('data/bengaluru_lakes_mean.csv')
lake_points = ee.FeatureCollection([
    ee.Feature(ee.Geometry.Point([row['lon'], row['lat']]), {'name': row['name']})
    for i, row in df.iterrows()
])

In [None]:
# 3. Topographic Datasets
# A. Raw Elevation
srtm = ee.Image("USGS/SRTMGL1_003")
elevation = srtm.select('elevation')

### Visualising Bengaluru’s Topographic Context for Lake Analysis

In this code block, we are setting up a **topographic and spatial context** to understand Bengaluru’s lakes not as isolated points, but as features embedded within a **sloping urban landscape**.

First, we define the **administrative boundary of Bengaluru Urban district** using the FAO GAUL Level-2 dataset. By filtering for `ADM2_NAME = Bangalore Urban`, we extract a precise polygon that represents the city’s official extent. This boundary becomes the spatial frame for all subsequent analysis, ensuring that global datasets are restricted strictly to Bengaluru.

Next, we load **SRTM elevation data**, which provides a near-surface digital elevation model at ~30 m resolution. Instead of working with the global DEM, we **clip** it to the Bengaluru boundary. This step is crucial: it removes irrelevant terrain outside the city and ensures that elevation and slope values are interpreted only within the urban hydrological system we care about.

Using the clipped elevation, we then compute **slope** with `ee.Terrain.slope`. Slope is derived from elevation gradients and represents how steep or flat the ground is at each location. Clipping the slope layer again to the Bengaluru boundary keeps the analysis spatially consistent. This slope layer is especially important for lake studies, because it helps reveal whether lakes sit in natural “bowls” (depressions) or on unnaturally flattened surfaces.

We then initialise an interactive **geemap map**, centring it on Bengaluru at a city-scale zoom level. This makes the map immediately interpretable as an urban system rather than a regional or national view.

For visual interpretation, you define colour palettes:
- The **elevation palette** highlights Bengaluru’s plateau structure (roughly 800–1000 m), allowing you to see ridges, valleys, and relative height differences across the city.
- The **slope palette** simplifies interpretation by mapping flat areas to white and steeper terrain to black, making potential lake basins and drainage gradients visually obvious.

Finally, you layer everything together:
- The **Bengaluru boundary** is drawn as a red outline to anchor spatial reference.
- The **elevation layer** shows the city’s vertical structure.
- The **slope layer** reveals geomorphic features such as depressions and embankments.
- Your **lake points from the CSV** are overlaid in yellow, allowing you to visually assess where each lake sits relative to elevation and slope.

Conceptually, this step lets you answer questions like:
- Are lakes located in low-lying terrain or on ridges?
- Do flood-prone lakes sit in flat, filled-in areas?
- How does the city’s natural slope influence water movement toward or away from lakes?

In short, you are building a **geomorphological backdrop** that grounds all later flood, storage, and encroachment analyses in Bengaluru’s real physical landscape.


In [None]:

# 2. Get the Bengaluru Boundary
# We use the GAUL Level 2 dataset and filter for 'Bangalore Urban'
bengaluru_boundary = ee.FeatureCollection("FAO/GAUL/2015/level2") \
    .filter(ee.Filter.eq('ADM2_NAME', 'Bangalore Urban'))

# 3. Load and Clip the Topographic Data
srtm = ee.Image("USGS/SRTMGL1_003")
# Use .clip() to cut the global image to the Bengaluru shape
elevation_clipped = srtm.select('elevation').clip(bengaluru_boundary)
slope_clipped = ee.Terrain.slope(elevation_clipped).clip(bengaluru_boundary)

# 4. Setup the Map
Map = geemap.Map()
Map.centerObject(bengaluru_boundary, 11) # Zoom specifically to the city

# --- Visualization Parameters ---
elev_viz = {
    'min': 800, 
    'max': 1000, 
    'palette': ['#313695', '#4575b4', '#abd9e9', '#fee090', '#f46d43', '#d73027']
}

slope_viz = {
    'min': 0, 
    'max': 5, 
    'palette': ['white', 'black'] # White = Flat, Black = Steep
}

# --- Add Layers to Map ---
# Add the Boundary (just the outline)
Map.addLayer(bengaluru_boundary.style(fillColor='00000000', color='red', width=2), {}, 'Bengaluru Boundary')

# Add Clipped Elevation
Map.addLayer(elevation_clipped, elev_viz, 'Bengaluru Elevation (m)')

# Add Clipped Slope (helps see the lake "bowls")
Map.addLayer(slope_clipped, slope_viz, 'Bengaluru Slope (Degrees)')

# Add your Lake Points from the CSV
# (Assuming lake_points variable is already defined from previous step)
Map.addLayer(lake_points, {'color': 'yellow'}, 'Lake Points (from CSV)')

Map

In [None]:
import pandas as pd
import matplotlib.pyplot as plt
import numpy as np

# Load the topography data
df_topo = pd.read_csv('lake_flatness_analysis.csv')

# Statistics for explanation
slope_stats = df_topo['slope'].describe()
elev_stats = df_topo['elevation'].describe()

# Get extremes for examples
flattest = df_topo.nsmallest(3, 'slope')
steepest = df_topo.nlargest(3, 'slope')
highest = df_topo.nlargest(3, 'elevation')
lowest = df_topo.nsmallest(3, 'elevation')

# Print summary for my use
print("Slope Stats:\n", slope_stats)
print("\nElevation Stats:\n", elev_stats)

# Visualizing the distribution of Slope
plt.figure(figsize=(10, 5))
plt.hist(df_topo['slope'], bins=20, color='#2ca02c', alpha=0.7, edgecolor='black')
plt.axvline(df_topo['slope'].median(), color='red', linestyle='dashed', linewidth=1, label=f'Median: {df_topo["slope"].median():.2f}°')
plt.title('Distribution of Mean Slope across Bengaluru Lakes')
plt.xlabel('Mean Slope (Degrees)')
plt.ylabel('Number of Lakes')
plt.legend()
plt.savefig('slope_distribution.png')

# Visualizing Elevation (The Cascade)
plt.figure(figsize=(10, 5))
plt.hist(df_topo['elevation'], bins=20, color='#1f77b4', alpha=0.7, edgecolor='black')
plt.axvline(df_topo['elevation'].median(), color='red', linestyle='dashed', linewidth=1, label=f'Median: {df_topo["elevation"].median():.2f}m')
plt.title('Elevation Distribution (The Bengaluru Plateau)')
plt.xlabel('Elevation (Meters above Sea Level)')
plt.ylabel('Number of Lakes')
plt.legend()
plt.savefig('elevation_distribution.png')

# Data for the final explanation
print("\nFlattest Lakes (The 'Tables'):")
print(flattest[['name', 'slope']])
print("\nSteepest Lakes (The 'Bowls'):")
print(steepest[['name', 'slope']])
print("\nHighest Lakes (Upstream):")
print(highest[['name', 'elevation']])
print("\nLowest Lakes (Downstream):")
print(lowest[['name', 'elevation']])

In [None]:
import pandas as pd
import matplotlib.pyplot as plt
import numpy as np

# Load the flatness analysis data
df_topo = pd.read_csv('data/lake_flatness_analysis.csv')

# Statistics for Slope
slope_mean = df_topo['slope'].mean()
slope_min = df_topo['slope'].min()
slope_max = df_topo['slope'].max()

# Create categories
def categorize_slope(s):
    if s < 1.5: return 'Flat (Table/Modified)'
    elif s < 3.0: return 'Gentle (Basin)'
    else: return 'Steep (Valley/Bund)'

df_topo['category'] = df_topo['slope'].apply(categorize_slope)

# Get examples for each
examples = df_topo.sort_values('slope').groupby('category').head(3)

print("Slope Statistics:")
print(f"Mean: {slope_mean:.2f}, Min: {slope_min:.2f}, Max: {slope_max:.2f}")
print("\nRepresentative Examples:")
print(examples[['name', 'slope', 'elevation', 'category']])

# Plot Elevation vs Slope to see if there is a pattern
plt.figure(figsize=(10, 6))
plt.scatter(df_topo['elevation'], df_topo['slope'], alpha=0.5, c='teal')
plt.axhline(y=1.5, color='r', linestyle='--', label='Flatness Threshold')
plt.xlabel('Elevation (m)')
plt.ylabel('Mean Slope (Degrees)')
plt.title('Bengaluru Lakes: Slope vs Elevation')
plt.legend()
plt.grid(True, alpha=0.3)
plt.savefig('slope_elevation_viz.png')

# Count by category
category_counts = df_topo['category'].value_counts()
print("\nCategory Distribution:")
print(category_counts)

## Understanding Slope and Elevation in Bengaluru’s Lake System

To understand **slope** and **elevation** in the context of Bengaluru’s lakes, it is important to move beyond seeing lakes as simple *points on a map*.  
They must instead be understood as **physical vessels embedded in a sloping landscape** that control how water is stored and released.

The following interpretation explains how to read the outputs from `lake_flatness_analysis.csv`.

---

## 1. Slope: The “Bowl vs. Table” Logic

In geomorphology, a lake naturally behaves like a **bowl**.  
Its sides should slope inward, guiding water toward a central depression.  
The **Mean Slope** within a lake boundary therefore represents its **geomorphic integrity**.

---

### A. The “Table” (Slope < 1.5°)

**Interpretation**

- Lakes in this category (e.g., *B. Narayanapura – 0.74°*) are **unnaturally flat**.

**What this implies**

- Such low slope indicates that the natural bowl has been **filled or leveled**.
- This typically results from:
  - Construction activity  
  - Debris dumping  
  - Sedimentation and encroachment

**Hydrological consequence**

- A “table-like” lake cannot **store** water.
- Instead of holding runoff, it allows water to **spread outward**, flooding nearby streets and layouts.

---

### B. The “Bowl” (Slope 1.5° – 3.0°)

**Interpretation**

- This range represents the **optimal or “Goldilocks” zone** for Bengaluru’s plateau terrain.

**What this implies**

- Lakes in this category (e.g., *Vidyaranyapura Kere*) retain a **functional depression**.
- The slope is sufficient to:
  - Channel runoff inward  
  - Maintain internal drainage toward the lake center

**Hydrological consequence**

- These lakes can still perform their role as **urban buffers**, absorbing rainfall and reducing downstream flooding.

---

### C. The “Vessel” (Slope > 3.0°)

**Interpretation**

- These are lakes with **steep sides**, often natural valleys or lakes reinforced with high bunds (embankments).

**What this implies**

- Lakes such as *Nayandahalli (3.03°)* are often protected by their own steepness.
- Steep terrain:
  - Discourages encroachment  
  - Raises construction costs  
  - Acts as a **natural defense mechanism**

**Hydrological consequence**

- Such lakes retain storage capacity and are less likely to be flattened or illegally occupied.

---

## 2. Elevation: The “Hydrological Rank”

Bengaluru historically functioned as a **cascade lake system**, where lakes were linked in a downstream chain.  
**Elevation** determines a lake’s position and role within this hydrological hierarchy.

---

### A. Headwater Lakes (High Elevation > 900 m)

**Examples**

- Vidyaranyapura (911 m)  
- Gantiganahalli (901 m)

**Role**

- These are the **first receivers of rainfall**.
- They regulate flows to all downstream lakes.

**Risk implication**

- If headwater lakes are:
  - Encroached  
  - Flattened  
  - Hydrologically disconnected  

  then the **entire cascade below them fails**, amplifying flood risk citywide.

---

### B. Terminal Lakes (Low Elevation < 850 m)

**Examples**

- Hosakerahalli (833 m)  
- Nayandahalli (810 m)

**Role**

- These function as **final sinks** in the system.
- They receive:
  - Overflow from upstream lakes  
  - Urban runoff  
  - Sewage and stormwater

**Risk implication**

- Because they sit at the bottom of the city’s drainage gradient, they are **structurally flood-prone**, even if well maintained.

---

## 3. Visualising the Conflict: Slope vs Elevation

The slope–elevation plot derived from the dataset reveals a clear pattern:

- Most Bengaluru lakes cluster between **870 m and 910 m**, corresponding to the Deccan plateau surface.

---

### The “Danger Zone”

**Definition**

- Lakes with:
  - **High elevation (> 900 m)**  
  - **Low slope (< 1.5°)**

**Why this is critical**

- These lakes were historically **key sponges at the top of the watershed**.
- Urban development has flattened them into near-level land.

**Outcome**

- During heavy rainfall:
  - Water cannot be stored locally  
  - Runoff accelerates downslope  
  - Downstream areas experience **sudden and intense flooding**

**Real-world relevance**

- This mechanism explains recurring floods in downstream zones such as:
  - Silk Board  
  - Bellandur  
  - Outer Ring Road corridor

---

## Key Takeaway

Slope determines a lake’s **capacity to store water**, while elevation determines its **position in the urban drainage hierarchy**.  
Flood risk in Bengaluru emerges most sharply where **high-elevation lakes lose their natural bowl-shaped geometry**.

---
---


# Explanation of the Lake Flatness and Topography Analysis Code

This script links **lake attribute data** with **terrain information** from satellite-derived elevation models to understand how lake geometry and surrounding topography influence flood behaviour in Bengaluru.

---

### Creating Lake Polygons from Area Values
`features = []`\
`for i, row in df.iterrows():`\
    `radius = np.sqrt((row['potential_ha'] * 10000) / np.pi)`\
    `geom = ee.Geometry.Point([row['lon'], row['lat']]).buffer(radius)`\
    `features.append(ee.Feature(geom, {'name': row['name'], 'id': i}))`
    
**What is happening conceptually**

* The CSV only contains points, not lake boundaries.

* To approximate lake extents:

    * Each lake is modelled as a circle.

    * The radius is calculated so that the circle’s area equals the lake’s potential_ha.

**Key logic**

* 1 hectare = 10,000 m²

* Area of circle = πr²

* Radius = √(Area / π)

**Why this is done**

* Enables terrain statistics to be calculated over a lake footprint, not just a point.

* Creates a consistent and reproducible approximation of lake geometry.

---

### Creating an Earth Engine FeatureCollection
`lake_polygons = ee.FeatureCollection(features)`


**What is happening**

* All individual lake polygons are bundled into a single FeatureCollection.

* This collection can now be used for bulk spatial operations in Earth Engine.

---

### Preparing Topographic Layers (Elevation and Slope)
`srtm = ee.Image("USGS/SRTMGL1_003")`\
`elevation = srtm.select('elevation')`\
`slope = ee.Terrain.slope(elevation).rename('slope')`


**What is happening**

* Loads the **SRTM 30 m Digital Elevation Model**.

* Extracts:

    * **elevation** → absolute height above sea level (meters).

    * **slope** → steepness derived from elevation (degrees).

**Why slope matters**

* Indicates whether a lake basin is:

    * Naturally bowl-shaped

    * Artificially flattened

    * Steep and resistant to encroachment

---

### Combining Elevation and Slope into a Single Stack
`topo_stack = ee.Image.cat([elevation, slope])`


**What is happening**

* Elevation and slope are stacked into one multi-band image.

* Allows both variables to be analysed simultaneously in one operation.

---

### Extracting Mean Terrain Values for Each Lake
`stats = topo_stack.reduceRegions(`\
    `collection=lake_polygons,`\
    `reducer=ee.Reducer.mean(),`\
    `scale=30`\
`)`


**What is happening**

* For each lake polygon:

    * Earth Engine calculates the mean elevation.

    * Earth Engine calculates the mean slope.   

    * The scale=30 ensures calculations match SRTM’s spatial resolution.

**Output**

* A FeatureCollection where each lake now has:

    * mean_elevation

    * mean_slope

---

### Converting Results to a DataFrame
`df_results = geemap.ee_to_df(stats)`


**What is happening**

* Transfers results from Earth Engine (server-side) to Python (client-side).

* Converts spatial results into a Pandas DataFrame for analysis.



---
---

## Slope and Elevation
This script performs a **lake-specific topographic analysis** using **exact lake boundary geometries** (not circular proxies). It converts lake polygons stored in a CSV into **Earth Engine FeatureCollections**, then extracts **mean elevation and mean slope** within each lake’s true spatial footprint.

---

#### Step 1: Preparing Lake Attributes and Boundaries
- Two CSV files are loaded:
  - One containing **lake attributes** (names, metadata).
  - Another containing **exact lake boundary geometries** stored as **WKT (Well-Known Text)**.
- Duplicate lake names in the boundary file are removed to ensure a **one-to-one join**.
- The attribute table and boundary table are **merged on lake name**, attaching polygon geometry to each lake record.

**Key idea:** this step upgrades the analysis from *approximate circular buffers* to **true lake outlines**.

---

#### Step 2: Converting WKT Geometries into Earth Engine Features
- Each lake’s geometry is read from the WKT string using **Shapely**.
- Two cases are handled explicitly:
  - **Polygon** → a single contiguous lake boundary.
  - **MultiPolygon** → lakes with multiple disconnected basins or islands.
- Coordinates are extracted from Shapely objects and converted into:
  - **ee.Geometry.Polygon** or
  - **ee.Geometry.MultiPolygon**
- Each geometry is wrapped as an **ee.Feature** with the lake name as metadata.
- All features are combined into a single **ee.FeatureCollection**.

**Key idea:** Earth Engine cannot read WKT directly, so this step bridges **local vector geometry** → **cloud-based geospatial analysis**.

---

#### Step 3: Preparing Topographic Layers
- **SRTM elevation data** (~30 m resolution) is loaded.
- Two topographic variables are derived:
  - **Elevation** → absolute height of the lake basin.
  - **Slope** → steepness of terrain inside the lake footprint.
- These layers are stacked into a single **multi-band image** for efficient processing.

**Key idea:** elevation gives **hydrological position**, slope gives **geomorphic integrity**.

---

#### Step 4: Extracting Mean Topography within Exact Lake Boundaries
- `reduceRegions` is used with the **lake polygon FeatureCollection**.
- For each lake polygon:
  - Mean **elevation** is computed.
  - Mean **slope** is computed.
- Extraction is done at **30 m scale**, matching the SRTM resolution.

**Key idea:** statistics are computed **only inside the real lake boundaries**, not across buffers or surrounding land.

---

#### Step 5: Exporting Results
- The Earth Engine results are converted into a **Pandas DataFrame**.
- The final table is saved as `lake_slope_elevation.csv`.

Each row in the output represents:
- one lake
- its **mean elevation**
- its **mean slope**

---

#### Conceptual Significance
This workflow measures **how flat or bowl-shaped each lake actually is**, using its **true spatial extent**. Flat, low-slope lakes are more likely to be **filled, encroached, or hydrologically compromised**, while steeper basins indicate **better-preserved lake morphology**.

---

#### One-line takeaway
We are extracting **physically meaningful topographic indicators** (slope and elevation) **directly from exact lake boundaries**, enabling robust analysis of lake degradation and flood vulnerability.


In [12]:
import ee
import geemap
import pandas as pd
from shapely import wkt # To parse the WKT geometry
from shapely.geometry import Polygon, MultiPolygon

df = pd.read_csv('data/bengaluru_lakes_mean.csv')
df_boundary = pd.read_csv('data/lake_polygon_boundaries.csv')
df_boundary = df_boundary.drop_duplicates(subset='name')
df = df.merge(df_boundary, on = 'name', how = 'left')

# 2. Convert Pandas DataFrame (with geometries) to EE FeatureCollection
features = []
for i, row in df.iterrows():
    if pd.notnull(row['geometry']):
        # Parse the WKT string
        poly = wkt.loads(row['geometry'])
        
        if isinstance(poly, Polygon):
            # Single Polygon: Create a list containing one ring
            coords = [list(poly.exterior.coords)]
            geom = ee.Geometry.Polygon(coords)
            
        elif isinstance(poly, MultiPolygon):
            # MultiPolygon: Iterate through all constituent polygons
            all_rings = []
            for p in poly.geoms:
                all_rings.append([list(p.exterior.coords)])
            geom = ee.Geometry.MultiPolygon(all_rings)
            
        features.append(ee.Feature(geom, {'name': row['name']}))

lake_polygons = ee.FeatureCollection(features)

# 3. Topographic Analysis
srtm = ee.Image("USGS/SRTMGL1_003")
elevation = srtm.select('elevation')
slope = ee.Terrain.slope(elevation).rename('slope')
topo_stack = ee.Image.cat([elevation, slope])

# 4. Extract Stats (Mean values within the EXACT boundaries)
stats = topo_stack.reduceRegions(
    collection=lake_polygons,
    reducer=ee.Reducer.mean(),
    scale=30 
)

# 5. Export to CSV
df_results = geemap.ee_to_df(stats)
df_results.to_csv('data/lake_slope_elevation.csv', index=False)

## Extracting Hydrological Context for Bengaluru’s Lakes

This section of the code prepares the **spatial boundary**, **lake locations**, and **hydrological flow layers** needed to understand how water moves across Bengaluru and interacts with its lakes.

---

### 1. Defining the Bengaluru Urban Boundary

```python
bengaluru_boundary = ee.FeatureCollection("FAO/GAUL/2015/level2")
    .filter(ee.Filter.eq('ADM2_NAME', 'Bangalore Urban'))
```

### 2. Loading Lake Locations as Point Features

```python 
df = pd.read_csv('data/bengaluru_lakes_mean.csv')
features = [
    ee.Feature(
        ee.Geometry.Point([row['lon'], row['lat']]),
        {'name': row['name']}
    ) 
    for i, row in df.iterrows()
]
lake_points = ee.FeatureCollection(features)
```
**What is happening**

* Reads a CSV file containing lake centroids.

* Converts each latitude–longitude pair into:

    * An Earth Engine Point geometry

    * With lake name as metadata

* All points are combined into a FeatureCollection.

**Why this matters**
* Lake points act as anchors to sample hydrological properties.

* Enables point-based queries such as:

    * Upstream contributing area

    * Flow direction at the lake location

---

### 3. Loading MERIT Hydro Datasets
`merit = ee.Image("MERIT/Hydro/v1_0_1")`

**What this dataset is**

* **MERIT Hydro** is a globally corrected hydrological dataset.

* Built on improved DEMs with:

    * Reduced striping errors

    * Corrected river networks

* It is especially useful for urban flood and drainage analysis.

---

### 4. Extracting Flow Accumulation
```python
flow_acc = merit.select('upa')
flow_acc_viz = flow_acc.log10()
```

**Key concepts**

* `upa` (Upstream Accumulation Area)

* Represents the total upstream area draining into each pixel.

* High values indicate major drains and valleys.

**Why log-transform**

* Raw flow accumulation values span several orders of magnitude.

* Log transformation:

    * Enhances visibility of small urban streams

    * Prevents large rivers from dominating the visualization

**Hydrological meaning**

* Pixels with high values indicate where runoff naturally converges.

* Lakes located on high upa pixels are structurally flood-prone.

---

### 5. Extracting Flow Direction
`flow_dir = merit.select('dir')`

**What dir represents**

* Indicates the direction water flows out of each pixel.

* Encoded using a D8 flow model (8 possible directions).

**Flow direction explains**:

* How water moves between lakes

* Which lakes are upstream or downstream

* Essential for understanding Bengaluru’s historic cascade lake system.

---

### 6. Preparing the Map for Visualisation
```python
Map = geemap.Map()
Map.centerObject(bengaluru_boundary, 11)
```

**What is happening**

* Initializes an interactive map.

* Centers the map over Bangalore Urban at a city-scale zoom level.

---

### 7. Flow Accumulation Visualisation Parameters
```python
acc_params = {
    'min': 0, 
    'max': 5, 
    'palette': ['#000000', '#023858', '#0570b0', '#74a9cf', '#fff7fb']
}
```

**Interpretation**

* Dark colors → low or negligible drainage

* Light colors → strong drainage pathways

**Highlights**:

* Natural valleys

* Stormwater drains

* Low-lying convergence zones

---

### 8. Flow Direction Visualisation Parameters

```python
dir_params = {
    'min': 1, 
    'max': 128, 
    'palette': ['red', 'orange', 'yellow', 'green', 'blue', 'cyan', 'magenta', 'black']
}
```


**Interpretation**

* Each color corresponds to a specific flow direction.

* Together, they reveal the directional logic of runoff across the city.

* Helps visually verify:

    * Whether lakes align with natural flow paths

    * Where drainage has been disrupted by urban development

---

In [None]:
# 2. Get Boundary and Lakes
bengaluru_boundary = ee.FeatureCollection("FAO/GAUL/2015/level2") \
    .filter(ee.Filter.eq('ADM2_NAME', 'Bangalore Urban'))

# Load your lake points
df = pd.read_csv('data/bengaluru_lakes_mean.csv')
features = [ee.Feature(ee.Geometry.Point([row['lon'], row['lat']]), {'name': row['name']}) for i, row in df.iterrows()]
lake_points = ee.FeatureCollection(features)

# 3. Load MERIT Hydro Datasets with CORRECT BAND NAMES
merit = ee.Image("MERIT/Hydro/v1_0_1")

# 'upa' is the band for Upstream Accumulation Area
flow_acc = merit.select('upa') 
# We log-transform it for better visualization of small streams
flow_acc_viz = flow_acc.log10() 

# 'dir' is the band for Flow Direction
flow_dir = merit.select('dir')

# 4. Visualization on the Map
Map = geemap.Map()
Map.centerObject(bengaluru_boundary, 11)

# Palette for Flow Accumulation (Blue to White represents the drainage network)
acc_params = {
    'min': 0, 
    'max': 5, 
    'palette': ['#000000', '#023858', '#0570b0', '#74a9cf', '#fff7fb']
}

# Palette for Direction (Standard 8-direction colors)
dir_params = {
    'min': 1, 
    'max': 128, 
    'palette': ['red', 'orange', 'yellow', 'green', 'blue', 'cyan', 'magenta', 'black']
}

Map.addLayer(flow_dir.clip(bengaluru_boundary), dir_params, '1. Flow Direction (Compass)')
Map.addLayer(flow_acc_viz.clip(bengaluru_boundary), acc_params, '2. Flow Accumulation (Drainage Network)')
Map.addLayer(lake_points, {'color': 'red'}, '3. Lake Locations')

Map



In [None]:
# 5. Extract and Save Data
print("Extracting flow stats for lakes...")
# Combine bands into one image for sampling
topo_image = ee.Image.cat([
    flow_acc.rename('flow_accumulation_km2'),
    flow_dir.rename('flow_direction_code')
])

stats = topo_image.reduceRegions(
    collection=lake_points,
    reducer=ee.Reducer.mean(),
    scale=90
)

try:
    df_results = geemap.ee_to_df(stats)
    if not os.path.exists('data'): os.makedirs('data')
    df_results.to_csv('data/lake_flow_analysis.csv', index=False)
    print("Success! Data saved to data/lake_flow_analysis.csv")
    print(df_results[['name', 'flow_accumulation_km2']].sort_values(by='flow_accumulation_km2', ascending=False).head())
except Exception as e:
    print(f"Error saving CSV: {e}")

## Interpretation of Flow Accumulation and Flow Direction in Bengaluru’s Lakes

---

### 1. The Giants: Catchment “Masters”

#### **Yellamallappa Chetty** and **Bellandur Lake**

- These lakes function as the **primary controllers** of their respective catchments.
- **Bellandur Lake**, for example, receives runoff from **over 114 km²** of the city.

#### **The Bottleneck Effect**

- When the **boundaries of a major lake** like **Bellandur** are **encroached**:
  - The problem is not limited to the lake itself.
  - The entire **114 km² upstream drainage network** begins to **back up**.
- This results in **flooding in neighbourhoods located far upstream**, sometimes several kilometres away from the lake.

**Key idea**

- A large lake acts like a **valve** in the city’s drainage system.
- If the valve is blocked, pressure builds everywhere upstream.

---

### 2. Direction of Flow (**flow_direction_code**)

The **flow_direction_code** indicates **where water exits a lake pixel**, making it essential for identifying **downstream flood impacts**.

#### Examples

- **Bellandur Lake (128 → North-East)**
  - Water flows toward the **North-East**.
  - Ultimately drains toward **Varthur Lake**.

- **Ramapura Kere (2 → South-East)**
  - Drains toward the **South-East**.

- **Anchepalya Lake (32 → North-West)**
  - Drains toward the **North-West**.

---

#### Why Flow Direction Matters for **Encroachment**

- If a lake such as **Yellamallappa Chetty** drains **southward**:
  - Heavy **construction and encroachment downstream** block the natural outflow.
- With no effective **“exit door”**:
  - Water accumulates inside the lake.
  - Lake levels rise **much faster** than they would naturally.

This transforms moderate rainfall into **sudden urban flooding**.

---

### 3. Headwater Lakes: The “Sponges”

#### Characteristics

- Lakes with **very low flow accumulation**  
  - Example: **Chikkabettahalli** (~0.008 km²).
- Located at the **very start of drainage lines**.

#### Hydrological Role

- These lakes:
  - Do **not** face high flood risk themselves.
  - Play a crucial role in **groundwater recharge**.

#### Impact of Encroachment

- When headwater lakes are **paved over or filled**:
  - Rainwater is no longer held locally.
  - The **groundwater table drops sharply**.
  - Surrounding areas experience **borewell failure**.

---

### 4. Forensic Summary

- **High Accumulation + High Encroachment**
  - These are **“Flood Bombs”**  
  - Example: **Bellandur Lake**
  - Capable of triggering **city-scale flooding**.

- **Low Accumulation + High Encroachment**
  - These become **“Dry Zones”**
  - Lakes are destroyed quietly.
  - Result is **chronic water scarcity**, not floods.

---

### One-line Insight

**Floods and water scarcity in Bengaluru are two sides of the same problem: the destruction of lakes that once regulated flow at different points in the drainage hierarchy.**



---
---


## Why Many Lakes Show a Very Small Flow Accumulation Value (~0.008 km²)

Several lakes in your analysis show a **tiny and repeated flow accumulation value** (≈ **0.00833 km²**).  
This is not an error — it reflects **how hydrological data, terrain position, and geometry alignment interact**.  
The reasons are explained below.

---

## 1. The **“Single Pixel” Limit** (Dataset Resolution)

The **MERIT Hydro** dataset used in the script has a spatial resolution of:

- **90 m × 90 m per pixel**

### Pixel-area logic

- Area of one pixel  
  - **90 m × 90 m = 8,100 m²**
- Converted to square kilometres  
  - **0.0081 km²**
- Adjusted for Earth’s curvature at Bengaluru’s latitude  
  - **≈ 0.00833 km²**

### What this means in practice

- When a lake shows **~0.00833 km²**:
  - The algorithm found **no upstream pixels draining into it**
  - It is only counting the **single pixel** on which the lake point is located

**Key interpretation**

- **Zero upstream contribution**
- Only **local pixel area** is being registered

---

## 2. These Are **Headwater Lakes** (The Ridge Effect)

### Bengaluru’s terrain context

- The city is built on a series of **ridges and shallow plateaus**
- Water flows:
  - From **ridges → valleys**
  - From **headwaters → downstream sinks**

### Position of many lakes

Lakes such as:

- **Agara Lake**
- **Chikkabettahalli**
- **Yelahanka**

sit at the **very top of these ridges**.

### Hydrological consequence

- There is **no land physically higher** than these lakes
- Therefore:
  - **No upstream catchment exists**
  - No flow accumulation can be added

### **Forensic insight**

- These lakes are the **origins of Bengaluru’s drainage system**
- They rely almost entirely on:
  - **Direct rainfall**
  - Not upstream runoff

---

## 3. **Point vs. Drain Alignment** (Geometry Mismatch)

### What the script currently does

- Uses **lake_points** (single coordinate points)
- Samples flow accumulation at **exact pixel locations**

### The alignment problem

- **Rajakaluves (stormwater drains)** are often:
  - Narrow
  - One or two pixels wide
- If a lake point is even **30–50 metres away** from the centre of the drain pixel:
  - The algorithm thinks the lake is on **adjacent dry land**
  - Not inside the drainage line

### Result

- The script reports:
  - Only the **single-pixel area (0.00833 km²)**
- Meanwhile:
  - A **large flow** may exist just **one pixel away**, but is ignored

---

## Why This Matters for Your Study

### **“Zero” Accumulation Lakes = Sponges**

- Lakes with near-zero accumulation:
  - Are **headwater lakes**
  - Play a **critical role in groundwater recharge**
- Their main function:
  - **Absorb rainfall locally**
- Encroachment here leads to:
  - Falling groundwater tables
  - Borewell failure

---

### **High Accumulation Lakes = Sinks**

- Lakes like **Bellandur** (~**114 km²**):
  - Receive runoff from **huge upstream areas**
  - Act as **terminal collectors**
- They are:
  - Highly flood-prone
  - Extremely sensitive to encroachment and blockage

---

## How to Get More **Realistic Catchment Numbers**

### Problem with current approach

- Sampling only **exact point locations**
- Misses nearby high-flow pixels

### Solution: **Buffered Sampling**

- Replace point sampling with:
  - **`buffer(200)`** around each lake point

### Why this works

- Allows the algorithm to:
  - “Reach out” to the nearest **high-flow drain pixel**
  - Capture the **true contributing catchment**
- Especially useful for lakes that:
  - Sit slightly off the mapped drain centerline

---

## Key Takeaway

**Tiny accumulation values do not mean lakes are unimportant.**  
They identify **headwater “sponge” lakes**, which are crucial for groundwater stability, while **large accumulation values identify flood-critical sink lakes** like Bellandur.

---
---

In [None]:
# 2. Get Data
bengaluru_boundary = ee.FeatureCollection("FAO/GAUL/2015/level2") \
    .filter(ee.Filter.eq('ADM2_NAME', 'Bangalore Urban'))

# Load lake points from your previous mean CSV
df = pd.read_csv('data/bengaluru_lakes_mean.csv')
features = [ee.Feature(ee.Geometry.Point([row['lon'], row['lat']]), {'name': row['name']}) for i, row in df.iterrows()]
lake_points = ee.FeatureCollection(features)

# 3. Create the Sink Depth Layer
# Raw Elevation (SRTM)
raw_elevation = ee.Image("USGS/SRTMGL1_003").select('elevation')

# Filled Elevation (MERIT Hydro 'elv' band is hydro-conditioned/filled)
filled_dem = ee.Image("MERIT/Hydro/v1_0_1").select('elv')

# Sink Depth = Filled - Raw
# If the result is positive, it means a "hole" was filled by that many meters.
depressions = filled_dem.subtract(raw_elevation).rename('sink_depth')

# Remove noise: only keep areas where the fill is more than 0.5 meters
depressions_masked = depressions.updateMask(depressions.gt(0.5))

# 4. Visualization
Map = geemap.Map()
Map.centerObject(bengaluru_boundary, 11)

sink_viz = {
    'min': 0, 
    'max': 10, 
    'palette': ['white', 'yellow', 'orange', 'red'] # Red = Deepest depressions
}

Map.addLayer(raw_elevation.clip(bengaluru_boundary), {'min': 800, 'max': 1000}, '1. Raw Elevation')
Map.addLayer(depressions_masked.clip(bengaluru_boundary), sink_viz, '2. Detected Depressions (Sinks)')
Map.addLayer(lake_points, {'color': 'blue'}, '3. Lake Points')

Map

# 5. Export results
print("Calculating sink depth for lakes...")
stats = depressions.reduceRegions(
    collection=lake_points,
    reducer=ee.Reducer.mean(),
    scale=30
)

df_results = geemap.ee_to_df(stats)
if not os.path.exists('data'): os.makedirs('data')
df_results.to_csv('data/lake_sink_analysis.csv', index=False)
print("Success! Saved to data/lake_sink_analysis.csv")

In [None]:
import ee
import geemap
import pandas as pd

# 1. Initialize
ee.Initialize(project='bengaluru-lakes-485612')

# Load Lake Points
df = pd.read_csv('data/bengaluru_lakes_mean.csv')
features = [ee.Feature(ee.Geometry.Point([row['lon'], row['lat']]), {'name': row['name']}) for i, row in df.iterrows()]
lake_points = ee.FeatureCollection(features)

# 2. LOAD BOTH GENERATIONS OF TOPOGRAPHY
# Historical (Year 2000)
srtm = ee.Image("USGS/SRTMGL1_003").select('elevation')

# Modern (Year 2021 Snapshot - Copernicus GLO-30)
# This is the most accurate representation of Bengaluru's current terrain
copernicus = ee.ImageCollection("COPERNICUS/DEM/GLO30").mosaic().select('DEM')

# 3. CALCULATE SINK DEPTH (FILLING) FOR BOTH PERIODS
# We use MERIT Hydro as the "Hydrologically Correct" reference
filled_ref = ee.Image("MERIT/Hydro/v1_0_1").select('elv')

sink_2000 = filled_ref.subtract(srtm).rename('sink_depth_2000')
sink_2021 = filled_ref.subtract(copernicus).rename('sink_depth_2021')

# 4. CALCULATE DYNAMIC PRESSURE (Runoff Sum for 2020-2024)
# Fixing the band name to 'surface_runoff_sum'
runoff_2020_2024 = ee.ImageCollection("ECMWF/ERA5_LAND/MONTHLY_AGGR") \
    .filterDate('2020-01-01', '2024-12-31') \
    .select('surface_runoff_sum') \
    .sum() \
    .rename('total_runoff_20s')

# 5. EXTRACT DATA
combined_stats = sink_2000.addBands(sink_2021).addBands(runoff_2020_2024)

results = combined_stats.reduceRegions(
    collection=lake_points,
    reducer=ee.Reducer.mean(),
    scale=30
)

# 6. EXPORT
df_final = geemap.ee_to_df(results)
# Calculate the "Filling Rate"
df_final['depth_change'] = df_final['sink_depth_2021'] - df_final['sink_depth_2000']

df_final.to_csv('data/lake_topographic_change_2000_2025.csv', index=False)
print("Comparison complete! Check 'lake_topographic_change_2000_2025.csv'")

In [None]:
import ee
import geemap
import pandas as pd

# 1. Initialize
ee.Initialize(project='bengaluru-lakes-485612')

# 2. Load Topography
copernicus = ee.ImageCollection("COPERNICUS/DEM/GLO30").mosaic().select('DEM')

# Load Lake Points and BUFFER them by 100m to create small polygons
df = pd.read_csv('data/bengaluru_lakes_mean.csv')
features = [
    ee.Feature(ee.Geometry.Point([row['lon'], row['lat']]).buffer(100), {'name': row['name']}) 
    for i, row in df.iterrows()
]
lake_polygons = ee.FeatureCollection(features)

def calculate_bathymetry(year):
    # Sentinel-2 Median for the year (broadened window to ensure water capture)
    s2 = ee.ImageCollection("COPERNICUS/S2_SR_HARMONIZED") \
        .filterDate(f'{year}-01-01', f'{year}-12-31') \
        .filter(ee.Filter.lt('CLOUDY_PIXEL_PERCENTAGE', 20)) \
        .median()
    
    # Relaxed Water Mask (includes shallow/weedy water)
    ndwi = s2.normalizedDifference(['B3', 'B8'])
    water_mask = ndwi.gt(0.0) # Lowered threshold from 0.1 to 0.0
    
    # Step A: Find the 'Rim Elevation' (Max elevation in the 100m buffer)
    # This is more accurate for urban lakes than MERIT
    rim_elevation = copernicus.reduceRegions(
        collection=lake_polygons,
        reducer=ee.Reducer.max(),
        scale=30
    ).reduceToImage(properties=['max'], engine='mean')

    # Step B: Depth = Rim - Ground
    # Only calculate where there is water detected by Sentinel-2
    depth_image = rim_elevation.subtract(copernicus).updateMask(water_mask).rename('depth')
    
    # Step C: Volume (Area * Depth)
    volume_image = depth_image.multiply(ee.Image.pixelArea()).rename('volume')
    
    stats = depth_image.addBands(volume_image).reduceRegions(
        collection=lake_polygons,
        reducer=ee.Reducer.mean().combine(
            reducer2=ee.Reducer.sum(), sharedInputs=True
        ),
        scale=10
    )
    return stats

# 3. Execution
all_years = []
for year in range(2020, 2026):
    print(f"Processing {year}...")
    try:
        yearly_df = geemap.ee_to_df(calculate_bathymetry(year))
        yearly_df['year'] = year
        all_years.append(yearly_df)
    except Exception as e:
        print(f"Error in {year}: {e}")

# 4. Final Export
final_df = pd.concat(all_years)
final_df.to_csv('data/lake_bathymetry_verified_2020_2025.csv', index=False)

In [None]:
import ee
import geemap

# 1. Initialize Earth Engine
ee.Initialize(project='bengaluru-lakes-485612')

# 2. Define Area of Interest (Bengaluru)
# Replace with your specific lake coordinates or boundary
roi = ee.Geometry.Point([77.5946, 12.9716]).buffer(30000) 

# 3. Load Sentinel-1 SAR Collection
# Filter for IW mode, VH polarization (better for urban water detection)
s1_collection = ee.ImageCollection('COPERNICUS/S1_GRD') \
    .filterBounds(roi) \
    .filterDate('2020-01-01', '2025-12-31') \
    .filter(ee.Filter.listContains('transmitterReceiverPolarisation', 'VH')) \
    .filter(ee.Filter.eq('instrumentMode', 'IW'))

# 4. Define the Water Threshold
# -20 dB is a standard threshold for VH. Values below this are likely water.
def identify_water(image):
    # Apply threshold
    water = image.select('VH').lt(-20).rename('is_water')
    # Return 1 for water, 0 for land, masked for invalid data
    return water.copyProperties(image, ['system:time_start'])

# 5. Map the function over the collection
water_time_series = s1_collection.map(identify_water)

# 6. Calculate Frequency
# Frequency = (Sum of water detections / Total number of observations) * 100
total_observations = water_time_series.count()
water_observations = water_time_series.sum()

flood_frequency = water_observations.divide(total_observations).multiply(100).rename('flood_freq_pct')

# 7. Visualization
Map = geemap.Map()
Map.centerObject(roi, 12)

# Set visualization parameters: 0% (Dry) is transparent/white, 100% (Permanent Water) is Blue
viz_params = {
    'min': 0,
    'max': 100,
    'palette': ['#ffffff', '#ff0000', '#0000ff'] # Red for occasional flooding, Blue for permanent
}

Map.addLayer(flood_frequency.clip(roi), viz_params, 'SAR Flood Frequency (%)')
Map

---
---

### Lake-wise SAR (Synthetic Aperture Radar) Flood Frequency Extraction

This script computes **observed flood frequency** around each Bengaluru lake using **Sentinel-1 SAR radar data** for the period **2020–2025**. The output is a lake-level dataset showing **how often flooding was actually detected**, based on satellite observations rather than model assumptions.

---

**Initialization and Data Loading**

- Earth Engine is initialized so all geospatial processing runs on Google’s servers.
- A CSV containing lake names and coordinates is loaded into a Pandas DataFrame.

---

**Building a City-wide Flood Frequency Image (Done Once)**

- All lake coordinates are combined into a single **MultiPoint geometry**, then buffered by **500 m** to define a city-wide region of interest.
- Sentinel-1 SAR images are loaded and filtered:
  - Spatially: only images covering the buffered lake region
  - Temporally: 2020–2025
  - Polarisation: **VH** (best for water detection in urban areas)
  - Mode: **IW** (standard land observation mode)

- Each SAR image is converted into a **binary water map** using a −20 dB threshold:
  - Values below −20 dB → likely water/flooded
  - Values above −20 dB → land or built-up
- All binary water maps are stacked over time.
- **Flood frequency (%)** is computed per pixel as:  
  *(number of times water was detected ÷ number of observations) × 100*  
  This produces a single raster (`sar_flood_freq_pct`) showing how often each pixel was inundated over five years.

---

**Lake-wise Extraction Loop**

- The script loops through lakes **client-side (Python)** for progress monitoring.
- For each lake:
  - A **200 m buffer** around the lake centroid is created to capture spillover and nearby waterlogging.
  - The **mean flood frequency** within this buffer is extracted from the precomputed SAR flood-frequency image using `reduceRegion`.
  - This yields one number per lake:  
    *“On average, what percentage of satellite passes detected flooding here?”*

- Results are stored in a list with lake name and flood frequency.
- Errors for individual lakes are caught so the loop continues uninterrupted.

---

**Saving the Output**

- The collected results are converted to a DataFrame.
- A CSV file is written containing:
  - `name` → lake name  
  - `sar_flood_freq_pct` → observed flood frequency (2020–2025)

---

**Conceptual Meaning**

- This workflow provides an **empirical, observation-based measure of flooding**, not a simulated one.
- It captures:
  - chronic waterlogging
  - repeated lake overflows
  - drainage failures
- The output is ideal for:
  - validating flood models
  - identifying flood hotspots
  - serving as a **ground-truth target** for machine-learning flood-risk models

---

**One-line takeaway**

This code converts five years of Sentinel-1 radar imagery into a lake-wise measure of how often flooding actually occurred around Bengaluru’s lakes.


In [None]:
import ee
import geemap
import pandas as pd
import time

# 1. Initialize
ee.Initialize(project='bengaluru-lakes-485612')

# 2. Load Data
df_lakes = pd.read_csv('data/bengaluru_lakes_mean.csv')

# 3. Create the Base Frequency Image (Do this ONCE outside the loop)
roi_all = ee.Geometry.MultiPoint(df_lakes[['lon', 'lat']].values.tolist()).buffer(500)
s1_collection = ee.ImageCollection('COPERNICUS/S1_GRD') \
    .filterBounds(roi_all) \
    .filterDate('2020-01-01', '2025-12-31') \
    .filter(ee.Filter.listContains('transmitterReceiverPolarisation', 'VH')) \
    .filter(ee.Filter.eq('instrumentMode', 'IW'))

def identify_water(image):
    return image.select('VH').lt(-20).rename('is_water').copyProperties(image, ['system:time_start'])

water_ts = s1_collection.map(identify_water)
flood_freq_img = water_ts.sum().divide(water_ts.count()).multiply(100).rename('sar_flood_freq_pct')

# 4. Processing Loop with Prints
results = []
print(f"Starting extraction for {len(df_lakes)} lakes...")

for index, row in df_lakes.iterrows():
    lake_name = row['name']
    print(f"[{index+1}/{len(df_lakes)}] Processing: {lake_name}...", end="\r")
    
    # Define local geometry
    point = ee.Geometry.Point([row['lon'], row['lat']]).buffer(200)
    
    # Extract mean frequency for this specific lake
    try:
        # reduceRegion (singular) is faster for a single geometry
        stat = flood_freq_img.reduceRegion(
            reducer=ee.Reducer.mean(),
            geometry=point,
            scale=10,
            maxPixels=1e9
        ).getInfo()
        
        results.append({
            'name': lake_name,
            'sar_flood_freq_pct': stat.get('sar_flood_freq_pct')
        })
    except Exception as e:
        print(f"\nError processing {lake_name}: {e}")

# 5. Save results
results_df = pd.DataFrame(results)
results_df.to_csv('data/lake_sar_flood_frequency_2025.csv', index=False)
print("\nExtraction Complete! File saved.")

---
---

## Measuring Rainfall Intensity and Timing
* For urban flooding in Bengaluru, "**Total Rainfall**" is less important than "**Intensity**" (how much rain falls in a short window). 
* We use the **GPM (Global Precipitation Measurement) IMERG dataset**, which provides data every 30 minutes.
    * **Metric 1 (Intensity): Max Daily Rainfall (mm/day)**.
    * **Metric 2 (Timing)**: The month of the peak rainfall event (to correlate with your SAR flood observations).

---

## Measuring Imperviousness
* "Imperviousness" refers to surfaces like concrete, asphalt, and rooftops that prevent water from soaking into the ground. 
* A high impervious percentage in the **200m buffer** around a lake leads to rapid runoff and higher flood risk.
    * Dataset: **Dynamic World (10m) or ESA WorldCover**. Dynamic World is preferred because it's at **10m resolution (same as Sentinel-2).**

---

In [None]:
import ee
import pandas as pd

# 1. Initialize
ee.Initialize(project='bengaluru-lakes-485612')

# 2. Load Geometries
df_lakes = pd.read_csv('data/bengaluru_lakes_mean.csv')
features = [
    ee.Feature(ee.Geometry.Point([row['lon'], row['lat']]).buffer(200), {'name': row['name']}) 
    for _, row in df_lakes.iterrows()
]
lake_fc = ee.FeatureCollection(features)

def export_hydrology_year(year):
    print(f"Submitting Task for {year}...")
    start_date = ee.Date.fromYMD(year, 1, 1)
    end_date = ee.Date.fromYMD(year, 12, 31)

    # --- 1. RAINFALL: DAILY AGGREGATION ---
    gpm = ee.ImageCollection("NASA/GPM_L3/IMERG_V07") \
        .filterDate(start_date, end_date) \
        .select('precipitation')

    days = ee.List.sequence(0, end_date.difference(start_date, 'day').subtract(1))
    
    def calc_daily(d):
        date = start_date.advance(d, 'day')
        return gpm.filterDate(date, date.advance(1, 'day')) \
                  .sum().multiply(0.5) \
                  .set('system:time_start', date.millis())
    
    daily_col = ee.ImageCollection.fromImages(days.map(calc_daily))
    daily_list = daily_col.toList(366)

    # --- 2. VECTORIZED ROLLING 3-DAY SUM (Faster) ---
    # We sum Image(i) + Image(i-1) + Image(i-2)
    indices = ee.List.sequence(2, daily_list.length().subtract(1))
    
    def sum_3days(i):
        i = ee.Number(i)
        img1 = ee.Image(daily_list.get(i))
        img2 = ee.Image(daily_list.get(i.subtract(1)))
        img3 = ee.Image(daily_list.get(i.subtract(2)))
        return img1.add(img2).add(img3).set('system:time_start', img1.get('system:time_start'))

    max_3day_img = ee.ImageCollection.fromImages(indices.map(sum_3days)).max().rename('max_3day_rain_mm')

    # --- 3. PEAK INTENSITY & IMPERVIOUSNESS ---
    peak_30min_img = gpm.max().multiply(0.5).rename('peak_30min_intensity_mm')
    
    dw = ee.ImageCollection("GOOGLE/DYNAMICWORLD/V1") \
        .filterDate(start_date, end_date).select('label').mode()
    impervious_img = dw.eq(6).rename('impervious_fraction')

    # --- 4. BATCH EXTRACTION ---
    combined = peak_30min_img.addBands([max_3day_img, impervious_img])
    
    stats = combined.reduceRegions(
        collection=lake_fc,
        reducer=ee.Reducer.mean(),
        scale=10,
        tileScale=4 # Splits the job into smaller tiles to avoid memory errors
    )

    # --- 5. EXPORT TO DRIVE ---
    task = ee.batch.Export.table.toDrive(
        collection=stats,
        description=f'Hydrology_Stats_{year}',
        folder='EE_Exports', # Folder name in your Google Drive
        fileNamePrefix=f'lake_stats_{year}',
        fileFormat='CSV'
    )
    task.start()

# Run for all years
for yr in range(2020, 2026):
    export_hydrology_year(yr)

print("All tasks submitted! Check your Google Earth Engine 'Tasks' tab or your Google Drive 'EE_Exports' folder.")

---
---

### Recorded data cleaning for further processing


In [None]:
import pandas as pd
import glob

# Find all files starting with 'lake_stats_' in the data folder
files = sorted(glob.glob('data/lake_stats_20*.csv'))

all_years = []

for f in files:
    df = pd.read_csv(f)
    # Optional: If the CSV doesn't have a 'year' column, extract it from the filename
    if 'year' not in df.columns:
        year = f.split('_')[-1].replace('.csv', '')
        df['year'] = int(year)
    all_years.append(df)
# Merge everything
master_df = pd.concat(all_years, ignore_index=True)

cols_to_drop = ['system:index', '.geo']
master_df = master_df.drop(columns=[c for c in cols_to_drop if c in master_df.columns])
master_df = master_df.sort_values(by = ['name', 'year'])
master_df = master_df[['name', 'impervious_fraction', 'max_3day_rain_mm', 'peak_30min_intensity_mm', 'year']]

print(f"Merged {len(files)} files into a single master DataFrame.")

master_df.to_csv('data/lake_stats_summary_2020_2025.csv')


In [None]:
master_df.head()

In [None]:
import pandas as pd

# 1. Load all datasets
df_hydro = pd.read_csv('data/lake_stats_summary_2020_2025.csv')
df_landuse = pd.read_csv('data/bengaluru_lakes_cleaned_gt_0.5ha.csv')
df_flow = pd.read_csv('data/lake_flow_analysis.csv')
df_flood = pd.read_csv('data/lake_sar_flood_frequency_2025.csv')
df_encroach = pd.read_csv('data/bengaluru_lakes_mean.csv')

# 2. Average the Yearly Data (Hydro & Land Use)
# We drop 'year' and 'Unnamed: 0' before averaging
hydro_mean = df_hydro.drop(columns=['year', 'Unnamed: 0'], errors='ignore').groupby('name').mean().reset_index()

# For landuse, we keep lat/lon as they are constant, but average the areas
landuse_mean = df_landuse.drop(columns=['year', 'Unnamed: 0'], errors='ignore').groupby('name').mean().reset_index()

# 3. Merge into a single "Representative" DataFrame
# Start with landuse_mean as it contains lat/lon
ml_dataset = pd.merge(landuse_mean, hydro_mean, on='name', how='inner')

# Add static flow data
ml_dataset = pd.merge(ml_dataset, df_flow, on='name', how='left')

# Add pre-calculated encroachment data
ml_dataset = pd.merge(ml_dataset, df_encroach[['name', 'encroachment_pct']], on='name', how='left')

# Add the TARGET variable (Flood Frequency)
ml_dataset = pd.merge(ml_dataset, df_flood, on='name', how='left')

# 4. Final Cleanup
ml_dataset.fillna(0, inplace=True)

# 5. Save for ML
ml_dataset.to_csv('data/lake_flood_ml_ready.csv', index=False)

print(f"ML Dataset Created: {ml_dataset.shape[0]} lakes and {ml_dataset.shape[1]} features.")
print("Sample of predictors:", ml_dataset[['name', 'impervious_fraction', 'flow_accumulation_km2', 'sar_flood_freq_pct']].head())

---
---

### ML–Based Flood Risk Classification

This script builds and evaluates a **lake-level flood risk classification model** for Bengaluru using **observed SAR flood frequency** as the outcome and a set of **physically meaningful flood drivers** as predictors. The goal is to classify lakes into **Low Risk** and **High Risk** flood categories in a way that is interpretable and actionable.

---

**Data Loading**

- A pre-processed, lake-level dataset (`lake_flood_ml_ready.csv`) is loaded.
- Each row represents one lake, with rainfall, land-cover, drainage, and observed flood-frequency metrics already aggregated spatially.

---

**Feature Selection: The Three Pillars Framework**

- **Hydrological Drivers (Trigger):**
  - `max_3day_rain_mm` → cumulative wetness / system saturation  
  - `peak_30min_intensity_mm` → short-duration storm intensity

- **Land-Cover Vulnerability (Resistance):**
  - `impervious_fraction` → runoff efficiency  
  - `in_build_ha` → built-up pressure near lakes  
  - `encroachment_pct` → loss of natural buffer and storage

- **Landscape Topology (Gravity):**
  - `flow_accumulation_km2` → upstream drainage pressure  
  - `potential_ha` → lake basin scale

These features reflect **physical flood processes**, not just statistical convenience.

---

**Target Variable Construction**

- The continuous SAR-derived flood frequency (`sar_flood_freq_pct`) is converted into a binary risk label.
- Lakes with flood frequency **greater than 25%** are labelled as **High Risk** (`1`); others as **Low Risk** (`0`).
- This threshold produces a policy-friendly flood-risk classification while preserving an observational basis.

---

**Data Cleaning and Train–Test Split**

- Rows with missing feature or label values are removed.
- The dataset is split into:
  - **80% training data**
  - **20% testing data**
- A fixed random seed ensures reproducibility.

---

**Model Training**

- A **Random Forest Classifier** with 100 decision trees is trained.
- Random Forests are well suited here because:
  - flood drivers interact non-linearly
  - features operate at different scales
  - the model remains interpretable via feature importance

---

**Model Evaluation**

- Predictions are generated for the test set.
- Performance is assessed using:
  - **Accuracy** → overall correctness
  - **Classification report** → precision, recall, and F1-score for Low and High Risk classes
- This evaluates how well physical drivers explain observed flooding.

---

**Feature Importance Analysis**

- The contribution of each feature to the model’s decisions is extracted.
- Features are grouped by category (Rain, Buildings, Topology) to assess:
  - which physical processes dominate flood risk
- A bar plot visualizes relative importance for intuitive interpretation.

---

**Saving Final Predictions**

- Model predictions are mapped back to lake names.
- The output CSV contains:
  - lake name
  - observed flood frequency
  - true risk label
  - predicted risk class
- This enables direct comparison between observed and modelled flood risk.

---

**Conceptual Meaning**

- This workflow translates **observed flooding patterns** into a **predictive, interpretable risk classification**.
- It does not simulate floods; instead, it learns which combinations of rainfall, urbanisation, and drainage characteristics are associated with repeated inundation.
- The results are suitable for:
  - prioritising flood-prone lakes
  - policy and planning discussions
  - downstream regression or risk-index development

---

**One-line takeaway**

This code uses a Random Forest classifier to learn how rainfall, urban encroachment, and drainage topology jointly determine whether Bengaluru’s lakes are repeatedly flood-prone.


In [None]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import classification_report, confusion_matrix, accuracy_score

# 1. LOAD DATA
# Ensure you are using the lake-level averaged dataset
df = pd.read_csv('data/lake_flood_ml_ready.csv')

# 2. FEATURE SELECTION (The Three Pillars)
# Hydrological Drivers (The Trigger)
rain_features = ['max_3day_rain_mm', 'peak_30min_intensity_mm']
# Land-Cover Vulnerability (The Resistance)
building_features = ['impervious_fraction', 'in_build_ha', 'encroachment_pct']
# Landscape Topology (The Gravity)
topology_features = ['flow_accumulation_km2', 'potential_ha']

X_features = rain_features + building_features + topology_features
target = 'sar_flood_freq_pct'

# 3. PREPARE CLASSIFICATION TARGET
# We define "High Risk" as any lake with > 25% flood frequency
threshold = 25
df['risk_label'] = (df[target] > threshold).astype(int)

# 4. DATA CLEANING & SPLITTING
df_ml = df.dropna(subset=X_features + ['risk_label'])
X = df_ml[X_features]
y = df_ml['risk_label']

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# 5. TRAIN RANDOM FOREST CLASSIFIER
# We use a classifier to maximize accuracy and actionable insights
model = RandomForestClassifier(n_estimators=100, random_state=42)
model.fit(X_train, y_train)

# 6. EVALUATION
y_pred = model.predict(X_test)
print("--- Model Performance ---")
print(f"Accuracy: {accuracy_score(y_test, y_pred):.2%}")
print("\nDetailed Classification Report:")
print(classification_report(y_test, y_pred, target_names=['Low Risk', 'High Risk']))

# 7. FEATURE IMPORTANCE ANALYSIS
importances = model.feature_importances_
feat_df = pd.DataFrame({
    'Feature': X_features,
    'Importance': importances,
    'Category': (['Rain'] * 2) + (['Buildings'] * 3) + (['Topology'] * 2)
}).sort_values(by='Importance', ascending=False)

# 8. VISUALIZATION
plt.figure(figsize=(10, 6))
sns.barplot(data=feat_df, x='Importance', y='Feature', hue='Category', dodge=False)
plt.title('Drivers of Flood Risk in Bengaluru (ML Feature Importance)')
plt.xlabel('Contribution to Model Prediction')
plt.tight_layout()
plt.show()

# Calculate Category-level Importance
cat_importance = feat_df.groupby('Category')['Importance'].sum().sort_values(ascending=False)
print("\n--- Importance by Category ---")
print(cat_importance)

# 9. SAVE FINAL PREDICTIONS
# Map predictions back to lake names for the test set
results = df_ml.loc[X_test.index, ['name', target, 'risk_label']].copy()
results['predicted_risk'] = y_pred
results.to_csv('final_flood_risk_predictions.csv', index=False)

In [None]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestRegressor
from sklearn.metrics import mean_absolute_error, mean_squared_error, r2_score

# 1. LOAD DATA
df = pd.read_csv('data/lake_flood_ml_ready.csv')

# 2. FEATURE SELECTION (Same three pillars)
rain_features = ['max_3day_rain_mm', 'peak_30min_intensity_mm']
building_features = ['impervious_fraction', 'in_build_ha', 'encroachment_pct']
topology_features = ['flow_accumulation_km2', 'potential_ha']

X_features = rain_features + building_features + topology_features
target = 'sar_flood_freq_pct'

# 3. CLEAN DATA
df_ml = df.dropna(subset=X_features + [target])
X = df_ml[X_features]
y = df_ml[target]

# 4. TRAIN–TEST SPLIT
X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.2, random_state=42
)

# 5. TRAIN RANDOM FOREST REGRESSOR
model = RandomForestRegressor(
    n_estimators=300,
    random_state=42,
    min_samples_leaf=2
)

model.fit(X_train, y_train)

# 6. EVALUATION
y_pred = model.predict(X_test)

print("--- Regression Performance ---")
print(f"MAE  (Mean Absolute Error): {mean_absolute_error(y_test, y_pred):.2f}")
print(f"RMSE (Root Mean Sq Error): {np.sqrt(mean_squared_error(y_test, y_pred)):.2f}")
print(f"R²   (Explained Variance): {r2_score(y_test, y_pred):.2f}")

# 7. FEATURE IMPORTANCE
feat_df = pd.DataFrame({
    'Feature': X_features,
    'Importance': model.feature_importances_,
    'Category': (['Rain'] * 2) + (['Buildings'] * 3) + (['Topology'] * 2)
}).sort_values(by='Importance', ascending=False)

# 8. VISUALISATION
plt.figure(figsize=(10, 6))
sns.barplot(
    data=feat_df,
    x='Importance',
    y='Feature',
    hue='Category',
    dodge=False
)
plt.title('Drivers of Flood Frequency in Bengaluru (Regression)')
plt.xlabel('Relative Contribution')
plt.tight_layout()
plt.show()

# 9. CATEGORY-LEVEL IMPORTANCE
cat_importance = feat_df.groupby('Category')['Importance'].sum().sort_values(ascending=False)
print("\n--- Importance by Category ---")
print(cat_importance)

# 10. SAVE PREDICTIONS
results = df_ml.loc[X_test.index, ['name', target]].copy()
results['predicted_flood_freq_pct'] = y_pred
results.to_csv('final_flood_frequency_predictions.csv', index=False)
