# Bengaluru Lakes Analysis (2015-2025)

This notebook performs a comprehensive analysis of lakes in Bengaluru, India, including lake identification, area calculation from satellite imagery, and correlation with environmental and development factors.

## 1. Setup and Authentication
Install and import necessary libraries and authenticate with Google Earth Engine.

In [6]:
import ee
import osmnx as ox
import geopandas as gpd
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import folium
from shapely.geometry import Polygon, MultiPolygon
import time
from sklearn.ensemble import RandomForestRegressor
from sklearn.model_selection import train_test_split
from sklearn.metrics import mean_squared_error, r2_score

print('Libraries imported.')

Libraries imported.


### Google Earth Engine Authentication
To run the GEE parts of this notebook, you need an Earth Engine account. Run the cell below to authenticate.

In [7]:
# ee.Authenticate()
project_id = 'bengaluru-lakes-485612'

try:
    ee.Initialize(project=project_id)
    print("Successfully initialized!")
except Exception:
    ee.Authenticate()
    ee.Initialize(project=project_id)

Successfully initialized!


## 2. Identify Lakes in Bengaluru
Query and extract a list of lakes within the administrative boundary of Bengaluru using OpenStreetMap (OSM).

In [5]:
# Define Bengaluru boundary
place_name = 'Bengaluru, Karnataka, India'

# Query lakes using tags: natural=water and water=lake
tags = {'natural': 'water', 'water': 'lake'}

print(f'Searching for lakes in {place_name}...')
try:
    # Use osmnx to fetch features
    lakes_gdf = ox.features_from_place(place_name, tags)
    
    # Filter for polygons and multipolygons
    lakes_gdf = lakes_gdf[lakes_gdf.geometry.type.isin(['Polygon', 'MultiPolygon'])]
    
    # Keep only relevant columns and drop rows without names
    lakes_gdf = lakes_gdf[['name', 'geometry']].dropna(subset=['name'])
    
    print(f'Found {len(lakes_gdf)} lakes with names.')
    display(lakes_gdf.head())
except Exception as e:
    print(f'Error retrieving lakes: {e}')

lakes_gdf.to_csv('data/lake_polygon_boundaries.csv', index=False)


Searching for lakes in Bengaluru, Karnataka, India...
Found 202 lakes with names.


Unnamed: 0_level_0,Unnamed: 1_level_0,name,geometry
element,id,Unnamed: 2_level_1,Unnamed: 3_level_1
relation,1332093,NCBS Pond,"POLYGON ((77.5791 13.07125, 77.57909 13.07121,..."
relation,1853330,Vengayyana Lake,"POLYGON ((77.70218 13.01708, 77.70235 13.017, ..."
relation,1857615,Halasuru lake,"POLYGON ((77.62261 12.98202, 77.6227 12.98193,..."
relation,2310400,Chelekere,"POLYGON ((77.64527 13.02519, 77.64512 13.02543..."
relation,2310417,Madiwala Lake,"MULTIPOLYGON (((77.61159 12.90261, 77.61165 12..."


## 3. Delineate Lake Boundaries
Refine lake polygons and overlay them on a geographical map for verification.

### 1. The Data Source: COPERNICUS/S2_SR
* You are pulling data from the Sentinel-2 satellites.

* SR stands for "Surface Reflectance," meaning the data has been processed to remove the "haze" of the atmosphere. It's the cleanest version of the data.

* `.filterBounds(...)` tells the computer to only look for images that touch the coordinates of Bengaluru.

* `.first()` grabs the single most recent clear image from that collection.

### 2. The Secret Sauce: {'bands': ['B8', 'B4', 'B3']}
* Standard cameras use Red, Green, and Blue. This code replaces those with a different combination:

* **B8 (Near-Infrared - NIR)**: This is assigned to the Red channel of your screen. Healthy plants reflect NIR incredibly strongly.

* **B4 (Red)**: Assigned to the Green channel.

* **B3 (Green)**: Assigned to the Blue channel.

### 3. What the colors mean now
* Because you've remapped the colors, the "dark green" mystery is solved:

* **Bright Red**: This is Vegetation. If a lake looks bright red, it's not water; it's a thick layer of weeds (like Water Hyacinth). The redder it is, the "healthier" the plants are.

* **Deep Black/Dark Blue**: This is Clear Water. Water absorbs the B8 (NIR) band completely, so it reflects nothing back, appearing black.

* **Grey/Cyan**: This is Built-up Area (concrete, roads, and buildings).

[Image comparison of Natural Color vs False Color Infrared satellite imagery]

### 4. Why use min: 0, max: 3000?
* Satellite sensors don't store "colors" as 0–255 like a JPEG. They store raw light intensity values. For Sentinel-2, **3000** is a typical "bright" value. Setting these limits tells the map how to stretch the contrast so the image isn't too dark or completely blown out (pure white).

* Why this explains your "**Dark Green**" lakes:
    * When you run this code:

* If the lake stays Black, it is open water that just looked dark due to depth or algae.

* If the lake turns Bright Pink or Red, it is actually solid vegetation (weeds) masquerading as a lake surface.

In [None]:
import geemap
import geopandas as gpd

# 1. Calculate the centroid correctly using projection
# We use a temporary object so we don't clutter the GeoDataFrame with objects
temp_centroids = lakes_gdf.to_crs(epsg=32643).centroid.to_crs(epsg=4326)

# 2. Store only the numbers (floats), which are JSON-friendly
lakes_gdf['latitude'] = temp_centroids.y
lakes_gdf['longitude'] = temp_centroids.x

# 3. Drop the 'centroid' column if it exists as an object
if 'centroid' in lakes_gdf.columns:
    lakes_gdf = lakes_gdf.drop(columns=['centroid'])

# 4. Create the map
Map = geemap.Map(center=[12.9716, 77.5946], zoom=11)
Map.add_basemap('HYBRID')

# 5. Add the first 50 lakes
# geemap handles the 'geometry' column automatically
Map.add_gdf(lakes_gdf.head(50), layer_name='geometry', info_mode='on_click', style={
        'color': '#00FFFF',   # Bright Cyan outline for visibility
        'width': 2,           # Thickness of the line
        'fill_opacity': 0     # Makes the center transparent
    })
Map.addLayer(
ee.ImageCollection("COPERNICUS/S2_SR").filterBounds(ee.Geometry.Point([77.5946, 12.9716])).first(),
{'bands': ['B8', 'B4', 'B3'], 'min': 0, 'max': 3000},
'Vegetation (Red) vs Water (Black)'
)

Map

* To calculate the area of the lakes, we must first re-project your data. Since the lakes are currently in a geographic system (degrees), a direct **.area** calculation would give nonsense numbers (square degrees).

* For Bengaluru, we use **EPSG:32643 (UTM Zone 43N)**, which uses meters.

In [None]:
# 1. Project to a meter-based system (UTM 43N)
# 2. Calculate area (returns square meters)
# 3. Convert to Square Kilometers (divide by 1,000,000)
lakes_gdf['area_km2'] = lakes_gdf.to_crs(epsg=32643).geometry.area / 1_000_000

# Let's see the top 5 largest lakes
print(lakes_gdf[['name', 'area_km2']].sort_values(by='area_km2', ascending=False).head())

#### Adding Area to Map Popups
* Now that I have the area, I can include it in the **geemap popups** so that clicking a lake gives me its size

In [None]:
# Round for clean display
lakes_gdf['area_label'] = lakes_gdf['area_km2'].round(2).astype(str) + " km²"

Map = geemap.Map(center=[12.9716, 77.5946], zoom=12)
Map.add_basemap('SATELLITE')

# Add only the necessary columns to the map to avoid the JSON error
map_data = lakes_gdf[['geometry', 'name', 'area_label']]

Map.add_gdf(map_data, layer_name='Lakes with Area')
Map

In [None]:
# 1. Setup Storage
results = []

# Assuming lake_geometry is defined (e.g., from your GeoDataFrame)
# For this example, we'll loop through one lake geometry. 
# To do all 50, you'd wrap this in another loop.
#lake_geom_ee = geemap.gdf_to_ee(lakes_gdf.iloc[[2]]) # Let's use the first lake

for idx in range(len(lakes_gdf)):
    lake_geom_ee = geemap.gdf_to_ee(lakes_gdf.iloc[[idx]])

    for year in range(2015, 2026):
        start_date = f'{year}-01-01'
        end_date = f'{year}-12-31'
        
        # 2. Select Collection based on Year
        if year >= 2017:
            collection = (ee.ImageCollection('COPERNICUS/S2_SR_HARMONIZED')
                .filterBounds(lake_geom_ee)
                .filterDate(start_date, end_date)
                .filter(ee.Filter.lt('CLOUDY_PIXEL_PERCENTAGE', 20))
                .select(['B3', 'B11'], ['Green', 'SWIR1']))
        else:
            collection = (ee.ImageCollection('LANDSAT/LC08/C02/T1_L2')
                .filterBounds(lake_geom_ee)
                .filterDate(start_date, end_date)
                .filter(ee.Filter.lt('CLOUD_COVER', 20))
                .select(['SR_B3', 'SR_B6'], ['Green', 'SWIR1']))

        # 3. Check if images exist
        count = collection.size().getInfo()
        if count == 0:
            print(f"No images found for {year}")
            results.append({'year': year, 'area_m2': None, 'status': 'no_data'})
            continue

        # 4. Create a Median Image and Calculate MNDWI
        image = collection.median()
        mndwi = image.normalizedDifference(['Green', 'SWIR1']).rename('MNDWI')
        
        # 5. Threshold to find water (MNDWI > 0)
        water_mask = mndwi.gt(0)
        
        # 6. Calculate Area of Water Pixels within the lake boundary
        # multiply(ee.Image.pixelArea()) converts pixel count to square meters
        area_image = water_mask.multiply(ee.Image.pixelArea())
        stats = area_image.reduceRegion(
            reducer=ee.Reducer.sum(),
            geometry=lake_geom_ee.geometry(),
            scale=10 if year >= 2017 else 30, # Sentinel is 10m, Landsat is 30m
            maxPixels=1e9
        )
        
        area_val = stats.getInfo().get('MNDWI')
        results.append({'name':lakes_gdf.iloc[idx]['name'],'year': year, 'area_m2': area_val, 'status': 'success'})
        print(f"Year {year}: {area_val} m²")

# 7. Save to the requested DataFrame
lake_params = pd.DataFrame(results)

In [10]:
lake_geom_ee = geemap.gdf_to_ee(lakes_gdf.iloc[[2]]) # Let's use the first lake

for year in range(2015, 2026):
    # --- ADD THESE LINES ---
    start_date = f'{year}-01-01'
    end_date = f'{year}-12-31'
    current_scale = 10 if year >= 2017 else 30 
    # -----------------------

    if year >= 2017:
        collection = (ee.ImageCollection('COPERNICUS/S2_SR_HARMONIZED')
            .filterBounds(lake_geom_ee)
            .filterDate(start_date, end_date)
            .filter(ee.Filter.lt('CLOUDY_PIXEL_PERCENTAGE', 20))
            .select(['B3', 'B11'], ['Green', 'SWIR1']))
    else:
        collection = (ee.ImageCollection('LANDSAT/LC08/C02/T1_L2')
            .filterBounds(lake_geom_ee)
            .filterDate(start_date, end_date)
            .filter(ee.Filter.lt('CLOUD_COVER', 20))
            .select(['SR_B3', 'SR_B6'], ['Green', 'SWIR1']))

    count = collection.size().getInfo()
    if count == 0:
        results.append({'year': year, 'dynamic_area_m2': None, 'status': 'no_data'})
        continue
    
    # Use a buffer to create a "Search Zone"
    search_zone = lake_geom_ee.geometry().buffer(500)
    
    # Calculate MNDWI on the median composite
    image = collection.median().clip(search_zone)
    mndwi = image.normalizedDifference(['Green', 'SWIR1']).rename('MNDWI')
    
    # Detect water pixels
    water_mask = mndwi.gt(0.0) 
    
    # Calculate Area
    stats = water_mask.multiply(ee.Image.pixelArea()).reduceRegion(
        reducer=ee.Reducer.sum(),
        geometry=search_zone,
        scale=current_scale, # Use the dynamic scale
        maxPixels=1e9
    )
    
    dynamic_area = stats.getInfo().get('MNDWI')
    results.append({'year': year, 'dynamic_area_m2': dynamic_area, 'status': 'success'})
    print(f"Year {year}: {dynamic_area} m²")

Year 2015: 298488.0909770891 m²
Year 2016: 307205.1808208391 m²
Year 2017: 330487.90259061335 m²
Year 2018: 309032.87561154837 m²
Year 2019: 308539.07642926986 m²
Year 2020: 324301.05529578717 m²
Year 2021: 360142.05182222853 m²
Year 2022: 354847.5752257104 m²
Year 2023: 361065.44906529447 m²
Year 2024: 349972.8127917421 m²
Year 2025: 358339.5691966637 m²


## 4. Retrieve and Process Annual Satellite Data (2015-2025)
Using Google Earth Engine to fetch annual satellite imagery (Sentinel-2 and Landsat 8) and calculate lake areas.

In [13]:
def mask_s2_clouds(image):
    """Masks clouds in a Sentinel-2 image using the QA60 band."""
    qa = image.select('QA60')
    cloud_bit_mask = 1 << 10
    cirrus_bit_mask = 1 << 11
    mask = qa.bitwiseAnd(cloud_bit_mask).eq(0).And(qa.bitwiseAnd(cirrus_bit_mask).eq(0))
    return image.updateMask(mask).divide(10000)

def calculate_mndwi(image):
    """Calculates Modified Normalized Difference Water Index (MNDWI)."""
    return image.normalizedDifference(['Green', 'SWIR1']).rename('MNDWI')

def get_lake_area(lake_geometry, year):
    """Computes lake area for a given year using GEE."""
    start_date = f'{year}-01-01'
    end_date = f'{year}-12-31'
    
    # Use Sentinel-2 for 2017 onwards, Landsat 8 for 2015-2016
    if year >= 2017:
        collection = ee.ImageCollection('COPERNICUS/S2_SR_HARMONIZED') \
            .filterBounds(lake_geometry) \
            .filterDate(start_date, end_date) \
            .filter(ee.Filter.lt('CLOUDY_PIXEL_PERCENTAGE', 20)) \
            .map(mask_s2_clouds) \
            .select(['B3', 'B11'], ['Green', 'SWIR1'])
    else:
        collection = ee.ImageCollection('LANDSAT/LC08/C02/T1_L2') \
            .filterBounds(lake_geometry) \
            .filterDate(start_date, end_date) \
            .filter(ee.Filter.lt('CLOUD_COVER', 20)) \
            .select(['SR_B3', 'SR_B6'], ['Green', 'SWIR1'])

    if collection.size().getInfo() == 0:
        return None

    # Median composite
    composite = collection.median()
    mndwi = calculate_mndwi(composite)
    
    # Threshold for water (typically > 0)
    water_mask = mndwi.gt(0)
    
    # Calculate area
    area_image = water_mask.multiply(ee.Image.pixelArea())
    stats = area_image.reduceRegion(
        reducer=ee.Reducer.sum(),
        geometry=lake_geometry,
        scale=10 if year >= 2017 else 30,
        maxPixels=1e9
    )
    
    return stats.get('MNDWI').getInfo()


In [21]:
data = []
for idx, row in lakes_gdf.head(1).iterrows(): # Limit to 5 for demonstration
    print(f'Processing {row["name"]}...')
    geom = ee.Geometry(mapping(row['geometry']))
    for year in years:
        area = get_lake_area(geom, year)
        rainfall = get_rainfall(geom, year)
        ndvi = get_green_cover(geom, year)
        night_lights = get_night_lights(geom, year)
        built_up = get_built_up(geom, year)
        
        data.append({
            'lake_name': row['name'],
            'year': year,
            'area_m2': area,
            'rainfall_mm': rainfall,
            'ndvi': ndvi,
            'built_up': built_up,
            'night_lights': night_lights,
            'flood_occurrence': 1 if (rainfall and rainfall > 1000) else 0 # Simple proxy
        })
results_df = pd.DataFrame(data)

Processing NCBS Pond...


## 5. Retrieve Auxiliary Environmental and Development Datasets (2015-2025)
Retrieving Rainfall (CHIRPS), Green Cover (NDVI), Built-up Area (GHSL), and Night Lights (VIIRS).

In [15]:
def get_rainfall(geometry, year):
    """Retrieves annual rainfall totals from CHIRPS."""
    collection = ee.ImageCollection('UCSB-CHG/CHIRPS/PENTAD') \
        .filterBounds(geometry) \
        .filterDate(f'{year}-01-01', f'{year}-12-31')
    return collection.sum().reduceRegion(ee.Reducer.mean(), geometry, 5000).get('precipitation').getInfo()

def get_green_cover(geometry, year):
    """Retrieves mean NDVI from Sentinel-2/Landsat."""
    start_date, end_date = f'{year}-01-01', f'{year}-12-31'
    if year >= 2017:
        img = ee.ImageCollection('COPERNICUS/S2_SR_HARMONIZED') \
            .filterBounds(geometry).filterDate(start_date, end_date) \
            .filter(ee.Filter.lt('CLOUDY_PIXEL_PERCENTAGE', 20)).median()
        ndvi = img.normalizedDifference(['B8', 'B4'])
    else:
        img = ee.ImageCollection('LANDSAT/LC08/C02/T1_L2') \
            .filterBounds(geometry).filterDate(start_date, end_date) \
            .filter(ee.Filter.lt('CLOUD_COVER', 20)).median()
        ndvi = img.normalizedDifference(['SR_B5', 'SR_B4'])
    return ndvi.reduceRegion(ee.Reducer.mean(), geometry, 30).get('nd').getInfo()

def get_night_lights(geometry, year):
    """Retrieves annual night light intensity from VIIRS."""
    collection = ee.ImageCollection('NOAA/VIIRS/DNB/MONTHLY_V1/VCMSLCFG') \
        .filterBounds(geometry) \
        .filterDate(f'{year}-01-01', f'{year}-12-31')
    return collection.mean().reduceRegion(ee.Reducer.mean(), geometry, 500).get('avg_rad').getInfo()

def get_built_up(geometry, year):
    """Retrieves built-up area index (proxy) from GHSL or similar."""
    # GHSL is not annual, but we can use the closest available or Dynamic World
    dw = ee.ImageCollection('GOOGLE/DYNAMICWORLD/V1') \
        .filterBounds(geometry).filterDate(f'{year}-01-01', f'{year}-12-31').mosaic()
    built_up = dw.select('built')
    return built_up.reduceRegion(ee.Reducer.mean(), geometry, 10).get('built').getInfo()


## 6. Statistical Analysis and Correlations
Analyze changes in lake area and correlate with environmental/urbanization factors.

In [16]:
def analyze_changes(df):
    #"""Calculates percentage change and correlations."""
    # Sort by lake and year
    df = df.sort_values(['lake_name', 'year'])
    
    # Calculate % change in area from previous year
    df['area_pct_change'] = df.groupby('lake_name')['area_m2'].pct_change() * 100
    
    # Overall change 2015-2025
    total_change = df.groupby('lake_name').apply(
        lambda x: (x.iloc[-1]['area_m2'] - x.iloc[0]['area_m2']) / x.iloc[0]['area_m2'] * 100 
        if len(x) > 1 and x.iloc[0]['area_m2'] > 0 else 0
    ).reset_index(name='total_pct_change_2015_2025')
    
    return df, total_change

def plot_correlation(df):
    """Generates a correlation matrix heatmap."""
    cols_to_corr = ['area_m2', 'rainfall_mm', 'ndvi', 'built_up', 'night_lights', 'flood_occurrence']
    corr = df[cols_to_corr].corr()
    
    plt.figure(figsize=(10, 8))
    sns.heatmap(corr, annot=True, cmap='coolwarm', fmt='.2f')
    plt.title('Correlation Matrix of Environmental and Urbanization Factors')
    plt.show()


## 7. Predictive Modeling for Flooding
Fit a Random Forest model to estimate the probability of future flooding based on lake area reduction, rainfall, and urbanization.

In [17]:
def train_flood_model(df):
    """Trains a predictive model for flooding."""
    # Features: % reduction in lake area, Rainfall, Green cover, Built-up, Night lights
    features = ['area_m2', 'rainfall_mm', 'ndvi', 'built_up', 'night_lights']
    target = 'flood_occurrence'
    
    # Preprocessing: drop rows with NaNs in features
    data = df.dropna(subset=features + [target])
    
    if len(data) < 10:
        print('Not enough data to train model.')
        return None, None
    
    X = data[features]
    y = data[target]
    
    X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
    
    model = RandomForestRegressor(n_estimators=100, random_state=42)
    model.fit(X_train, y_train)
    
    y_pred = model.predict(X_test)
    mse = mean_squared_error(y_test, y_pred)
    r2 = r2_score(y_test, y_pred)
    
    print(f'Model trained. MSE: {mse:.4f}, R2: {r2:.4f}')
    return model, features


## 8. Outputs and Visualizations
Annual statistics and time-series plots for each lake.

In [18]:
def plot_time_series(df, lake_name):
    """Plots time series for a specific lake."""
    lake_df = df[df['lake_name'] == lake_name]
    
    fig, axes = plt.subplots(3, 2, figsize=(15, 12))
    fig.suptitle(f'Environmental and Urbanization Trends for {lake_name} (2015-2025)')
    
    sns.lineplot(ax=axes[0, 0], data=lake_df, x='year', y='area_m2', marker='o')
    axes[0, 0].set_title('Lake Area (sq meters)')
    
    sns.lineplot(ax=axes[0, 1], data=lake_df, x='year', y='rainfall_mm', marker='o', color='green')
    axes[0, 1].set_title('Annual Rainfall (mm)')
    
    sns.lineplot(ax=axes[1, 0], data=lake_df, x='year', y='ndvi', marker='o', color='brown')
    axes[1, 0].set_title('Green Cover (NDVI)')
    
    sns.lineplot(ax=axes[1, 1], data=lake_df, x='year', y='built_up', marker='o', color='red')
    axes[1, 1].set_title('Built-up Area Index')
    
    sns.lineplot(ax=axes[2, 0], data=lake_df, x='year', y='night_lights', marker='o', color='orange')
    axes[2, 0].set_title('Night Light Intensity')
    
    sns.barplot(ax=axes[2, 1], data=lake_df, x='year', y='flood_occurrence', color='blue')
    axes[2, 1].set_title('Flooding Occurrence (Count/Severity)')
    
    plt.tight_layout(rect=[0, 0.03, 1, 0.95])
    plt.show()


## 9. Data Integration and Execution Loop
Combining all components and handling the data retrieval process for all lakes and years.

In [None]:
from shapely.geometry import mapping
def generate_mock_data(lakes_gdf, years):
    """Generates synthetic data for demonstration if GEE is not authenticated."""
    data = []
    for _, row in lakes_gdf.head(10).iterrows(): # Sample 10 lakes
        base_area = np.random.uniform(5000, 50000)
        for year in years:
            # Simulate a trend: decreasing area, increasing built-up and night lights
            area = base_area * (1 - (year - 2015) * 0.02) + np.random.normal(0, 500)
            rainfall = np.random.uniform(600, 1200)
            ndvi = 0.4 * (1 - (year - 2015) * 0.01) + np.random.normal(0, 0.02)
            built_up = 0.2 * (1 + (year - 2015) * 0.05) + np.random.normal(0, 0.01)
            night_lights = 15 * (1 + (year - 2015) * 0.08) + np.random.normal(0, 1)
            # Flood occurrence based on rainfall and area reduction
            # Simulate flooding with some randomness to avoid perfect correlation
            flood_prob = (rainfall / 1200 * 0.4) + ((base_area - area) / base_area * 0.4) + np.random.normal(0, 0.1)
            flood = 1 if flood_prob > 0.6 else 0
            
            data.append({
                'lake_name': row['name'],
                'year': year,
                'area_m2': max(0, area),
                'rainfall_mm': rainfall,
                'ndvi': max(0, min(1, ndvi)),
                'built_up': max(0, min(1, built_up)),
                'night_lights': max(0, night_lights),
                'flood_occurrence': flood
            })
    return pd.DataFrame(data)

# --- Main Execution ---
years = list(range(2015, 2026))
results_df = None

try:
    print('Attempting to retrieve real data from GEE...')
    # ee.Initialize() # Should be called after ee.Authenticate()
    
    # This block is for real execution (commented out by default as it requires Auth)
    data = []
    for idx, row in lakes_gdf.head(1).iterrows(): # Limit to 5 for demonstration
        print(f'Processing {row["name"]}...')
        geom = ee.Geometry(mapping(row['geometry']))
        for year in years:
            area = get_lake_area(geom, year)
            rainfall = get_rainfall(geom, year)
            ndvi = get_green_cover(geom, year)
            night_lights = get_night_lights(geom, year)
            built_up = get_built_up(geom, year)
            
            data.append({
                'lake_name': row['name'],
                'year': year,
                'area_m2': area,
                'rainfall_mm': rainfall,
                'ndvi': ndvi,
                'built_up': built_up,
                'night_lights': night_lights,
                'flood_occurrence': 1 if (rainfall and rainfall > 1000) else 0 # Simple proxy
            })
    results_df = pd.DataFrame(data)
    raise Exception('GEE Authentication required for real data retrieval.')

except Exception as e:
    print(f'Using synthetic data for demonstration: {e}')
    results_df = generate_mock_data(lakes_gdf, years)

# Post-processing
results_df, total_change_df = analyze_changes(results_df)

# Display results
display(results_df.head())
display(total_change_df.head())

# Correlation
plot_correlation(results_df)

# Modeling
model, features = train_flood_model(results_df)

# Example Time Series
if len(results_df) > 0:
    plot_time_series(results_df, results_df['lake_name'].iloc[0])


## 11. Predictive Flood Probability Maps
Visualize the probability of future flooding for each lake based on the trained model.

In [None]:
def plot_flood_probability_map(lakes_gdf, model, features, results_df):
    """Creates a map showing flood probability for each lake."""
    # Use the most recent year's data as input for prediction
    latest_data = results_df[results_df['year'] == results_df['year'].max()].copy()
    
    # Predict probability (using the model)
    if model:
        latest_data['flood_prob'] = model.predict(latest_data[features])
    else:
        # Fallback if no model
        latest_data['flood_prob'] = np.random.uniform(0, 1, len(latest_data))

    # Merge back with GeoDataFrame
    map_gdf = lakes_gdf.merge(latest_data[['lake_name', 'flood_prob']], left_on='name', right_on='lake_name')

    m = folium.Map(location=[12.9716, 77.5946], zoom_start=11, tiles='CartoDB positron')
    
    for _, row in map_gdf.iterrows():
        prob = row['flood_prob']
        color = 'red' if prob > 0.7 else 'orange' if prob > 0.4 else 'green'
        
        sim_geo = gpd.GeoSeries(row['geometry']).simplify(tolerance=0.001)
        geo_j = sim_geo.to_json()
        geo_j = folium.GeoJson(data=geo_j, style_function=lambda x, color=color: {
            'fillColor': color, 'color': 'black', 'weight': 1, 'fillOpacity': 0.7
        })
        folium.Popup(f"{row['name']}: {prob:.2f} probability").add_to(geo_j)
        geo_j.add_to(m)
    
    return m

# Generate and display the map
print('Generating Predictive Flood Probability Map...')
flood_map = plot_flood_probability_map(lakes_gdf, model, features, results_df)
flood_map


## 10. Technical and Reproducibility Requirements

### Data Sources:
- **Lakes & Boundaries**: OpenStreetMap (via OSMNX).
- **Satellite Imagery**: Sentinel-2 (Level-2A SR) and Landsat 8 (Collection 2 Level 2).
- **Rainfall**: CHIRPS Pentad (UCSB-CHG/CHIRPS/PENTAD).
- **Green Cover**: NDVI derived from Sentinel-2/Landsat 8.
- **Built-up Area**: Dynamic World (GOOGLE/DYNAMICWORLD/V1) or GHSL.
- **Night Lights**: VIIRS (NOAA/VIIRS/DNB/MONTHLY_V1/VCMSLCFG).

### Assumptions:
- MNDWI threshold of 0 is used for water body delineation.
- Cloud cover threshold of 20% is used for satellite imagery filtering.
- Annual median composites are used to represent each year.
- Flooding occurrence in this demonstration is proxied by high rainfall and lake area reduction, but real historical flood records should be used for production models.