<strong>Project Title</strong>
AI for Predicting Glacial Lake Outburst Floods

<strong>Project Statement:</strong>
Accelerating glacier melt caused by rising temperatures is increasing the size of glacial lakes behind unstable natural dams. This change raises the chances of sudden, high-volume outburst floods. These floods can release millions of cubic meters of water without warning, putting downstream villages, infrastructure, and ecosystems at risk. Traditional monitoring methods, which often depend on infrequent field surveys and low-resolution images, do not offer the multi-day advance notice required for timely evacuations and disaster response.

<strong>Project Description:</strong>
This dataset is a detailed account of each lake and glacier moment: when and where it happened (date, latitude/longitude, elevation), how the lake looks (area, volume, dam type/geometry), what the mountains feel like (slope, relief, drainage basin), how the glacier is changing (area, retreat, ice velocity), and what the sky and ground are doing (temperature anomaly, PDD, precipitation, seismic counts/magnitudes, distance to faults). These features provide early warnings by connecting recent climate changes to melt-driven inflows, seismic activity to dam weakening, and the land's shape to outburst pathways. Past events and the time since the last GLOF add context to the story. In practice, lake metrics indicate potential volume release, climate time series influence short-term risk, and seismic signals identify instability over the next week.

Importing Libraries

In [2]:
import numpy as np
import pandas as pd
from datetime import datetime
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler, LabelEncoder
from sklearn.ensemble import RandomForestClassifier, GradientBoostingClassifier
from sklearn.metrics import classification_report, confusion_matrix, accuracy_score
import joblib

Loading Data

In [14]:
df_raw = pd.read_csv('synthetic_glof_dataset.csv')

To display first 5 rows

In [4]:
print (df_raw.head())

   Event_ID  Year  ...  Glacier_to_Lake_Ratio  Elevation_Band
0         1  2018  ...             150.926836        Very_Low
1         2  2004  ...               8.232311        Very_Low
2         3  1997  ...               8.301270          Medium
3         4  2010  ...              13.199977            High
4         5  2008  ...              31.408399             Low

[5 rows x 32 columns]


Removing Duplicate Values

In [15]:
df_raw.info()
print (df_raw.describe())
print (df_raw.isnull().sum())

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1000 entries, 0 to 999
Data columns (total 32 columns):
 #   Column                           Non-Null Count  Dtype  
---  ------                           --------------  -----  
 0   Event_ID                         1000 non-null   int64  
 1   Year                             1000 non-null   int64  
 2   Month                            1000 non-null   int64  
 3   Day                              1000 non-null   int64  
 4   Latitude                         1000 non-null   float64
 5   Longitude                        1000 non-null   float64
 6   Elevation_m                      1000 non-null   float64
 7   Lake_Area_km2                    1000 non-null   float64
 8   Lake_Volume_MCM                  1000 non-null   float64
 9   Dam_Height_m                     1000 non-null   float64
 10  Dam_Width_m                      1000 non-null   float64
 11  Lake_Type                        1000 non-null   object 
 12  Temperature_Anomaly_C