<img src="https://i-guide.io/wp-content/themes/iguide-v2/assets/images/logo-color.png"
     alt="I-GUIDE Logo" width="200" align="right">

# **Flood and Drought Risk Analysis Perception (RAP) Framework for India**


---

## **Project Team**

**Team Leader**  
**Jennifer Marlon, Ph.D.**  
School of the Environment, Yale University, New Haven, CT, USA

**Team Members** *(arranged in alphabetical order by last name)*  
**Okikiola Michael Alegbeleye**  
School of the Environment, Washington State University, Pullman, WA, USA

**Deepika Pingali**  
Department of Agricultural and Consumer Economics, University of Illinois Urbana–Champaign, Champaign, IL, USA

**Emine Senkardesler**  
Department of Informatics (Spatial Informatics program), University of Illinois Urbana–Champaign, Champaign, IL, USA

**Pratyush Tripathy**  
Department of Geography, University of California, Santa Barbara, CA, USA

**Surabhi Upadhyay**  
Department of Hydrologic Science and Engineering, Colorado School of Mines, Golden, CO, USA

---

## **Context**
Developed for the [_I-GUIDE Summer School on Spatial AI for Extreme Events and Disaster Resilience_](https://i-guide.io/summer-school/summer-school-2025/) (August 4–8, 2025), this notebook serves as the entry point to the RAP project documentation and provides links to the supporting analysis notebooks.

---

## **Overview**
The Risk Analysis Perception (RAP) Framework assesses perceptions of flood and drought risk across India using survey outcomes combined with geospatial and media features. The documentation consolidates methodology, datasets, modeling choices, and visualization outputs. The linked notebooks enable transparent navigation through data processing, model development, and results.

**Objectives**
- Quantify spatial variation in perceived flood and drought risk across Indian districts  
- Evaluate how individual demographics and district-level environmental and media signals relate to perception  
- Compare baseline and enhanced models that integrate remote sensing and learned embeddings

**Study Area and Unit of Analysis**
- National coverage for India  
- Individual survey responses linked to district identifiers for spatial aggregation and modeling

---

## **Data and Variables**

### **Dependent Variable**
- **Survey-based perception outcomes**  
  - Example outcomes: perceived flood risk, perceived drought risk  
  - Aggregated and modeled at individual level with district context

### **Covariates**

**Individual-level**
- Age  
- Gender  
- Education  
- Caste

**District-level**
- **News media**: web-scraped counts of flood-related mentions  
- **Nighttime lights**: VIIRS radiance as a proxy for human activity and development  
- **Learned representations**: Alpha Earth satellite embeddings to capture latent environmental and built-environment signals  
- **Climate reanalysis (ERA5)**  
  - Monthly precipitation  
  - Monthly temperature

*Notes to editors*: finalize variable definitions, temporal windows, and any transformations or standardization applied before modeling.

---

## **Modeling Scenarios**

| Model  | Demographic Covariates<br>(Age, Gender,<br>Education, Caste) | District-Level Features<br>(VIIRS Radiance, ERA5 Precipitation<br>& Temperature, News Mentions) | Alpha Earth Embeddings |
|:------:|:------------------------------------------------------------:|:------------------------------------------------------------------------------------------------:|:----------------------:|
| **1**  | ✅                                                           | ❌                                                                                               | ❌                     |
| **2**  | ✅                                                           | ✅                                                                                               | ❌                     |
| **3**  | ✅                                                           | ❌                                                                                               | ✅                     |
| **4**  | ✅                                                           | ✅                                                                                               | ✅                     |


*Evaluation*: We perform a 5 fold cross-validation during model training and 80-20 train test split for after model training validation. The 80-20 split is done by randmly dividing the country in 250 km square grids and randomly selecting 20% of these 250 km grids to separate validation districts. We do this to account for autocorrelation that may exist in the model.

---

## **Workflow and Linked Notebooks**

- **Data Processing**  
  [01_Data.ipynb](./01_Data.ipynb)  
  Steps: data ingestion, cleaning, joins between survey and district features, feature engineering, QA checks

- **Model Tuning and Training**  
  [02_Model.ipynb](./02_Model.ipynb)  
  Steps: scenario definitions, hyperparameter search, cross-validation strategy, diagnostics

- **Visualization and Results**  
  [03_Plots.ipynb](./03_Plots.ipynb)  
  Steps: maps and plots of outcomes and predictors, model comparison charts, error analysis

---

## **Methods Summary**
- **Preprocessing**: missing data handling, categorical encoding, scaling as needed  
- **Spatial linkage**: survey responses associated with district codes; district-level covariates aligned by consistent spatial boundaries and time windows  
- **Modeling**: compare generalized linear or tree-based models with and without embeddings; record seeds and configs for reproducibility  
- **Validation**: report effect sizes or feature importances with uncertainty; inspect residuals for spatial autocorrelation

---

## **Reproducibility**
- Environment file: `environment.yml` with pinned versions  
- Random seeds stored in notebook parameters or config files  
- Outputs written to versioned folders: `outputs/{date}/...`  
- Optional: parameterize runs with Papermill for batch execution

---

## **Ethics and Limitations**
- Survey privacy and de-identification  
- Potential media bias and unequal reporting across districts  
- Spatial and temporal mismatches between survey dates and covariate periods  
- Embeddings are powerful but may encode confounders; interpretability checks recommended

---

## **Citation**
If you use or adapt this framework, please cite:  
> Marlon, J., Alegbeleye, O. M., Pingali, D., Senkardesler, E., Tripathy, P., and Upadhyay, S. (2025). *Flood and Drought Risk Analysis Perception (RAP) Framework for India*.

---

## **Contact**
For questions about the project or data access, please contact the Team Leader or repository maintainer.
