<a href="https://colab.research.google.com/github/Amzilynn/Geoepidemiology-Profiling/blob/main/Geoepidemiology_Profiling.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Enviroment Based Disease Profiling

**Core Objective**
Discover universal environmental risk profiles from satellite data without using any disease information.


1.    **Step 1: Data Collection & Feature Extraction**

Input: Raw satellite observations (temperature, rainfall, water, vegetation, air quality, land use)

Process: Transform pixels into disease-relevant features:

*  Sea Surface Temperature (SST)

*  Chlorophyll concentration (water quality)

*  Flooded area percentage

*  Aerosol density (air pollution)

*  NDVI (vegetation health)

*  Urban heat intensity

Output: Environmental feature vectors per location and time

2.   **Step 2: Environmental Clustering (The Core Innovation)**

Method: Unsupervised clustering (K-Means/DBSCAN/HDBSCAN)

Key: Uses ONLY environmental features (no disease data!)

Result: Discovers natural groupings of environmental conditions:

*  Profile 1: Warm + Flooded + Nutrient-rich water

*  Profile 2: Hot + Dry + Low vegetation

*  Profile 3: Urban + Polluted air + Heat island

*  Profile 4: Standing water + Vegetation + Heat

*  Profile 5: Cold + Humid + Forested


3.  **Step 3: Profile Interpretation
Mapping**:

Each profile gets linked to disease groups based on biological plausibility:

*  Warm+Flooded → Waterborne diseases

*  Standing water+Heat → Vector-borne diseases

*  Urban+Polluted → Respiratory diseases

*  Dry+Crop stress → Nutritional diseases

*  Deforestation+Wildlife → Zoonotic diseases










## Data Collection

1. **Define the 5 Environmental Axes**

These are your variables for clustering (profiles)

*   Temperature (land/sea surface) → MODIS/Landsat
*   Rainfall/Precipitation → GPM/IMERG
*   Vegetation/NDVI → Sentinel-2 or MODIS
*   Water/Surface Water → Sentinel-1 (SAR) or Landsat
*   Air Quality / Aerosols / Pollution → Sentinel-5P










| Environmental Axis         | Example Variables                                     | Satellite / Data Source                                                                                    | Resolution / Frequency                                 | Processing Notes                                                                                                                              |
| -------------------------- | ----------------------------------------------------- | ---------------------------------------------------------------------------------------------------------- | ------------------------------------------------------ | --------------------------------------------------------------------------------------------------------------------------------------------- |
| **Climate**                | Temperature, Rainfall, Humidity                       | MODIS (LST for land surface temperature), GPM / IMERG (precipitation), ERA5 (climate reanalysis)           | MODIS: 1km daily, GPM: 0.1° hourly, ERA5: 0.25° hourly | Aggregate to your regions of interest; compute monthly or seasonal averages; normalize across regions                                         |
| **Water**                  | Flooding, Standing water, River levels, Soil moisture | Sentinel-1 SAR (flood detection), Landsat-8 (surface water), SMAP (soil moisture), Global Flood Map (NASA) | Sentinel-1: 10–30m, Landsat: 30m, SMAP: 9km daily      | Detect water presence vs. dry land; compute % area flooded; track seasonal changes                                                            |
| **Air**                    | Pollution, Aerosols, Dust, Smoke                      | Sentinel-5P (NO₂, SO₂, CO, aerosols), MODIS / VIIRS (fire detection, aerosol optical depth)                | 7km–10km daily                                         | Extract average air quality per region; identify spikes / anomalies; map smoke plumes or dust events                                          |
| **Land**                   | Vegetation, Land cover, Urbanization, Deforestation   | Sentinel-2 (NDVI, land cover), MODIS (vegetation indices), Copernicus Global Land Cover                    | 10–30m (Sentinel-2), 250–500m (MODIS)                  | Compute NDVI, land type fractions, urban density; detect seasonal changes                                                                     |
| **Human / Socio-economic** | Population density, Sanitation, Mobility              | WorldPop, LandScan, Facebook Data for Good, OpenStreetMap                                                  | 100m–1km                                               | Extract population density per region; combine with sanitation index / infrastructure data; optionally include mobility patterns if available |


2. **Choosing the Regions**

We will pick regions that represent the 5 env Axes:

| Region                           | Why It Fits                                             | Covered Axes / Features                                                                                                                                               |
| -------------------------------- | ------------------------------------------------------- | --------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| **Bangladesh (Ganges Delta)**    | Coastal, tropical, flood-prone, high population density | Climate: monsoon rains<br>Water: frequent flooding, rivers<br>Human: dense population, poor sanitation<br>Land: agriculture, low elevation<br>Air: moderate pollution |
| **Sahara / Sahel (West Africa)** | Dry, desert, dust storms                                | Climate: arid, high temperature<br>Air: dust storms<br>Land: sparse vegetation, soil exposure<br>Human: moderate population<br>Water: very limited                    |
| **Amazon Basin (Brazil/Peru)**   | Tropical rainforest, rivers                             | Climate: wet, high humidity<br>Water: rivers, floodplains<br>Land: dense forest<br>Human: low/moderate population<br>Air: mostly clean (natural aerosols)             |
| **Beijing / Northern China**     | Urban, air pollution hotspot                            | Climate: temperate<br>Air: high NO₂, PM2.5<br>Land: urbanization, built-up areas<br>Human: high population density<br>Water: limited rivers/flooding locally          |
| **California / Western US**      | Wildfires, mixed climate                                | Climate: Mediterranean<br>Air: wildfire smoke<br>Land: forests, urban-forest interface<br>Water: occasional drought<br>Human: high population, moderate sanitation    |

