<a href="https://colab.research.google.com/github/Amzilynn/Geoepidemiology-Profiling/blob/main/Geoepidemiology_Profiling.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Enviroment Based Disease Profiling

**Core Objective**
Discover universal environmental risk profiles from satellite data without using any disease information.


1.    **Step 1: Data Collection & Feature Extraction**

Input: Raw satellite observations (temperature, rainfall, water, vegetation, air quality, land use)

Process: Transform pixels into disease-relevant features:

*  Sea Surface Temperature (SST)

*  Chlorophyll concentration (water quality)

*  Flooded area percentage

*  Aerosol density (air pollution)

*  NDVI (vegetation health)

*  Urban heat intensity

Output: Environmental feature vectors per location and time

2.   **Step 2: Environmental Clustering (The Core Innovation)**

Method: Unsupervised clustering (K-Means/DBSCAN/HDBSCAN)

Key: Uses ONLY environmental features (no disease data!)

Result: Discovers natural groupings of environmental conditions:

*  Profile 1: Warm + Flooded + Nutrient-rich water

*  Profile 2: Hot + Dry + Low vegetation

*  Profile 3: Urban + Polluted air + Heat island

*  Profile 4: Standing water + Vegetation + Heat

*  Profile 5: Cold + Humid + Forested


3.  **Step 3: Profile Interpretation
Mapping**:

Each profile gets linked to disease groups based on biological plausibility:

*  Warm+Flooded → Waterborne diseases

*  Standing water+Heat → Vector-borne diseases

*  Urban+Polluted → Respiratory diseases

*  Dry+Crop stress → Nutritional diseases

*  Deforestation+Wildlife → Zoonotic diseases










Regions (Bangladesh, Sahel, Amazon, Beijing, California)
         
         ↓ (sample multiple points, weekly)

Environmental Samples (Climate, Water, Air, Land, Human)

         ↓ (cluster analysis)

Profiles (5 clusters of similar environmental conditions)

         ↓ (map historical disease data)

Disease Families Assigned to Each Profile

         ↓ (map to locations for visualization & early warning)

Predictions (future environmental conditions → profile → risk)


## Data Collection

1. **Define the 5 Environmental Axes**


We define five satellite-observable environmental axes (climate, water, air, land, human exposure).









| **Environmental Axis**     | **Example Variables**                                 | **Satellite / Data Source**                                                                         | **Resolution / Frequency**                         | **Why This Matters (Evidence)**                                                                                                                                                              |
| -------------------------- | ----------------------------------------------------- | --------------------------------------------------------------------------------------------------- | -------------------------------------------------- | -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| **Climate**                | Temperature, Rainfall, Humidity                       | MODIS LST (land surface temp), GPM / IMERG (precipitation), ERA5 (climate reanalysis)               | MODIS: ~1km daily; GPM: ~10km hourly; ERA5: ~0.25° | Climate variables like rainfall, temperature, humidity influence disease incidence (e.g., cholera, vector-borne) and are used in health models. ([NASA Global Precipitation Measurement][1]) |
| **Water**                  | Flooding, Standing water, Soil moisture, River levels | Sentinel-1 SAR (flood detection), Landsat (water surfaces), SMAP (soil moisture), Global Flood Maps | Sentinel-1: 10–30m; Landsat: 30m; SMAP: ~9km       | Water conditions and flooding are linked to waterborne disease risk (e.g., cholera, diarrheal diseases). ([disasters.nasa.gov][2])                                                           |
| **Air**                    | Pollution, Aerosols, Dust, Smoke                      | Sentinel-5P (NO₂, SO₂, CO, aerosols), MODIS/VIIRS (aerosol optical depth, fire detection)           | ~7–10km daily                                      | Air pollution and dust influence respiratory illnesses. NASA and WHO studies link satellite air data with health impacts. ([disasters.nasa.gov][3])                                          |
| **Land**                   | Vegetation, Land cover, Urbanization                  | Sentinel-2 (NDVI, land cover), MODIS (vegetation indices), Copernicus Land Cover                    | Sentinel-2: 10–30m; MODIS: 250–500m                | Vegetation and land cover affect vector habitats and environmental context for disease. Satellite land indices are used in disease ecology. ([NASA Science][4])                              |
| **Human / Socio-economic** | Population density, Sanitation proxies, Urban extent  | WorldPop, LandScan, OpenStreetMap, VIIRS night lights                                               | ~100m–1km                                          | Human exposure, density, and infrastructure shape disease risk; these data are used as socioeconomic risk predictors in health models. ([Number Analytics][5])                               |

[1]: https://gpm.nasa.gov/applications/health?utm_source=chatgpt.com "Using GPM Data for Development and Public Health | NASA Global Precipitation Measurement Mission"
[2]: https://disasters.nasa.gov/get-involved/training/english/arset-application-earth-observations-assessing-waterborne-disease?utm_source=chatgpt.com "ARSET - The Application of Earth Observations for Assessing Waterborne Disease Risk | NASA Applied Sciences"
[3]: https://disasters.nasa.gov/sites/default/files/2019-09/Health_Air_Quality_2017_Annual_Summary.pdf?utm_source=chatgpt.com "Health & Air Quality: 2017 Annual Summary"
[4]: https://science.nasa.gov/earth/earth-observatory/tracking-disease-by-satellite/?utm_source=chatgpt.com "Of Mosquitoes and Models: Tracking Disease by Satellite - NASA Science"
[5]: https://www.numberanalytics.com/blog/uncovering-environmental-risk-factors?utm_source=chatgpt.com "Uncovering Environmental Risk Factors"


2. **Choosing the Locations**

--> Within the region we can observe variation along multiple axes.

| Region     | Why Chosen                       | Axes Covered                              |
| ---------- | -------------------------------- | ----------------------------------------- |
| Bangladesh | Floods, rivers, dense population | Climate, Water, Human, Land, Air          |
| Sahel      | Arid, dust, desert → savanna     | Climate, Air, Land, Human, Water (sparse) |
| Amazon     | Rainforest, rivers               | Climate, Water, Land, Human               |
| Beijing    | Urban, pollution hotspot         | Climate, Air, Land, Human                 |
| California | Wildfires, mixed climate         | Climate, Air, Land, Water, Human          |




**Specific Regions in each country**

| **Region**                       | **Why It Makes Sense for Profiling (Axes Covered)**                                | **Evidence / Reasoning**                                                                                                                                                                            |
| -------------------------------- | ---------------------------------------------------------------------------------- | --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| **Bangladesh (Ganges Delta)**    | Hot, humid, frequent flooding, dense population → climate, water, human, land, air | Heavy rainfall & flooding shape waterborne disease patterns such as cholera. Satellites have been used to assess environmental risk here. ([PMC][1])                                                |
| **Sahara / Sahel (West Africa)** | Arid, dust events, temperature extremes → climate, air, land, human                | Dust and climate variables show strong seasonal links to meningitis and other respiratory stress conditions. ([NASA Science][2])                                                                    |
| **Amazon Basin (Brazil/Peru)**   | Wet tropical, dense vegetation, standing water → climate, water, land, human       | Tropical forests with rainfall gradients are classic vector habitats; NASA uses similar data to map mosquito risk zones. ([NASA Global Precipitation Measurement][3])                               |
| **Beijing / Northern China**     | Urban pollution hotspot → climate, air, land, human                                | Air quality extremes and urbanization correlate with respiratory disease burdens; satellite NO₂/PM data are widely used in air-health studies. ([Organisation mondiale de la santé][4])             |
| **California / Western US**      | Mixed fire, drought, wildfire smoke → climate, air, land, water                    | Wildfire smoke and drought conditions are linked with respiratory issues; satellite aerosol data are used for forecasting smoke plumes and health impacts. ([Organisation mondiale de la santé][4]) |

[1]: https://pmc.ncbi.nlm.nih.gov/articles/PMC12699975/?utm_source=chatgpt.com "Exploring Climate Links and Clinical Association of the Diarrheal Disease Using Data From an Upsurge in Dhaka, Bangladesh: A Cross‐Sectional Study - PMC"
[2]: https://science.nasa.gov/humans-in-space/why-go-to-space/benefits-back-on-earth/climate-conditions-help-forecast-meningitis-outbreaks/?utm_source=chatgpt.com "Climate conditions help forecast meningitis outbreaks - NASA Science"
[3]: https://gpm.nasa.gov/applications/health?utm_source=chatgpt.com "Using GPM Data for Development and Public Health | NASA Global Precipitation Measurement Mission"
[4]: https://www.who.int/news/item/11-11-2017-worldwide-health-risks-related-to-climate-change-are-on-the-rise?utm_source=chatgpt.com "Worldwide health risks related to climate change are on the rise"


**We will collect 2 years old weekly dataset for all of the features that define differnt Enviromental axies according to the specific region --> ~104 weeks per region.**

**We will use GEE : https://developers.google.com/earth-engine/datasets?hl=fr  to exctract the Data**

In [1]:
!pip install earthengine-api
!pip install earthengine-api




In [2]:
import ee
ee.Authenticate()
ee.Initialize()
