# Quick start

In [1]:
# Might also need to install
# %pip install jupyter_black
# %pip install nbformat>=4.2.0

%load_ext jupyter_black
%load_ext autoreload
%autoreload 2

In [None]:
import pandas as pd
import ocha_stratus as stratus  # need v 0.1.5
from dotenv import load_dotenv
import json
import plotly.express as px
from sqlalchemy import text

from utils.data_utils import get_current_quantiles

load_dotenv()

ADM_LEVEL = 1
engine = stratus.get_engine("dev")

## Looking at the data

The app displays near real time and historical flood exposure data per admin boundary in selected countries across Africa. This data is updated daily in the `app.floodscan_exposure` table on Postgres and is the source that you'll want to use when coming up with a new visualization.

- `iso3`: Country ISO3 code
- `adm_level`: Admin level (useful for filtering, since we only display one admin level at once)
- `valid_date`: The date that the flood exposure value applies to
- `pcode`: The pcode of the admin unit. Use this as a join field to the geospatial data for any mapping
- `sum`: The sum of flood exposed people in that admin unit for that day. To smooth out noise, we display a rolling sum across N number of days. N defaults to 7 or can otherwise be set to a `ROLL_WINDOW` environment variable. 

In [None]:
with engine.connect() as conn:
    df = pd.read_sql(
        """
        SELECT * FROM app.floodscan_exposure 
        LIMIT 100
        """,
        con=conn,
    )
df

Unnamed: 0,iso3,adm_level,valid_date,pcode,sum
0,SOM,1,2016-04-13,SO23,0.0
1,SOM,1,2016-04-13,SO24,8068.0
2,SOM,1,2016-04-13,SO25,0.0
3,SOM,1,2016-04-13,SO26,481.0
4,SOM,1,2016-04-13,SO27,1123.0
...,...,...,...,...,...
95,SOM,1,2016-04-18,SO28,20.0
96,SOM,1,2016-04-19,SO11,0.0
97,SOM,1,2016-04-19,SO12,12496.0
98,SOM,1,2016-04-19,SO13,536.0


## Reproducing current visualization

In [None]:
# These are preprocessed boundaries that take the CODAB shapefiles and simplify and convert to
# geoJSON format so that they're more suitable for web visualization.
with open(f"assets/geo/adm{ADM_LEVEL}.json", "r") as file:
    data = json.load(file)

# Gets latest data on the quantile assignment of the latest flood exposure values (factoring in a rolling window)
# This is updated in a separate pipeline here:
# https://github.com/OCHA-DAP/ds-floodexposure-monitoring/blob/main/pipelines/update_exposure_quantile.py
df_quantile = get_current_quantiles(ADM_LEVEL)

# This plotting code is different than what's in the app, but it doesn't really matter
fig = px.choropleth_map(
    df_quantile,
    geojson=data,
    locations="pcode",
    color="quantile",
    featureidkey="properties.pcode",
    color_continuous_scale=[
        "#fafafa",
        "#e0e0e0",
        "#b8b8b8",
        "#f7a29c",
        "#da5a51",
    ],
    zoom=2,
)
fig

## Some other examples

In [None]:
# Let's say we try plotting just the latest flood exposure value per admin region
with engine.connect() as conn:
    df_latest = pd.read_sql(
        text(
            """
        SELECT * FROM app.floodscan_exposure 
        WHERE valid_date = (SELECT MAX(valid_date) FROM app.floodscan_exposure)
        AND adm_level = :adm_level
        """
        ),
        con=conn,
        params={"adm_level": ADM_LEVEL},
    )

This isn't a great solution for a number of reasons. Firstly, we can see that the distribution of values is very sensitive to outliers. Most values are close to 0, but can in some cases extend to upwards of 150k people. This also biases the visualization towards larger admin regions which contain more people (those that just cover more area, not more densely populated). This can also be operationally less useful because there are many places where a certain degree of flooding is normal and expected. This visualization approach doesn't give us the distinction between places that are flooded to a normal level, vs places that are flooded to a significantly above normal level. 

In [18]:
fig = px.choropleth_map(
    df_latest,
    geojson=data,
    locations="pcode",
    color="sum",
    featureidkey="properties.pcode",
    zoom=2,
)
fig