In [1]:
from datetime import datetime

import dask.dataframe as dd
import geopandas
import pandas as pd
from bokeh.models import (
    ColumnDataSource,
    HoverTool,
    Legend,
    Range1d,
    Span,
    TabPanel,
    Tabs,
    Text,
    Title,
)
from bokeh.plotting import figure, output_notebook, show
from sklearn.preprocessing import StandardScaler

In [2]:
from dask.distributed import Client

client = Client()

# Estimating Activity based on Mobility Data

Less movement typically means less economic activity. Understanding where and when population movement occurs can help inform disaster response and public policy, especially during crises. 

Similarly to [COVID-19 Community Mobility Reports](https://www.google.com/covid19/mobility/), [Facebook Population During Crisis](https://dataforgood.facebook.com/dfg/tools/facebook-population-maps) and [Mapbox Movement Data](https://www.mapbox.com/movement-data), we generate a series of crisis-relevant metrics, including the baseline device count (sampled population), **percent change** and **z-score**. The metrics are calculated by counting devices drawn out from a mobility data panel in each tile and at each time period and comparing to a baseline period.

## Data

In [3]:
# https://papermill.readthedocs.io/en/latest/usage-parameterize.html
PANEL = "v2023.5.14"

### Area of Interest 

In this step, we import the clipping boundary and the H3 tessellation defined by **area(s) of interest** below. 

In [4]:
AOI = geopandas.read_file("../../data/interim/tessellation/SYRTUR_tessellation.gpkg")

In [5]:
AOI[["geometry", "distance_bin"]].explore(
    column="distance_bin",
    cmap="seismic_r",
    style_kwds={"stroke": True, "fillOpacity": 0.1},
)

### Mobility Data

Through the [Development Data Partnership](https://datapartnership.org), the project team obtained a longitudinal panel of mobility data from which the metrics are calculated, including the device count **percent change** and **z-score**. The metrics are calculated by aggregating the number of devices within the **area of interest** in each tile and at each time period. For additional information, please see {ref}`mobility-data` and {ref}`mobility-activity-methodology`.

```{note}
Due to the data volume and velocity (updated daily), the computation of the **panel** took place on AWS on an EC2 instance owned by the project team. The resulting aggregation is the tabulation of the device count for each `hex_id` and `date`. 
```

In [6]:
ddf = dd.read_parquet(
    f"../../data/final/panels/{PANEL}", columns=["hex_id", "datetime", "uid", "month"]
)

(mobility-activity-methodology)=

## Methodology

The methodology presented consists of generating a series of crisis-relevant metrics, including the baseline device count (sampled population), `percent change` and `z-score` based on the number of devices in an area at a time. The device count is determined for each tile and for each time period, as defined by data standards and the spatial and temporal aggregations below. Similar approaches have been adopted, such as in {cite}`10.1145/3292500.3340412`. The metrics may reveal movement trends in the sampled population that may indicate more or less activity. 

### Data Standards

#### Population Sample

The sampled population is composed of GPS-enabled devices drawn out from a longituginal mobility data panel. It is important to emphasize the sampled population is obtained via convenience sampling and that the mobility data panel represents only a subset of the total population in an area at a time, specifically only users that turned on location tracking on their mobile device. Thus, derived metrics do not represent the total population density.

#### Spatial Aggregation 

The metrics are spatially aggregated on [H3 tiles resolution 6](https://h3geo.org). This is equivalent to approximately to an area of $36 Km^2$ on average 

In [7]:
AOI[AOI["hex_id"] == "862da898fffffff"].explore(
    color="blue", style_kwds={"stroke": True, "fillOpacity": 0.25}
)

> Illustration of H3 tile resolution 6 near Gaziantep, Türkiye. Gaziantep is among the most affected areas by the 2023 Türkiye–Syria Earthquake; a 2200-year-old Gaziantep Castle was destroyed after the seismic episodes.

#### Temporal Aggregation 

The metrics are temporally aggregated daily in Coordinated Universal Time (UTC).

### Implementation 

#### Calculate `ACTIVITY`

In [8]:
ACTIVITY = (
    ddf.assign(date=lambda x: dd.to_datetime(ddf["datetime"].dt.date))
    .groupby(["hex_id", "date"])["uid"]
    .nunique()
    .to_frame("count")
    .reset_index()
    .compute()
)

Additionally, we create a column `weekday` that will come handy later on.

In [9]:
ACTIVITY["weekday"] = ACTIVITY["date"].dt.weekday

#### Calculate `BASELINE`

For this experiment, we choose the 4-week period spanning January 2, 2023 to January 29, 2023 as the baseline. The baseline is calculated for each tile and for each time period, according to the [spatial](#spatial-aggregation) and [temporal](#temporal-aggregation) aggregations. 

In [10]:
BASELINE = ACTIVITY[ACTIVITY["date"].between("2023-01-02", "2023-01-29")]

In fact, the result 7 different baselines for each tile. We calculate the mean device count for each tile and for each day of the day. 

In [11]:
MEAN = BASELINE.groupby(["hex_id", "weekday"]).agg({"count": ["mean", "std"]})

Taking a sneak peek, 

In [12]:
MEAN.columns = MEAN.columns.map(".".join)

In [13]:
MEAN[MEAN.index.get_level_values("hex_id").isin(["862da898fffffff"])]

Unnamed: 0_level_0,Unnamed: 1_level_0,count.mean,count.std
hex_id,weekday,Unnamed: 2_level_1,Unnamed: 3_level_1
862da898fffffff,0,5819.75,2285.557901
862da898fffffff,1,6675.25,1918.023527
862da898fffffff,2,7020.0,2137.928281
862da898fffffff,3,6586.0,2345.257484
862da898fffffff,4,5671.5,2838.52949
862da898fffffff,5,6300.0,2516.413718
862da898fffffff,6,6891.75,2462.698029


#### Calculate `Z-Score`

A **z-score** is a statistical measure that tells how above or below a particular data point is from the mean (average) of a group of data points, in terms of standard deviations. It is used to standardize data and make meaningful comparisons between different sets of data. A **z-score** is particularly useful when working with normally distributed data. By examining the z-scores, you can assess how closely a data set follows a normal distribution. Percent change does not provide this information.

Creating `StandardScaler` for each `hex_id`,

In [14]:
scalers = {}

for hex_id in BASELINE["hex_id"].unique():
    scaler = StandardScaler()
    scaler.fit(BASELINE[BASELINE["hex_id"] == hex_id][["count"]])

    scalers[hex_id] = scaler

Joining with `AOI`,

In [15]:
ACTIVITY = ACTIVITY.merge(AOI, how="left", on="hex_id").drop(["geometry"], axis=1)

Joining with `BASELINE`,

In [16]:
ACTIVITY = pd.merge(ACTIVITY, MEAN, on=["hex_id", "weekday"], how="left")

Preparing columns, 

In [17]:
ACTIVITY["n_baseline"] = ACTIVITY["count.mean"]
ACTIVITY["n_difference"] = ACTIVITY["count"] - ACTIVITY["n_baseline"]

In [18]:
# ACTIVITY["activity"] = ACTIVITY["log_count"]

Additionally, we calculate the **percent change**. While the **z-score** offers more robustness to outliers and numerical stability, the **percent change** can be used when interpretability is most important. 

In [19]:
ACTIVITY["percent_change"] = 100 * (ACTIVITY["count"] / (ACTIVITY["n_baseline"]) - 1)

Calculating `z_score`, 

In [20]:
for hex_id, scaler in scalers.items():
    try:
        predicate = ACTIVITY["hex_id"] == hex_id
        score = scaler.transform(ACTIVITY[predicate][["count"]])
        ACTIVITY.loc[predicate, "z_score"] = score
    except:
        pass

Taking a sneak peek, 

In [21]:
ACTIVITY[
    [
        "hex_id",
        "date",
        "count",
        "n_baseline",
        "n_difference",
        "percent_change",
        "z_score",
        "ADM0_PCODE",
        "ADM1_PCODE",
        "ADM2_PCODE",
    ]
].sort_values("n_baseline", ascending=False)

Unnamed: 0,hex_id,date,count,n_baseline,n_difference,percent_change,z_score,ADM0_PCODE,ADM1_PCODE,ADM2_PCODE
56578,862da898fffffff,2023-02-08,4689,7020.0,-2331.0,-33.205128,-0.821804,TR,TUR027,TUR027008
180123,862da898fffffff,2023-05-03,9743,7020.0,2723.0,38.789174,1.572825,TR,TUR027,TUR027008
56585,862da898fffffff,2023-02-15,5740,7020.0,-1280.0,-18.233618,-0.323831,TR,TUR027,TUR027008
158273,862da898fffffff,2023-04-26,2159,7020.0,-4861.0,-69.245014,-2.020540,TR,TUR027,TUR027008
106822,862da898fffffff,2023-03-29,2677,7020.0,-4343.0,-61.866097,-1.775107,TR,TUR027,TUR027008
...,...,...,...,...,...,...,...,...,...,...
185942,862dae95fffffff,2023-05-10,1,,,,0.000000,TR,TUR031,TUR031011
185944,862dae95fffffff,2023-05-14,1,,,,0.000000,TR,TUR031,TUR031011
185952,862dae96fffffff,2023-05-09,1,,,,-0.707107,TR,TUR031,TUR031010
185954,862dae96fffffff,2023-05-12,2,,,,1.414214,TR,TUR031,TUR031010


## Findings 

The following map shows the **z-score** on each tile for each time period. The **z-score** shows the number of standard deviations that the data point diverges from the mean; in other words, whether the change in population for that area is statistically different from the baseline period.

<iframe width="100%" height="500px" src="https://studio.foursquare.com/public/55af1cba-9659-4f10-811b-f7f08dfe2ed8/embed" frameborder="0" allowfullscreen></iframe>

```{tip}
[Click to see it on Foursquare Studio](https://studio.foursquare.com/public/55af1cba-9659-4f10-811b-f7f08dfe2ed8)
```

### Movement Activity Trends

An immediate use of the movement activity metrics is to see to they evolve in time and how they may correlate to other features. We present the results on both first-level administrative division (governorate and provinces) and selected areas.

In [22]:
COLORS = [
    "#4E79A7",  # Blue
    "#F28E2B",  # Orange
    "#E15759",  # Red
    "#76B7B2",  # Teal
    "#59A14F",  # Green
    "#EDC948",  # Yellow
    "#B07AA1",  # Purple
    "#FF9DA7",  # Pink
    "#9C755F",  # Brown
    "#BAB0AC",  # Gray
    "#7C7C7C",  # Dark gray
    "#6B4C9A",  # Violet
    "#D55E00",  # Orange-red
    "#CC61B0",  # Magenta
    "#0072B2",  # Bright blue
    "#329262",  # Peacock green
    "#9E5B5A",  # Brick red
    "#636363",  # Medium gray
    "#CD9C00",  # Gold
    "#5D69B1",  # Medium blue
]

#### Percent Change (ADM 1)

In [23]:
data = ACTIVITY.groupby(["date", "ADM1_PCODE"])["percent_change"].mean().to_frame()
data = data.pivot_table(
    values=["percent_change"], index=["date"], columns=["ADM1_PCODE"]
)
data.columns = [x[1] for x in data.columns]

In [24]:
p = figure(
    title="Movement Activity Trends",
    width=800,
    height=700,
    x_axis_label="Date",
    x_axis_type="datetime",
    y_axis_label="Percent Change (based on device count)",
    tools="pan,wheel_zoom,box_zoom,reset,save,box_select",
)
p.y_range = Range1d(-100, 1000, bounds=(0, None))
p.add_layout(
    Title(
        text=f"",
        text_font_size="12pt",
        text_font_style="italic",
    ),
    "above",
)
p.add_layout(
    Title(
        text=f"Percent change in device count aggregated on first-level administrative division for each time window",
        text_font_size="12pt",
        text_font_style="italic",
    ),
    "above",
)
p.add_layout(
    Title(
        text=f"Source: Veraset Movement. Creation date: {datetime.today().strftime('%d %B %Y')}. Feedback: datalab@worldbank.org.",
        text_font_size="10pt",
        text_font_style="italic",
    ),
    "below",
)
p.add_layout(Legend(), "right")
p.renderers.extend(
    [
        Span(
            location=datetime(2023, 2, 6),
            dimension="height",
            line_color="grey",
            line_width=2,
            line_dash=(4, 4),
        ),
    ]
)
p.add_tools(
    HoverTool(
        tooltips="date: @x{%F}, percent change: @y",
        formatters={"@x": "datetime"},
    )
)
renderers = []
for column, color in zip(data.columns, COLORS):
    try:
        r = p.line(
            data.index,
            data[column],
            legend_label=column,
            line_color=color,
            line_width=2,
        )
        renderers.append(r)
    except:
        pass

p.legend.location = "bottom_left"
p.legend.click_policy = "hide"
p.title.text_font_size = "16pt"
p.sizing_mode = "scale_both"

In [25]:
output_notebook()
show(p)

#### Z-Score (ADM 1)

In [26]:
data = ACTIVITY.groupby(["date", "ADM1_PCODE"])["z_score"].mean().to_frame()
data = data.pivot_table(values=["z_score"], index=["date"], columns=["ADM1_PCODE"])
data.columns = [x[1] for x in data.columns]

In [27]:
p = figure(
    title="Movement Activity Trends",
    width=800,
    height=700,
    x_axis_label="Date",
    x_axis_type="datetime",
    y_axis_label="Z-score (based on device count)",
    tools="pan,wheel_zoom,box_zoom,reset,save,box_select",
)
p.y_range = Range1d(-5, 10, bounds=(0, None))
p.add_layout(
    Title(
        text=f"",
        text_font_size="12pt",
        text_font_style="italic",
    ),
    "above",
)
p.add_layout(
    Title(
        text=f"Normalized device count on first-level administrative division for each time window",
        text_font_size="12pt",
        text_font_style="italic",
    ),
    "above",
)
p.add_layout(
    Title(
        text=f"Source: Veraset Movement. Creation date: {datetime.today().strftime('%d %B %Y')}. Feedback: datalab@worldbank.org.",
        text_font_size="10pt",
        text_font_style="italic",
    ),
    "below",
)
p.add_layout(Legend(), "right")
p.renderers.extend(
    [
        Span(
            location=datetime(2023, 2, 6),
            dimension="height",
            line_color="grey",
            line_width=2,
            line_dash=(4, 4),
        ),
    ]
)
p.add_tools(
    HoverTool(
        tooltips="date: @x{%F}, z-score: @y",
        formatters={"@x": "datetime"},
    )
)
renderers = []
for column, color in zip(data.columns, COLORS):
    try:
        r = p.line(
            data.index,
            data[column],
            legend_label=column,
            line_color=color,
            line_width=2,
        )
        renderers.append(r)
    except:
        pass

p.legend.location = "bottom_left"
p.legend.click_policy = "hide"
p.title.text_font_size = "16pt"
p.sizing_mode = "scale_both"

show(p)

#### Z-Score (Areas of Interest)

In [28]:
AREAS = ["Aleppo, SY", "Idlib, SY", "Sahinbey, TR", "Sehitkamil, TR"]

In [29]:
dfs = []

for area in AREAS:
    AREA = geopandas.read_file(f"../../data/boundaries/{area}.h3.geojson")

    data = (
        ACTIVITY[ACTIVITY["hex_id"].isin(AREA["hex_id"])]
        .groupby("date")["z_score"]
        .mean()
        .to_frame(area)
    )

    dfs.append(data)

data = pd.concat(dfs, axis=1)
data

Unnamed: 0_level_0,"Aleppo, SY","Idlib, SY","Sahinbey, TR","Sehitkamil, TR"
date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
2023-01-01,0.331360,0.246419,0.236514,-0.279976
2023-01-02,0.387746,0.637647,0.228704,-0.528527
2023-01-03,0.717044,0.839547,0.050624,0.322380
2023-01-04,0.282071,0.523613,-0.040633,-0.171626
2023-01-05,0.704871,0.227187,-0.449675,-0.578063
...,...,...,...,...
2023-05-11,4.626265,3.437550,-0.287593,0.585875
2023-05-12,3.350759,5.358581,0.311932,0.917654
2023-05-13,4.454403,2.601058,0.357485,0.765970
2023-05-14,3.781766,2.461859,0.459886,0.761811


In [30]:
p = figure(
    title="Movement Activity Trends",
    width=800,
    height=700,
    x_axis_label="Date",
    x_axis_type="datetime",
    y_axis_label="Z-score (based on device count)",
    tools="pan,wheel_zoom,box_zoom,reset,save,box_select",
)
p.y_range = Range1d(-10, 10, bounds=(0, None))
p.add_layout(
    Title(
        text=f"",
        text_font_size="12pt",
        text_font_style="italic",
    ),
    "above",
)
p.add_layout(
    Title(
        text=f"Normalized device count for each time window",
        text_font_size="12pt",
        text_font_style="italic",
    ),
    "above",
)
p.add_layout(
    Title(
        text=f"Source: Veraset Movement. Creation date: {datetime.today().strftime('%d %B %Y')}. Feedback: datalab@worldbank.org.",
        text_font_size="10pt",
        text_font_style="italic",
    ),
    "below",
)
p.add_layout(Legend(), "right")
p.renderers.extend(
    [
        Span(
            location=datetime(2023, 2, 6),
            dimension="height",
            line_color="grey",
            line_width=2,
            line_dash=(4, 4),
        ),
    ]
)
p.add_tools(
    HoverTool(
        tooltips="date: @x{%F}, z-score: @y",
        formatters={"@x": "datetime"},
    )
)
renderers = []
for column, color in zip(AREAS, COLORS):
    try:
        r = p.line(
            data.index,
            data[column],
            legend_label=column,
            line_color=color,
            line_width=2,
        )
        renderers.append(r)
    except:
        pass

p.legend.location = "bottom_left"
p.legend.click_policy = "hide"
p.title.text_font_size = "16pt"
p.sizing_mode = "scale_both"

In [31]:
show(p)

## Limitations

The methodology presented is an investigative pilot aiming to shed light on the economic situation in Syria and Türkiye leveraging alternative data, especially when we are confronted with the absence of traditional data and methods.

```{caution}
In summary, beyond waiting for peer review, the limitations can be summarized in the following.

- The methodology relies on private intent data. In other words, the input data, i.e. the mobility data, was not produced or collected to analyze the population of interest or address the research question as its primary objective but it was repurposed for the public good. The benefits and caveats when using private intent data have been discussed extensively in the [World Development Report 2021](https://wdr2021.worldbank.org) {cite}`WorldBank2021WorldDevelopmentReport`.

- On the one hand, the mobility data panel is spatially and temporally readily available and comprehensive, on the other hand it is generated through convenience sampling which constitutes an important source of bias. The panel composition is not entirely known and it is susceptible to change. In other words, the collection and composition of the mobility data panel cannot be controlled. 

- In summary, the results cannot be interpreted to generalize the entirety of population movement but can potentially provide information on movement panels to inform Syrian economic situation, considering time constraints and the scarcity of traditional data sources in the context of Syria.
```

## References

```{bibliography}
:filter: docname in docnames
```
