(mobility-activity)=
# Estimating Activity through Mobility Data

Understanding population movement can provide valuable insights for public policy and disaster response efforts, particularly during crises when less movement often correlates with reduced economic activity.

Similar to initiatives such as the [COVID-19 Community Mobility Reports](https://www.google.com/covid19/mobility/), [Facebook Population During Crisis](https://dataforgood.facebook.com/dfg/tools/facebook-population-maps), and [Mapbox Movement Data](https://www.mapbox.com/movement-data), we have developed a range of crisis-relevant indicators. These include baseline and subsequent device densities, as well as metrics like **percent change** and **z-score**. These indicators are derived by aggregating device counts within specific geographical tiles and across various time periods, utilizing longitudinal mobility data.

It's important to note the inherent limitations associated with this approach, as detailed in {ref}`mobility-activity-limitations`. Notably, mobility data is typically collected through convenience sampling methods and lacks the controlled methodology of randomized trials.

In [1]:
from datetime import datetime

import colorcet as cc
import dask.dataframe as dd
import datashader
import geopandas
import holoviews as hv
import pandas as pd
from bokeh.models import (
    HoverTool,
    Legend,
    Span,
    Title,
    Range1d,
)
from bokeh.plotting import figure, output_notebook, show
from holoviews.element.tiles import CartoDark
from holoviews.operation.datashader import rasterize
from sklearn.preprocessing import StandardScaler

In [2]:
import warnings

warnings.filterwarnings("ignore")

In [3]:
from dask.distributed import Client

client = Client(n_workers=4)
client

0,1
Connection method: Cluster object,Cluster type: distributed.LocalCluster
Dashboard: http://127.0.0.1:8787/status,

0,1
Dashboard: http://127.0.0.1:8787/status,Workers: 4
Total threads: 12,Total memory: 64.00 GiB
Status: running,Using processes: True

0,1
Comm: tcp://127.0.0.1:56079,Workers: 4
Dashboard: http://127.0.0.1:8787/status,Total threads: 12
Started: Just now,Total memory: 64.00 GiB

0,1
Comm: tcp://127.0.0.1:56091,Total threads: 3
Dashboard: http://127.0.0.1:56094/status,Memory: 16.00 GiB
Nanny: tcp://127.0.0.1:56082,
Local directory: /var/folders/lb/68gqsbvn171cckt3kdl3d5nr0000gn/T/dask-scratch-space/worker-7vgbsi3y,Local directory: /var/folders/lb/68gqsbvn171cckt3kdl3d5nr0000gn/T/dask-scratch-space/worker-7vgbsi3y

0,1
Comm: tcp://127.0.0.1:56090,Total threads: 3
Dashboard: http://127.0.0.1:56096/status,Memory: 16.00 GiB
Nanny: tcp://127.0.0.1:56083,
Local directory: /var/folders/lb/68gqsbvn171cckt3kdl3d5nr0000gn/T/dask-scratch-space/worker-m7vljh1i,Local directory: /var/folders/lb/68gqsbvn171cckt3kdl3d5nr0000gn/T/dask-scratch-space/worker-m7vljh1i

0,1
Comm: tcp://127.0.0.1:56092,Total threads: 3
Dashboard: http://127.0.0.1:56095/status,Memory: 16.00 GiB
Nanny: tcp://127.0.0.1:56084,
Local directory: /var/folders/lb/68gqsbvn171cckt3kdl3d5nr0000gn/T/dask-scratch-space/worker-s75n8oct,Local directory: /var/folders/lb/68gqsbvn171cckt3kdl3d5nr0000gn/T/dask-scratch-space/worker-s75n8oct

0,1
Comm: tcp://127.0.0.1:56093,Total threads: 3
Dashboard: http://127.0.0.1:56100/status,Memory: 16.00 GiB
Nanny: tcp://127.0.0.1:56085,
Local directory: /var/folders/lb/68gqsbvn171cckt3kdl3d5nr0000gn/T/dask-scratch-space/worker-hruyv1hw,Local directory: /var/folders/lb/68gqsbvn171cckt3kdl3d5nr0000gn/T/dask-scratch-space/worker-hruyv1hw




## Data

In this section, we import from the data sources, available either publicly or via {ref}`data`.

In [4]:
# https://papermill.readthedocs.io/en/latest/usage-parameterize.html
PANEL = "v2023.10"

### Area of Interest 

In this step, we import the clipping boundary and the H3 tessellation defined by **area(s) of interest** below. 

In [5]:
AOI = geopandas.read_file("../../data/final/tessellation/SYRTUR_tessellation.gpkg")

AOI[["geometry", "hex_id", "distance_bin", "distance"]].explore(
    column="distance_bin",
    cmap="seismic_r",
    style_kwds={"stroke": True, "fillOpacity": 0.05},
)

```{figure} ../../docs/images/logo.png
---
height: 0px
---
Visualization of the area of interest centered at the earthquake's epicenter. The distance (in Km) to the epicenter is calculated for each on H3 (resolution 6) tile.  
```

### Mobility Data

The WB Data Lab team has acquired longitudinal human mobility data encompassing anonymized timestamped geographical points generated by GPS-enabled devices located in Türkiye and Syria. This dataset spans the specified timeframe below.

The project team has utilized the longitudinal mobility data to derive several key metrics. Specifically, we compute baseline and subsequent device densities, denoted as `n_baseline` and `count` respectively, along with metrics such as **percent change** (`percent_change` and [**Z-score**](https://en.wikipedia.org/wiki/Standard_score) (`z-score`). These indicators are derived by aggregating the device count within each tile and at each time period. The devices are sourced from the longitudinal mobility data. For further details, please refer to the documentation provided in {ref}`mobility-data` and {ref}`mobility-activity-methodology`.

In [6]:
ddf = dd.read_parquet(
    f"../../data/final/panels/{PANEL}",
    columns=["hex_id", "longitude", "latitude", "datetime", "uid", "month"],
)

In [7]:
ddf.head()

Unnamed: 0,hex_id,longitude,latitude,datetime,uid,month
0,862dae067ffffff,37.095111,36.194343,2022-07-13 03:20:35+03:00,ebb3aaaa336c1d3701244d2557e90b7f69c74ca7ca59c3...,7
1,862dae067ffffff,37.095111,36.194343,2022-07-13 03:21:56+03:00,ebb3aaaa336c1d3701244d2557e90b7f69c74ca7ca59c3...,7
2,862dae067ffffff,37.095067,36.194376,2022-07-13 03:27:13+03:00,ebb3aaaa336c1d3701244d2557e90b7f69c74ca7ca59c3...,7
3,862dae067ffffff,37.095112,36.194338,2022-07-13 03:28:53+03:00,ebb3aaaa336c1d3701244d2557e90b7f69c74ca7ca59c3...,7
4,862dae067ffffff,37.095045,36.194358,2022-07-13 04:12:08+03:00,ebb3aaaa336c1d3701244d2557e90b7f69c74ca7ca59c3...,7


In [8]:
#ddf = ddf[ddf["datetime"] >= pd.Timestamp('2022-07-01 00:00:00+03:00')]

```{note}
Due to the data volume and velocity (updated daily), the **panel**'s computation from the raw mobility data took place on AWS. The resulting named dataset above is available on the project's folder.
```

First, we calculate the cardinality, 

In [9]:
len(ddf)

372967805

Now, we calculate the temporal extent, 

In [10]:
print(
    "From",
    ddf["datetime"].min().compute().strftime("%b %d, %Y"),
    "to",
    ddf["datetime"].max().compute().strftime("%b %d, %Y"),
)

From Jun 28, 2022 to Nov 01, 2023


And visualize the mobility data panel's spatial density.

In [11]:
hv.extension("bokeh")

x, y = datashader.utils.lnglat_to_meters(ddf["longitude"], ddf["latitude"])
points = hv.Points(ddf.assign(x=x, y=y), ["x", "y"])
points = rasterize(points).opts(
    tools=["hover"],
    colorbar=True,
    colorbar_position="bottom",
    cmap=cc.fire,
    cnorm="eq_hist",
)

tiles = CartoDark().opts(
    title="Türkiye-Syria: Mobility Data Spatial Distribution",
    alpha=0.75,
    width=780,
    height=600,
    bgcolor="black",
)

tiles * points

```{figure} ../../docs/images/logo.png
---
height: 0px
---
Visualization of the mobility data panel's spatial distribution. The panel is composed of approximately 200 million points. Source: Veraset Movement. 
```

(mobility-activity-methodology)=

## Methodology

The methodology presented consists of generating a series of crisis-relevant metrics, including the baseline(sample) `population density`, `percent change` and `z-score` based on the number of devices in an area at a time. The device count is determined for each tile and for each time period, as defined by data standards and the spatial and temporal aggregations below. Similar approaches have been adopted, such as in {cite}`10.1145/3292500.3340412`. The metrics may reveal movement trends in the sampled population that may indicate more or less activity. 

### Data Standards

#### Population Sample

The sampled population is composed of GPS-enabled devices drawn out from longituginal mobility data. It is important to emphasize the sampled population is obtained via convenience sampling and that the mobility data panel represents only a subset of the total population in an area at a time, specifically only users that turned on location tracking on their mobile device. Thus, derived metrics do not represent the total population density.

#### Spatial Aggregation 

The indicators are aggregated spatially on [H3 resolution 6](https://h3geo.org) tiles. This is equivalent to approximately to an area of $36 Km^2$ on average as illustrated below.

In [12]:
AOI[
    AOI["hex_id"].isin(
        [
            "862da898fffffff",
            "862da8837ffffff",
            "862da8987ffffff",
            "862da88a7ffffff",
            "862da89afffffff",
            "862da8817ffffff",
            "862da88a7ffffff",
            "862da899fffffff",
            "862da899fffffff",
        ]
    )
].explore(color="blue", style_kwds={"stroke": True, "fillOpacity": 0.1})

```{figure} ../../docs/images/logo.png
---
height: 0px
---
Illustration of H3 (resolution 6) tiles near Gaziantep, Türkiye. Gaziantep is among the most affected areas by the 2023 Türkiye–Syria Earthquake; a 2200-year-old Gaziantep Castle was destroyed after the seismic episodes.
```

#### Temporal Aggregation 

The indicators are aggregated daily on the localized date in the **Europe/Istanbul (UTC+3)** timezone.

### Implementation 

#### Calculate `ACTIVITY`

In this step, we compute `ACTIVITY` as a density metric. Specifically, we tally the total number of devices detected within each designated area of interest, aggregated on a daily basis. It's important to highlight that this calculation is based on a spatial join approach, which determines whether a device has been detected within an area of interest at least once. This method, while straightforward, represents a simplified approach compared to more advanced techniques such as estimating stay locations and visits.

In [13]:
ACTIVITY = (
    ddf.assign(date=lambda x: dd.to_datetime(ddf["datetime"].dt.date))
    .groupby(["hex_id", "date"])["uid"]
    .nunique()
    .to_frame("count")
    .reset_index()
    .compute()
)

Additionally, we create a column `weekday` that will come handy later on when standardizing. 

In [14]:
ACTIVITY["weekday"] = ACTIVITY["date"].dt.weekday

#### Calculate `BASELINE`

In this step, we choose the period spanning July 1, 2022 to December 31, 2022 as the baseline. The baseline is calculated for each tile and for each time period, according to the [spatial](#spatial-aggregation) and [temporal](#temporal-aggregation) aggregations. 

In [15]:
BASELINE = ACTIVITY[ACTIVITY["date"].between("2022-07-01", "2022-12-31")]

In fact, the result are 7 different baselines for each tile. We calculate the mean device density for each tile and for each day of the week (Mon-Sun). 

In [16]:
MEAN = BASELINE.groupby(["hex_id", "weekday"]).agg({"count": ["mean", "std"]})

Taking a sneak peek, 

In [17]:
MEAN.columns = MEAN.columns.map(".".join)

In [18]:
MEAN[MEAN.index.get_level_values("hex_id").isin(["862da898fffffff"])]

Unnamed: 0_level_0,Unnamed: 1_level_0,count.mean,count.std
hex_id,weekday,Unnamed: 2_level_1,Unnamed: 3_level_1
862da898fffffff,0,8867.653846,9441.786543
862da898fffffff,1,8641.5,9341.744035
862da898fffffff,2,8100.192308,8794.041446
862da898fffffff,3,8858.307692,8130.09618
862da898fffffff,4,10231.888889,10199.128712
862da898fffffff,5,10072.407407,10230.396328
862da898fffffff,6,9946.384615,9898.669483


#### Calculate `Z-Score` and **Percent Change**

A z-score serves as a statistical metric indicating the deviation of a specific data point from the mean (average) of a given dataset, expressed in terms of standard deviations. It is particularly valuable for standardizing and facilitating meaningful comparisons across various datasets. By evaluating the z-scores, one can gauge the extent to which a dataset diverges from its mean, while accounting for variance. Conversely, a percent change offers a simpler interpretation but lacks the detailed information provided by z-scores.

Creating `StandardScaler` for each `hex_id`,

In [19]:
scalers = {}

for hex_id in BASELINE["hex_id"].unique():
    scaler = StandardScaler()
    scaler.fit(BASELINE[BASELINE["hex_id"] == hex_id][["count"]])

    scalers[hex_id] = scaler

Joining with the area of interest (`AOI`),

In [20]:
ACTIVITY = ACTIVITY.merge(AOI, how="left", on="hex_id").drop(["geometry"], axis=1)

Finally, merging with the (mean) baseline,

In [21]:
ACTIVITY = pd.merge(ACTIVITY, MEAN, on=["hex_id", "weekday"], how="left")

Calculating the `z_score` for each tile,

In [22]:
for hex_id, scaler in scalers.items():
    try:
        predicate = ACTIVITY["hex_id"] == hex_id
        score = scaler.transform(ACTIVITY[predicate][["count"]])
        ACTIVITY.loc[predicate, "z_score"] = score
    except:
        pass

Additionally, we calculate the **percent change**. While the **z-score** offers more robustness to outliers and numerical stability, the **percent change** can be used when interpretability is most important. Thus, preparing columns, 

In [23]:
ACTIVITY["n_baseline"] = ACTIVITY["count.mean"]
ACTIVITY["n_difference"] = ACTIVITY["count"] - ACTIVITY["n_baseline"]

In [24]:
ACTIVITY["percent_change"] = 100 * (ACTIVITY["count"] / (ACTIVITY["n_baseline"]) - 1)

In [25]:
ACTIVITY.to_csv(f"../../data/interim/SYRTUR_{PANEL}_activity.csv", index=False)

# ACTIVITY = pd.read_csv(f"../../data/interim/SYRTUR_{PANEL}_activity.csv")
# ACTIVITY["date"] = pd.to_datetime(ACTIVITY["date"])

Taking a sneak peek, 

In [26]:
ACTIVITY = ACTIVITY[(ACTIVITY["count"] >= 10) | (ACTIVITY["n_baseline"] >= 10)]
ACTIVITY = ACTIVITY[(ACTIVITY["z_score"] <= 40)]

In [27]:
ACTIVITY[
    [
        "hex_id",
        "date",
        "count",
        "n_baseline",
        "n_difference",
        "percent_change",
        "z_score",
        "ADM0_PCODE",
        "ADM1_PCODE",
        "ADM2_PCODE",
    ]
].sort_values(["hex_id", "date"], ascending=False)

Unnamed: 0,hex_id,date,count,n_baseline,n_difference,percent_change,z_score,ADM0_PCODE,ADM1_PCODE,ADM2_PCODE
288925,862db3bafffffff,2023-10-31,1,13.923077,-12.923077,-92.817680,-3.693753,SY,SY12,SY1200
595347,862db3bafffffff,2023-10-30,1,13.846154,-12.846154,-92.777778,-3.693753,SY,SY12,SY1200
34933,862db3bafffffff,2023-10-29,1,11.884615,-10.884615,-91.585761,-3.693753,SY,SY12,SY1200
411302,862db3bafffffff,2023-10-26,1,14.440000,-13.440000,-93.074792,-3.693753,SY,SY12,SY1200
717533,862db3bafffffff,2023-10-25,1,11.961538,-10.961538,-91.639871,-3.693753,SY,SY12,SY1200
...,...,...,...,...,...,...,...,...,...,...
735650,862c14807ffffff,2022-07-04,19,7.750000,11.250000,145.161290,2.617558,SY,SY08,SY0803
35502,862c14807ffffff,2022-07-03,11,8.272727,2.727273,32.967033,0.908132,SY,SY08,SY0803
691932,862c14807ffffff,2022-07-01,11,9.500000,1.500000,15.789474,0.908132,SY,SY08,SY0803
96611,862c14807ffffff,2022-06-30,13,6.166667,6.833333,110.810811,1.335489,SY,SY08,SY0803


## Findings 


Less movement typically means less economic activity. A potential use of movement "activity" indicators could be to see the their evolution in time and correlatation to other features. We present the results (i.e., `percent_change` and `z_score`) on both governate and selected areas.

In [28]:
COLORS = [
    "#4E79A7",  # Blue
    "#F28E2B",  # Orange
    "#E15759",  # Red
    "#76B7B2",  # Teal
    "#59A14F",  # Green
    "#EDC948",  # Yellow
    "#B07AA1",  # Purple
    "#FF9DA7",  # Pink
    "#9C755F",  # Brown
    "#BAB0AC",  # Gray
    "#7C7C7C",  # Dark gray
    "#6B4C9A",  # Violet
    "#D55E00",  # Orange-red
    "#CC61B0",  # Magenta
    "#0072B2",  # Bright blue
    "#329262",  # Peacock green
    "#9E5B5A",  # Brick red
    "#636363",  # Medium gray
    "#CD9C00",  # Gold
    "#5D69B1",  # Medium blue
]
FREQ = "D"
NAMES = AOI.groupby("ADM1_PCODE")["ADM1_EN"].apply(set)
NAMES = dict([(k, v.pop()) for k, v in NAMES.items()])

#### Percent Change in Activity

##### Percent Change in Activity by Governorate

In this section, we present visualizations of the aggregated `percent_change` for each governorate.

In [29]:
data = ACTIVITY.groupby(["date", "ADM1_PCODE"])["percent_change"].mean().to_frame()
data = data.pivot_table(
    values=["percent_change"], index=["date"], columns=["ADM1_PCODE"]
)
data.columns = [x[1] for x in data.columns]

data = data.groupby(pd.Grouper(freq=FREQ)).mean()

In [30]:
p = figure(
    title="Activity Trends: Percent Change by Governorate",
    width=800,
    height=700,
    x_axis_label="Date",
    x_axis_type="datetime",
    y_axis_label="Percent change (based on device density)",
    tools="pan,wheel_zoom,box_zoom,reset,save,box_select",
)
p.y_range = Range1d(-150, 500, bounds=(-150, None))
p.add_layout(
    Title(
        text="",
        text_font_size="12pt",
        text_font_style="italic",
    ),
    "above",
)
p.add_layout(
    Title(
        text="Percent change in device density for each time window and each first-leveml administrative division",
        text_font_size="12pt",
        text_font_style="italic",
    ),
    "above",
)
p.add_layout(
    Title(
        text=f"Source: Veraset Movement. Creation date: {datetime.today().strftime('%d %B %Y')}. Feedback: datalab@worldbank.org.",
        text_font_size="10pt",
        text_font_style="italic",
    ),
    "below",
)
p.add_layout(Legend(), "right")
p.renderers.extend(
    [
        Span(
            location=datetime(2023, 2, 6),
            dimension="height",
            line_color="grey",
            line_width=2,
            line_dash=(4, 4),
        ),
    ]
)
p.add_tools(
    HoverTool(
        tooltips="Date: @x{%F}, Percent Change: @y{00.0}%",
        formatters={"@x": "datetime"},
    )
)
renderers = []
for column, color in zip(data.columns, COLORS):
    try:
        r = p.line(
            data.index,
            data[column],
            legend_label=NAMES.get(column),
            line_color=color,
            line_width=2,
        )
        r.visible = False
        renderers.append(r)
    except:
        pass

renderers[0].visible = True

p.legend.location = "bottom_left"
p.legend.click_policy = "hide"
p.title.text_font_size = "16pt"
p.sizing_mode = "scale_both"

In [31]:
output_notebook()
show(p)

##### Percent Change in Activity for Specific Areas

In this section, we present visualizations of the `percent_change` for specific areas, such as **Aleppo, Syria**, among others.

In [32]:
AREAS = ["Aleppo, SY", "Idlib, SY", "Sahinbey, TR", "Sehitkamil, TR"]

In [33]:
dfs = []

for area in AREAS:
    AREA = geopandas.read_file(f"../../data/boundaries/{area}.h3.geojson")

    data = (
        ACTIVITY[ACTIVITY["hex_id"].isin(AREA["hex_id"])]
        .groupby("date")["percent_change"]
        .mean()
        .to_frame(area)
    )

    dfs.append(data)

data = pd.concat(dfs, axis=1)
data

Unnamed: 0_level_0,"Aleppo, SY","Idlib, SY","Sahinbey, TR","Sehitkamil, TR"
date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
2022-06-28,-85.061240,-88.611550,-10.942621,5.185608
2022-06-29,-70.404890,29.500632,107.927575,181.788463
2022-06-30,-66.838456,-10.724082,138.633859,164.734823
2022-07-01,-44.266469,100.041040,93.918238,171.873760
2022-07-02,-6.688375,-13.880445,85.930355,128.654067
...,...,...,...,...
2023-10-28,-91.808989,-87.043350,-73.099934,86.979500
2023-10-29,-92.734049,-94.128882,-87.115446,102.765045
2023-10-30,-93.881265,-90.154318,-82.134036,45.738152
2023-10-31,-93.883027,-95.050423,-86.393535,-28.208391


And we visualize the time series, 

In [34]:
p = figure(
    title="Activity Trends: Percent Change in Activity for Specific Areas",
    width=800,
    height=700,
    x_axis_label="Date",
    x_axis_type="datetime",
    y_axis_label="Percent change (based on device density)",
    tools="pan,wheel_zoom,box_zoom,reset,save,box_select",
)
p.y_range = Range1d(-150, 500, bounds=(-150, None))
p.add_layout(
    Title(
        text="",
        text_font_size="12pt",
        text_font_style="italic",
    ),
    "above",
)
p.add_layout(
    Title(
        text="Normalized device count for each time window in specific areas",
        text_font_size="12pt",
        text_font_style="italic",
    ),
    "above",
)
p.add_layout(
    Title(
        text=f"Source: Veraset Movement. Creation date: {datetime.today().strftime('%d %B %Y')}. Feedback: datalab@worldbank.org.",
        text_font_size="10pt",
        text_font_style="italic",
    ),
    "below",
)
p.add_layout(Legend(), "right")
p.renderers.extend(
    [
        Span(
            location=datetime(2023, 2, 6),
            dimension="height",
            line_color="grey",
            line_width=2,
            line_dash=(4, 4),
        ),
    ]
)
p.add_tools(
    HoverTool(
        tooltips="date: @x{%F}, z-score: @y{00.0}%",
        formatters={"@x": "datetime"},
    )
)
renderers = []
for column, color in zip(AREAS, COLORS):
    try:
        r = p.line(
            data.index,
            data[column],
            legend_label=column,
            line_color=color,
            line_width=2,
        )
        r.muted = True
        renderers.append(r)
    except:
        pass

renderers[0].muted = False

p.legend.location = "bottom_left"
p.legend.click_policy = "mute"
p.title.text_font_size = "16pt"
p.sizing_mode = "scale_both"

In [35]:
show(p)

#### Z-Score 

##### Z-Score(ADM 1)

In this section, we visualize the mean `z_score` aggregated for each first-level administrative division. 

<iframe width="100%" height="500px" src="https://studio.foursquare.com/public/55af1cba-9659-4f10-811b-f7f08dfe2ed8/embed" frameborder="0" allowfullscreen></iframe>


```{figure} ../../docs/images/logo.png
---
height: 0px
---
The map above shows the **z-score** for each H3 tile and each time period. The **z-score** shows the number of standard deviations that the data point diverges from the mean; in other words, whether the change in population for that area is statistically different from the baseline period. [Click to see it on Foursquare Studio](https://studio.foursquare.com/public/55af1cba-9659-4f10-811b-f7f08dfe2ed8)
```

Now, we visualize below the `z_score` indicator presented and aggregated (mean) for each first-level administrative division.

In [36]:
data = ACTIVITY.groupby(["date", "ADM1_PCODE"])["z_score"].mean().to_frame()
data = data.pivot_table(values=["z_score"], index=["date"], columns=["ADM1_PCODE"])
data.columns = [x[1] for x in data.columns]

data = data.groupby(pd.Grouper(freq=FREQ)).mean()

In [37]:
p = figure(
    title="Activity Trends: Z-Score by Governorate",
    width=800,
    height=700,
    x_axis_label="Date",
    x_axis_type="datetime",
    y_axis_label="Z-score (based on device density)",
    tools="pan,wheel_zoom,box_zoom,reset,save,box_select",
)
# p.y_range = Range1d(-5, 5, bounds=(0, None))
p.add_layout(
    Title(
        text="",
        text_font_size="12pt",
        text_font_style="italic",
    ),
    "above",
)
p.add_layout(
    Title(
        text="Normalized device density for each time window and each first-level administrative division",
        text_font_size="12pt",
        text_font_style="italic",
    ),
    "above",
)
p.add_layout(
    Title(
        text=f"Source: Veraset Movement. Creation date: {datetime.today().strftime('%d %B %Y')}. Feedback: datalab@worldbank.org.",
        text_font_size="10pt",
        text_font_style="italic",
    ),
    "below",
)
p.add_layout(Legend(), "right")
p.renderers.extend(
    [
        Span(
            location=datetime(2023, 2, 6),
            dimension="height",
            line_color="grey",
            line_width=2,
            line_dash=(4, 4),
        ),
    ]
)
p.add_tools(
    HoverTool(
        tooltips="date: @x{%F}, z-score: @y{00.0}",
        formatters={"@x": "datetime"},
    )
)
renderers = []
for column, color in zip(data.columns, COLORS):
    try:
        r = p.line(
            data.index,
            data[column],
            legend_label=NAMES.get(column),
            line_color=color,
            line_width=2,
        )
        renderers.append(r)
    except:
        pass

p.legend.location = "bottom_left"
p.legend.click_policy = "hide"
p.title.text_font_size = "16pt"
p.sizing_mode = "scale_both"

show(p)

(mobility-activity-limitations)=

## Limitations

The methodology presented is an exploratory analysis pilot aiming to shed light on the economic situation in Syria and Türkiye leveraging alternative data, especially when we are confronted with the absence of traditional data and methods. Mobility data, like any other type of data, comes with limitations and underlying assumptions that should be considered when interpreting and using the data. 

```{caution}
Here are some common limitations and assumptions associated with mobility data:

**Limitations:**

 - **Sampling Bias:** Mobility data is primarily collected through convenience sampling and lacks the controlled methodology of randomized trials. 
 
 - **Selection Bias:** Users who opt to share their mobility data may not be representative of the entire population, potentially introducing selection bias.

 - **Privacy Concerns:** The collection of mobility data may raise privacy issues, as it can sometimes be linked to individuals, potentially violating their privacy.

  - **Data Quality:** Data quality can vary, and errors, inaccuracies, or missing data points may be present, which can affect the reliability of analyses.
  
 - **Temporal and Spatial Resolution:** Mobility data may not capture all movements or may lack fine-grained temporal or spatial resolution, limiting its utility for some applications.
 
 - **Lack of Contextual Information:** Mobility data primarily captures movement patterns and geolocation information. It may lack other crucial contextual information, such as transactional data, business types, or specific economic activities, which are essential for accurate estimation of economic activity.
 
 - **Private Intent Data:** The methodology relies on private intent data. In other words, the input data, i.e. the mobility data, was not produced or collected to analyze the population of interest or address the research question as its primary objective but it was repurposed for the public good. The benefits and caveats when using private intent data have been discussed extensively in the [World Development Report 2021](https://wdr2021.worldbank.org) {cite}`WorldBank2021WorldDevelopmentReport`.
 
**Assumptions:**

 - **Homogeneity:** Mobility data often assumes that the mobility patterns of individuals or groups are relatively consistent over time and space, which may not always be the case.

 - **Consistency in Data Sources:** Mobility data may assume consistency in data sources and methodologies across different regions or datasets, which may not always hold true.

 - **User Behavior:** Assumptions about user behavior, such as the purpose of travel or preferred routes, are often made when interpreting mobility data.

 - **Implicit Data Interpretation:** Interpretation of mobility data often assumes that certain behaviors or patterns observed in the data have a specific meaning, which may not always be accurate without additional context.

 - **App Usage as a Proxy:** In some cases, the use of specific apps or devices may be used as a proxy for mobility data, assuming that it accurately represents individual movements.
```

It's important to be aware of these limitations and assumptions when working with mobility data and to consider their potential impact on the conclusions drawn from the data. Additionally, researchers and analysts should explore ways to address these limitations and validate assumptions when conducting mobility data analyses.

```{seealso}
For further discussion on limitations and assumptions, please check out the [Development Data Partnership Documentation on Mobility Data](https://docs.datapartnership.org/collections/mobility/README.md).
```

## References

```{bibliography}
:filter: docname in docnames
```
