# Merging Facebook movement range maps, Google mobility data, and Oxford policy data for Nepal

#### _Work done by Nepal Poverty Team, The World Bank_

## Data Sources:
1. [Google Community Mobility Reports](https://www.google.com/covid19/mobility/)
2. [The Oxford COVID-19 Government Response Tracker](https://www.bsg.ox.ac.uk/research/research-projects/coronavirus-government-response-tracker) 
3. [Facebook Movement Range Maps](https://data.humdata.org/dataset/movement-range-maps) 

We have used Python 3 and produced the Python 3 Jupyter notebook showing data cleaning and merging.

## Setup

Running of this notebook requires Jupyter software system. Either Jupyter notebook or Jupyter lab can be installed on the system. In addition, two additional Python packages -- pycountry and pandas -- are required.

### Jupyter Software Installation
https://jupyter.org/install

### pandas Package Installation
https://pandas.pydata.org/pandas-docs/stable/getting_started/install.html

### GeoPandas Package Installation
https://pypi.org/project/geopandas/

After all the dependencies are installed the notebook can be imported to the Jupyter software and run.

## Imports

In [187]:
import time
import zipfile
import pandas as pd

import geopandas as gp

from rasterstats import zonal_stats, gen_zonal_stats

## Fetch data, some from web URLs and some from downloaded local files

In [172]:
# Tab delimited Facebook data, downloaded from the URL (https://data.humdata.org/dataset/movement-range-maps)
fb_data = pd.read_csv('movement-range-2020-07-10.csv', sep='\t')

# Google mobility data, fetching from the web URL
google_url = "https://www.gstatic.com/covid19/mobility/Global_Mobility_Report.csv"
google_data = pd.read_csv(google_url)
print("Google mobility data fetched.")

# Oxford policy data, fetching from the web URL
oxford_url = 'https://raw.githubusercontent.com/OxCGRT/covid-policy-tracker/master/data/OxCGRT_latest.csv'
oxford_data = pd.read_csv(oxford_url)
print("OxCGRT data fetched.")

  interactivity=interactivity, compiler=compiler, result=result)


Google mobility data fetched.
OxCGRT data fetched.


## Cleaning of the data and zone names assertion

In [173]:
# filter the data to Nepal for all the data sources
google_data = google_data[google_data['country_region_code'] == 'NP']
oxford_data = oxford_data[oxford_data['CountryCode'] == 'NPL']
fb_data = fb_data[fb_data['country'] == 'NPL']

# bring the date to uniform format
oxford_data['Date'] = oxford_data['Date'].apply(lambda x: str(x)).apply(lambda x: x[:4] + '-' + x[4:6] + '-' + x[6:])

# the zone spelling for Dhaulagiri is wrong, so correcting it
fb_data['polygon_name'] = fb_data['polygon_name'].replace(to_replace='Dhaualagiri', value='Dhaulagiri', regex=True)

# assert if all the zone names are same between fb_data and shapefile
assert set(fb_data['polygon_name'].unique()) - set(zone_popn.keys()) == set()

Let's prepend the column names by `GCMR_`, `FB_` and `OXCGRT_` for Google mobility data, Facebook data and Oxford policy tracker data respectively. This helps us to distinguish the source of the columns.

In [174]:
# GCMR for Google Community Mobility Report
google_data.columns = ['GCMR_' + i for i in google_data.columns]

# FB for Our World in Development
fb_data.columns = ['FB_' + i for i in fb_data.columns]

# OXCGRT for Oxford COVID-19 Government Response Tracker
oxford_data.columns = ['OXCGRT_' + i for i in oxford_data.columns]

## Using rasterstats to calculate the zonal population

In [175]:
# load the zones shapefile
adm_2 = gp.read_file('../GIS Data/Old/admin_2.shp')

# run the zonal stats with "sum" stats
stats = zonal_stats(adm_2, 'population_npl_2018-10-01.tif', all_touched=True, stats=['sum'], geojson_out=True)

# get the zonal population and total population
zone_popn = dict((stat['properties']['ZONE_NAME'], stat['properties']['sum']) for stat in stats)
total_popn = sum(zone_popn.values())

In [178]:
# calculate population weighted variables
# first_var_dict => dictionary consisting population weighted all_day_bing_tiles_visited_relative_change value for each date (key)
# second_var_dict => dictionary consisting population weighted all_day_ratio_single_tile_users value for each date (key)

first_var_dict = dict(fb_data.groupby('FB_ds').apply(lambda x: sum(x['FB_all_day_bing_tiles_visited_relative_change'] * x['FB_polygon_name'].map(zone_popn)) / total_popn))
second_var_dict = dict(fb_data.groupby('FB_ds').apply(lambda x: sum(x['FB_all_day_ratio_single_tile_users'] * x['FB_polygon_name'].map(zone_popn)) / total_popn))

In [179]:
# get only the date, country, and other relevant columns
df = fb_data[['FB_ds', 'FB_country', 'FB_baseline_name', 'FB_baseline_type']].drop_duplicates()

# save the population weighted variables to previous (same) column names
df['FB_all_day_bing_tiles_visited_relative_change'] = df['FB_ds'].map(first_var_dict)
df['FB_all_day_ratio_single_tile_users'] = df['FB_ds'].map(second_var_dict)

## Outer merge Google data, Oxford data and Facebook data on date

Let's create a function <i>get_a_or_b</i> which gets either <i>a</i> or <i>b</i>, depending on which value is non-null. 

In [180]:
def get_a_or_b(row, a, b):
    
    row = row.fillna('')
    
    if row[a]:
        return row[a]
    elif row[b]:
        return row[b]   

In [181]:
merged_df = pd.merge(google_data, oxford_data,  how='outer', left_on=['GCMR_date'], right_on = ['OXCGRT_Date'])

Apply <i>get_a_or_b</i> to dates and save them in the column, `Date`. It helps to get the non-null column, `Date`, which stores dates. Then, delete the column `GCMR_date`.

In [182]:
merged_df['Date'] = merged_df.apply(get_a_or_b, args=('GCMR_date', 'OXCGRT_Date'), axis=1)

merged_df.drop(['GCMR_date'], axis=1, inplace=True)

Repeat the step: merging, getting non-null value for date, saving date values in `Date` column, and dropping the redundant date column, `FB_ds`.

In [183]:
final_merged_df = pd.merge(merged_df, df,  how='outer', left_on=['Date'], right_on = ['FB_ds'])

final_merged_df['Date'] = merged_df.apply(get_a_or_b, args=('Date', 'FB_ds'), axis=1)

final_merged_df.drop(['FB_ds'], axis=1, inplace=True)

## Export the final merged dataframe

In [190]:
final_merged_df.to_csv('Nepal_FB_Google_OXCGRT_{}.csv'.format(int(time.time())), index=False)