# Production of indicators for the COVID19 Mobility Task Force

In this notebook we produce indicators for the [COVID19 Mobility Task Force](https://github.com/worldbank/covid-mobile-data).

[Flowminder](https://covid19.flowminder.org) indicators are produced to increase the availability of comparable datasets across countries, and have been copied without modification from the [Flowminder COVID-19 github repository](https://github.com/Flowminder/COVID-19) (except for the start and end dates). These have been supplemented by a set of *priority* indicators with data for ingestion into the dashboard in this repository.

In this notebook we produce indicators in the following four steps:

- **Import code**: The code for the aggregation is included in the 'custom_aggregation' and 'flowminder_aggregation' scripts
- **Import data**: 
To set up the data import we need to place the CDR data files into the `data/new/CC/telco/` folder, where we replace `CC` with the country code and `telco` with the company abbreviation. 
We also need to place csv files with the tower-region mapping and distance matrices into the `data/support-data/CC/telco/geofiles` folder, and then modify the `data/support_data/config_file.py` to specify:
    - *geofiles*: the names of the geofiles, 
    - *country_code*: country code and company abbreviation,
    - *telecom_alias*: the path to the `data` folder,
    - *data_paths*: the names to the subfolders in `data/new/CC/telco/` that hold the csv files. Simply change this to `[*]` if you didn't create subfolders and want to load all files.
    - *dates*: set the start and end date of the data you want to produce the indicators for.
    
Find more information about the `config_file.py` settings see the [github page](https://github.com/worldbank/covid-mobile-data/tree/master/cdr-aggregation).
    
- **Run aggregations**: By default, we produce all flowminder and priority indicators. We've included 4 re-tries in case of failure, which we have experienced to help on databricks but is probably irrelevant in other settings. Note that before you can re-run these aggregations, you need to move the csv outputs that have been saved in `data/results/CC/telco/` in previous runs to another folder, else these indicators will be skipped. This prevents you from accidentally overwriting previous results. This way you can also delete the files only for the indicators you want to re-produce, and skip any indicatos you don't want to re-produce.

The outcome of this effort will be used to inform policy making using a [mobility indicator dashboard](https://github.com/worldbank/covid-mobile-data/tree/master/dashboard-dataviz).

# Import code

In [None]:
%load_ext autoreload
%autoreload 2

In [None]:
from modules.DataSource import *

In [None]:
config_file = '../config_uga_draft.py'

In [None]:
exec(open(config_file).read())

In [None]:
ds = DataSource(datasource_configs)
ds.show_config()

In [None]:
from modules.setup import *

# Import data

## Load CDR data

### Process/standardize raw data, save as parquet, and then load it

In [None]:
ds.standardize_csv_files(show=True, str_coords = True)
ds.save_as_parquet()

In [None]:
ds.load_standardized_parquet_file()

## Load geo data

In [None]:
ds.load_geo_csvs()

# Run aggregations

## Priority indicators for admin2

In [None]:
# agg_priority_admin2 = priority_aggregator(result_stub = '/admin2',
#                                datasource = ds,
#                                re_create_vars  = True, # Needed to use sample parquet
#                                regions = 'admin2_tower_map')

# # agg_priority_admin2.attempt_aggregation(indicators_to_produce = {
# #   'transactions_per_hour' : ['transactions', 'hour'],
# #   'origin_destination_connection_matrix_per_day' : ['origin_destination_connection_matrix', 'day'],
# #   'origin_destination_matrix_time_per_day' : ['origin_destination_matrix_time', 'day']})

# agg_priority_admin2.attempt_aggregation(indicators_to_produce = {
#   'transactions_per_hour' : ['transactions', 'hour'] 
#     })

In [None]:
# agg_priority_admin2

## Priority indicators for admin3

In [None]:
!jupyter nbconvert --to script *.ipynb

In [None]:
test = pd.DataFrame([[np.nan,1, 2],[0,1,2]])

In [None]:
test = ds.spark.createDataFrame([[None,1, 1,2],[2,2,2,2]])

In [None]:
test.toPandas()

In [None]:
test.groupby('_4').sum().toPandas()

In [None]:
test.withColumn('f', F.col('_1') + F.col('_2')).toPandas()