# Reading JSON files and aggregated Telia data

In [21]:
import json
import pandas as pd

Defininig a function to read any JSON file into a Python dictionary

In [2]:
def open_json_file(file):
    '''
    Parse a JSON file and return a python dictionary

    Parameters
    ----------
    logger: logging object
        logger of the script
    file : string
        The path to the JSON file

    Returns
    -------
    dict
        The parsed JSON content into a python dictionary
    '''
    python_dic = {}
    with open(file, 'r') as f:
        python_dic = json.load(f)
    print('Parsed JSON file: %s' % (file, ))

    if python_dic is None:
        raise NameError('The specified file was not found!')

    return python_dic

Reading HYKS JSON file as an example

In [5]:
hyks_data = open_json_file('hyks.json')

Parsed JSON file: hyks.json


Checking out the keys of the dictioanry i.e. the data it contains. All the values of the dictionary are lists.

In [7]:
print(hyks_data.keys())

dict_keys(['date', 'cases', 'hospitalized', 'in_icu', 'deaths', 'new_deaths', 'new_cases', 'Rt', 'Rt_lower', 'Rt_upper', 'Rt_lower50', 'Rt_upper50', 'Rt_lower90', 'Rt_upper90', 'new_cases_uks', 'new_cases_uks_lower50', 'new_cases_uks_upper50', 'new_cases_uks_lower90', 'new_cases_uks_upper90', 'new_cases_uks_lower', 'new_cases_uks_upper'])


For convenience, declaring variables for Rt and date

In [9]:
rt_hyks = hyks_data['Rt']
date_hyks = hyks_data['date']

Getting the index were a given `date` is located. There is a one to one correspondence between all the lists in the dicionary. The $i^{th}$ entry of the list `date` corresponds to the $i^{th}$ of the list Rt

In [19]:
date_idx = date_hyks.index('2021-03-01')
print(rt_hyks[date_idx])

1.13738929797133


Opening the aggregated file by ERVA using pandas. I stored the file in CSV format because it is a format that any programming language is able to read.

I think all the fields are self-explanatory. Just note that there can be also trips within the same area.

In [22]:
file_path = '/m/cs/scratch/cv19-telia/poncea2/telia_data_erva.csv'
telia_mobility_df = pd.read_csv(file_path, sep=",")
telia_mobility_df

Unnamed: 0,date,origin_erva,dest_erva,hour,trips_sum
0,2019-02-01,HYKS,HYKS,00-05,206218
1,2019-02-01,HYKS,HYKS,06-11,1723584
2,2019-02-01,HYKS,HYKS,12-17,2140500
3,2019-02-01,HYKS,HYKS,18-23,893553
4,2019-02-01,HYKS,KYS,00-05,1018
...,...,...,...,...,...
13805,2020-03-31,TAYS,TYKS,00-05,22
13806,2020-03-31,TAYS,TYKS,06-11,29
13807,2020-03-31,TAYS,TYKS,12-17,20
13808,2020-03-31,TAYS,TYKS,18-23,5


Example: Gettting rid of the rows that have the same area as origin and destination

In [27]:
telia_mobility_dfalt = telia_mobility_df[telia_mobility_df['origin_erva'] != telia_mobility_df['dest_erva']]
telia_mobility_dfalt

Unnamed: 0,date,origin_erva,dest_erva,hour,trips_sum
4,2019-02-01,HYKS,KYS,00-05,1018
5,2019-02-01,HYKS,KYS,06-11,7747
6,2019-02-01,HYKS,KYS,12-17,15700
7,2019-02-01,HYKS,KYS,18-23,4114
8,2019-02-01,HYKS,OYS,00-05,264
...,...,...,...,...,...
13805,2020-03-31,TAYS,TYKS,00-05,22
13806,2020-03-31,TAYS,TYKS,06-11,29
13807,2020-03-31,TAYS,TYKS,12-17,20
13808,2020-03-31,TAYS,TYKS,18-23,5


Example: Aggregating by date i.e. getting rid of the hour

In [28]:
groupby = ['date', 'origin_erva', 'dest_erva']
telia_mob_day = telia_mobility_dfalt.groupby(by=groupby).sum()
telia_mob_day

Unnamed: 0_level_0,Unnamed: 1_level_0,Unnamed: 2_level_0,trips_sum
date,origin_erva,dest_erva,Unnamed: 3_level_1
2019-02-01,HYKS,KYS,28579
2019-02-01,HYKS,OYS,6591
2019-02-01,HYKS,TAYS,70564
2019-02-01,HYKS,TYKS,32801
2019-02-01,HYKS,Åland,146
...,...,...,...
2020-03-31,HYKS,TYKS,183
2020-03-31,TAYS,HYKS,1486
2020-03-31,TAYS,KYS,8
2020-03-31,TAYS,TYKS,76


In [2]:
import numpy as np

In [3]:
a = np.arange(10)

In [5]:
a[2]

2

In [6]:
np.round(0.5)

0.0