## Optional: Retrieval of timestamps from calldata filenames to run analysis in a loop

This notebook can help case studies, where mobile phone records are provided in zipped .txt-files, containing a timestamp in the filename (e.g. "DDD_21_20210925001002.txt.gz" or more generally formulated as "DDD_YY_YYYYMMDDhhmmss.txt.gz")
Based on a desired time interval of analysis (e.g. days/weeks/months) this notebook creates a dictionary of start and end timestamps for each interval using the datetimeformat of the raw data filenames. This can help when trying to run the provided scripts of this repository in a loop over time intervals of analysis.

Load required packages

In [None]:
import pandas as pd
import os

Specify path to data

In [None]:
# path to temporally sorted *.gz files with the following filename structure "DDD_YY_YYYMMDDhhmmss"
path_to_zipped_calldata_on_server_from_notebook_working_path = '../data/0_input_data/calldata/zipped' 
# path to .csv-file containing daily timestamps (columns) for time intervals of analysis (rows) in the follwing datetimeformat 'YYYYMMDD') 
path_to_timestamp_list                                       = '../data/0_input_data/timestamps/timestamps.csv' 

Create list of timestamps present in the filenames of the raw data.

In [None]:
timestamps_from_zipped_files = []
for zipped_file in os.listdir(path_to_zipped_calldata_on_server_from_notebook_working_path):
    if zipped_file.startswith("DDD"):
        timestamps_from_zipped_files.append(zipped_file[7:-7]) # retrieve timestamp from filename with format "DDD_YY_YYYYMMDDhhmmss"
print(timestamps_from_zipped_files)

Load timestamps for each interval, where each row represents an interval and each column caontinas a timestmap. (here: daily interval of analysis, thus only one timestamps per row)

In [None]:
timestamps = pd.read_csv(path_to_timestamp_list, delimiter=';', header=None)
timestamps.head()

Create dictionary, where the key represents an index for the time interval of analysis and the value contains a tuple of start and end timestamps in the datetime format used in the raw data filenames. 

In [None]:
timestamps_dict = {}
for index, row in timestamps.iterrows():
    matchers = list(row)
    matchers_string = [str(i) for i in matchers]
    matching = [s for s in timestamps_from_zipped_files if any(xs in s for xs in matchers_string)]
    matching.sort()
    if matching:
        timestamps_dict['day_' + str(index)] = (matching[0], matching[-1])

# Print dictionary
timestamps_dict

Store dict as csv-file

In [None]:
timestamps_df = pd.DataFrame.from_dict(timestamps_dict)
timestamps_df.T.to_csv('../data/0_input_data/timestamps/timestamps_df.csv', header=False)