# Introduction

This notebook aggregates some basic statistics on the data coverage from the timestamp and message files.
This will extend some analysis done in a 2017-07-04 notebook.

# Method

- get start and end date for each village
- calculate number of days of data for each village
- create a boolean array for minutes with a valid uptime or downtime observation


# Results

- all villages have data coverage of 86% or greater
- ajau has 99% coverage


# Next Work


In [1]:
%load_ext autoreload

In [2]:
%autoreload 2
%matplotlib inline

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import WP19_analysis as wpa

def wpa_valid_coverage_percentage(energy_data, messages):
    # if a particular time t is in a recorded gap or has a timestamp, it is valid 
    downtime = wpa.create_downtime_boolean_message(energy_data, messages)
    uptime = wpa.create_uptime_boolean_timestamp(energy_data)
    valid_observations = uptime | downtime
    return valid_observations.mean()

In [3]:
dd = {}
for rfd in wpa.raw_file_data:
    vname = rfd['village_name']
    energy_data = wpa.load_timeseries_file(vname + '-clean.csv')
    messages = wpa.load_message_file(vname + '-messages.csv')
    start_date, end_date = wpa.get_start_time(energy_data), wpa.get_end_time(energy_data)
    duration = (end_date - start_date) / np.timedelta64(1,'D')
    dd[vname] = {'start_date':start_date,
                 'end_date':end_date,
                 'duration':duration,
                 'coverage':wpa_valid_coverage_percentage(energy_data, messages)}

stats = pd.DataFrame(dd).T[['start_date', 'end_date', 'duration', 'coverage']]
stats

Unnamed: 0,start_date,end_date,duration,coverage
ajau,2015-04-22 00:00:00,2015-08-28 11:32:00,128.481,0.993579
asei,2015-04-22 00:00:00,2015-07-09 16:32:00,78.6889,0.968477
atamali,2015-04-24 17:46:00,2015-08-26 19:20:00,124.065,0.863161
ayapo,2015-04-22 11:54:00,2015-08-27 22:17:00,127.433,0.904552
kensio,2015-05-11 18:17:00,2015-08-21 22:57:00,102.194,0.92657


In [4]:
# output for markdown outline
import tabulate
print(tabulate.tabulate(stats, tablefmt='pipe', 
      headers=['Village', 'Start Date', 'End Date', 'Duration', 'Coverage']))

| Village   | Start Date          | End Date            |   Duration |   Coverage |
|:----------|:--------------------|:--------------------|-----------:|-----------:|
| ajau      | 2015-04-22 00:00:00 | 2015-08-28 11:32:00 |   128.481  |   0.993579 |
| asei      | 2015-04-22 00:00:00 | 2015-07-09 16:32:00 |    78.6889 |   0.968477 |
| atamali   | 2015-04-24 17:46:00 | 2015-08-26 19:20:00 |   124.065  |   0.863161 |
| ayapo     | 2015-04-22 11:54:00 | 2015-08-27 22:17:00 |   127.433  |   0.904552 |
| kensio    | 2015-05-11 18:17:00 | 2015-08-21 22:57:00 |   102.194  |   0.92657  |
