This notebook is trying to compare the uptimes predicted by the two different methods
It also hopes to identify the source of discrepancies and determine which is likely to be more accurate.

In [1]:
import pandas as pd
import WP19_analysis as wpa
import numpy as np

output = {}
for rdf in wpa.raw_file_data:
    input_file = rdf['village_name'] + '-clean.csv'
    energy_data = pd.read_csv(input_file, index_col=0, parse_dates=True)
    output[rdf['village_name']] = {'uptime percentage by timestamp gap': wpa.get_uptime_timestamps(energy_data).magnitude}

pd.DataFrame(output).T

11100720 second 11100720.0 second
6798720 second 6798720.0 second
10719240 second 10719240.0 second
11010180 second 11010180.0 second
8829600 second 8829600.0 second


Unnamed: 0,uptime percentage by timestamp gap
ajau,0.956262
asei,0.913575
atamali,0.189092
ayapo,0.23107
kensio,0.089026


In [2]:
data = {}
for rfd in wpa.raw_file_data:
    # get durations from message file
    messages = wpa.load_message_file(rfd['village_name'] + '-messages.csv')
    durations = wpa.get_durations(messages)
    # get total duration from time series file
    observations = wpa.load_timeseries_file(rfd['village_name'] + '-clean.csv')
    total_duration = wpa.get_total_duration(observations)
    
    data[rfd['village_name']] = {'downtime':durations.sum().values[0],
                                 'total duration':total_duration,
                                 'uptime by message gap':1-durations.sum().values[0]/total_duration}
    
pd.DataFrame(data).T

Unnamed: 0,downtime,total duration,uptime by message gap
ajau,102.056111,3083.533333,0.966903
asei,101.501389,1888.533333,0.946254
atamali,2003.236389,2977.566667,0.327224
ayapo,2055.485278,3058.383333,0.327918
kensio,2053.530833,2452.666667,0.162735


In [4]:
def wpa_num_gaps_timestamp(energy_data):
    time_gaps = np.diff(energy_data.index.values) / np.timedelta64(1,'s')
    time_gaps = pd.Series(time_gaps)
    return len(time_gaps[time_gaps > 60.0])

def wpa_num_gaps_messages(messages):
    power_down = messages[messages['message']=='Power Down']
    power_up = messages[messages['message']=='Power Up']

    # first message should be a power down
    assert (power_down.index[0] < power_up.index[0]), 'first message not power down'
    # last message should be a power up
    assert (power_down.index[-1] < power_up.index[-1]), 'last message not power up'
    # should be same number of up and down messages
    assert (len(power_down) == len(power_up)), 'unequal up and down messages'

    return len(power_down)

def wpa_get_downtime_timestamps(energy_data):
    time_gaps = np.diff(energy_data.index.values) / np.timedelta64(1,'s')
    time_gaps = pd.Series(time_gaps)
    return time_gaps[time_gaps > 60.0].sum()


data = {}
for rdf in wpa.raw_file_data:
    energy_data = wpa.load_timeseries_file(rdf['village_name'] + '-clean.csv')
    messages = wpa.load_message_file(rdf['village_name'] + '-messages.csv')
    total_duration = wpa.get_total_duration(energy_data)
    data[rdf['village_name']] = {'total_duration':total_duration,
                                 'ts num gaps':wpa_num_gaps_timestamp(energy_data),
                                 'ts down hours':wpa_get_downtime_timestamps(energy_data)/3600,
                                'mess num gaps':wpa_num_gaps_messages(messages), 
                                'mess down hours':wpa.get_durations_messages(messages).sum()[0]}

data_table = pd.DataFrame(data).T
data_table['ts up hrs'] = data_table['total_duration'] - data_table['ts down hours']
data_table['mess up hrs'] = data_table['total_duration'] - data_table['mess down hours']
data_table

Unnamed: 0,mess down hours,mess num gaps,total_duration,ts down hours,ts num gaps,ts up hrs,mess up hrs
ajau,102.056111,112.0,3083.533333,134.866667,778.0,2948.666667,2981.477222
asei,101.501389,119.0,1888.533333,163.216667,215.0,1725.316667,1787.031944
atamali,2003.236389,98.0,2977.566667,2414.533333,233.0,563.033333,974.330278
ayapo,2055.485278,243.0,3058.383333,2351.683333,262.0,706.7,1002.898056
kensio,2053.530833,48.0,2452.666667,2234.316667,46.0,218.35,399.135833


While the downtime hours are within 10-20 percent, I believe this leads to large uptime discrepancies.

Also note there are far more timestamp gaps that could be skewing the numbers due to data issues rather than power issues.
Maybe I could indentify and gather statistics on these data gaps?

Maybe the hours lost on the month borders could lead to this 10-20 percent discrepancy?
Can I find the unmatched gaps directly?