# Introduction

- some of the data gaps in our data show an increase in energy consumption
- this complicates the decision to ingnore or discard them
- this notebook creates a catalog of gaps and any power consumption during the gap

# Method

- take a discrete difference of the kWh export and timestamp index
- create a column from the index so that you get both diffs
- create a structure with start time, duration, and energy consumption

# Results

- There are many events where there is a change in the kWh export value during a data gap:  
    - `{'ajau': 608, 'asei': 55, 'atamali': 15, 'ayapo': 81, 'kensio': 3}`
- If we extend the definition of a gap to longer than two minutes we get:
    - `{'ajau': 40, 'asei': 15, 'atamali': 4, 'ayapo': 24, 'kensio': 3}`
- By looking at the percentiles, it appears that the vast majority of the kWh jumps are 1 kWh and there is only one large jump in each village.
- Also, many of these gaps are only two minutes, which suggest that (invalid) data could be inserted in these gaps without incident.

# Next Work

- consider using pandas intervals to make this cleaner and more explicit
- consider scatter plot with x-axis as duration and y-axis as energy to see patterns
- create table with total number of gaps, number of gaps with kwh jumps
- there are lots of gaps of 2 minutes, which seem ripe for ignoring

In [2]:
%load_ext autoreload

In [3]:
%autoreload 2
%matplotlib inline

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import WP19_analysis as wpa

def wpa_get_energy_gaps_timestamp(energy_data):
    # create time gaps and energy differences and return as df
    energy_data['timestamp'] = energy_data.index
    energy_data = energy_data[['kWh export', 'timestamp']]
    tmp = energy_data.diff().shift(-1)
    return tmp[tmp['timestamp'] > np.timedelta64(1,'m')]


In [4]:
vname = 'test'
energy_data = wpa.load_timeseries_file(vname + '-clean.csv')
messages = wpa.load_message_file(vname + '-messages.csv')


In [12]:
tmpdict = {}
for rfd in wpa.raw_file_data:
    vname = rfd['village_name']
    energy_data = wpa.load_timeseries_file(vname + '-clean.csv')
    messages = wpa.load_message_file(vname + '-messages.csv')
    print(vname)
    gaps = wpa_get_energy_gaps_timestamp(energy_data)
    gaps = gaps[gaps['timestamp'] > np.timedelta64(3,'m')]
    #print(gaps[gaps['kWh export'] != 0].describe())
    print(gaps[gaps['kWh export'] != 0]['kWh export'].value_counts())
    print(gaps[gaps['kWh export'] != 0]['timestamp'].value_counts())
    tmpdict[vname] = len(gaps[gaps['kWh export'] != 0])

ajau
1.0      36
2.0       1
14.0      1
122.0     1
Name: kWh export, dtype: int64
00:23:00    3
01:32:00    2
00:37:00    2
00:09:00    1
02:10:00    1
00:12:00    1
00:21:00    1
00:30:00    1
00:13:00    1
01:15:00    1
00:40:00    1
01:33:00    1
02:37:00    1
00:10:00    1
01:05:00    1
02:09:00    1
00:19:00    1
02:02:00    1
00:50:00    1
01:08:00    1
00:27:00    1
01:27:00    1
01:00:00    1
01:14:00    1
00:33:00    1
03:09:00    1
04:50:00    1
01:47:00    1
08:12:00    1
00:25:00    1
00:44:00    1
00:26:00    1
01:58:00    1
00:36:00    1
00:55:00    1
Name: timestamp, dtype: int64
asei
1.0      11
395.0     1
18.0      1
9.0       1
5.0       1
Name: kWh export, dtype: int64
0 days 00:18:00    1
0 days 00:04:00    1
0 days 00:13:00    1
0 days 00:25:00    1
2 days 01:50:00    1
0 days 02:11:00    1
0 days 00:53:00    1
0 days 01:02:00    1
0 days 00:16:00    1
0 days 02:47:00    1
0 days 00:43:00    1
0 days 00:52:00    1
0 days 00:45:00    1
0 days 04:40:00    1
0 days

In [11]:
tmpdict

{'ajau': 39, 'asei': 15, 'atamali': 4, 'ayapo': 23, 'kensio': 3}