# Appliance hours detail

This notebook looks for discrepancies in the responses, particularly if a household reports owning a TV but does not report any usage of that TV.

- Do we assume that these folks use the TV the same as the other respondents?
- Do we assume that these households do not use the TV at all?

In [1]:
%matplotlib inline
import seaborn as sns
import matplotlib.pyplot as plt

from drs_sentani import get_survey
import pandas as pd
survey = get_survey()
import pysentani as sti
survey['access_type'] = sti.access_type(survey)

# we may want to do this more fine-grained so we know what we are omitting
#survey = survey.fillna(0)

In [12]:
survey['app_now/TV'].value_counts(dropna=False)

 1     844
 0     191
NaN    149
Name: app_now/TV, dtype: int64

## Non-responses from TV owners

We expect the total number of non-responses to be the sum of the non-responses to the TV question (149) plus the number of folks responding that they have no TV (191).  So we expect 340 null responses.

We calculate 373 null responses to the hours question meaning that if the survey tool is working correctly, there are 33 null responses for TV owners.  Assigning these zeros skew the results downward.  Also note that there are 374 nulls for per week TV use.

In [14]:
survey['app_TV_hrs'].isnull().sum()

373

In [16]:
survey['app_TV_per_wk'].isnull().sum()

374

In [23]:
ss = survey[['app_now/TV', 'app_TV_hrs', 'app_TV_per_wk', 'village_name']]
ss.head()

Unnamed: 0,app_now/TV,app_TV_hrs,app_TV_per_wk,village_name
0,1,4.0,7.0,Puai
1,1,2.0,7.0,Abar
2,1,3.0,7.0,Abar
3,1,2.0,2.0,Abar
4,0,,,Abar


# Masks to see discrepancies

The mask below finds the TV owners who did not respond to either the hours or times per week question.

In [31]:
mask = (ss['app_now/TV']==1) & ((ss['app_TV_hrs'].isnull()) | ss['app_TV_per_wk'].isnull())
ss[mask].head()

Unnamed: 0,app_now/TV,app_TV_hrs,app_TV_per_wk,village_name
20,1,,,Abar
106,1,,,Ebunfauw
197,1,,,Ebunfauw
205,1,,,Ebunfauw
288,1,6.0,,Puai


Iterating over all the appliances and counting the responses in this mask gives the number of responses that need to be considered in our extrapolation.  While we should make this adjustment, it doesn't look like a large discrepancy.

In [45]:
appliances = ['TV', 'fridge', 'radio', 'fan', 'rice_cooker', 'lighting']

for a in appliances:
    mask = (survey['app_now/{}'.format(a)]==1) & (
            (survey['app_{}_hrs'.format(a)].isnull()) | 
             survey['app_{}_per_wk'.format(a)].isnull())
    print(a, survey[mask]['app_now/{}'.format(a)].count())

TV 34
fridge 5
radio 12
fan 3
rice_cooker 5
lighting 25
