# Dropping appliance survey non-responses

We are noticing a significant number of non-responses to questions about current and future ownership of appliances.  

- How do the percentages of households surveyed change for each access category when we exclude non-responses?


In [1]:
from drs_sentani import get_survey
import pandas as pd
survey = get_survey()
import pysentani as sti
survey['access_type'] = sti.access_type(survey)


This gives us a count of all the survey results, regardless of null values.  Note that we have to either replace the null values or use ```value_counts```.

In [2]:
appliance_data = survey[['app_now/TV', 'app_buy/TV', 'access_type']]
appliance_data_filled_nulls = appliance_data.fillna('no response')
adfn = appliance_data_filled_nulls.groupby('access_type').count()
adfn

Unnamed: 0_level_0,app_now/TV,app_buy/TV
access_type,Unnamed: 1_level_1,Unnamed: 2_level_1
PLN_grid,619,619
PLN_microgrid,170,170
community_microgrid,54,54
no_access,341,341


In [3]:
appliance_data['access_type'].value_counts()

PLN_grid               619
no_access              341
PLN_microgrid          170
community_microgrid     54
Name: access_type, dtype: int64

We can also use count to find the number of valid responses for each column.

In [4]:
appliance_data.groupby('access_type').count()

Unnamed: 0_level_0,app_now/TV,app_buy/TV
access_type,Unnamed: 1_level_1,Unnamed: 2_level_1
PLN_grid,568,426
PLN_microgrid,158,117
community_microgrid,52,46
no_access,257,209


After dropping the null values, we can count the number of survey responses remaining.

In [5]:
appliance_data_no_nulls = appliance_data.dropna()
adnn = appliance_data_no_nulls.groupby('access_type').count()
adnn

Unnamed: 0_level_0,app_now/TV,app_buy/TV
access_type,Unnamed: 1_level_1,Unnamed: 2_level_1
PLN_grid,417,417
PLN_microgrid,115,115
community_microgrid,45,45
no_access,182,182


We can then combine these to see what percentage of survey results we are losing from non-responses.  

(There may be a cleaner way to do this using isnull and an aggregation?)

In [6]:
merged = pd.merge(adfn, adnn, left_index=True, right_index=True)
merged['percent_remaining'] = merged['app_buy/TV_y'] / merged['app_buy/TV_x'] 
merged[['app_now/TV_x', 'app_now/TV_y', 'percent_remaining']]

Unnamed: 0_level_0,app_now/TV_x,app_now/TV_y,percent_remaining
access_type,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
PLN_grid,619,417,0.673667
PLN_microgrid,170,115,0.676471
community_microgrid,54,45,0.833333
no_access,341,182,0.533724
