## GRID3 User Survey Raw Aggregate Responses

In [1]:
import pandas as pd
import numpy as np
from IPython.display import Markdown,display

def printmd(string):
    display(Markdown(string))

In [2]:
raw_df = pd.read_excel("mel_survey_data_analysis.xlsx")

In [3]:
def clean_countries(df):
    import string
    for i in [0,15]:
        df.iloc[:,i] = df.iloc[:,i].apply(lambda x: string.capwords(x.lower()))
    #combining the same object encoded differently
    df.iloc[:,0] = df.iloc[:,0].replace("Guinée","Guinee Conakry")
    return df

In [4]:
main_col_names = [
    'In which country do you work?',
    'Do you use geospatial software and datasets (json, shp, kml, gpx, etc...) in your work?',
    'Which of the following best characterizes your relationship with GRID3?',
    'How did you become familiar with GRID3-supported products and work (geospatial data, analysis, maps, training)?',
    'Please specify "other"',
    'GRID3 enables stakeholders to access geospatial information that is of a higher quality (or otherwise improved), than what is currently available outside the program',
    'GRID3 fills gaps or complements existing mechanisms for geospatial data collection, creation, and production',
    'Have you used GRID3-supported products (geospatial data, maps, analyses) in your professional work?',
    'How likely are you to use GRID3-supported products (geospatial data, maps, analyses) in the future?',
    'Overall, what is your opinion of GRID3?',
    'Do you know anyone outside your organization who is using GRID3-supported products (geospatial data, maps, analyses) in their professional work?']

In [5]:
branch1_names = [
    'What do you think are the MOST useful areas of GRID3 support?',
    'What do you think are the LEAST useful areas of GRID3 support?',
]

branch2_names = [
    'Have you used GRID3-supported products (geospatial data, maps, analyses) in your professional work?'
    'When did you first start engaging with or using GRID3-supported products (geospatial data, analysis, maps)?'
    'Please briefly describe the project(s) for which you used GRID3-supported products',
    'Which GRID3-supported geospatial data were used?',
    '',
    '',
    'Have you used GRID3-supported maps or data hubs?',
    'Please select the products you or your organization have used',
    'Please rate the user-experience of GRID3-supported maps, relative to other maps or methods',
    'How would you characterize the user-experience (ease of use, design, relevance) of GRID3-supported hubs, relative to other data hubs?',
    'GRID3-supported products correctly identify the intended target LOCATIONS for my organization’s program or service',
    'GRID3-supported products correctly identify the intended target POPULATION for my organization’s program or service',
    'What, if any, barrier(s) prevent the use of GRID3-supported products (geospatial data, maps, analyses)?']

In [6]:
#cleaning countries
working_df = clean_countries(raw_df)

In [7]:
working_df.iloc[:,94] = working_df.iloc[:,94].replace('Not sure','Not Sure')

In [8]:
def search_col_idx(df,name):
    return df.columns.get_loc(name)

In [9]:
def get_name_idx_pair(df,lst):
    ret_lst = []
    for name in lst:
        idx = search_col_idx(df,name)
        ret_lst.append((name,idx))
    return ret_lst

In [10]:
main_cols = get_name_idx_pair(working_df,main_col_names)

In [11]:
def get_df_from_info(info):
    return info[2]

In [47]:
def join_multiple_res(lst_infos,start,stop,parse_name=True):
    selected = lst_infos[start:stop]
    #print(selected[2][2])
    df = pd.DataFrame()
    for item in selected:

        res_df = get_df_from_info(item)
        
        if parse_name:
        #parse the response name from the question
            name = ''.join(item[1][item[1].find('?/'):][2:])
            if not name:
                import re 
                pattern = re.compile(r'(are|is) (.*),')
                name = ' || '.join((pattern.search(item[1]).group(2).strip(),f'Total Count: {item[3]}'))
                #print(name)
                try:
                    res_df.columns = pd.MultiIndex.from_product([[name],res_df.columns])
                except:
                    pass

            name = ' || '.join((item[1][item[1].find('?/'):][2:],f'Total Count: {item[3]}'))
            #renaming the multilevel index
            try:
                res_df.columns = pd.MultiIndex.from_product([[name],res_df.columns])
            except:
                pass
        else:
            res_df.columns = pd.MultiIndex.from_product([[item[1]],res_df.columns])
        
        #print(name)
        #display(res_df)
        df = pd.concat([df,res_df],axis=1)
        #print(res_df)
    return df

In [13]:
def get_response_info_per_question(df,col_idx):
    # getting the proportion of responses 
    ser_name = df.columns[col_idx]
    pct_series = df.iloc[:,col_idx].value_counts(normalize=True).rename('% Responses',inplace=True)
    pct_series = pct_series.apply(lambda x: "{:.2%}".format(x))
    
    # getting the raw counts of responses
    cnt_series = df.iloc[:,col_idx].value_counts().rename('# Responses',inplace=True)
    
    # creating the df
    ret_df = pd.concat([pct_series,cnt_series],axis=1)
    count = df.iloc[:,col_idx].count()
    #ret_df.loc['Total Counts'] = [None,count]
    
    return (ser_name,ret_df,count)

In [14]:
def get_all_responses(df,cols):
    ret_lst = []
    try:
        for _,idx in cols:
            info = get_response_info_per_question(df,idx)
            ret_lst.append((idx,*info))
    except:
        for idx in cols:
            info = get_response_info_per_question(df,idx)
            ret_lst.append((idx,*info))
    return ret_lst

In [15]:
def clean_name(string):
    return string.replace('?','').replace('/','-').replace('*','')[:30]

In [16]:
main_col_info = get_all_responses(working_df,main_cols)

In [18]:
i = working_df.shape[1]
all_info = get_all_responses(working_df,zip([None]*i,list(range(i))))

In [20]:
pd.set_option('display.max_rows', None)

In [23]:
channel_range = list(range(4,13))
most_important_range = list(range(18,31))
least_important_range = list(range(33,46))
data_used_range = list(range(51,62))
barriers_range = list(range(103,115))

In [24]:
lst_clusters = [channel_range,most_important_range,least_important_range,data_used_range,barriers_range]

In [25]:
ignore_cols = [3,13,14,17,31,32,49,50,102,117,119,120,121,122].extend(lst_clusters)

In [50]:
most_important_df = working_df.loc[:,"What do you think are the **most** useful areas of GRID3 support?/Data collection":"What do you think are the **most** useful areas of GRID3 support?/Other"]
least_important_df = working_df.loc[:,"What do you think are the **least** useful areas of GRID3 support?/Data collection":"What do you think are the **least** useful areas of GRID3 support?/Other"]
channel_df = working_df.iloc[:,4:13]
data_used_df = working_df.iloc[:,51:62]
map_df = working_df.iloc[:,90:97]
barriers_df = working_df.iloc[:,103:115]

In [28]:
relative_rankings = working_df.loc[:,'How accurate are the dataset(s), relative to other similar geospatial data?':'How useful are GRID3-supported health catchment area datasets, relative to other geospatial health catchment area data?']

In [29]:
relative_accuracies = relative_rankings.iloc[:,::2]
relative_usefulness = relative_rankings.iloc[:,1::2]

In [45]:
def pprint_proportions_datagroup(df,parse_name=True):
    #takes a dataframe and returns the condensed summary of a dataframe
    size = df.shape[1]
    
    responses = get_all_responses(df,list(range(size)))
    joined_responses = join_multiple_res(responses,0,size,parse_name).fillna('---').T

    return joined_responses.reindex(sorted(joined_responses.columns,reverse=True), axis=1)

In [31]:
size_relative = relative_accuracies.shape[1]
size_usefulness = relative_usefulness.shape[1]

In [54]:
printed_channels = False
printed_important = False
printed_not_important = False
printed_4 = False
printed_5 = False
printed_6 = False
printed_7 = False

for col_idx, name,ser,count in all_info:
    if col_idx in [3,13,14,17,31,32,49,50,89,102,117,119,120,121,122]:
        continue
    if col_idx in range(4,13) and not printed_channels:
        name = name[:name.find('?/')]
        printmd(f"**{name}**")
        display(pprint_proportions_datagroup(channel_df))
        printed_channels = True
        print('\n')
    elif col_idx in range(4,13) and printed_channels:
        continue
    elif col_idx in range(18,31) and not printed_important:
        name = name[:name.find('?/')]
        printmd(f"**{name}** *Total Responses:{count}*")
        display(pprint_proportions_datagroup(most_important_df))
        printed_important = True
        print('\n')
    elif col_idx in range(18,31) and printed_important:
        continue
    elif col_idx in range(33,46) and not printed_not_important:
        name = name[:name.find('?/')]
        printmd(f"**{name}** *Total Responses:{count}*")
        display(pprint_proportions_datagroup(least_important_df))
        printed_not_important = True
        print('\n')
    elif col_idx in range(33,46) and printed_not_important:
        pass
    elif col_idx in range(51,62) and not printed_4:
        name = name[:name.find('?/')]
        printmd(f"**{name}** *Total Responses:{count}*")
        display(pprint_proportions_datagroup(data_used_df))
        printed_4 = True
        print('\n')
    elif col_idx in range(51,62) and printed_4:
        pass
    elif col_idx in range(63,84) and not printed_5:
        name = 'Usefulness and Accuracy of various GRID3 supported data'
        printmd(f"**{name}**")
        display(pprint_proportions_datagroup(relative_accuracies))
        display(pprint_proportions_datagroup(relative_usefulness))
        printed_5 = True
        print('\n')
    elif col_idx in range(63,84) and printed_5:
        pass
    elif col_idx in range(90,98) and not printed_6:
        name = 'Map Usefulness'
        printmd(f"**{name}**")
        display(pprint_proportions_datagroup(map_df,False))
        printed_6 = True
        print('\n')
    elif col_idx in range(90,97) and printed_6:
        pass
    elif col_idx in range(103,115) and not printed_7:
        name = name[:name.find('?/')]
        printmd(f"**{name}**")
        display(pprint_proportions_datagroup(barriers_df))
        printed_7 = True
        print('\n')
    elif col_idx in range(103,115) and printed_7:
        pass
    else:
        printmd(f"**{name}** *Total Responses:{count}*")
        display(ser)
        #print(col_idx)

**In which country do you work?** *Total Responses:137*

Unnamed: 0,% Responses,# Responses
Nigeria,27.74%,38
Zambia,20.44%,28
Drc,19.71%,27
Burkina Faso,4.38%,6
Sierra Leone,3.65%,5
Mozambique,3.65%,5
Ghana,2.19%,3
Kenya,2.19%,3
South Sudan,2.19%,3
Somalia,1.46%,2


**Do you use geospatial software and datasets (json, shp, kml, gpx, etc...) in your work?** *Total Responses:137*

Unnamed: 0,% Responses,# Responses
Yes,84.67%,116
No,15.33%,21


**Which of the following best characterizes your relationship with GRID3?** *Total Responses:137*

Unnamed: 0,% Responses,# Responses
I (or my organization) am working or have previously worked with GRID3 on a project,69.34%,95
I (or my organization) have not worked with GRID3 on a project,30.66%,42


**How did you become familiar with GRID3-supported products and work (geospatial data, analysis, maps, training)**

Unnamed: 0,Unnamed: 1,1,0
GRID3 newsletter || Total Count: 137,% Responses,19.71%,80.29%
GRID3 newsletter || Total Count: 137,# Responses,27,110
GRID3 website || Total Count: 137,% Responses,56.20%,43.80%
GRID3 website || Total Count: 137,# Responses,77,60
Linkedin || Total Count: 137,% Responses,10.22%,89.78%
Linkedin || Total Count: 137,# Responses,14,123
Twitter || Total Count: 137,% Responses,2.19%,97.81%
Twitter || Total Count: 137,# Responses,3,134
Youtube || Total Count: 137,% Responses,3.65%,96.35%
Youtube || Total Count: 137,# Responses,5,132






**GRID3 enables stakeholders to access geospatial information that is of a higher quality (or otherwise improved), than what is currently available outside the program** *Total Responses:137*

Unnamed: 0,% Responses,# Responses
Strongly Agree,48.91%,67
Agree,37.23%,51
Neutral,10.22%,14
Strongly Disagree,3.65%,5


**GRID3 fills gaps or complements existing mechanisms for geospatial data collection, creation, and production** *Total Responses:137*

Unnamed: 0,% Responses,# Responses
Agree,46.72%,64
Strongly agree,44.53%,61
Neutral,5.84%,8
Strongly Disagree,2.92%,4


**What do you think are the **most** useful areas of GRID3 support** *Total Responses:95*

Unnamed: 0,Unnamed: 1,1.0,0.0
Data collection || Total Count: 95,% Responses,66.32%,33.68%
Data collection || Total Count: 95,# Responses,63,32
Data production || Total Count: 95,% Responses,62.11%,37.89%
Data production || Total Count: 95,# Responses,59,36
Data quality control and cleaning || Total Count: 95,% Responses,43.16%,56.84%
Data quality control and cleaning || Total Count: 95,# Responses,41,54
Data analysis || Total Count: 95,% Responses,71.58%,28.42%
Data analysis || Total Count: 95,# Responses,68,27
Data management || Total Count: 95,% Responses,57.89%,42.11%
Data management || Total Count: 95,# Responses,55,40






**What do you think are the **least** useful areas of GRID3 support** *Total Responses:95*

Unnamed: 0,Unnamed: 1,1.0,0.0
Data collection || Total Count: 95,% Responses,13.68%,86.32%
Data collection || Total Count: 95,# Responses,13,82
Data production || Total Count: 95,% Responses,2.11%,97.89%
Data production || Total Count: 95,# Responses,2,93
Data quality control and cleaning || Total Count: 95,% Responses,9.47%,90.53%
Data quality control and cleaning || Total Count: 95,# Responses,9,86
Data analysis || Total Count: 95,% Responses,2.11%,97.89%
Data analysis || Total Count: 95,# Responses,2,93
Data management || Total Count: 95,% Responses,4.21%,95.79%
Data management || Total Count: 95,# Responses,4,91






**Please specify "other".2** *Total Responses:2*

Unnamed: 0,% Responses,# Responses
CAPACITY BUILDING,50.00%,1
Security,50.00%,1


**Have you used GRID3-supported products (geospatial data, maps, analyses) in your professional work?** *Total Responses:137*

Unnamed: 0,% Responses,# Responses
Yes,74.45%,102
No,25.55%,35


**When did you first start engaging with or using GRID3-supported products (geospatial data, analysis, maps)?** *Total Responses:102*

Unnamed: 0,% Responses,# Responses
2022-01-01,6.86%,7
2021-08-01,5.88%,6
2021-07-01,5.88%,6
2022-02-01,4.90%,5
2021-03-01,4.90%,5
2022-05-01,3.92%,4
2020-07-01,3.92%,4
2021-05-01,3.92%,4
2022-06-01,3.92%,4
2020-09-01,3.92%,4


**Which GRID3-supported geospatial data were used** *Total Responses:102*

Unnamed: 0,Unnamed: 1,1.0,0.0
Population estimates (estimates of the current population) || Total Count: 102,% Responses,62.75%,37.25%
Population estimates (estimates of the current population) || Total Count: 102,# Responses,64,38
Population projections (estimates of the future population) || Total Count: 102,% Responses,28.43%,71.57%
Population projections (estimates of the future population) || Total Count: 102,# Responses,29,73
Settlement extents || Total Count: 102,% Responses,60.78%,39.22%
Settlement extents || Total Count: 102,# Responses,62,40
Settlement names || Total Count: 102,% Responses,46.08%,53.92%
Settlement names || Total Count: 102,# Responses,47,55
School locations || Total Count: 102,% Responses,37.25%,62.75%
School locations || Total Count: 102,# Responses,38,64






**Please specify "other".3** *Total Responses:6*

Unnamed: 0,% Responses,# Responses
Profiling of the social mobilization interventions,16.67%,1
"schools points, settlement points, WASH Layer, District boundaries",16.67%,1
Community Care Sites,16.67%,1
Work related,16.67%,1
DETERMINING OF WARD BOUNDARIES WITHIN NIGERIA,16.67%,1
"Energy, Utilities,",16.67%,1


**Usefulness and Accuracy of various GRID3 supported data**

Unnamed: 0,Unnamed: 1,5--Extremely accurate,4--Moderately accurate,3--Somewhat accurate,2--Slightly accurate,1--Not accurate at all
the dataset(s) || Total Count: 6,% Responses,16.67%,50.00%,33.33%,---,---
the dataset(s) || Total Count: 6,# Responses,1.0,3.0,2.0,---,---
GRID3-supported population estimates || Total Count: 64,% Responses,18.75%,54.69%,18.75%,7.81%,---
GRID3-supported population estimates || Total Count: 64,# Responses,12.0,35.0,12.0,5.0,---
GRID3-supported population projections || Total Count: 29,% Responses,6.90%,72.41%,17.24%,3.45%,---
GRID3-supported population projections || Total Count: 29,# Responses,2.0,21.0,5.0,1.0,---
GRID3-supported settlement extents || Total Count: 62,% Responses,33.87%,48.39%,14.52%,3.23%,---
GRID3-supported settlement extents || Total Count: 62,# Responses,21.0,30.0,9.0,2.0,---
GRID3-supported settlement names || Total Count: 47,% Responses,27.66%,46.81%,23.40%,---,2.13%
GRID3-supported settlement names || Total Count: 47,# Responses,13.0,22.0,11.0,---,1.0


Unnamed: 0,Unnamed: 1,5--Extremely useful,4--Moderately useful,3--Somewhat useful,2--Slightly useful,1--Not useful at all
the dataset(s) || Total Count: 6,% Responses,16.67%,66.67%,16.67%,---,---
the dataset(s) || Total Count: 6,# Responses,1.0,4.0,1.0,---,---
GRID3-supported population estimates || Total Count: 64,% Responses,48.44%,29.69%,17.19%,4.69%,---
GRID3-supported population estimates || Total Count: 64,# Responses,31.0,19.0,11.0,3.0,---
GRID3-supported population projections || Total Count: 29,% Responses,24.14%,58.62%,17.24%,---,---
GRID3-supported population projections || Total Count: 29,# Responses,7.0,17.0,5.0,---,---
GRID3-supported settlement extents || Total Count: 62,% Responses,43.55%,45.16%,11.29%,---,---
GRID3-supported settlement extents || Total Count: 62,# Responses,27.0,28.0,7.0,---,---
GRID3-supported settlement names || Total Count: 47,% Responses,36.17%,44.68%,14.89%,2.13%,2.13%
GRID3-supported settlement names || Total Count: 47,# Responses,17,21,7,1,1






**How useful are GRID3-supported health catchment area datasets, relative to other geospatial health catchment area data?** *Total Responses:37*

Unnamed: 0,% Responses,# Responses
4--Moderately useful,45.95%,17
5--Extremely useful,40.54%,15
3--Somewhat useful,13.51%,5


**Have you used GRID3-supported maps or data hubs?** *Total Responses:102*

Unnamed: 0,% Responses,# Responses
Yes,67.65%,69
No,32.35%,33


**Please select the products you or your organization have used** *Total Responses:69*

Unnamed: 0,% Responses,# Responses
Maps,81.16%,56
Digital hub,14.49%,10
Maps Digital hub,4.35%,3


**Please select the products you or your organization have used/Maps** *Total Responses:69*

Unnamed: 0,% Responses,# Responses
1.0,85.51%,59
0.0,14.49%,10


**Please select the products you or your organization have used/Digital hub** *Total Responses:69*

Unnamed: 0,% Responses,# Responses
0.0,81.16%,56
1.0,18.84%,13


**Map Usefulness**

Unnamed: 0,Unnamed: 1,Strongly Disagree,Strongly Agree,Not Sure,Disagree,Agree
The maps make work efficient,% Responses,3.39%,49.15%,1.69%,1.69%,44.07%
The maps make work efficient,# Responses,2,29,1,1,26
The maps are practical,% Responses,3.39%,47.46%,3.39%,1.69%,44.07%
The maps are practical,# Responses,2,28,2,1,26
The map information is organized well,% Responses,3.39%,50.85%,---,1.69%,44.07%
The map information is organized well,# Responses,2.0,30.0,---,1.0,26.0
The maps are understandable,% Responses,1.69%,57.63%,1.69%,3.39%,35.59%
The maps are understandable,# Responses,1,34,1,2,21
The maps present information clearly,% Responses,3.39%,55.93%,3.39%,1.69%,35.59%
The maps present information clearly,# Responses,2,33,2,1,21






**How would you characterize the user-experience (ease of use, design, relevance) of GRID3-supported hubs, relative to other data hubs?** *Total Responses:13*

Unnamed: 0,% Responses,# Responses
4--Very Good,30.77%,4
3--Good,30.77%,4
5--Excellent,23.08%,3
4--Very good,15.38%,2


**Please rate your agreement/disagreement with the following statements.1** *Total Responses:0*

Unnamed: 0,% Responses,# Responses


**GRID3-supported products correctly identify the intended target **locations** for my organization’s program or service** *Total Responses:102*

Unnamed: 0,% Responses,# Responses
Mostly,53.92%,55
Always,18.63%,19
Partially,18.63%,19
Occasionally,7.84%,8
Never,0.98%,1


**GRID3-supported products correctly identify the intended target **population** for my organization’s program or service** *Total Responses:102*

Unnamed: 0,% Responses,# Responses
Mostly,49.02%,50
Always,24.51%,25
Partially,19.61%,20
Occasionally,6.86%,7


**What, if any, barrier(s) prevent the use of GRID3-supported products (geospatial data, maps, analyses)?** *Total Responses:102*

Unnamed: 0,% Responses,# Responses
There are none,35.29%,36
Skills to use GIS or other geospatial software Decision-making and integration of products into workflows,4.90%,5
The inaccuracy of products,3.92%,4
Other,3.92%,4
Products are not endorsed by the government,2.94%,3
Skills to use GIS or other geospatial software,2.94%,3
Decision-making and integration of products into workflows,2.94%,3
Products are not endorsed by the government Access to geospatial software Skills to use GIS or other geospatial software Ownership or licensing of the products Decision-making and integration of products into workflows,1.96%,2
Ownership or licensing of the products Resources to print maps,1.96%,2
I don't know where to find the products,1.96%,2


**What, if any, barrier(s) prevent the use of GRID3-supported products (geospatial data, maps, analyses)**

Unnamed: 0,Unnamed: 1,1.0,0.0
I don't know where to find the products || Total Count: 102,% Responses,8.82%,91.18%
I don't know where to find the products || Total Count: 102,# Responses,9,93
The inaccuracy of products || Total Count: 102,% Responses,6.86%,93.14%
The inaccuracy of products || Total Count: 102,# Responses,7,95
Relevance of the products to work || Total Count: 102,% Responses,2.94%,97.06%
Relevance of the products to work || Total Count: 102,# Responses,3,99
Visualization of the data is not appropriate || Total Count: 102,% Responses,4.90%,95.10%
Visualization of the data is not appropriate || Total Count: 102,# Responses,5,97
Products are not endorsed by the government || Total Count: 102,% Responses,12.75%,87.25%
Products are not endorsed by the government || Total Count: 102,# Responses,13,89






**Please specify "other".4** *Total Responses:7*

Unnamed: 0,% Responses,# Responses
The maps are not live. I.e. data does not change with time lapse (following year),14.29%,1
clear or detailed methodologies of products including metadata,14.29%,1
Limited computing power and slow download of products owing to poor internet access,14.29%,1
"Only a fraction of the DRC has been mapped, limiting the utility to only specific geographies",14.29%,1
Allow the application for production of population grid to work with land cover data and all its layers to achieve a more accurate results in DUG and TUC.,14.29%,1
I wish to continue the training (entire program),14.29%,1
\nN / A. I don't know how to rate this question!,14.29%,1


**How likely are you to use GRID3-supported products (geospatial data, maps, analyses) in the future?** *Total Responses:137*

Unnamed: 0,% Responses,# Responses
Very likely,58.39%,80
Likely,32.12%,44
Unsure,3.65%,5
Very unlikely,3.65%,5
Unlikely,2.19%,3


**Do you know anyone outside your organization who is using GRID3-supported products (geospatial data, maps, analyses) in their professional work?** *Total Responses:137*

Unnamed: 0,% Responses,# Responses
No,67.88%,93
Yes,32.12%,44
