## Analysing the data from the Market Research Survey

In [1]:
# import libraries
import pandas as pd

In [2]:
# import csv file as dataframe
data_path = "tudublin_amenities_access_survey.csv"
survey_data = pd.read_csv(data_path, delimiter=",", encoding='unicode_escape')
survey_data.head(5)

Unnamed: 0,Id,Start time,Completion time,Email,Name,Which county do you work/live in?,Which sector do you work in?\n,Does your job involve accessing/working with data on publicly available amenities?\n,Which devices do you primarily use for your work?\n,What type of public amenities information do you usually require for your job?\n,...,Do you have any additional suggestions/feedback on our proposal ?\n,"If you'd like to contact us for more information on the project, please share your email below\n",Which devices do you primarily use for your day-to-day personal tasks?\n,How often do you use digital tools/applications for navigation?\n,"Based on the demo below, how useful would a web app be for accessing parking locations and quantities in your day-to-day?\n",Why would this be impractical for you?\n,Please tell us what you are using instead\n,What other public amenity would you like more information on?,What other features would you like to see alongside location & quantity of parking spaces?\n,"If you'd like to contact us for more information on the project, please share your email below\n1"
0,1,22/10/2024 18:14,22/10/2024 18:17,anonymous,,,Education,Yes,Desktop computer;Laptop;Smartphone;,"Mechanical (water grid, electric grid, etc);Tr...",...,,,,,,,,,,
1,2,29/10/2024 16:29,29/10/2024 16:36,anonymous,,Dublin,Government,No,,,...,,,Desktop computer;Laptop;Smartphone;,Weekly,Extremely useful,,,,Cost;Parking hours;,
2,3,31/10/2024 12:18,31/10/2024 12:19,anonymous,,Fingal,Construction,No,,,...,,,Desktop computer;,Never,Somewhat useful,,,,Parking hours;,
3,4,31/10/2024 12:25,31/10/2024 12:29,anonymous,,Dublin,Architecture,Yes,Desktop computer;Laptop;Smartphone;,"Recreational (parks, sport facilities, hiking ...",...,,,,,,,,,,
4,5,31/10/2024 12:24,31/10/2024 12:29,anonymous,,Dublin,Construction,Yes,Desktop computer;,"Healthcare & Safety (emergency services, hospi...",...,,,,,,,,,,


In [3]:
# check number of rows and columns
print("Number of columns: ", len(survey_data.columns))
print("Number of rows: ", len(survey_data))

Number of columns:  36
Number of rows:  111


In [4]:
# check data type of columns -- all object
print("Data types of columns:")
survey_data.dtypes

Data types of columns:


Id                                                                                                                               int64
Start time                                                                                                                      object
Completion time                                                                                                                 object
Email                                                                                                                           object
Name                                                                                                                           float64
Which county do you work/live in?                                                                                               object
Which sector do you work in?\n                                                                                                  object
Does your job involve accessing/working with data on pu

### First order of business: clean up the df
<li> I will start by removing the unecessary columns: start & completion time, name & email because they're empty.<br></li>
<li>Need to rename the columns as well</li>
<li>Next, I will fill in the NAN in county responses, and group the different "other" responses in job<br></li>
<li>Then, I will split the df into those who are employed and unemployed, and then those who use public amenities or not in their professional life.</li>

In [5]:
# remove unecessary columns
survey_data = survey_data.drop(survey_data.columns[[1,2,3,4]], axis=1)
survey_data.head(5)

Unnamed: 0,Id,Which county do you work/live in?,Which sector do you work in?\n,Does your job involve accessing/working with data on publicly available amenities?\n,Which devices do you primarily use for your work?\n,What type of public amenities information do you usually require for your job?\n,How often do you require/access this information?,What type of software/tools do you use to access information on publicly available amenities?\n,How often do you use this software/tool?,"In regards to the public amenities chosen, does your current software/tool satisfy your work requirements?\n",...,Do you have any additional suggestions/feedback on our proposal ?\n,"If you'd like to contact us for more information on the project, please share your email below\n",Which devices do you primarily use for your day-to-day personal tasks?\n,How often do you use digital tools/applications for navigation?\n,"Based on the demo below, how useful would a web app be for accessing parking locations and quantities in your day-to-day?\n",Why would this be impractical for you?\n,Please tell us what you are using instead\n,What other public amenity would you like more information on?,What other features would you like to see alongside location & quantity of parking spaces?\n,"If you'd like to contact us for more information on the project, please share your email below\n1"
0,1,,Education,Yes,Desktop computer;Laptop;Smartphone;,"Mechanical (water grid, electric grid, etc);Tr...",Weekly,Government database (i.e: data.gov.ie);,Monthly,Somewhat,...,,,,,,,,,,
1,2,Dublin,Government,No,,,,,,,...,,,Desktop computer;Laptop;Smartphone;,Weekly,Extremely useful,,,,Cost;Parking hours;,
2,3,Fingal,Construction,No,,,,,,,...,,,Desktop computer;,Never,Somewhat useful,,,,Parking hours;,
3,4,Dublin,Architecture,Yes,Desktop computer;Laptop;Smartphone;,"Recreational (parks, sport facilities, hiking ...",Daily,Navigation applications (i.e: Google Maps);,Daily,Somewhat,...,,,,,,,,,,
4,5,Dublin,Construction,Yes,Desktop computer;,"Healthcare & Safety (emergency services, hospi...",Weekly,Navigation applications (i.e: Google Maps);Cit...,Weekly,Yes,...,,,,,,,,,,


In [6]:
# import column names
col_file = open("col_names.txt", "r")
col_names = col_file.read()
col_list = col_names.replace(' ','').split(",")
print(col_list)
col_file.close()

['id', 'county', 'sector', 'use_amenity_data', 'device_work', 'type_amenity_data_work', 'freq_amenity_work', 'type_tool_work', 'freq_tool_work', 'satisfaction_tool_work', 'why_unsatisfied_tool_work', 'feature_wish_work', 'demo_useful_work', 'why_impractical_demo_work', 'other_amenity_work', 'searchfunctionality_useful_rank', 'realtimeavail_useful_rank', 'qtyamenity_useful_rank', 'costamenity_useful_rank', 'routeplan_useful_rank', 'filtertype_useful_rank', 'export_useful_rank', 'feedback_work', 'contact_work', 'device_personal', 'freq_tool_personal', 'demo_useful_personal', 'why_impractical_demo_personal', 'other_tool_personal', 'other_amenity_personal', 'other_feature_personal', 'contact_personal']


In [7]:
# rename columns
survey_data = survey_data.set_axis(col_list, axis=1)
print(survey_data.columns)

Index(['id', 'county', 'sector', 'use_amenity_data', 'device_work',
       'type_amenity_data_work', 'freq_amenity_work', 'type_tool_work',
       'freq_tool_work', 'satisfaction_tool_work', 'why_unsatisfied_tool_work',
       'feature_wish_work', 'demo_useful_work', 'why_impractical_demo_work',
       'other_amenity_work', 'searchfunctionality_useful_rank',
       'realtimeavail_useful_rank', 'qtyamenity_useful_rank',
       'costamenity_useful_rank', 'routeplan_useful_rank',
       'filtertype_useful_rank', 'export_useful_rank', 'feedback_work',
       'contact_work', 'device_personal', 'freq_tool_personal',
       'demo_useful_personal', 'why_impractical_demo_personal',
       'other_tool_personal', 'other_amenity_personal',
       'other_feature_personal', 'contact_personal'],
      dtype='object')


In [8]:
survey_data.head(5)

Unnamed: 0,id,county,sector,use_amenity_data,device_work,type_amenity_data_work,freq_amenity_work,type_tool_work,freq_tool_work,satisfaction_tool_work,...,feedback_work,contact_work,device_personal,freq_tool_personal,demo_useful_personal,why_impractical_demo_personal,other_tool_personal,other_amenity_personal,other_feature_personal,contact_personal
0,1,,Education,Yes,Desktop computer;Laptop;Smartphone;,"Mechanical (water grid, electric grid, etc);Tr...",Weekly,Government database (i.e: data.gov.ie);,Monthly,Somewhat,...,,,,,,,,,,
1,2,Dublin,Government,No,,,,,,,...,,,Desktop computer;Laptop;Smartphone;,Weekly,Extremely useful,,,,Cost;Parking hours;,
2,3,Fingal,Construction,No,,,,,,,...,,,Desktop computer;,Never,Somewhat useful,,,,Parking hours;,
3,4,Dublin,Architecture,Yes,Desktop computer;Laptop;Smartphone;,"Recreational (parks, sport facilities, hiking ...",Daily,Navigation applications (i.e: Google Maps);,Daily,Somewhat,...,,,,,,,,,,
4,5,Dublin,Construction,Yes,Desktop computer;,"Healthcare & Safety (emergency services, hospi...",Weekly,Navigation applications (i.e: Google Maps);Cit...,Weekly,Yes,...,,,,,,,,,,


In [9]:
# split in 2 sub dataframe, those that use amenity data for work and those who dont
# user A = dont' use amenity data
# user B = use amenity data

users_A = pd.DataFrame(survey_data[survey_data["use_amenity_data"] == "No"])
users_B = pd.DataFrame(survey_data[survey_data["use_amenity_data"] == "Yes"])

In [10]:
users_A.head(5)

Unnamed: 0,id,county,sector,use_amenity_data,device_work,type_amenity_data_work,freq_amenity_work,type_tool_work,freq_tool_work,satisfaction_tool_work,...,feedback_work,contact_work,device_personal,freq_tool_personal,demo_useful_personal,why_impractical_demo_personal,other_tool_personal,other_amenity_personal,other_feature_personal,contact_personal
1,2,Dublin,Government,No,,,,,,,...,,,Desktop computer;Laptop;Smartphone;,Weekly,Extremely useful,,,,Cost;Parking hours;,
2,3,Fingal,Construction,No,,,,,,,...,,,Desktop computer;,Never,Somewhat useful,,,,Parking hours;,
8,9,Kildare,Government,No,,,,,,,...,,,Smartphone;Laptop;,Daily,Extremely useful,,,,Cost;Parking hours;,
10,11,Dublin,Education,No,,,,,,,...,,,Laptop;Smartphone;,Daily,Extremely useful,,,,Cost;Parking hours;Safety rating;,
11,12,Kildare,IT,No,,,,,,,...,,,Laptop;,Weekly,Somewhat useful,,,,Cost;Parking hours;,


#### Pre process user_A
THOSE THAT DONT USE AMENITY DATA

In [11]:
# remove useless columns aka those with only NAN values
users_A = users_A.dropna(axis=1, how='all')
users_A.columns


Index(['id', 'county', 'sector', 'use_amenity_data', 'device_personal',
       'freq_tool_personal', 'demo_useful_personal',
       'why_impractical_demo_personal', 'other_tool_personal',
       'other_amenity_personal', 'other_feature_personal', 'contact_personal'],
      dtype='object')

#### Preprocess user_B
THOSE THAT USE AMENITY DATA

In [12]:
# remove useless columns aka those with only NAN values
users_B = users_B.dropna(axis=1, how='all')
users_B.columns

Index(['id', 'county', 'sector', 'use_amenity_data', 'device_work',
       'type_amenity_data_work', 'freq_amenity_work', 'type_tool_work',
       'freq_tool_work', 'satisfaction_tool_work', 'why_unsatisfied_tool_work',
       'feature_wish_work', 'demo_useful_work', 'why_impractical_demo_work',
       'other_amenity_work', 'searchfunctionality_useful_rank',
       'realtimeavail_useful_rank', 'qtyamenity_useful_rank',
       'costamenity_useful_rank', 'routeplan_useful_rank',
       'filtertype_useful_rank', 'export_useful_rank', 'feedback_work',
       'contact_work'],
      dtype='object')

In [13]:
# shorten some answers for type amenity and type tool
new_amenity_list = ["mechanical","recreational","healthcare & safety","transport & mobility","technological","accessibility"]
og_amenity_list = ["Recreational (parks, sport facilities, hiking trails, public beaches, etc)",
                   "Transport & mobility (bus stops, EV charging stations, parking, bicycle lanes, etc)",
                   "Healthcare & Safety (emergency services, hospitals, pharmacies, public defibrillators, etc)",
                   "Technological (public wi-fi, etc)",
                   "Mechanical (water grid, electric grid, etc)",
                   "Accessibility features (wheelchair ramps, tactile pavement, public toilets, etc)"]

users_B1 = users_B.replace(to_replace=og_amenity_list, value=new_amenity_list)

In [None]:
users_B1

# need to figure out how to replace the amenity types in multiple answers

Unnamed: 0,id,county,sector,use_amenity_data,device_work,type_amenity_data_work,freq_amenity_work,type_tool_work,freq_tool_work,satisfaction_tool_work,...,other_amenity_work,searchfunctionality_useful_rank,realtimeavail_useful_rank,qtyamenity_useful_rank,costamenity_useful_rank,routeplan_useful_rank,filtertype_useful_rank,export_useful_rank,feedback_work,contact_work
0,1,,Education,Yes,Desktop computer;Laptop;Smartphone;,"Mechanical (water grid, electric grid, etc);Tr...",Weekly,Government database (i.e: data.gov.ie);,Monthly,Somewhat,...,Car parking;,Very useful,Very useful,Somewhat useful,Very useful,Very useful,,,,
3,4,Dublin,Architecture,Yes,Desktop computer;Laptop;Smartphone;,"Recreational (parks, sport facilities, hiking ...",Daily,Navigation applications (i.e: Google Maps);,Daily,Somewhat,...,Bike lanes;Bike sheds;Hiking trails;Car parkin...,Very useful,Very useful,Very useful,Very useful,Very useful,Very useful,Neutral,,
4,5,Dublin,Construction,Yes,Desktop computer;,"Healthcare & Safety (emergency services, hospi...",Weekly,Navigation applications (i.e: Google Maps);Cit...,Weekly,Yes,...,Bike lanes;,Very useful,Neutral,Neutral,Neutral,Neutral,Very useful,Neutral,,
5,6,Dublin,Construction,Yes,Desktop computer;Smartphone;Tablet;,"Transport & mobility (bus stops, EV charging s...",Daily,City planning or Zoning software;Navigation ap...,Daily,Somewhat,...,Public bathrooms;Bike lanes;Bike sheds;,Very useful,Very useful,Very useful,Somewhat useful,Very useful,Very useful,Somewhat useful,keeping information up to date should be the m...,
12,13,Dublin,Finance,Yes,Laptop;,"Transport & mobility (bus stops, EV charging s...",Daily,Navigation applications (i.e: Google Maps);,Daily,Somewhat,...,Public bathrooms;,Very useful,Very useful,Very useful,Very useful,Very useful,Very useful,Somewhat useful,,
25,26,Dublin,Construction,Yes,Desktop computer,"Recreational (parks, sport facilities, hiking ...",Daily,Navigation applications (i.e: Google Maps);Cit...,Daily,Somewhat,...,Bike lanes;Hiking trails;Car parking;Parks,Very useful,Neutral,Very useful,Neutral,Somewhat useful,Very useful,Not useful,Mapping of amenities including public and priv...,
33,34,Dublin,Education,Yes,Laptop;Smartphone,"Healthcare & Safety (emergency services, hospi...",Daily,Navigation applications (i.e: Google Maps);Gov...,Daily,Yes,...,Hiking trails;Bike sheds;Bike lanes;Car parkin...,Very useful,Very useful,Very useful,Very useful,Very useful,Very useful,Very useful,No,
35,36,Dublin,Construction,Yes,Desktop computer;Laptop;Smartphone,"Technological (public wi-fi, etc);Mechanical (...",Daily,City planning or Zoning software;Government da...,Daily,Yes,...,Bike lanes;Bike sheds,Very useful,Very useful,Somewhat useful,Somewhat useful,Very useful,Somewhat useful,Neutral,,
36,37,Dublin,Student,Yes,Laptop;Smartphone,"Transport & mobility (bus stops, EV charging s...",Daily,Navigation applications (i.e: Google Maps),Daily,Yes,...,Bike sheds,Very useful,Very useful,Very useful,Very useful,Not useful,Not useful,Not useful,,
42,43,Kerry,Government,Yes,Laptop,mechanical,Daily,Government database (i.e: data.gov.ie);Navigat...,Daily,Somewhat,...,Bike sheds;Public bathrooms,Very useful,Very useful,Somewhat useful,Somewhat useful,Somewhat useful,Very useful,Neutral,,


### Second order of business: probably encoding some of the columns