# 03_Amenities

### Here we wrangle some data







<div class="alert alert-block alert-info"><b>We retrieve amenities from Open Street Map    
</b></div>

> Our amenities are `supermarket`, `convenience`, `kiosk`, `greengrocer` and `marketplace`, `kindergarten`, `restaurant`, `cafe`, `cinema`, `theatre`, `atm`, `bank`, `pharmacy`, `veterinary`, `internet_cafe`.

<div class="alert alert-block alert-success">

<b>From the City of Cape Town Open Data API  ~ (https://odp-cctegis.opendata.arcgis.com/) we harvest: 
</b></div>

> `park`, `library`, `health_care`, `community_center`

<div class="alert alert-block alert-warning"><b>From Department Basic Education ~ (https://www.education.gov.za/Home.aspx):</b>
</div>

> `primary_school`, `secondary_school`

<div class="alert alert-block alert-danger"><b>Warning:</b> 
</div>

> a) We focus on a specific area: `(18.349529,-34.050469,18.649593,-33.848834) #(minx, miny, maxx, maxy)`   
> b) Informal trading serves to enormous portion of South Africa. Typically `shop=convenience` or `shop=kiosk` or `shop=greengrocer` should include these. I don't know how to access data on informal traders in South Africa.

In [1]:
#load the magic

%matplotlib inline
import os
from pathlib import Path

import pandana as pdna
from pandana.loaders import osm

import time
import numpy as np
import pandas as pd
from shapely.geometry import Point
from shapely.geometry import Polygon
from shapely.geometry import box
import geopandas as gpd

import matplotlib.pyplot as plt
import matplotlib

In [2]:
#set path
path = Path('./')

In [3]:
# what amenities?
amenities = ['restaurant', 'cafe', 'cinema', 'theatre', 'university', 'atm', 'bank', 
             'pharmacy', 'veterinary', 'kindergarten', 'marketplace']

#NOTE: shopping is not an amenity. If you include these with amenities the query will return nothing.
shops = ['supermarket', 'convenience', 'kiosk', 'greengrocer']

# (minx, miny, maxx, maxy) bounding box ~ focus area
bbox = [18.349529,-34.050469,18.649593,-33.848834] # for hbay~bville~miln~m'degama

### The amenities

In [4]:
start_time = time.time()
# query the OSM API for the specified amenities within the bounding box 
osm_tags = '"amenity"~"{}"'.format('|'.join(amenities))
pois = osm.node_query(bbox[1], bbox[0], bbox[3], bbox[2], tags=osm_tags)
    
#### save to CSV if you choose
#pois = pois[pois['amenity'].isin(amenities)]
#pois.to_csv(amenities_filename, index=False, encoding='utf-8')
method = 'downloaded from OSM'
    
print('{:,} Amenities {} in {:,.2f} seconds'.format(len(pois), method, time.time()-start_time))
pois[['amenity', 'name', 'lat', 'lon']].head()

809 Amenities downloaded from OSM in 2.61 seconds


Unnamed: 0_level_0,amenity,name,lat,lon
id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
26191572,restaurant,Rhodes Memorial Tearoom,-33.952124,18.458541
26310715,restaurant,Cincinnati Spur,-33.933043,18.510368
26313138,restaurant,Magica Roma,-33.940229,18.497875
30518038,atm,,-33.939903,18.498271
30518146,atm,,-33.940939,18.500053


### The shops

In [5]:
start_time = time.time()
# query the OSM API for the specified shops within the bounding box 
osm_tags = '"shop"~"{}"'.format('|'.join(shops))
shop = osm.node_query(bbox[1], bbox[0], bbox[3], bbox[2], tags=osm_tags)
    
#### save to CSV if you choose
#shop = shop[shop['shop'].isin(shops)]
#shop.to_csv(shops_filename, index=False, encoding='utf-8')
method = 'downloaded from OSM'
    
print('{:,} Shops {} in {:,.2f} seconds'.format(len(shop), method, time.time()-start_time))
shop[['shop', 'name', 'lat', 'lon']].head()

229 Shops downloaded from OSM in 1.22 seconds


Unnamed: 0_level_0,shop,name,lat,lon
id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
26310714,supermarket,Woolworths,-33.933101,18.510459
26313136,supermarket,Pick 'n Pay,-33.934214,18.511008
26313137,convenience,Spar,-33.939831,18.498314
73521740,supermarket,Shoprite,-33.950047,18.471612
73525282,convenience,Caledonian Spar,-33.948299,18.479413


<div class="alert alert-block alert-info"><b>
     
</b>**We need to `.rename` some columns and add (`concatenate`) the `shops` to the `amenities`**.</div>

In [6]:
shop.rename(columns={'amenity': 'amenity2'}, inplace=True)
shop.rename(columns={'shop': 'amenity'}, inplace=True)

In [7]:
df_list = [pois[['amenity','name','lat','lon']], shop[['amenity','name','lat','lon']]]
data = pd.concat(df_list, ignore_index=True)

In [8]:
len(data)

1038

In [9]:
data['amenity'].value_counts()

restaurant       408
supermarket      132
atm              130
cafe             126
convenience       88
bank              65
pharmacy          46
kindergarten      11
veterinary         8
theatre            7
greengrocer        5
kiosk              4
cinema             4
marketplace        3
internet_cafe      1
Name: amenity, dtype: int64

<div class="alert alert-block alert-success">

<b>    
</b>**We go directly to the source for `park`, `library`, `health_care` and `community_center`.**
</div>

In [10]:
# call the park API ~ https://odp.capetown.gov.za/datasets/parks/geoservice
prks_object = 'https://citymaps.capetown.gov.za/agsext1/rest/services/Theme_Based/Open_Data_Service/MapServer/29/query?where=1%3D1&outFields=PARK_NAME&outSR=4326&f=json'
#read as gpd ~ filter by bbox
prks_shp = gpd.read_file(prks_object, bbox=bbox)

In [11]:
# these are polygons. We need a Point
prks_shp['lon'] = prks_shp.centroid.x
prks_shp['lat'] = prks_shp.centroid.y


  

  This is separate from the ipykernel package so we can avoid doing imports until


<div class="alert alert-block alert-info"><b>
     
</b>**To keep our work consistent we `.rename` some columns and add (`concatenate`) the `shops` to the `amenities`**.</div>

In [12]:
prks_shp.rename(columns={'PARK_NAME': 'name'}, inplace=True)
prks_shp['amenity'] = 'park'

prks_shp[['amenity', 'name', 'lat', 'lon']].head()

Unnamed: 0,amenity,name,lat,lon
0,park,Disa Place Pos,-34.043269,18.456462
1,park,Highfield Road Community Garden,-34.020969,18.589949
2,park,Mitchell Pos,-34.041542,18.600167
3,park,Igiyogiyo Road Passageway,-33.992392,18.597748
4,park,Janet Avenue Passageway,-34.0228,18.456378


In [13]:
df_list = [data[['amenity','name','lat','lon']], prks_shp[['amenity','name','lat','lon']]]
data = pd.concat(df_list, ignore_index=True)

In [14]:
len(data)

1635

<div class="alert alert-block alert-info"><b>
     
</b>**Now the `libraries`**
</div>

In [15]:
# call the library API ~ https://odp.capetown.gov.za/datasets/libraries/geoservice
lib_object = 'https://citymaps.capetown.gov.za/agsext1/rest/services/Theme_Based/Open_Data_Service/MapServer/22/query?where=1%3D1&outFields=NAME&outSR=4326&f=json'
#read as gpd ~ filter by bbox
lib = gpd.read_file(lib_object, bbox=bbox)

In [16]:
#seperate columns for x and y
lib['lon'] = lib.geometry.x
lib['lat'] = lib.geometry.y

lib.rename(columns={'NAME': 'name'}, inplace=True)
lib['amenity'] = 'library'

lib[['amenity', 'name', 'lat', 'lon']].head()

Unnamed: 0,amenity,name,lat,lon
0,library,IMIZAMO YETHU (SATELLITE),-34.029826,18.360171
1,library,HOUT BAY,-34.043999,18.358361
2,library,CAMPS BAY,-33.953785,18.377659
3,library,CROSSROADS,-33.998297,18.596949
4,library,BROWN'S FARM,-34.005611,18.585389


In [17]:
df_list = [data[['amenity','name','lat','lon']], lib[['amenity','name','lat','lon']]]
data = pd.concat(df_list, ignore_index=True)

In [18]:
len(data)

1716

<div class="alert alert-block alert-info"><b>
     
</b>**Now `community_centers`**
</div>

In [19]:
# call the community_center API ~ https://odp.capetown.gov.za/datasets/community-centres/geoservice
com_cen_object = 'https://citymaps.capetown.gov.za/agsext1/rest/services/Theme_Based/Open_Data_Service/MapServer/21/query?where=1%3D1&outFields=NAME&outSR=4326&f=json'
#read as gpd ~ filter by bbox
com_cen = gpd.read_file(com_cen_object, bbox=bbox)

In [20]:
#seperate columns for x and y
com_cen['lon'] = com_cen.geometry.x
com_cen['lat'] = com_cen.geometry.y

com_cen.rename(columns={'NAME': 'name'}, inplace=True)
com_cen['amenity'] = 'community_center'

df_list = [data[['amenity','name','lat','lon']], com_cen[['amenity','name','lat','lon']]]
data = pd.concat(df_list, ignore_index=True)

In [21]:
len(data)

1824

<div class="alert alert-block alert-info"><b>
     
</b>**Then `health_care_centers`**
</div>

In [22]:
# call the health_care center API ~ https://odp.capetown.gov.za/datasets/health-care-facilities
health_object = 'https://citymaps.capetown.gov.za/agsext1/rest/services/Theme_Based/Open_Data_Service/MapServer/44/query?where=1%3D1&outFields=NAME,TYPE&outSR=4326&f=json'
#read as gpd ~ filter by bbox
health = gpd.read_file(health_object, bbox=bbox)

In [23]:
#seperate columns for x and y
health['lon'] = health.geometry.x
health['lat'] = health.geometry.y

health.rename(columns={'NAME': 'name'}, inplace=True)
health['amenity'] = 'health_care'

df_list = [data[['amenity','name','lat','lon']], health[['amenity','name','lat','lon']]]
data = pd.concat(df_list, ignore_index=True)

In [24]:
len(data)

1916

<div class="alert alert-block alert-warning"><b>
    
</b>**The [Department of Basic Education](https://www.education.gov.za/Programmes/EMIS/EMISDownloads.aspx) provides a `MasterList`.**
</div>

In [25]:
sch_object = path/'data/secondary_2018_Masterlist_CoCT.csv'
sch = pd.read_csv(sch_object, sep=',', header=0)

In [26]:
# take only the columns you want
sch_sub = sch[['Official_Institution_Name', 'GNSS_Longitude', 'GNSS_Latitude']].copy()
sch_sub['GNSS_Latitude'] = pd.to_numeric(sch_sub['GNSS_Latitude']) 
sch_sub['GNSS_Longitude'] = pd.to_numeric(sch_sub['GNSS_Longitude'])

<div class="alert alert-block alert-info"><b>
     
</b>**Trim to the area of interest (`bbox`), `.rename` and `concatenate`.**
</div>

In [27]:
bbox_poly = gpd.GeoDataFrame({"id":1,"geometry":[box(*bbox)]})
bbox_poly = bbox_poly.set_crs("EPSG:4326")

geometry = [Point(xy) for xy in zip(sch_sub.GNSS_Longitude, sch_sub.GNSS_Latitude)]
points = gpd.GeoDataFrame(sch_sub, crs="EPSG:4326", geometry=geometry)

#take only the schools within the bbox
merge = gpd.sjoin(bbox_poly, points, how="left", op='contains')

merge.rename(columns={'Official_Institution_Name': 'name', 
                      'GNSS_Longitude': 'lon', 
                      'GNSS_Latitude': 'lat' }, inplace=True)
merge['amenity'] = 'school'

merge[['amenity', 'name', 'lat', 'lon']].head()

Unnamed: 0,amenity,name,lat,lon
0,school,ZOLA SENIOR SECONDARY SCHOOL,-34.015092,18.634394
0,school,MANDALAY PRIMARY SCHOOL,-34.014732,18.629511
0,school,VUZAMANZI PUBLIC PRIMARY SCHOOL,-34.013736,18.648988
0,school,NOLUNGILE PRIMARY SCHOOL,-34.013678,18.642853
0,school,INTLANGANISO SECONDARY SCHOOL,-34.013115,18.641669


In [28]:
df_list = [data[['amenity','name','lat','lon']], merge[['amenity','name','lat','lon']]]
data = pd.concat(df_list, ignore_index=True)

In [29]:
len(data)

2481

In [31]:
data['amenity'].value_counts()

park                597
school              565
restaurant          408
supermarket         132
atm                 130
cafe                126
community_center    108
health_care          92
convenience          88
library              81
bank                 65
pharmacy             46
kindergarten         11
veterinary            8
theatre               7
greengrocer           5
kiosk                 4
cinema                4
marketplace           3
internet_cafe         1
Name: amenity, dtype: int64

In [32]:
#save it
data.to_csv(path/'data/amenities_test.csv', index=False, encoding='utf-8')