# Chianson Siu
## Distilling WEB GIS Datasets Lab

This script saves and parses homeless shelter data from https://www.kingcounty.gov/depts/community-human-services/housing/services/homeless-housing/coordinated-entry/access-points.aspx. It then uses LocationIQ
api in order to geoencode these addresses and write them to a Folium map. 

Intellectual Significance:

This project was intellectually significant because it maps out homeless shelters in the King County area.
This information could be uses to help set homeless people fine safe places to go, as well determine
where a food bank or donation center would benefit the city most. 

Challenges:

This assignment was was challenging because the names and addresses were not always stored in the same
location within the DOM structure of the webpage. Additionally, LocationIQ could not geoencode all the 
addresses in the form they were scraped. I found myself hard coding many parts of this script in order
to access a specific css element, a specific sentence in a paragraph element, and a specific address
that would not return a proper geoencoded response unless written a certain way. 

Problems:

Saving the geodatabase shapefile often crashes the kernel, resulting in an incomplete shapefile saved
to a local file. Additionally, the Therapeutic Health Services located at 1901 Martin Luther King Jr. 
Way S, Seattle, WA 98144 could not be geoencoded and was excluded from the map. 

In [1]:
# import packages and set the working directory
import urllib2
import urllib
import requests
import lxml
from lxml import html
import geopandas
import folium
import os
import shapely
import shapely.geometry
import fiona
import fiona.crs
workspace = "geog458\\lab_3\\"

In [2]:
# Function for saving an html page to a local file
def saveHtml(link, fileName):
    u = urllib2.urlopen(link)
    filePath = os.path.abspath(fileName + ".html")
    localFile = open(filePath, "w")
    localFile.write(u.read())
    localFile.close()
    print(filePath)
    return filePath

In [3]:
# Save an html page containing addresses of homeless shelters / homes
homelessPath = saveHtml("https://www.kingcounty.gov/depts/community-human-services/housing/services/homeless-housing/coordinated-entry/access-points.aspx", "homless_shelter_data")

C:\Users\cjms2\geog458\lab_3\homless_shelter_data.html


In [4]:
# Reads the local html file
homelessText = urllib2.urlopen("file:///" + homelessPath).read()

In [5]:
# Set the root of the html tree for parsing
homelessRoot = html.document_fromstring(homelessText)

In [6]:
# Get the panels containing homeless addresses and names
homelessPanels = homelessRoot.find_class("panel-accordion-primary")
# veteran's text has a different text structure than the other panels.
veteranHome = homelessPanels[5] # verteran's panel stored in index 5
del homelessPanels[5] # remove veteran's panel 

In [7]:
homelessAddress = [] # empty list to hold homeless shelter addresses
homelessTitle = [] # empty list to hold homeless shelter names

# Scans the panel body for homeless addresses and names. 
# Appends the results to the appropriate list
for i in range(0,len(homelessPanels)):
    panelBody = homelessPanels[i].find_class("panel-body")
    panelLink = panelBody[0].cssselect("a") # addresses stored in anchor tags
    panelTitle = panelBody[0].cssselect("strong") # names stored in strong tags

    for link in panelLink:
        panelAddress = link.text_content().strip() # remove extra spaces
        homelessAddress.append(panelAddress.encode("utf-8")) # add address to address list
    
    for title in panelTitle:
        thisTitle = title.text_content().strip() # remove extra spaces
        homelessTitle.append(thisTitle.encode("utf-8")) # add names to names list

# cleanse the addresses for inconsistencies
del homelessAddress[len(homelessAddress)-4:len(homelessAddress)] # last 4 indices do not have addresses
del homelessAddress[9] # index 9 contains an email
homelessAddress[5] = homelessAddress[5] + homelessAddress[6] # complete address was split between index 5 and 6
del homelessAddress[6] # index 6 only contains half an address
del homelessAddress[15] # index 15 contains an blank string ""

# append the veteran's homeless clinic name and address
vetHome = veteranHome.cssselect("p")[5].text_content()
homelessAddress.append(vetHome[len(vetHome)-43:len(vetHome)]) # address contained in last 43 indexes
homelessTitle.append(vetHome[0:len(vetHome)-45]) # name contained in length of text minus 45 indexes

In [8]:
# list of indices within the names list that do not contain names
indices = [1, 2, 3, 5, 6, 7, 9, 10, 11, 13, 14, 15, 18, 19, 20, 21, 22, 23, 26, 28, 29, 30, 31, 32, 37, 42]

# cleanse names list by removing indices that do not contain names
for i in sorted(indices, reverse = True):
    del homelessTitle[i]
homelessTitle[4] = homelessTitle[4] + homelessTitle[5] # full name split between index 4 and 5
homelessTitle[6] = homelessTitle[6] + " " + homelessTitle[7] # full name split between index 6 and 7

# delete index 7 and 5 after concatenation
del homelessTitle[7]
del homelessTitle[5]


In [9]:
# function to making a get request to LocationIQ api to geoencode addresses
# takes one address as the desired search string parameter and returns
# the get request response
def getLocation(searchString):
    
    geocodingApiKey = "9af8ae63239de6" # my API key
    url = "https://us1.locationiq.org/v1/search.php" # base website
    geoformat = "json" # desired return format
    
    # contstructing the URL for the get request
    url = (url + "?key=" + urllib.quote(geocodingApiKey) + 
              "&q=" + urllib.quote(searchString[0:len(searchString)]) + 
              "&format=" + urllib.quote(geoformat))
    response = requests.get(url) # performing get request
    return (response) # return request response

In [10]:
import json
import time
homelessData = []
homelessAddressEmpty = []

# performs a get request for each address in the homeless address list.
# appends the [name, address, lat, lon, geometry point] to the homeless
# data list. If a request returns an empty response, stores the index
# in the homelessAddressEmpty list to be cleanses later
for i in range(0, len(homelessAddress)):
    searchString = homelessAddress[i] # address
    response = getLocation(searchString) # get request for each address
    time.sleep(1) # delay each request
    
    # If an empty response is returned, tries get request one more time
    # without the ending zip code
    if (response.status_code == 404): # empty json returned
        searchString = searchString[0: len(searchString)-6] # remove zip code
        response = getLocation(searchString)
    
    # appends the name, addres, lat, lon, geometry to homeless data list
    # if the response returned a valid output with longitude and latitude
    # data
    if (response.status_code != 500) & (response.status_code != 404):
        jsonAsDict = json.loads(response.text)[0]
        # encodes the lat and lon as coordinate point for the geometry column
        coordinateTuple = [float(jsonAsDict["lat"].encode("utf-8")), float(jsonAsDict["lon"].encode("utf-8"))]
        homelessData.append([homelessTitle[i], 
                             homelessAddress[i], 
                             float(jsonAsDict["lat"].encode("utf-8")), 
                             float(jsonAsDict["lon"].encode("utf-8")),
                             shapely.geometry.Point(coordinateTuple)])
    else: # appends index of failed request to be cleansed
        homelessAddressEmpty.append(i)

In [11]:
# cleanses the failed get request by replacing them with hard coded
# addresses that were tested and returned a proper output
homelessAddressFix = [] # list to hold fixed addresses
homelessAddressFix.append("11061 NE 2nd St. Bellevue, King County")
homelessAddressFix.append("11920 NE 80th St. Kirkland, King County")
homelessAddressFix.append("2709 3rd Ave. Seattle. King County")
homelessAddressFix.append("16225 NE 87th Street. Redmond, King County")
homelessAddressFix.append("419 S 2nd Street #2. Renton, King County")

# index 2 was not able to return a proper response. Without the proper
# geocoding, this index was removed. In this case, it would be the
# Therapeutic Health Services located at 
# 1901 Martin Luther King Jr. Way S, Seattle, WA 98144
del homelessAddressEmpty[2]


In [12]:
# Repeats get requests for the fixed addresses and appends
# the data appropriately to the end of the homeless data
# list
for i in range(0, len(homelessAddressFix)):
    response = getLocation(homelessAddressFix[i])
    jsonAsDict = json.loads(response.text)[0]
    index = homelessAddressEmpty[i] # get the correct index of the name and address for this response
    coordinateTuple = [float(jsonAsDict["lat"].encode("utf-8")), float(jsonAsDict["lon"].encode("utf-8"))]
    homelessData.append([homelessTitle[index], 
                         homelessAddress[index], 
                         float(jsonAsDict["lat"].encode("utf-8")), 
                         float(jsonAsDict["lon"].encode("utf-8")),
                         shapely.geometry.Point(coordinateTuple)])

In [13]:
# Add column name and convert data to a GeoDataFrame
geopandas_df = geopandas.GeoDataFrame(homelessData, columns=['name','address', "lat", "long", "geometry"])
geopandas_df

Unnamed: 0,name,address,lat,long,geometry
0,Catholic Community Services - Seattle,"100 23rd Ave. S., Seattle, WA 98144",47.601189,-122.301566,POINT (47.6011886 -122.30156565)
1,Multi-Service Center- Federal Way,"1200 S. 336th St., Federal Way, WA 98003",47.300769,-122.318042,POINT (47.3007686442953 -122.318042154362)
2,YWCA- Renton,"1010 S. 2nd St., Renton, WA 98057",47.481409,-122.203563,POINT (47.4814093 -122.203563458549)
3,Solid Ground - North Seattle,"9600 College Way N. Seattle, WA 98103",47.698708,-122.332552,POINT (47.69870805 -122.332551801326)
4,YMCA Young Adult Services Drop in Center,"2100 24th Ave S, Seattle, WA 98144",47.584188,-122.301253,POINT (47.5841884 -122.301253)
5,YouthCare’s James W. Ray Orion Center,"1828 Yale Avenue, Seattle, WA 98101",47.618233,-122.330389,POINT (47.6182332 -122.3303895)
6,Peace for the Streets by Kids from the Streets...,"1609 19th Avenue, Seattle, WA 98122",47.615584,-122.307734,POINT (47.6155844 -122.3077338)
7,Nexus Youth & Families,"915 H Street SE, Auburn, WA 98002",47.299578,-122.218838,POINT (47.2995776464646 -122.218837787879)
8,Teen Feed,"4740 B University Way NE, Seattle, WA 98105",47.664324,-122.312734,POINT (47.66432385 -122.31273440028)
9,University District Youth Center,"4516 15th Avenue NE, Seattle, WA 98105",47.661859,-122.31167,POINT (47.66185935 -122.311670474999)


In [14]:
# Get the location information for King County since that is where 
# the homeless shelters are located
response = getLocation("King County")
jsonAsDict = json.loads(response.text)[0]
kingCountyData = [float(jsonAsDict["lat"]), float(jsonAsDict["lon"])]

In [15]:
# create a folium map out of the homeless shelter data,
# using the King County coordinates for the center
map_center_lat = kingCountyData[0]
map_center_lon = kingCountyData[1]
map_zoom = float(10)
my_map = folium.Map(location=[map_center_lat,map_center_lon],
                    zoom_start=map_zoom,
                    tiles="Stamen Toner")
geopandas_df.crs = fiona.crs.from_epsg(4326) # set the reference system
points = folium.features.GeoJson(geopandas_df.to_json())
my_map.add_child(points)
my_map.save("homelessShelterMap.html") # save the map to a local file

In [16]:
# save the GeoDataFrame to a shapefile
# geopandas_df.to_file("homelessData.shp", driver = "ESRI Shapefile")

In [19]:
geopandas_df.to_csv("homeless_data.csv", index = False, encoding='utf-8')

In [None]:
geopandas_df.to_file("homelessData.geojson", driver = "GeoJSON")