# Where To Live in Hawaii?

## House prices, and local venue analysis of the Hawaiian Islands

## a. Introduction

A Hawaiian vacation, what first comes to mind are beaches, great weather and relaxed island vibes. But when the discussion of living in Hawaii pops up it turns to expensive, expensive and great weather. According to Zillow, the median house price for the state is \\$619,000. The national median is \\$231,000. Those better be some good beaches! Hawaii is the 4th smallest state according to land size. 

With such little land mass and incredibly high housing costs what kind of venues does Hawaii have to offer? Is there room for more diverse venues? What are the best areas to live in factoring in house prices and nearby venues?

These are some of the questions I will answer in this report. A map with the different neighborhoods and the median cost of a house will help get an idea of what areas are more expensive than others. Adding in the venue data will possibly give an answer as to why some areas of Hawaii are more expensive than others. Lastly running the data through a classification model will visually offer some serperation between the different neighborhoods and what the particular area has to offer in terms of venues.

This report will be useful for people thinking about moving to Hawaii, and myself being one of those I am extra invested in the results of this analysis. The analysis can also be of interest to investor looking to find a new up and coming neighborhood of Hawaii.

## b. Data

The data I have aquired to solve the problem:

 1. Zip codes and cities of Hawaii
 2. Median house price for the different cities 
 3. Longitude and latitude of the various Hawaiian cities
 4. Venue data 


In [1]:
from bs4 import BeautifulSoup
import requests
import pandas as pd
import numpy as np
from project_lib import Project

import types
from botocore.client import Config
import ibm_boto3

In [23]:
import json # library to handle JSON files

!conda install -c conda-forge geopy --yes # uncomment this line if you haven't completed the Foursquare API lab
from geopy.geocoders import Nominatim # convert an address into latitude and longitude values

import requests # library to handle requests
from pandas.io.json import json_normalize # tranform JSON file into a pandas dataframe

# Matplotlib and associated plotting modules
import matplotlib.cm as cm
import matplotlib.colors as colors

# import k-means from clustering stage
from sklearn.cluster import KMeans

!conda install -c conda-forge folium=0.5.0 --yes # uncomment this line if you haven't completed the Foursquare API lab
import folium # map rendering library

print('Libraries imported.')

Solving environment: done

## Package Plan ##

  environment location: /opt/conda/envs/Python36

  added / updated specs: 
    - geopy


The following packages will be downloaded:

    package                    |            build
    ---------------------------|-----------------
    geopy-1.20.0               |             py_0          57 KB  conda-forge
    openssl-1.1.1d             |       h516909a_0         2.1 MB  conda-forge
    geographiclib-1.50         |             py_0          34 KB  conda-forge
    certifi-2019.9.11          |           py36_0         147 KB  conda-forge
    ca-certificates-2019.9.11  |       hecc5488_0         144 KB  conda-forge
    ------------------------------------------------------------
                                           Total:         2.5 MB

The following NEW packages will be INSTALLED:

    geographiclib:   1.50-py_0         conda-forge
    geopy:           1.20.0-py_0       conda-forge

The following packages will be UPDATED:

    cer

### 1. Zip codes and cities

I have used this [website](https://www.zipcodestogo.com/Hawaii/) that listed all of the zip codes with the name of the city. Then extracted the data off of the website using BeautifulSoup

In [2]:
url = 'https://www.zipcodestogo.com/Hawaii/'


In [3]:
# The code was removed by Watson Studio for sharing.

In [4]:
response = requests.get(url, headers = headers)

In [5]:
# Check it was correctly connected to the website
response.status_code

200

In [6]:
soup = BeautifulSoup(response.content, 'html.parser')
zips = soup.find_all('table', class_ ='inner_table')

In [7]:
hi_zip = zips[0]

In [8]:
# This is loop through all columns and cells and add it to the dataframe
l = []
table_rows = hi_zip.find_all('tr')

for tr in table_rows:
    td = tr.find_all('td')
    row = [tr.text.strip() for tr in td if tr.text.strip()]
    if row:
        l.append(row)
    
df = pd.DataFrame(l, columns=["Zip Code", "City", "County", "Drop"])

In [9]:
df.head()

Unnamed: 0,Zip Code,City,County,Drop
0,Zip Codes for the State of\n\nHawaii,,,
1,Zip Code,City,County,Zip Code Map
2,96701,Aiea,Honolulu,View Map
3,96703,Anahola,Kauai,View Map
4,96704,Captain Cook,Hawaii,View Map


I have grabbed everything from the table, but will clean it up to just get the data I will need

In [10]:
df.drop(columns='Drop', inplace=True)

In [11]:
df.drop([0,1], inplace=True)

In [12]:
df.head()

Unnamed: 0,Zip Code,City,County
2,96701,Aiea,Honolulu
3,96703,Anahola,Kauai
4,96704,Captain Cook,Hawaii
5,96705,Eleele,Kauai
6,96706,Ewa Beach,Honolulu


### 2. Median House Prices

I couldn't find a comprehensive list of all the cities and the median house price for each. So after exporting the data I used [Zilllow](https://www.zillow.com/kilauea-hi-96703/home-values/) to create my own. I collected the median home value. A lot of the zip codes provided didn't have housing data because of two reasons. 1. The zip code was covering national park area and did not have housing data. 2. The zip code of close neighboring cities were recongized by one zip code.

In Excel I removed duplicate cities that were missing house price data.

In [14]:
# The code was removed by Watson Studio for sharing.

Unnamed: 0,Zip Code,city,County,House Price
0,96701,Aiea,Honolulu,704300.0
1,96703,Anahola,Kauai,590300.0
2,96861,Camp H M Smith,Honolulu,
3,96704,Captain Cook,Hawaii,363800.0
4,96705,Eleele,Kauai,495600.0


In [15]:
# The code was removed by Watson Studio for sharing.

Unnamed: 0,Zip Code,city,County,House Price
0,96701,Aiea,Honolulu,704300.0
1,96703,Anahola,Kauai,590300.0
2,96861,Camp H M Smith,Honolulu,
3,96704,Captain Cook,Hawaii,363800.0
4,96705,Eleele,Kauai,495600.0


### 3. Longitude and Latitude

I obtained this data [online](https://simplemaps.com/data/us-cities). It was downloaded as an Excel file. It contained the coordinates for all 50 states. I filtered out all but Hawaii and uploaded the data.  

In [16]:
# The code was removed by Watson Studio for sharing.

Unnamed: 0,city,county_name_all,lat,lng
0,Paauilo,Hawaii,20.0397,-155.3696
1,Discovery Harbour,Hawaii,19.0415,-155.6254
2,Haena,Kauai,22.2186,-159.561
3,Ualapue,Maui,21.0704,-156.8355
4,Waikane,Honolulu,21.4921,-157.8721


In [17]:
#Merge the two datasets 
hawaii = pd.merge(hawaii_house, lat_long, on = 'city', how='outer')

In [18]:
hawaii.head(10)

Unnamed: 0,Zip Code,city,County,House Price,county_name_all,lat,lng
0,96701.0,Aiea,Honolulu,704300.0,Honolulu,21.3865,-157.9232
1,96703.0,Anahola,Kauai,590300.0,Kauai,22.1455,-159.3151
2,96861.0,Camp H M Smith,Honolulu,,,,
3,96704.0,Captain Cook,Hawaii,363800.0,Hawaii,19.4995,-155.8937
4,96705.0,Eleele,Kauai,495600.0,Kauai,21.9088,-159.5801
5,96706.0,Ewa Beach,Honolulu,625100.0,Honolulu,21.3181,-158.0073
6,96858.0,Fort Shafter,Honolulu,,,,
7,96708.0,Haiku,Maui,852400.0,,,
8,96710.0,Hakalau,Hawaii,,,,
9,96712.0,Haleiwa,Honolulu,1206600.0,Honolulu,21.5871,-158.1074


In [19]:
hawaii.shape

(178, 7)

This dataframe needs to be cleaned up a bit. As house price and lat/long are the most important columns for the analysis. I will pull out only those with both pieces of data.

In [20]:
hawaii_na = hawaii[hawaii['House Price'].notnull() & hawaii['lat'].notnull()].reset_index(drop=True)

In [21]:
hawaii_na.shape

(48, 7)

In [67]:
hawaii_df = hawaii_na.drop(columns=['county_name_all', 'Zip Code'])

In [68]:

hawaii_df.head()

Unnamed: 0,city,County,House Price,lat,lng
0,Aiea,Honolulu,704300.0,21.3865,-157.9232
1,Anahola,Kauai,590300.0,22.1455,-159.3151
2,Captain Cook,Hawaii,363800.0,19.4995,-155.8937
3,Eleele,Kauai,495600.0,21.9088,-159.5801
4,Ewa Beach,Honolulu,625100.0,21.3181,-158.0073


In [39]:
# The code was removed by Watson Studio for sharing.

### 4. Venue Data

For the venue data I accessed Foursquare API. I chose a radius of 1.5km and a limit of 100 venues. I decided to increase the radius because when using only 500m there was not a city with more than 25 venues. When looking at the city plots on the map it is clear that a lot of the areas have a large distance between them. This makes sense why a 500m radius turned up very little. Even still with 1.5km there are multiple cities with as little as 2 venues. I'm interested to see if these cities are correlated to lower priced houses.

In [55]:
def getNearbyVenues(names, latitudes, longitudes, radius=1500):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Neighborhood', 
                  'Neighborhood Latitude', 
                  'Neighborhood Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

In [None]:
venue = getNearbyVenues(names= hawaii_df['city'],
                                   latitudes=hawaii_df['lat'],
                                   longitudes=hawaii_df['lng']
                                  )

In [57]:
venue.shape

(1194, 7)

In [58]:
venue.head(10)

Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,Aiea,21.3865,-157.9232,The Alley Restaurant,21.379108,-157.92955,Japanese Restaurant
1,Aiea,21.3865,-157.9232,Aiea Bowl,21.379102,-157.929651,Bowling Alley
2,Aiea,21.3865,-157.9232,Ice Garden,21.37889,-157.929873,Ice Cream Shop
3,Aiea,21.3865,-157.9232,Pearl Country Club,21.392433,-157.935925,Golf Course
4,Aiea,21.3865,-157.9232,Palazzo Italian Ristorante,21.379343,-157.931611,Italian Restaurant
5,Aiea,21.3865,-157.9232,El Charro Mexicano Restaurant,21.379244,-157.929342,Mexican Restaurant
6,Aiea,21.3865,-157.9232,Trina Beauty Supply Hawaii,21.379264,-157.931422,Cosmetics Shop
7,Aiea,21.3865,-157.9232,California Pizza Kitchen,21.391305,-157.933016,Pizza Place
8,Aiea,21.3865,-157.9232,Starbucks,21.37836,-157.930445,Coffee Shop
9,Aiea,21.3865,-157.9232,Hughley's Southern Cuisine,21.379395,-157.931627,Southern / Soul Food Restaurant


In [64]:
venue.groupby('Neighborhood').count()['Venue']

Neighborhood
Aiea            41
Anahola          6
Captain Cook     5
Eleele          50
Ewa Beach       23
Haleiwa         81
Hanalei         54
Hanapepe        29
Hauula          14
Hilo             4
Holualoa         2
Honokaa         11
Honolulu        28
Kaaawa           8
Kahului         41
Kailua          96
Kalaheo         21
Kaneohe         54
Kapaa            6
Kapaau          13
Kapolei         51
Kaunakakai       5
Kealakekua       1
Kekaha           7
Kihei           30
Kilauea         25
Koloa           27
Kula            10
Kurtistown       4
Lahaina          4
Laie            41
Lanai City      16
Lihue           82
Makawao         22
Naalehu          2
Pahoa           14
Paia            10
Pearl City      25
Princeville     66
Wahiawa         44
Waialua         15
Waianae          4
Wailuku         35
Waimanalo       17
Waimea           2
Waipahu         48
Name: Venue, dtype: int64

In [70]:
address = 'Hawaii, USA'

geolocator = Nominatim(user_agent="hi_explorer")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geographical coordinate of Hawaii are {}, {}.'.format(latitude, longitude))

The geographical coordinate of Hawaii are 21.2160437, -157.975203.


In [75]:
map_hawaii = folium.Map(location=[latitude, longitude], zoom_start=10)


for lat, lng, city, county in zip(hawaii_df['lat'], hawaii_df['lng'], hawaii_df['city'], hawaii_df['County']):
    label = '{}, {}'.format(city, county)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_hawaii)  
    
map_hawaii