# Where To Live in Hawaii?

## House prices, and local venue analysis of the Hawaiian Islands

## a. Introduction

A Hawaiian vacation, what first comes to mind are beaches, great weather and relaxed island vibes. But when the discussion of living in Hawaii pops up it turns to expensive, expensive and great weather. According to Zillow, the median house price for the state is \\$619,000. The national median is \\$231,000. Those better be some good beaches! Hawaii is the 4th smallest state according to land size. 

With such little land mass and incredibly high housing costs what kind of venues does Hawaii have to offer? Is there room for more diverse venues? What are the best areas to live in factoring in house prices and nearby venues?

These are some of the questions I will answer in this report. A map with the different neighborhoods and the median cost of a house will help get an idea of what areas are more expensive than others. Adding in the venue data will possibly give an answer as to why some areas of Hawaii are more expensive than others. Lastly running the data through a classification model will visually offer some serperation between the different neighborhoods and what the particular area has to offer in terms of venues.

This report will be useful for people thinking about moving to Hawaii, and myself being one of those I am extra invested in the results of this analysis. The analysis can also be of interest to investor looking to find a new up and coming neighborhood of Hawaii.

## b. Data

The data I have aquired to solve the problem:

 1. Zip codes and cities of Hawaii
 2. Median house price for the different cities 
 3. Longitude and latitude of the various Hawaiian cities
 4. Venue data 


In [51]:
from bs4 import BeautifulSoup
import requests
import pandas as pd
import numpy as np
from project_lib import Project

import types
from botocore.client import Config
import ibm_boto3

### 1. Zip codes and cities

I have used this [website](https://www.zipcodestogo.com/Hawaii/) that listed all of the zip codes with the name of the city. Then extracted the data off of the website using BeautifulSoup

In [41]:
url = 'https://www.zipcodestogo.com/Hawaii/'


In [42]:
# The code was removed by Watson Studio for sharing.

In [43]:
response = requests.get(url, headers = headers)

In [44]:
response.status_code

200

In [45]:
soup = BeautifulSoup(response.content, 'html.parser')
zips = soup.find_all('table', class_ ='inner_table')

In [46]:
hi_zip = zips[0]

In [47]:
l = []
table_rows = hi_zip.find_all('tr')

for tr in table_rows:
    td = tr.find_all('td')
    row = [tr.text.strip() for tr in td if tr.text.strip()]
    if row:
        l.append(row)
    
df = pd.DataFrame(l, columns=["Zip Code", "City", "County", "Drop"])

In [48]:
df.drop(columns='Drop', inplace=True)

In [49]:
df.drop([0,1], inplace=True)

In [50]:
df.head()

Unnamed: 0,Zip Code,City,County
2,96701,Aiea,Honolulu
3,96703,Anahola,Kauai
4,96704,Captain Cook,Hawaii
5,96705,Eleele,Kauai
6,96706,Ewa Beach,Honolulu


### 2. Median House Prices

I couldn't find a comprehensive list of all the cities and the median house price for each. So after exporting the data I used [Zilllow](https://www.zillow.com/kilauea-hi-96703/home-values/) to create my own. I collected the median home value. A lot of the zip codes provided didn't have housing data because of two reasons. 1. The zip code was covering national park area and did not have housing data. 2. The zip code of close neighboring cities were recongized by one zip code.

In Excel I removed duplicate cities that were missing house price data.

In [56]:
# The code was removed by Watson Studio for sharing.

Unnamed: 0,Zip Code,city,County,House Price
0,96701,Aiea,Honolulu,704300.0
1,96703,Anahola,Kauai,590300.0
2,96861,Camp H M Smith,Honolulu,
3,96704,Captain Cook,Hawaii,363800.0
4,96705,Eleele,Kauai,495600.0


### 3. Longitude and Latitude

I obtained this data [online](https://simplemaps.com/data/us-cities). It was downloaded as an Excel file. It contained the coordinates for all 50 states. I filtered out all but Hawaii and uploaded the data.  

In [57]:
# The code was removed by Watson Studio for sharing.

Unnamed: 0,city,county_name_all,lat,lng
0,Paauilo,Hawaii,20.0397,-155.3696
1,Discovery Harbour,Hawaii,19.0415,-155.6254
2,Haena,Kauai,22.2186,-159.561
3,Ualapue,Maui,21.0704,-156.8355
4,Waikane,Honolulu,21.4921,-157.8721


In [89]:
#Merge the two datasets 
hawaii = pd.merge(hawaii_house, lat_long, on = 'city', how='outer')

In [82]:
hawaii.head(10)

Unnamed: 0,Zip Code,city,County,House Price,county_name_all,lat,lng
0,96701.0,Aiea,Honolulu,704300.0,Honolulu,21.3865,-157.9232
1,96703.0,Anahola,Kauai,590300.0,Kauai,22.1455,-159.3151
2,96861.0,Camp H M Smith,Honolulu,,,,
3,96704.0,Captain Cook,Hawaii,363800.0,Hawaii,19.4995,-155.8937
4,96705.0,Eleele,Kauai,495600.0,Kauai,21.9088,-159.5801
5,96706.0,Ewa Beach,Honolulu,625100.0,Honolulu,21.3181,-158.0073
6,96858.0,Fort Shafter,Honolulu,,,,
7,96708.0,Haiku,Maui,852400.0,,,
8,96710.0,Hakalau,Hawaii,,,,
9,96712.0,Haleiwa,Honolulu,1206600.0,Honolulu,21.5871,-158.1074


In [83]:
hawaii.shape

(178, 7)

This dataframe needs to be cleaned up a bit. As house price and lat/long are the most important columns for the analysis. I will pull out only those with both pieces of data.

In [84]:
hawaii_na = hawaii[hawaii['House Price'].notnull() & hawaii['lat'].notnull()].reset_index(drop=True)

In [85]:
hawaii_na.shape

(48, 7)

In [88]:
hawaii_na.drop(columns=['county_name_all', 'Zip Code'])

Unnamed: 0,city,County,House Price,lat,lng
0,Aiea,Honolulu,704300.0,21.3865,-157.9232
1,Anahola,Kauai,590300.0,22.1455,-159.3151
2,Captain Cook,Hawaii,363800.0,19.4995,-155.8937
3,Eleele,Kauai,495600.0,21.9088,-159.5801
4,Ewa Beach,Honolulu,625100.0,21.3181,-158.0073
5,Haleiwa,Honolulu,1206600.0,21.5871,-158.1074
6,Hanalei,Kauai,1064000.0,22.2041,-159.4977
7,Hanapepe,Kauai,492000.0,21.914,-159.5874
8,Hauula,Honolulu,640100.0,21.6111,-157.9118
9,Hilo,Hawaii,344800.0,19.6886,-155.0864
