# <h1><b>Battle of Neighborhoods - A Coursera Capstone project (Week 2) </b></h1>

<!-- ## Table of contents
* [Introduction](#introduction)
* [Data](#data)
* [Data_Sources](#data sources)
* [Methodology](#methodology)
* [Analysis](#analysis)
* [Results and Discussion](#results)
* [Conclusion](#conclusion) -->

## **Introduction** <a name="introduction"></a>

Arizona and Hawaii- both the states in the United States of America have "state of the art" telescopes and observatories. As a grad student who wants to pursue a PhD in Astronomy, these giant visible-light range telescopes are subjects of fascination. For this capstone project, I'll explore the neighborhoods of Hawaii as well as Arizona and cluster the neighborhoods to show similarities and dissimilarities between these two places. 

Hopefully, **students, faculty members and job-seeking candidates who want to work at the observatories of either of the mentioned places will be benefitted from this project as they will be able to make decisions about whether to stay near this place or travel from a further location, based on their preferences.**

## <b>Data</b><a name="data"></a>

### Description of the data that will be used in this project-

 - **List of neighborhoods in Hawaii with their latitudes and longitudes**. The types of venues have also been extracted so that segmentation and clustering is easier. 
 - **List of neighborhoods in Arizona with their latitudes and longitudes** and similarly the types of venues have also been extracted.
 - **Location data from the Foursquare API to segment and cluster the neighborhoods.**

## **Data Sources** <a name="data sources"></a>

In this project, **Keck Observatory** in **Hawaii** and **Steward Observatory** at the University of Arizona in **Arizona** will be used as the centers, around which the neighborhoods will be explored.

- **Geodata for Hawaii,extracted from a GeoJson file from NYU Spatial Data Repository by loading the .json file.**
- **Geodata for Arizona, extracted from a GeoJson file from NYU Spatial Data Repository, extracted from a GeoJson file from NYU Spatial Data Repository by loading the .json file.**

While downloading the Geojson file, it is important to check if the file contains "point"/"multipoint"
 features. Otherwise, in case of "polygon" features, extracting the types and names of neighborhoods and types of venues becomes much complicated.
 - **Geocode information from Geopy.** 
 - **Location data from the Foursquare API to segment and cluster the neighborhoods. For this segment, the CLIENT_ID and the CLIENT_SECRET is required. It is also required to specify the RADIUS  upto which distance around the specified location the "exploring neighborhood" process will take place.**

Now we'll be gathering the required data to  explore the different neighborhoods around the mentioned centers.

## **Methodology**

First, We'll need to import different libraries for the analysis. 

#### **Importing required libraries-**

In [87]:
import  numpy as np

import pandas as pd # library for data analsysis
pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', None)

import json # library to handle JSON files

from geopy.geocoders import Nominatim # convert an address into latitude and longitude values

import requests # library to handle requests
from pandas.io.json import json_normalize # tranform JSON file into a pandas dataframe

# Matplotlib and associated plotting modules
import matplotlib.cm as cm
import matplotlib.colors as colors

# import k-means from clustering stage
from sklearn.cluster import KMeans

import folium # map rendering library

As we need to explore the neighborhoods around Hawaii's Mauna Kea Observatory and Arizona's Steward Observatory, the Geodata of these places via a Geojson file is required. These Geojson files will be imported as **Hawaii.json** for Hawaii and **Arizona.json** for Arizona.

First, we will be exploring the neighborhoods around Hawaii's Mauna Kea observatory.

To do that, we need to load the **Hawaii.json** file and import it as **hawaii_data**.

In [88]:
with open('Hawaii.json') as json_data:
    hawaii_data = json.load(json_data)

Now, we'll take a look at **hawaii_data**.

In [5]:
hawaii_data

{'type': 'FeatureCollection',
 'totalFeatures': 936,
 'features': [{'type': 'Feature',
   'id': 'TG00HILPT.1',
   'geometry': {'type': 'MultiPoint',
    'coordinates': [[-156.036222, 19.782857]]},
   'geometry_name': 'the_geom',
   'properties': {'GIST_ID': 1, 'CFCC': 'D82', 'NAME': ''}},
  {'type': 'Feature',
   'id': 'TG00HILPT.2',
   'geometry': {'type': 'MultiPoint',
    'coordinates': [[-156.003466, 19.64056]]},
   'geometry_name': 'the_geom',
   'properties': {'GIST_ID': 2,
    'CFCC': 'D71',
    'NAME': 'Kukailimoku Point Lighthouse'}},
  {'type': 'Feature',
   'id': 'TG00HILPT.3',
   'geometry': {'type': 'MultiPoint',
    'coordinates': [[-156.011014, 19.64701401]]},
   'geometry_name': 'the_geom',
   'properties': {'GIST_ID': 3, 'CFCC': 'D51', 'NAME': 'Old Kona Airport'}},
  {'type': 'Feature',
   'id': 'TG00HILPT.4',
   'geometry': {'type': 'MultiPoint',
    'coordinates': [[-156.043785, 19.73810299]]},
   'geometry_name': 'the_geom',
   'properties': {'GIST_ID': 4,
    'CFCC

As we can see after loading the data that all the relevant data is in the *features* key, which is basically a list of the neighborhoods. So, we define a new variable that includes this data.

In [6]:
neighborhoods_data = hawaii_data['features']

As we have the data of neighborhoods stored as *neighborhoods_data*, we will have a look at the data.

In [8]:
neighborhoods_data

[{'type': 'Feature',
  'id': 'TG00HILPT.1',
  'geometry': {'type': 'MultiPoint',
   'coordinates': [[-156.036222, 19.782857]]},
  'geometry_name': 'the_geom',
  'properties': {'GIST_ID': 1, 'CFCC': 'D82', 'NAME': ''}},
 {'type': 'Feature',
  'id': 'TG00HILPT.2',
  'geometry': {'type': 'MultiPoint', 'coordinates': [[-156.003466, 19.64056]]},
  'geometry_name': 'the_geom',
  'properties': {'GIST_ID': 2,
   'CFCC': 'D71',
   'NAME': 'Kukailimoku Point Lighthouse'}},
 {'type': 'Feature',
  'id': 'TG00HILPT.3',
  'geometry': {'type': 'MultiPoint',
   'coordinates': [[-156.011014, 19.64701401]]},
  'geometry_name': 'the_geom',
  'properties': {'GIST_ID': 3, 'CFCC': 'D51', 'NAME': 'Old Kona Airport'}},
 {'type': 'Feature',
  'id': 'TG00HILPT.4',
  'geometry': {'type': 'MultiPoint',
   'coordinates': [[-156.043785, 19.73810299]]},
  'geometry_name': 'the_geom',
  'properties': {'GIST_ID': 4,
   'CFCC': 'D85',
   'NAME': 'Ellison S Onizuka Space Center'}},
 {'type': 'Feature',
  'id': 'TG00HILP

To inspect tha data, first we need to know how the data looks like and what its features are.

In [89]:
neighborhoods_data[1]

{'type': 'Feature',
 'id': 'TG00HILPT.2',
 'geometry': {'type': 'MultiPoint', 'coordinates': [[-156.003466, 19.64056]]},
 'geometry_name': 'the_geom',
 'properties': {'GIST_ID': 2,
  'CFCC': 'D71',
  'NAME': 'Kukailimoku Point Lighthouse'}}

So, we see that the data contains the name od the neghborhood and its coordinates(Latitude and Longitude). 

We need to convert the features data into a dataframe so that we can do exploratory analysis on it.

We have all the essential features such as **"Neighborhood", "Latitude" and "Longitude"**. So we create an empty dataframe and then include all the features as column names and then include the data from **neighborhoods_data**.

In [90]:
column_names = ['Neighborhood', 'Latitude', 'Longitude'] 

neighborhoods = pd.DataFrame(columns=column_names) # An empty dataframe created.
neighborhoods

Unnamed: 0,Neighborhood,Latitude,Longitude


In [91]:
for data in neighborhoods_data:
    neighborhood_name = data['properties']['NAME']
        
    neighborhood_latlon = data['geometry']['coordinates']
    neighborhood_lat = neighborhood_latlon[0][1]
    neighborhood_lon = neighborhood_latlon[0][0]
    
    neighborhoods = neighborhoods.append({'Neighborhood': neighborhood_name,
                                          'Latitude': neighborhood_lat,
                                          'Longitude': neighborhood_lon}, ignore_index=True)

Now we have filled the dataframe with the required features. 

A look at the dataframe-

In [92]:
neighborhoods.head(20)

Unnamed: 0,Neighborhood,Latitude,Longitude
0,,19.782857,-156.036222
1,Kukailimoku Point Lighthouse,19.64056,-156.003466
2,Old Kona Airport,19.647014,-156.011014
3,Ellison S Onizuka Space Center,19.738103,-156.043785
4,Keahole Point Lighthouse,19.730984,-156.063283
5,Honokohau Small Boat Harbor,19.673002,-156.025103
6,Kailua Airport,19.647014,-156.011014
7,,19.429102,-154.88329
8,,19.406082,-154.91839
9,,19.497396,-154.945666


The dataframe looks correct, but we can see a lot of rows are empty and have just coordinates. We'll have to clean up the dataframe.

Let's see the shape of the dataframe.

In [93]:
s = neighborhoods.shape
s

(936, 3)

In [None]:
for j in range(0,s[0]):
    if neighborhoods.iloc[j,0]=="":
            neighborhoods.drop(labels = j,axis = 0,inplace = True)
            neighborhoods.reset_index(drop = True,inplace = True)

In [109]:
neighborhoods

Unnamed: 0,Neighborhood,Latitude,Longitude
0,Kukailimoku Point Lighthouse,19.64056,-156.003466
1,Old Kona Airport,19.647014,-156.011014
2,Ellison S Onizuka Space Center,19.738103,-156.043785
3,Keahole Point Lighthouse,19.730984,-156.063283
4,Honokohau Small Boat Harbor,19.673002,-156.025103
5,Kailua Airport,19.647014,-156.011014
6,Water Tower,19.676486,-156.006658
7,Kealakekua Bay Park,19.478702,-155.921985
8,Kalahiki Cemetery,19.378003,-155.878344
9,Hookena School,19.390148,-155.881706


In [112]:
s1 = neighborhoods.shape

In [116]:
print('The dataframe has {} neighborhoods.'.format(s1[0]
    )
)

The dataframe has 515 neighborhoods.


As we can see after cleaning the dataframe, there are 515 neighborhoods.

Now we create a map of Hawaii with the help of **Folium.** The *Nominatim* will use "hawaii_explorer" as *user_agent*.

In [117]:
address = 'Hawaii, USA'

geolocator = Nominatim(user_agent="hawaii_explorer")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinate of Hawaii are {}, {}.'.format(latitude, longitude))

The geograpical coordinate of Hawaii are 21.2160437, -157.975203.


now we superimpose the neighborhoods on the top of the map of the Hawaii we created.

In [120]:
map_hawaii= folium.Map(location=[latitude, longitude], zoom_start=12)

# add markers to map
for lat, lng, label in zip(neighborhoods['Latitude'], neighborhoods['Longitude'], neighborhoods['Neighborhood']):
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='green',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_hawaii)  
    
map_hawaii

We'll be exploring around the Keck Observatory, so let's get its coordinates from thte **neighborhoods** dataframe.

In [126]:
neighborhoods[neighborhoods["Neighborhood"]=="Keck Observatory"]

Unnamed: 0,Neighborhood,Latitude,Longitude
239,Keck Observatory,19.829536,-155.474938


Getting the coordinates of the **Keck observatory**-

In [129]:
Latitude = neighborhoods[neighborhoods["Neighborhood"]=="Keck Observatory"].iloc[0,1]
Longitude = neighborhoods[neighborhoods["Neighborhood"]=="Keck Observatory"].iloc[0,2]

In [131]:
neighborhood_name = "Keck Observatory" # neighborhood name

print('Latitude and longitude values of {} are {}, {}.'.format(neighborhood_name, 
                                                               Latitude, 
                                                               Longitude))

Latitude and longitude values of Keck Observatory are 19.829536, -155.474938.


We will explore around this mentioned neighborhood using thte Foursquare API.

For doing this, we will be needing **CLIENT_ID** and **CLIENT_SECRET** from the Foursquare developer console.

In [132]:
CLIENT_ID = 'P2LTGBSPUIWKRMZF55OPZEW0H33JE45DQ225YFEVQWPRKVNV' # Foursquare ID
CLIENT_SECRET = 'T3NST1UUDSME3VDVIXGINPWOXHWY0VELWYXGGUFCAUKX3DY5' #  Foursquare Secret
VERSION = '20180605' # Foursquare API version

print('Credentails:')
print('CLIENT_ID: ' + CLIENT_ID)
print('CLIENT_SECRET:' + CLIENT_SECRET)

Credentails:
CLIENT_ID: P2LTGBSPUIWKRMZF55OPZEW0H33JE45DQ225YFEVQWPRKVNV
CLIENT_SECRET:T3NST1UUDSME3VDVIXGINPWOXHWY0VELWYXGGUFCAUKX3DY5


For the next step, **venues/explore** on the Foursquare API will be used to explore around the neighborhood.

In [135]:
LIMIT = 50
radius = 1000 #The radius around the center for which distance the neighborhood will be explored.
url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
    CLIENT_ID, 
    CLIENT_SECRET, 
    VERSION, 
    Latitude,
    Longitude,
    radius, 
    LIMIT)
url

'https://api.foursquare.com/v2/venues/explore?&client_id=P2LTGBSPUIWKRMZF55OPZEW0H33JE45DQ225YFEVQWPRKVNV&client_secret=T3NST1UUDSME3VDVIXGINPWOXHWY0VELWYXGGUFCAUKX3DY5&v=20180605&ll=19.829536,-155.474938&radius=1000&limit=50'

From the **requests** library, we'll get the results of the names of the neighborhoods around the center.

In [136]:
results = requests.get(url).json()
results

{'meta': {'code': 429,
  'errorType': 'quota_exceeded',
  'errorDetail': 'Quota exceeded',
  'requestId': '5e56bb4aaba297001b2627f9'},
 'response': {}}