<a href="https://colab.research.google.com/github/avinashkrishnan2020/Coursera_Capstone/blob/master/Battle_Of_Neighborhoods_week_1.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

#**Battle of Neighborhoods - Week 1**

#Introduction/Business Problem

New York is one of the most sought after tourist destinations. The number of visitors in this city keeps increasing year over year. The city is filled with venues no one ever wants to miss out on. Other than the main highlights such as the Statue of Liberty, Central Park and so on, the city is also known for its broadway shows as well as the exquisite dining and shopping experience.


Tourists who visit New York are already burdened with a hefty price tag on their travel tickets and an expensive stay only adds to their worries. This leaves them a very narrow opportunity to splurge their expense in exploring the city that is known for it's shopping and dining experience.

This analysis aims at finding an ideal location to start a budget friendly stay facility for tourists.

The stakeholders here are hotel owners who want to explore the possibilities of starting out a budget friendly hotel service for tourists without sacrificing the quality and the experience. If implemented in the an optimal manner, it can even disrupt the entire hotel industry.

The service is such that only stay is provided along with other basic necessities such as furnitures, internet, etc.



One of the most crucial aspects of finding the perfect location is its accessibility to the most tourist spots. Most of the tourists who visit New York stay for a maximum of 1 week and they mostly look for hotels that are closer/esily accessible to the popular destinations.

Another aspect to look out is the availabilty of bus-stop and train stations nearby. Cabs are pretty expensive in this city and hence it is recommended to explore the city via buses or train.

Next, we have to checkout the availability of restaurants in the shortlisted locations. Since we are providing only stay service, it is essential to have restaurants nearby which we will be very convinient for the visitors. This allows them to explore a wide variety of delicacies instead of being restricted with what the hotels provide which is usually the case with traditional hotel services.



Finally, we will checkout the median property prices in each neighborhood which gives us an idea about the cost of purchasing a property in different regions in order to start this budget hotel room service and hence the hotel owners can make an informed decision in the end based on the property value, and the ammenities it provides to the tourists.

# Datasets

Let us first import certain essential libraries

In [4]:
# Import essential libraries
import pandas as pd
import numpy as np
import folium
import json
from bs4 import BeautifulSoup
from urllib.request import urlopen
import requests
from math import sin, cos, sqrt, atan2, radians

Define Foursquare credentials for service call

In [10]:
# Foursquare credentials
clientId = 'abc'
clientSecret = 'def'
version = '20180605'

**Dataset-1**

This dataset contains the boroughs and neighborhoods of New York along with their latitudes and longitudes.

It can be obtained from the URL: https://cocl.us/new_york_dataset which gives us a json response.

We will use this dataset to add markers to a map of New York with different colours for different boroughs so as to distinguish them. 

Download the dataset and save it as newyork_data.json

In [2]:
!wget -q -O 'newyork_data.json' https://cocl.us/new_york_dataset

In [5]:
with open('newyork_data.json', mode='r') as data_json:
    newyork_data = json.load(data_json)

Convert the json to a pandas dataframe and remove any unwanted columns from the dataframe.


In [6]:
# New York data dataframe
pd.json_normalize(newyork_data['features'])

newyork_raw_df = pd.json_normalize(newyork_data['features'])
new_york = newyork_raw_df[['properties.borough', 'properties.name', 'geometry.coordinates']]

In [7]:
# Add separate latitude and longitude columns
longitude = [item[0] for item in new_york.iloc[0:,2]]
latitude = [item[1] for item in new_york.iloc[0:,2]]

In [8]:
# Drop 'geometry.coordinates' column
new_york = new_york.drop(columns = ['geometry.coordinates'], axis = 1)
# Add latitude column
new_york.insert(2,'latitude',latitude)
# Add longitude column
new_york.insert(3, 'longitude', longitude)
# Rename the columns
new_york.columns = ['borough', 'neighborhood', 'latitude', 'longitude']
# View forst 5 rows of the dataframe
new_york.head()

Unnamed: 0,borough,neighborhood,latitude,longitude
0,Bronx,Wakefield,40.894705,-73.847201
1,Bronx,Co-op City,40.874294,-73.829939
2,Bronx,Eastchester,40.887556,-73.827806
3,Bronx,Fieldston,40.895437,-73.905643
4,Bronx,Riverdale,40.890834,-73.912585


We will now use this dataset to add markers to a map of New York with different colours for different boroughs so as to distinguish them. 

This is done to visualize which borough has the most number of top-most visited places by tourists(Which is obtained in the proceeding dataset)

In [9]:
print('Number of rows: {}'.format(new_york.shape[0]))

Number of rows: 306


**Dataset-2**

The second dataset is a list of top-18 most visited tourist spots in New York. The list is obtained by scraping the wikipedia page: https://en.wikipedia.org/wiki/Tourism_in_New_York_City

This dataset contains the locations of each spot and we will using this data to plot markers on map of New York we created earlier. This is done to visualize which all boroughs are closer to most of these popular tourist destinations and hence narrow down the search for the ideal location.

In [11]:
# Scrape the html page from wikipedia
top_attractions_url = 'https://en.wikipedia.org/wiki/Tourism_in_New_York_City' 
attractions_html_page = urlopen(top_attractions_url)

In [12]:
# Define a BeautifulSoup instance 
# and find the required table from the page
soup = BeautifulSoup(attractions_html_page, 'html.parser')
attractions_table = (soup.find_all('table'))[1]

The latitudes and longitudes in the table formed above are in degrees, minutes and seconds. We need to convert these values to degrees and hence we define a method get_lat_long() for this conversion.

In [13]:
# Method to get latitude and longitude 

def get_lat_long(coordinates):
    
    degrees = float(coordinates.split("°")[0])
    minutes_seconds = coordinates.split("°")[1]

    minutes = minutes_seconds.split("′")[0]
    minutes = float(minutes)/60

    seconds_direction = minutes_seconds.split("′")[1]
    seconds = seconds_direction.split("″")[0]
    seconds = float(seconds)/3600

    direction = seconds_direction.split("″")[1]

    latlong = degrees + minutes + seconds
    if direction == 'S' or direction == 'W':
        latlong = -latlong
    
    return latlong

In [14]:
spot = [] # Tourist spot
latitude = [] #latitude
longitude = [] #longitude

# Add required data to each list
for row in attractions_table.find_all('tr')[1:]:
    row_data = row.find_all('td')

    spot.append(row_data[0].find_all('a')[0].text.strip())
    latitude.append(row_data[2].find_all('span', attrs = {'class' : 'latitude'})[0].text.strip())
    longitude.append(row_data[2].find_all('span', attrs = {'class': 'longitude'})[0].text.strip())

Convert the latitudes and longitudes to degrees

In [15]:
# Cnovert coordinates to degrees
 latitude = [get_lat_long(lat) for lat in latitude] 
 longitude = [get_lat_long(lng) for lng in longitude]

Finally, form a new datframe consisting of the top tourist spots and their latitudes and longitudes.

In [16]:
# Top-18 tourist spots dataframe
top_attractions_df = pd.DataFrame({'spot':spot, 'latitude':latitude, 'longitude':longitude})

In [17]:
# View a sample of the dataframe formed
top_attractions_df.head()

Unnamed: 0,spot,latitude,longitude
0,Central Park,40.782222,-73.965278
1,Times Square,40.756944,-73.986111
2,Grand Central Terminal,40.752778,-73.977222
3,Theater District,40.758889,-73.985
4,Rockefeller Center,40.758611,-73.979167


In [43]:
print('Number of tourist spots: {}'.format(top_attractions_df.shape[0]-1))

Number of tourist spots: 18


**Dataset-3**

The next dataset is obtained from the Foursqaure API where we try to retrieve bus-stops near each of the neighborhood and find out which of them are well connected with the public transport system.

Inorder to find the locations that have adequate bus shelters, we plot markers using the location data obtained from foursquare and visualize which all areas that are well connected with public transport.

For now, let us checkout two bus-stops in the city of New York

In [None]:
# Bus-stop category id 
bus_stop_categoryId = '52f2ab2ebcbc57f1066b8b4f'
foursquare_url = 'https://api.foursquare.com/v2/venues/search?categoryId={}&near=New+York&limit=2&client_id={}&client_secret={}&v={}'.format(bus_stop_categoryId,clientId,clientSecret,version)

response = requests.get(foursquare_url).json()


In [31]:
# Print the json having bus-stops details
response

{'meta': {'code': 200, 'requestId': '5f00bdb839be6735844dfe8e'},
 'response': {'confident': False,
  'geocode': {'feature': {'cc': 'US',
    'displayName': 'New York, NY, United States',
    'geometry': {'bounds': {'ne': {'lat': 40.882214, 'lng': -73.907},
      'sw': {'lat': 40.679548, 'lng': -74.047285}},
     'center': {'lat': 40.742185, 'lng': -73.992602}},
    'highlightedName': '<b>New York</b>, NY, United States',
    'id': 'geonameid:5128581',
    'longId': '72057594043056517',
    'matchedName': 'New York, NY, United States',
    'name': 'New York',
    'slug': 'new-york-city-new-york',
    'woeType': 7},
   'parents': [],
   'what': '',
   'where': 'new york'},
  'venues': [{'categories': [{'icon': {'prefix': 'https://ss3.4sqi.net/img/categories_v2/travel/busstation_',
       'suffix': '.png'},
      'id': '52f2ab2ebcbc57f1066b8b4f',
      'name': 'Bus Stop',
      'pluralName': 'Bus Stops',
      'primary': True,
      'shortName': 'Bus Stop'}],
    'hasPerk': False,
    'id

In [34]:
# Convert the json to a dataframe
bus_stops = pd.json_normalize(response['response']['venues'])

Let us filter out only the latitude and longitude details of the bus-stops as others are not required and print out a sample.

In [35]:
# Retrieve only location details
bus_stops = bus_stops[['location.lat', 'location.lng']]
bus_stops.columns = ['latitude', 'longitude']
bus_stops.head(2)

Unnamed: 0,latitude,longitude
0,40.675333,-73.91309
1,40.802801,-73.9338


**Dataset-4**

We will then retrive a data of available train stations in New York. We then use this dataset to plot markers on a map and visualize which all locations have a train station as well as a bus-stop nearby in order to narrow down the search for an ideal location for the hotel.

For now, let us search only two train stations for demonstration purpose

In [36]:
# Train station category id
train_station_categoryId = '4bf58dd8d48988d129951735'
foursquare_url = 'https://api.foursquare.com/v2/venues/search?categoryId={}&near=New+York&limit=2&client_id={}&client_secret={}&v={}'.format(train_station_categoryId,clientId,clientSecret,version)

train_stations_response = requests.get(foursquare_url).json()
train_stations_response

{'meta': {'code': 200, 'requestId': '5f00c02516b1975fb9bb847d'},
 'response': {'confident': False,
  'geocode': {'feature': {'cc': 'US',
    'displayName': 'New York, NY, United States',
    'geometry': {'bounds': {'ne': {'lat': 40.882214, 'lng': -73.907},
      'sw': {'lat': 40.679548, 'lng': -74.047285}},
     'center': {'lat': 40.742185, 'lng': -73.992602}},
    'highlightedName': '<b>New York</b>, NY, United States',
    'id': 'geonameid:5128581',
    'longId': '72057594043056517',
    'matchedName': 'New York, NY, United States',
    'name': 'New York',
    'slug': 'new-york-city-new-york',
    'woeType': 7},
   'parents': [],
   'what': '',
   'where': 'new york'},
  'venues': [{'categories': [{'icon': {'prefix': 'https://ss3.4sqi.net/img/categories_v2/travel/trainstation_',
       'suffix': '.png'},
      'id': '4bf58dd8d48988d129951735',
      'name': 'Train Station',
      'pluralName': 'Train Stations',
      'primary': True,
      'shortName': 'Train Station'}],
    'deliver

Let us filter out only the latitude and longitude details of the train-stations as others are not required and print out a sample.

In [40]:
train_stations = pd.json_normalize(train_stations_response['response']['venues'])
train_stations = train_stations[['location.lat','location.lng']]
train_stations.columns = ['latitude', 'longitude']
train_stations.head(2)

Unnamed: 0,latitude,longitude
0,40.750437,-73.993611
1,40.752647,-73.977226


**Dataset-5**

Using the foursquare API, we can find out the restaurants in the vicinity of each neighborhood. This is crucial as the service provided is only for stay and hence it must be a location where we can find as many delicacies as possible so as to cater to visitors from different parts of the world.

For now, let us call only two restaurants in New York for demonstration purpose.

In [41]:
# Restaurants category id
restaurants_categoryId = '4d4b7105d754a06374d81259'
foursquare_url = 'https://api.foursquare.com/v2/venues/search?categoryId={}&near=New+York&limit=2&client_id={}&client_secret={}&v={}'.format(restaurants_categoryId,clientId,clientSecret,version)

restaurants_response = requests.get(foursquare_url).json()
restaurants_response

{'meta': {'code': 200, 'requestId': '5f00bf517b9ef927ef349d42'},
 'response': {'confident': False,
  'geocode': {'feature': {'cc': 'US',
    'displayName': 'New York, NY, United States',
    'geometry': {'bounds': {'ne': {'lat': 40.882214, 'lng': -73.907},
      'sw': {'lat': 40.679548, 'lng': -74.047285}},
     'center': {'lat': 40.742185, 'lng': -73.992602}},
    'highlightedName': '<b>New York</b>, NY, United States',
    'id': 'geonameid:5128581',
    'longId': '72057594043056517',
    'matchedName': 'New York, NY, United States',
    'name': 'New York',
    'slug': 'new-york-city-new-york',
    'woeType': 7},
   'parents': [],
   'what': '',
   'where': 'new york'},
  'venues': [{'categories': [{'icon': {'prefix': 'https://ss3.4sqi.net/img/categories_v2/food/french_',
       'suffix': '.png'},
      'id': '4bf58dd8d48988d10c941735',
      'name': 'French Restaurant',
      'pluralName': 'French Restaurants',
      'primary': True,
      'shortName': 'French'}],
    'hasPerk': Fals

Form a dataframe of the locations of each restaurant and their category.


In [46]:
restaurants = pd.json_normalize(restaurants_response['response']['venues'])
restaurants = restaurants[['location.lat','location.lng','categories']]
restaurants.columns = ['latitude', 'longitude', 'category']
# Filter out the category of each restaurant
restaurants['category'] = [item[0]['name'] for item in restaurants['category']]
restaurants.head(2)

Unnamed: 0,latitude,longitude,category
0,40.719114,-74.000202,French Restaurant
1,40.674224,-73.999877,Ramen Restaurant


**Dataset-6**

Lastly, we can download a dataset from the url:  https://data.cityofnewyork.us/resource/5ebm-myj7.json
which contains the median house sale prices in different neighborhoods and hence gives us an idea about the property costs in different areas, which is again an important factor when we are trying to buy a property which we will be using for the hotel service.

In [23]:
# New york property sales
!wget -q -O 'sales_data.json' https://data.cityofnewyork.us/resource/5ebm-myj7.json

In [26]:
# Convert json to dataframe
with open('sales_data.json') as sales_data:
    sales_data_df = pd.read_json(sales_data)

In [27]:
# View sample of the dataframe
sales_data_df.head()

Unnamed: 0,borough,neighborhood,type_of_home,number_of_sales,lowest_sale_price,average_sale_price,median_sale_price,highest_sale_price,year
0,MANHATTAN,ALPHABET CITY,01 ONE FAMILY HOMES,1,593362,593362,593362,593362,2010
1,MANHATTAN,ALPHABET CITY,02 TWO FAMILY HOMES,1,1320000,1320000,1320000,1320000,2010
2,MANHATTAN,ALPHABET CITY,03 THREE FAMILY HOMES,1,900000,900000,900000,900000,2010
3,MANHATTAN,CHELSEA,01 ONE FAMILY HOMES,2,500000,2875000,2875000,5250000,2010
4,MANHATTAN,CHELSEA,02 TWO FAMILY HOMES,2,1306213,2603107,2603107,3900000,2010


We will now plot markers using the location details obtained above and find out the suitable location(s) where we can start the budget friendly hotel.