# Coursera Capstone Project
## The Battle of Neighborhoods (Week 1)

## Table of contents

1 Introduction <br>
2 Data <br>
3 Methology <br>
4 Results <br>
5 Discussion <br>
6 Conclusion

# 1 Introduction

##  Description of the problem and a discussion of the background:

New York is one of the cultural and financial centers of the USA. Furthermore, it is a flagship for the "American Way of Life", which is visited by millions of people every year. For this reason, New York is of crucial importance for tourism and for the worldwide reputation of the USA.

In the year 2018 New York City welcomed a new record of over 65 million visitors. According to analysis company Smith Travel Research the hotel occupancy rate of New York City rose to 87.3 percent, which was more than comparable cities like Paris and Berlin (both around 75 percent).

Currently there are over 115,530 hotel rooms in over 630 hotels in the five boroughs of New York City. Most of these rooms, around 80 percent, are in Manhattan. Since 2010, the New York City hotel market had a 42 percent growth in new hotel rooms. Most of this growth  has  happened  in  areas  outside  of  Manhattan,  recently  creating  well-established  hotel  districts  in  areas  of Brooklyn and Queens.

In this environment I am the Business Analyst of a hotel company looking for a location for a new hotel in New York City.

## Business Problem:

As seen in the problem description, New York City continues to be a worthwhile environment to build and operate a new Hotel. Since most hotels are in Manhattan and the costs for a new building there are very high, the goal is to analyze the possibilities for a hotel in Brooklyn. Legal problems and non-available construction site do not play a role at this point of the planning yet.

For tourists and businesspeople there are a number of requirements for a hotel. The following demands will be the most important drivers for a successful  hotel:

- Proximity to Manhattan 
- Access to public transportation
- Presence of services and amenities like restaurants and cultural sights in neighborhood
- Significant office or commercial markets
- Existing number of hotels in neighborhood


Furthermore, the Real Estate prices continue to play an important role for the success of a new hotel and will be part of the analysis.

Ultimately, the Business Problem is in which neighborhood a new hotel would have the greatest success. ALso what neighborhood is the worst for a new hotel and what category would be the best? Therefore, we will look at the demands for a successful hotel and analyze which neighborhood offers the best prerequisites for a new hotel. Interested Audience would be Hotel Operators and Investors but also other businesses that benefit/depend from hotels and want to plan for the future.

# 2 Data

## Description of the Data:

The following information is required to answer the issues of the problem:
- List of neighborhoods of Brooklyn with their geodata (latitud and longitud) 
- List of Subway metro stations in Brooklyn with their address location 
- List real estate prices for each neighborhood of Brooklyn
- Proximity to the Center of Manhattan for each neighborhood of Brooklyn
- Number of Restaurants (if possible by catagory) and cultural sights for each neighborhood of Brooklyn
- Number of already existing hotel for each neighborhood of Brooklyn

Therefore we need the follwing data:

- New York City data that contains list Boroughs, Neighborhoods along with their latitude and longitude.
    - Data source : https://cocl.us/new_york_dataset
- Restaurants, hotels, Subway stations and cultural sights.
    - Data source : Fousquare API
- GeoSpace data to get the New york Borough boundaries that will help us visualize choropleth map.
    - https://data.cityofnewyork.us/City-Government/Borough-Boundaries/tqmj-j8zm
- New York Real Estate Prices by neighborhood.
    - https://www.trulia.com/real_estate/New_York-New_York/


## How the data will be used to solve the problem 

The approach will be as follows:

- Collect the new york city data from https://cocl.us/new_york_dataset
- Use Foursquare and geopy data to find and map venues for all Brooklyn neighborhoods and clustered in groups
- Filter out all venues that are important for tourists and businesspeople
- Find rating and like count for Restaurants and hotels using FourSquare API.
- create a map that shows the average rental price for all Brooklyn neighborhoods
- Rank the Brooklyn neighborhoods by best usage for a new hotel

The data will allow to answer the key questions to make a decision:

- What is the cost for each Brooklyn neighborhood?
- Which Brooklyn neighborhood has most hotels, restaurants and cultural sights?
- Which Brooklyn neighborhood has the best subway connectivity?
- Are there tradeoffs between Proximity to the center of Manhattan and price and location?

## Create Database

In [4]:
import numpy as np # library to handle data in a vectorized manner

import pandas as pd # library for data analsysis
pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', None)

import json # library to handle JSON files

#!conda install -c conda-forge geopy --yes # uncomment this line if you haven't completed the Foursquare API lab
from geopy.geocoders import Nominatim # convert an address into latitude and longitude values

import requests # library to handle requests
from pandas.io.json import json_normalize # tranform JSON file into a pandas dataframe

# Matplotlib and associated plotting modules
import matplotlib.cm as cm
import matplotlib.colors as colors

# import k-means from clustering stage
from sklearn.cluster import KMeans

#!conda install -c conda-forge folium=0.5.0 --yes # uncomment this line if you haven't completed the Foursquare API lab
import folium # map rendering library

print('Libraries imported.')

Solving environment: done

# All requested packages already installed.

Libraries imported.


In [9]:
!wget -q -O 'newyork_data.json' https://cocl.us/new_york_dataset
print('Data downloaded!')

Data downloaded!


In [14]:
with open('newyork_data.json') as json_data:
    newyork_data = json.load(json_data)
neighborhoods_data = newyork_data['features']

# define the dataframe columns
column_names = ['Borough', 'Neighborhood', 'Latitude', 'Longitude'] 

# instantiate the dataframe
neighborhoods = pd.DataFrame(columns=column_names)

for data in neighborhoods_data:
    borough = neighborhood_name = data['properties']['borough'] 
    neighborhood_name = data['properties']['name']
        
    neighborhood_latlon = data['geometry']['coordinates']
    neighborhood_lat = neighborhood_latlon[1]
    neighborhood_lon = neighborhood_latlon[0]
    
    neighborhoods = neighborhoods.append({'Borough': borough,
                                          'Neighborhood': neighborhood_name,
                                          'Latitude': neighborhood_lat,
                                          'Longitude': neighborhood_lon}, ignore_index=True)
    
neighborhoods.head()

Unnamed: 0,Borough,Neighborhood,Latitude,Longitude
0,Bronx,Wakefield,40.894705,-73.847201
1,Bronx,Co-op City,40.874294,-73.829939
2,Bronx,Eastchester,40.887556,-73.827806
3,Bronx,Fieldston,40.895437,-73.905643
4,Bronx,Riverdale,40.890834,-73.912585


In [15]:
print('The dataframe has {} boroughs and {} neighborhoods.'.format(
        len(neighborhoods['Borough'].unique()),
        neighborhoods.shape[0]
    )
)

The dataframe has 5 boroughs and 306 neighborhoods.


In [16]:
address = 'New York City, NY'

geolocator = Nominatim(user_agent="ny_explorer")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinate of New York City are {}, {}.'.format(latitude, longitude))

The geograpical coordinate of New York City are 40.7127281, -74.0060152.


In [17]:
# create map of New York using latitude and longitude values
map_newyork = folium.Map(location=[latitude, longitude], zoom_start=10)

# add markers to map
for lat, lng, borough, neighborhood in zip(neighborhoods['Latitude'], neighborhoods['Longitude'], neighborhoods['Borough'], neighborhoods['Neighborhood']):
    label = '{}, {}'.format(neighborhood, borough)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_newyork)  
    
map_newyork

# 3 Methology

# 4 Results

# 5 Discussion

# 6 Conclusion