# Capstone Project - Battle of Neighborhoods - Opening a new Hotel in Toronto, Canada (Report)

## Introduction: Business Problem 


In this project we try to find the best location for building a new hotel in Toronto, Canada. 

Toronto is the capital city of the Canadian province of Ontario. With a recorded population of 2,731,571 in 2016, it is the most populous city in Canada and the fourth most populous city in North America. The city is the anchor of the Golden Horseshoe, an urban agglomeration of 9,245,438 people (as of 2016) surrounding the western end of Lake Ontario, while the Greater Toronto Area (GTA) proper had a 2016 population of 6,417,516. Toronto is an international centre of business, finance, arts, and culture, and is recognized as one of the most multicultural and cosmopolitan cities in the world

Our goal is to find the best place to building a new hotel. We want to build a hotel in the neighborhood which the competition is low, and there are not already crowded with hotels. However, we want to make sure that this place is close enough to the large amount of entertainment centers. 

This report can provide a helpful information for stakeholders who are interested in opening a hotel in Toronto, Canada.

## DATA

In this project, we will fetch or extract data from the following data sources:
1. https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M
      . I scraped the Wikipedia page and wrangle the data, clean it, and then read it into a pandas dataframe so that 
      it is in a structured format. This webpage includes Toronto boroughs and neighborhoods. 
2. http://cocl.us/Geospatial_data
      Here is a link to a csv file that has the geographical coordinates of each postal code
3. Hotels and Entertainment centers data in every neighborhood will be obtained using Foursquare API

## Methodology


In this project we will try to find a best place on Toronto, Canada to open a new Hotel. We want to build this hotel in the low hotel denstity neighborhood. 

First, we collected location of all neighborhood of Toronto. Then we retrieved location of all hotels in each neighborhood of Toronto. 

Second, we use our model to cluster different neighborhood in Toronto depends on how many hotels in each neighborhood. In fact, our model split different neighborhood to different clsuters based on number of (location of) hotels. By doing so, we found one cluster (neighborhood) with just 1 hotel. Then we look at all entertainment places on that neighborhood that can attract travelers to this place.


## Analysis

First we install al libraries as well as folium for map plotting.

Then we retrieve data from https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M. From this wikipedia page, we collected data about all Borough and Neighborhood in Toronto, Canada. We clear this data, remove all non-assigned neighborhood and borough. We converted it to structured format so we can do analysis based on that.

Then we collected data from https://cocl.us/Geospatial_data. From this data set, we got latitude and longitude of each neighborhood in Toronto. Then we merge these two tables into one table. Here is the part of the code we used to achieve this goal. ALso, you can find sample of results after merging two tables.

In [7]:
df=pd.read_html('https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M')
df=df[0]
df1=df.replace('Not assigned',np.NaN)
df2=df1.dropna(subset=['Borough'],axis=0)
df3=df2.reset_index()
tor=df3.drop(labels=['index'],axis=1)
tor=tor.rename(columns={"Neighbourhood": "Neighborhood"})
addr=pd.read_csv('https://cocl.us/Geospatial_data')
toronto=pd.merge(tor, addr, how='inner', on='Postal Code')
toronto.head()

Unnamed: 0,Postal Code,Borough,Neighborhood,Latitude,Longitude
0,M3A,North York,Parkwoods,43.753259,-79.329656
1,M4A,North York,Victoria Village,43.725882,-79.315572
2,M5A,Downtown Toronto,"Regent Park, Harbourfront",43.65426,-79.360636
3,M6A,North York,"Lawrence Manor, Lawrence Heights",43.718518,-79.464763
4,M7A,Downtown Toronto,"Queen's Park, Ontario Provincial Government",43.662301,-79.389494


###### Then we use API to fetch data from Foursquare. We want to list all hotels in Toronto. The below code shows us there are 43 hotels in Toronto. You can find the code as well as the sample of result.

In [10]:
toronto_venues = getNearbyVenues(names=toronto['Neighborhood'],
                                   latitudes=toronto['Latitude'],
                                   longitudes=toronto['Longitude']
                                  )
toronto_hotels=toronto_venues[toronto_venues['Venue Category']=='Hotel']
toronto_hotels=toronto_hotels.reset_index()
toronto_hotels=toronto_hotels.drop(labels=['index'],axis=1)
toronto_hotels.head()

Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,"Regent Park, Harbourfront",43.65426,-79.360636,Residence & Conference Centre,43.65304,-79.35704,Hotel
1,"Garden District, Ryerson",43.657162,-79.378937,The Grand Hotel & Suites Toronto,43.656449,-79.37411,Hotel
2,"Garden District, Ryerson",43.657162,-79.378937,Marriott Downtown at CF Toronto Eaton Centre,43.654728,-79.382422,Hotel
3,St. James Town,43.651494,-79.375418,Cambridge Suites Toronto,43.651836,-79.378107,Hotel
4,St. James Town,43.651494,-79.375418,One King West Hotel & Residence,43.649139,-79.377876,Hotel


### K-Means Clustering

Run _k_-means to cluster the neighborhood into 4 clusters. We can use the Sum of Squared Distance and Silhouette Score  to evaluate the K-Means algorithm for different K ans see 4 is the best number of cluster that we can assign for this problem. Then by using folium, we can plot a map where all hotels in the same cluster have the same color. 
You can find the map below. As you can see, there is just one Yellow color hotel in one of the clusters.

In [14]:
kclusters = 4

xy=toronto_hotels[['Venue Latitude','Venue Longitude']].values
# run k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(xy)
labels = kmeans.labels_

map_clusters

So we choose "Downsview" Neighborhood in the yellow clusters. Because our hotel just have one competitor. Also, many attractions are near to this location. We can see some of them in below.

In [15]:
toronto_venues2=toronto_venues[toronto_venues['Neighborhood']=='Downsview'].sort_values(by=['Venue Category'])
toronto_venues2

Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
945,Downsview,43.737473,-79.464763,Toronto Downsview Airport (YZD),43.738883,-79.470111,Airport
1364,Downsview,43.761631,-79.520999,Driftwood community centre,43.76568,-79.519706,Athletics & Sports
1124,Downsview,43.739015,-79.506944,TD Canada Trust,43.740236,-79.51255,Bank
1256,Downsview,43.728496,-79.495697,Roding Park,43.728655,-79.492918,Baseball Field
1258,Downsview,43.728496,-79.495697,Blue Sail Energy Solutions,43.731445,-79.493787,Business Service
1259,Downsview,43.728496,-79.495697,Yummy Dogs,43.726512,-79.50128,Food Truck
1126,Downsview,43.739015,-79.506944,Win Farm Supermarket,43.739193,-79.512053,Grocery Store
1127,Downsview,43.739015,-79.506944,Price Chopper,43.739908,-79.512261,Grocery Store
1362,Downsview,43.761631,-79.520999,Durante's No Frills,43.758178,-79.51968,Grocery Store
1365,Downsview,43.761631,-79.520999,Planet Fitness,43.757538,-79.51961,Gym / Fitness Center


One airport is located in this neighborhood. Our hotel will be the best location for the travelers who stop at this airport. Also, there are two beautiful parks close to our proposed location. Our guest can shop at three grocery stores which are located near us. This location is near to Baseball Field too.

## Results and Discussions

Our analysis shows that although there is a great number of hotels in Toronto, most of them concentrate on downtown. After clustering different neighborhood, we understand that in one cluster, there is only one hotel. SO we choose this cluster to build our hotel. Then we investigate all neighborhoods in this cluster to find out what is the best location to build this hotel. We find "Downsview" Neighborhood. This neighborhood has lots of activities for tourists to do. It has two parks (Ancaster Park, Giltspur Park), one shopping mall (jane sheppard mall), three grocery stores (Win Farm Supermarket, Price Chopper, Durante's No Frills), one baseball field (Roding Park), and so on. Also, this neighborhood has one Airport which is Toronto Downsview Airport (YZD) which can attracts travelers who has stop at this airport. So all in all, there are not much competition in this region as well as lot of attraction for tourist. SO building the hotel in this area can make a huge revenue for a new hotel. There are also other factors which could be taken into account. They will be helpful to find more accurate results.				
		

## Conclusion

The purpose of this project was to find an area with low number of hotels on Toronto to open a new Hotel. After retrieving data from several data sources and convert them to a structured data frame, applying the K-Means clustering algorithm, we picked the cluster with fewer hotels and more entertainment activities. 

Final decision on optimal hotel location will be made by stakeholders based on specific characteristics of neighborhoods like competition and other activities on that neighborhood like restaurants, airport, grocery store, sport field and so on




