<h1 align = "center"> Time-constrained Tourists + Immigrants Problems </h1>
<h3 align = "center"> Providing a City Guide for Foreigners for any purpose </h3>

Shun-Ping (Preston) Yu <br><br>
16th July 2021

## Introduction

There are lots of foreign and local people in a city. Especially for foreign visiters, they come to a city for many purposes, such as business, travelling, immigration and so on. In this capstone project, we will take the Big Apple, New York City, as example to demonstrate how to solve foreign visiters' common problems.  

## Business Problem

Imagine your are running a tourist agency company which is in capable of not only helping immigrants find suitable residential place to settle down, but also planning tourist packages for every customers. The aim is to satisfy all your customers for any purpose. One day, you get 2 different cases from different customers. <br> 
<li> Customer A - An immigrant from Taiwan : Searching for suitable area to live in </li>
<li> Customer B - A group of tourists : Visiting as many top attractions as possible in 3 days (because they will move on to the next city in the 4th day) </li>

## Data Description

<h4> Geolocation data for New York City </h4>
I reused the data from the previous lab (https://cf-courses-data.s3.us.cloud-object-storage.appdomain.cloud/IBMDeveloperSkillsNetwork-DS0701EN-SkillsNetwork/labs/newyork_data.json) and extract the following information.<br>
<li> Borough </li>
<li> Neighborhood </li>	
<li> Latitude	</li>
<li> Longitude </li>

<h4> NYC Property Sales </h4>
I used the data from kaggle (https://www.kaggle.com/new-york-city/nyc-property-sales) and extract the following information.<br>
<li> Neighborhood </li>	
<li> Sale Price	</li>
This dataset is a record of every building or building unit (apartment, etc.) sold in the New York City property market over a 12-month period.

<h4> Google OR-Tools Traveling Salesperson Problem (TSP) </h4>
I used the open-source routing package from Google OR-Tools.(https://developers.google.com/optimization/routing/tsp) <br>
TSP is used for finding the shortest route for a salesperson who needs to visit customers at different locations and return to the starting point. A TSP can be represented by a graph, in which the nodes correspond to the locations, and the edges (or arcs) denote direct travel between locations.

<h4> Foursquare API Data </h4>
We will need data about different venues in different neighbourhoods of that specific borough. In order to gain that information we will use "Foursquare" locational information. Foursquare is a location data provider with information about all manner of venues and events within an area of interest. Such information includes venue names, locations, menus and even photos. As such, the foursquare location platform will be used as the sole data source since all the stated required information can be obtained through the API.

After finding the list of neighbourhoods, we then connect to the Foursquare API to gather information about venues inside each and every neighbourhood. The data retrieved from Foursquare contained information of venues within a specified distance of the longitude and latitude of the postcodes. The information obtained per venue as follows:<br>

<li> Neighbourhood : Name of the Neighbourhood </li>
<li> Neighbourhood Latitude : Latitude of the Neighbourhood </li>
<li> Neighbourhood Longitude : Longitude of the Neighbourhood </li>
<li> Venue : Name of the Venue </li>
<li> Venue Latitude : Latitude of Venue </li>
<li> Venue Longitude : Longitude of Venue </li>
<li> Venue Category : Category of Venue </li>

## Methodology

We will be creating our model with the help of Python so we start off by importing all the required packages, so let's download all the dependencies that we will need.

In [None]:
import numpy as np # library to handle data in a vectorized manner

import pandas as pd # library for data analsysis
pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', None)

import json # library to handle JSON files

!conda install -c conda-forge geopy --yes # uncomment this line if you haven't completed the Foursquare API lab
from geopy.geocoders import Nominatim # convert an address into latitude and longitude values

import requests # library to handle requests
from pandas.io.json import json_normalize # tranform JSON file into a pandas dataframe

# Matplotlib and associated plotting modules
import matplotlib.cm as cm
import matplotlib.colors as colors

# import k-means from clustering stage
from sklearn.cluster import KMeans
from sklearn.cluster import DBSCAN

!pip install ortools
!conda install -c conda-forge folium=0.5.0 --yes # uncomment this line if you haven't completed the Foursquare API lab
import folium # map rendering library

Package breakdown:

<li> Pandas : To collect and manipulate data when analyzing </li> 
<li> requests : Handle http requests </li> 
<li> matplotlib : Detailing the generated maps </li> 
<li> folium : Generating maps of New York City </li> 
<li> sklearn : To import Kmeans and DBSCAN which are the machine learning models that we are using </li> 
<li> ortools : OR-Tools is an open source software suite for optimization, tuned for tackling the world's toughest problems in vehicle routing, flows, integer and linear programming, and constraint programming. </li> 

The approach taken here is to explore the city, plot the map to show the neighbourhoods being considered and then build our model by clustering all of the similar neighbourhoods or nearby top attractions together, and finally plot the new map with the clustered results. We draw insights and then compare and discuss our findings.

--------------------------------------------------------------------------------------------------------------------------------------------------------

## Part A.   Immigrants Problem - Searching for suitable area to live in

We begin to start collecting and refining the data needed for the our business solution to work.

### 1) Data Collection - Download and Explore Dataset

In the data collection stage, we begin with collecting the required data such as postal codes, neighbourhoods and boroughs in New York City.

In [None]:
!wget -q -O 'newyork_data.json' https://cf-courses-data.s3.us.cloud-object-storage.appdomain.cloud/IBMDeveloperSkillsNetwork-DS0701EN-SkillsNetwork/labs/newyork_data.json
with open('newyork_data.json') as json_data:
    newyork_data = json.load(json_data)
    
neighborhoods_data = newyork_data['features']

After arranging Neighborhood JSON file to DataFrame Format:

In [None]:
# define the dataframe columns
column_names = ['Borough', 'Neighborhood', 'Latitude', 'Longitude'] 

# instantiate the dataframe
neighborhoods = pd.DataFrame(columns=column_names)

# loop through the data and fill the dataframe one row at a time.
for data in neighborhoods_data:
    borough = neighborhood_name = data['properties']['borough'] 
    neighborhood_name = data['properties']['name']
        
    neighborhood_latlon = data['geometry']['coordinates']
    neighborhood_lat = neighborhood_latlon[1]
    neighborhood_lon = neighborhood_latlon[0]
    
    neighborhoods = neighborhoods.append({'Borough': borough,
                                          'Neighborhood': neighborhood_name,
                                          'Latitude': neighborhood_lat,
                                          'Longitude': neighborhood_lon}, ignore_index=True)

neighborhoods.head()

<img src="https://imgur.com/EAjAtlS.png">

Then, let's create a map of New York with neighborhoods superimposed on top

In [None]:
# create map of New York using latitude and longitude values
map_newyork = folium.Map(location=[latitude, longitude], zoom_start=10)

# add markers to map
for lat, lng, borough, neighborhood in zip(neighborhoods['Latitude'], neighborhoods['Longitude'], neighborhoods['Borough'], neighborhoods['Neighborhood']):
    label = '{}, {}'.format(neighborhood, borough)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_newyork)  
    
map_newyork

<img src="https://imgur.com/eQiADyG.png">

### 2) Data Processing - Data Cleaning and Wrangling + Exploratory Data Analysis + Feature Engineering

After using Foursquare API, we are able to get the venue and venue categories around each neighbourhood in New York City. This will help us get venue categories which is important for our analysis.

In [None]:
# Getting the venues in New York City
nyc_venues = getNearbyVenues(names=neighborhoods['Neighborhood'],
                                   latitudes=neighborhoods['Latitude'],
                                   longitudes=neighborhoods['Longitude']
                                  )

nyc_venues.head()

<img src="https://imgur.com/fWi1LAK.png">

**2.1) Define the Nearest Neighborhood for each Venue**

Seems like there are some venues which have been categorized to multiple neighborhoods. Let's calculate the distance (in meter) between venues and neighborhoods and determine the nearest neighborhood for each venue, so that it won't affect the analysis afterwards. 

In [None]:
import math
nyc_venues['distance_to_neighborhood_meters'] = 0
for i in range(0,len(nyc_venues)):
    nyc_venues['distance_to_neighborhood_meters'][i] = round(math.sqrt((nyc_venues['Venue Latitude'][i] - nyc_venues['Neighborhood Latitude'][i])**2 + (nyc_venues['Venue Longitude'][i] - nyc_venues['Neighborhood Longitude'][i])**2) * 111320,2)
    
nyc_venues.head()    

<img src="https://imgur.com/44gpyjJ.png">

Display only the nearest neighborhood for each venue and exclude all the other non-nearest ones

In [None]:
unique_venue = nyc_venues[['Venue','Venue Latitude','Venue Longitude','Venue Category']].drop_duplicates()
unique_venue = unique_venue.reset_index(drop = True)
min_distance_to_neighborhood_per_venue = {}

for i in range(0,len(unique_venue)):
    min_distance_to_neighborhood_per_venue[i] = min(nyc_venues[nyc_venues['Venue'] == unique_venue['Venue'][i]]['distance_to_neighborhood_meters'])
    
min_distance_to_neighborhood_per_venue_df = pd.DataFrame.from_dict(min_distance_to_neighborhood_per_venue, orient='index') 
unique_venue2 = pd.concat([unique_venue,min_distance_to_neighborhood_per_venue_df], axis = 1)

nyc_venues_neighborhood = nyc_venues[['Neighborhood','Neighborhood Latitude','Neighborhood Longitude','distance_to_neighborhood_meters','Venue']]
nyc_venues_df = unique_venue2.merge(nyc_venues_neighborhood, left_on = [0,'Venue'], right_on = ['distance_to_neighborhood_meters','Venue'], how = 'inner')
nyc_venues_df = nyc_venues_df.drop([0], axis = 1)
nyc_venues_df.head()

<img src="https://imgur.com/fWrV7RS.png">

**2.2) Import House Sale Price Data + Data Cleaning**

According to our customers' request on searching for suitable area to live in, let's add the house sale price data to help our customer choosing the ideal place based on their concerns and budgets.

In [None]:
import os, types
import pandas as pd
from botocore.client import Config
import ibm_boto3

# import data from IBM Watson
df_data_1 = pd.read_csv(body)
df_data_1.head()

<img src="https://imgur.com/kMPm2AJ.png">

In [None]:
# Data Wrangling 
df_data = df_data.reset_index(drop = True)
df_data = df_data[df_data['SALE PRICE'] != ' -  ']
df_data = df_data.reset_index(drop = True)
df_data['SALE PRICE'] = df_data['SALE PRICE'].astype('int64')
df_data = df_data[['NEIGHBORHOOD','SALE PRICE']]
df_data_grouped = df_data.groupby('NEIGHBORHOOD').mean().reset_index()
df_data_grouped.head()

<img src="https://imgur.com/D21ng2k.png">

Now we have the average house sale price for each neighborhood in New York City!

**2.3) Define the Composition of Neighborhoods**

Using one hot encoding to encode venue categories to get a better result for our upcoming clustering steps

In [None]:
# one hot encoding
nyc_onehot = pd.get_dummies(nyc_venues_df[['Venue Category']], prefix="", prefix_sep="")
nyc_onehot = nyc_onehot.drop(['Neighborhood'], axis = 1)

# add neighborhood column back to dataframe
nyc_onehot['_Neighborhood'] = nyc_venues_df['Neighborhood'] 

# move neighborhood column to the first column
fixed_columns = [nyc_onehot.columns[-1]] + list(nyc_onehot.columns[:-1])
nyc_onehot = nyc_onehot[fixed_columns]

a = list(nyc_onehot.columns)
a[0] = 'Neighborhood'
nyc_onehot.columns = a

nyc_onehot.head()

Group rows by neighborhood and by taking the mean of the frequency of occurrence of each category

In [None]:
nyc_grouped = nyc_onehot.groupby('Neighborhood').mean().reset_index()
nyc_grouped

<img src="https://imgur.com/d776Ibc.png">

Create a new dataframe and display the top 20 venues for each neighborhood

In [None]:
num_top_venues = 20

indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
columns = ['Neighborhood']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

# create a new dataframe
neighborhoods_venues_sorted = pd.DataFrame(columns=columns)
neighborhoods_venues_sorted['Neighborhood'] = nyc_grouped['Neighborhood']

for ind in np.arange(nyc_grouped.shape[0]):
    neighborhoods_venues_sorted.iloc[ind, 1:] = return_most_common_venues(nyc_grouped.iloc[ind, :], num_top_venues)

neighborhoods_venues_sorted.head()

<img src="https://imgur.com/PwEcsK4.png">

By doing so, we can have a clear picture of the composition of each neighborhood.

**2.4) Merge nyc_grouped Data with House Sale Price Data**

In [None]:
# Switch to Upper Case
for i in range(0,len(nyc_grouped)):
    nyc_grouped['Neighborhood'][i] = nyc_grouped['Neighborhood'][i].upper() 

# Merge 2 datasets
nyc_grouped_house_price = nyc_grouped.merge(df_data_grouped, left_on='Neighborhood', right_on='NEIGHBORHOOD', how='left')
nyc_grouped_house_price.head()

In [None]:
# Data Cleaning and Fill the NAs 
nyc_grouped_house_price = nyc_grouped_house_price.drop(['NEIGHBORHOOD'], axis = 1)
nyc_grouped_house_price = nyc_grouped_house_price.fillna(0)  # Set the NA house price to 0 as a special outlier in order to prevent inaccurate clustering
nyc_grouped_house_price.head()

### 3) Clustering - Applying KMeans Machine Learning Algorithm

Run kmeans to cluster the neighborhood into 5 clusters

In [None]:
# set number of clusters
kclusters = 5

nyc_grouped_house_price_clustering = nyc_grouped_house_price.drop('Neighborhood', 1)

# run k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(nyc_grouped_house_price_clustering)

# add clustering labels
neighborhoods_venues_sorted.insert(0, 'Cluster Labels', kmeans.labels_)

nyc_merged = neighborhoods

# merge toronto_grouped with toronto_data to add latitude/longitude for each neighborhood
nyc_merged = nyc_merged.join(neighborhoods_venues_sorted.set_index('Neighborhood'), on='Neighborhood')

nyc_merged.head() # check the last columns!

<img src="https://imgur.com/3VUnKTg.png">

Visualize the resulting clusters

<img src="https://imgur.com/ZVwjKhf.png">

The red dots symbolize the neighborhoods which are categorized as cluster 1, and they cover most of the area in New York City. The light green dots symbolize the neighborhood which are categorized as cluster 4, and they are mainly located in Manhattan. 

### 4) Data Visualization - Determine the differences among Clusters

In [None]:
# Switch to Upper Case
for i in range(0,len(neighborhoods_venues_sorted)):
    neighborhoods_venues_sorted['Neighborhood'][i] = neighborhoods_venues_sorted['Neighborhood'][i].upper() 

# Merge 2 datasets
neighborhoods_venues_sorted_house_price = neighborhoods_venues_sorted.merge(df_data_grouped, left_on='Neighborhood', right_on='NEIGHBORHOOD', how='left')

# Data Cleaning and Fill the NAs 
neighborhoods_venues_sorted_house_price = neighborhoods_venues_sorted_house_price.drop(['NEIGHBORHOOD'], axis = 1)
neighborhoods_venues_sorted_house_price = neighborhoods_venues_sorted_house_price.fillna(0)  # Set the NA house price to 0 as a special outlier in order to prevent inaccurate clustering
neighborhoods_venues_sorted_house_price.head()

***Cluster 1***

In [None]:
nyc_cluster_1_df = nyc_merged2[nyc_merged2['Cluster Labels'] == 0].reset_index(drop = True)
nyc_cluster_1_df.head()

In [None]:
import numpy as np
import matplotlib.pyplot as plt

Cluster1_Neighborhood = neighborhoods_venues_sorted_house_price[neighborhoods_venues_sorted_house_price['Cluster Labels'] == 0]['Neighborhood']
Cluster1_SalePrice = neighborhoods_venues_sorted_house_price[neighborhoods_venues_sorted_house_price['Cluster Labels'] == 0]['SALE PRICE']

x = np.arange(len(Cluster1_Neighborhood))
plt.bar(x, Cluster1_SalePrice, color=['blue'])
plt.xticks(x, Cluster1_Neighborhood)
plt.xlabel('Cluster1_Neighborhood')
plt.ylabel('Cluster1_SalePrice')
plt.title('Cluster1_Neighborhood House Sale Price')
plt.show()

print("The average price of the house in Cluster 1 region : $" , round(Cluster1_SalePrice.mean(),2))

<img src="https://imgur.com/e1Eh8jf.png">

***Cluster 2***

<img src="https://imgur.com/iUFJbms.png">

***Cluster 3***

<img src="https://imgur.com/0LoGZ3j.png">

***Cluster 4***

<img src="https://imgur.com/TxA2pEn.png">

***Cluster 5***

<img src="https://imgur.com/pThIg3u.png">

### 5) Results and Conclusion

According to the above analysis, we can conclude that Neighborhoods in Cluster 1 are probably residential areas, since the average house price is the lowest among clusters and it consists of restaurants, shops, and stores. We can highly recommend our immigrant customers to live in Cluster 1. For Cluster 2 and 3, the top 2 highest house price in New York City, where hotels and entertainment venues mostly locate, probably we can see lots of people from different states and countries in these 2 areas. Neighborhoods in Cluster 4 mostly locate in Manhattan and also consist of restaurants, shops, and stores. We could also recommend our immigrant customers to live in Cluster 4 if our customers have more budget on either renting or purchasing their house. Neighborhoods in Cluster 5 are probably the CBD (Central Business District), since the cluster consists of luxurious venues, such as jewelry stores, hotels and bars. <br><br>

The purpose of this problem is to explore a city in a glimpse, and also in a scientific way. We can even discover that the house sale price might be also related with the composition of the clusters. All in all, now it's up to how our immigrant customers choice on where to reside.

---------------------------------------------------------------------------------------------------------------------------------------------------------------

## Part B. Time-constrained Tourists Problem - Visiting as many top attractions as possible in 3 days

### 1) Data Collection - Download and Explore Dataset

Assuming the tourists live in somewhere around Lincoln Square, let's get the top 100 venues that are in Lincoln Square within a radius of 10 kilometers.

In [None]:
LIMIT = 100
radius = 10000

url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
    CLIENT_ID, 
    CLIENT_SECRET, 
    VERSION, 
    float(neighborhoods['Latitude'][neighborhoods['Neighborhood'] == 'Lincoln Square']), # lat of Lincoln Square
    float(neighborhoods['Longitude'][neighborhoods['Neighborhood'] == 'Lincoln Square']), # lon of Lincoln Square
    radius, 
    LIMIT)

url

In [None]:
venues = results['response']['groups'][0]['items']
    
nearby_venues = json_normalize(venues) # flatten JSON

# filter columns
filtered_columns = ['venue.name', 'venue.categories', 'venue.location.lat', 'venue.location.lng']
nearby_venues =nearby_venues.loc[:, filtered_columns]

# filter the category for each row
nearby_venues['venue.categories'] = nearby_venues.apply(get_category_type, axis=1)

# clean columns
nearby_venues.columns = [col.split(".")[-1] for col in nearby_venues.columns]

nearby_venues

<img src="https://imgur.com/xik7Jmw.png">

### 2) Data Processing - Data Cleaning and Wrangling

Since the tourists have little of time to explore the city, we will filter worthy attractions for them

In [None]:
major_sightseeing_spots = ['Opera House', 'Performing Arts Venue', 'Park','Fountain', 'Plaza', 'Theater','Garden', 'Art Museum', 'Concert Hall','Scenic Lookout', 'Waterfront', 'Church','Exhibit', 'Reservoir','Museum',  'Field', 'Art Gallery', 'Market']

major_sightseeing_venues = nearby_venues[nearby_venues['categories'].isin(major_sightseeing_spots)]
major_sightseeing_venues = major_sightseeing_venues.reset_index(drop = True)
major_sightseeing_venues

### 3) Clustering - Applying DBSCAN Machine Learning Algorithm

In [None]:
# Apply DBSCAN Algorithm
X = major_sightseeing_venues[['lat','lng']]
clustering = DBSCAN(eps=0.005,min_samples=3).fit(X)

# add clustering labels
major_sightseeing_venues.insert(0, 'Cluster Labels', clustering.labels_)
cluster_num = len(major_sightseeing_venues['Cluster Labels'].unique())

# create map
map_clusters2 = folium.Map(location=[latitude, longitude], zoom_start=12)

# set color scheme for the clusters
x = np.arange(cluster_num)
ys = [i + x + (i*x)**2 for i in range(cluster_num)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(major_sightseeing_venues['lat'], major_sightseeing_venues['lng'], major_sightseeing_venues['name'], major_sightseeing_venues['Cluster Labels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(int(cluster)+1), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[int(cluster)-1],
        fill=True,
        fill_color=rainbow[int(cluster)-1],
        fill_opacity=0.7).add_to(map_clusters2)
       
map_clusters2

<img src="https://imgur.com/BPB2F4p.png">

Looks like *Cluster 0 (Green dots)* has been excluded, since the attractions are too far away from the others.  

In [None]:
final_sightseeing_venues = major_sightseeing_venues[major_sightseeing_venues['Cluster Labels'] > -1] # Cluster number -1 is actually equivalent to Cluster 0, the tag name changes when creating folium map
final_sightseeing_venues = final_sightseeing_venues.reset_index(drop = True)
final_sightseeing_venues

<img src="https://imgur.com/px8gqHu.png">

### 4) Routing - Applying Google OR-Tools : Traveling Salesperson Problem

Reference: https://developers.google.com/optimization/routing/tsp 

Now we have the attraction clusters, let's determine the visiting order for these attractions! 

In [None]:
sightseeing_cluster = [final_sightseeing_venues_cluster1, final_sightseeing_venues_cluster2, final_sightseeing_venues_cluster3]

from ortools.constraint_solver import routing_enums_pb2
from ortools.constraint_solver import pywrapcp


for k in sightseeing_cluster:

    def build_distance_matrix(dff):
        """Distance matrix: Calculate the distance between venue and venue"""
        
        dff['distance_matrix'] = ''
        for i in range(0,len(dff)):
            distance_matrix = []
            for j in range(0,len(dff)):
                distance_matrix.append(int(round((math.sqrt(pow(float(dff['lat'][i])-float(dff['lat'][j]),2) + pow(float(dff['lng'][i])-float(dff['lng'][j]),2))*111320))))
            dff['distance_matrix'][i] = distance_matrix
        return list(dff['distance_matrix'])

    def create_data_model():
        """Stores the data for the problem."""
        data = {}
        data['distance_matrix'] = build_distance_matrix(k)
        data['num_vehicles'] = 1
        data['depot'] = 0
        data['name'] =list(k['name'])
        return data

    def print_solution(data, manager, routing, solution):
        """Prints solution on console."""
        print('Objective Trip Distance: {} miles'.format(solution.ObjectiveValue()))
        index = routing.Start(0)
        plan_output = 'Route Order for Trip Cluster:\n'
        route_distance = 0
        
        while not routing.IsEnd(index):
            node_index = manager.IndexToNode(index)
            plan_output += str(data['name'][manager.IndexToNode(index)]) + ' -> '
            
            previous_index = index
            
            index = solution.Value(routing.NextVar(index))
            route_distance += routing.GetArcCostForVehicle(previous_index, index, 0)
        
        plan_output += ' {}\n'.format(manager.IndexToNode(index))
        
        print(plan_output)
        plan_output += 'Route distance: {}miles\n'.format(route_distance)


    def main():
        """Entry point of the program."""
        # Instantiate the data problem.
        data = create_data_model()

        # Create the routing index manager.
        manager = pywrapcp.RoutingIndexManager(len(data['distance_matrix']),
                                           data['num_vehicles'], data['depot'])

        # Create Routing Model.
        routing = pywrapcp.RoutingModel(manager)


        def distance_callback(from_index, to_index):
            """Returns the distance between the two nodes."""
            # Convert from routing variable Index to distance matrix NodeIndex.
            from_node = manager.IndexToNode(from_index)
            to_node = manager.IndexToNode(to_index)
            return data['distance_matrix'][from_node][to_node]

        transit_callback_index = routing.RegisterTransitCallback(distance_callback)

        # Define cost of each arc.
        routing.SetArcCostEvaluatorOfAllVehicles(transit_callback_index)

        # Setting first solution heuristic.
        search_parameters = pywrapcp.DefaultRoutingSearchParameters()
        search_parameters.first_solution_strategy = (
            routing_enums_pb2.FirstSolutionStrategy.PATH_CHEAPEST_ARC)

        # Solve the problem.
        solution = routing.SolveWithParameters(search_parameters)

        # Print solution on console.
        if solution:
            print_solution(data, manager, routing, solution)


    if __name__ == '__main__':
        main()
        

<img src="https://imgur.com/jcjhnUH.png">

### 5) Results and Conclusion

Finally, we come out with the trip detail and visit order for these 3 days, now we are able to make tour arrangements for our time-constrained customers. <br><br>

**Day 1** <br>
<li> Start from "Lincoln Square", and first we will visit "Museum of Modern Art (MoMA)"</li>
<li> After that, we will visit "St. Patrick's Cathedral".</li>
<li> Then, we will go to "Top of the Rock Observation Deck" and have lunch nearby.</li>
<li> After lunch, we will go to "Radio City Music Hall". </li>
<li> Then, we will go to "Winter Garden Theatre" and "Majestic Theatre".</li>
<li> Then, we will enjoy a show at "Gershwin Theatre".</li>
<li> Finally, let's have some food and drinks in the bars nearby.    </li>
<br>

**Day 2** <br>
<li> First of all, we will go to "Shakespeare Garden" and "Delacorte Theater". </li>
<li> After that, we will visit The "Metropolitan Museum of Art (Metropolitan Museum of Art)" and have lunch nearby. </li>
<li> After lunch, we will visit "Temple of Dendur". </li>
<li> Then, we will go to "Jacqueline Kennedy Onassis Reservoir" and "Central Park". </li>
<li> Finally, let's shop nearby. </li>
<br>

**Day 3** <br>
<li> First of all, we will visit "Gagosian Gallery". </li>
<li> After that, we will go to "Pier 63 Hudson River Park" and have lunch nearby. </li>
<li> After lunch, we will visit "David Zwirner Gallery". </li>
<li> Then, we will go to "High Line 10th Ave Amphitheatre" and "Chelsea Market". </li>
<li> Finally, let's go to "High Line" for our last stop in New York City. Enjoy the skyline of New York City! </li>
<br>
<br>
The purpose of this problem is to help foreign travellers (especially for those who have never been to New York City before (like ME)) to quickly arrange the visiting order of the top attractions in a scientific and reasonable way.

------------------------------------------------------------------------------------------------------------------------

The detailed code is available on <a href="https://github.com/PrestonYU/Coursera_Capstone/blob/master/Capstone_Project_wk4_and_5.ipynb">Github</a>. (<a href="https://nbviewer.jupyter.org/github/PrestonYU/Coursera_Capstone/blob/master/Capstone_Project_wk4_and_5.ipynb">nbviewer</a>) 