# Mini Project II Presentation

##### Chao Si
##### March 18, 2022



### Purpose
* The main goal of this mini-project is to build the database of restaurants within walkable distance in the area of Kerrisdale, Vancouver and compare which API has better coverage.

* Combine and practice skills in:
    * APIs
    * Databases (SQL)
    * Pandas
    * Data wrangling

### Task
* Retrieve the data about various restaurants in the area through different APIs (FOURSQUARE, YELP, GOOGLE).

* Create own SQLite database and store the data about the POIs.

* Compare the results and determine which API has a better coverage of the area.

* Choose the top 10 POIs based on average rating.

* (Stretch) A bit fun with traveling salesman problem (TSP).

### Results
* SQL Tables created by data collected from different APIs
* Top 10 restaurants in Kerrisdale from Google Place mapped
* TSP using Google's ORTools
    * Objective: 9303 meters  \
        Route for vehicle 0:  \
        Route order (complte loop): 0 -> 2 -> 3 -> 5 -> 9 -> 8 -> 6 -> 7 -> 4 -> 1 -> 0

In [90]:
import gmaps
import gmaps.datasets
import os
import pandas as pd

api_key = os.environ['GPLACE_KEY']
gmaps.configure(api_key)
t10_df = pd.read_csv('top10loc.csv')
t10_df.insert(0, column='rank', value=[0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

figure_layout = {
    'width': '1200px',
    'height': '900px',
    'border': '1px solid black',
    'padding': '1px'
}

t10_locations = [
    (49.236033,-123.155774),
    (49.233200,-123.155448),
    (49.237795,-123.155628),
    (49.242562,-123.170108),
    (49.219346,-123.149315),
    (49.234776,-123.159282),
    (49.234182,-123.140192),
    (49.233933,-123.139953),
    (49.233898,-123.154094),
    (49.234785,-123.156730)
]
t10_ranks = t10_df.index
t10_names = t10_df['name']
locations = t10_df[['lat', 'lng']]
weights = t10_df['rating']

fig = gmaps.figure(map_type='HYBRID', layout=figure_layout)
fig.add_layer(gmaps.heatmap_layer(locations, weights=weights))
# fig.add_layer(gmaps.symbol_layer(locations))
# fig.add_layer(gmaps.marker_layer((49.236033,-123.155774)))
fig

Figure(layout=FigureLayout(border='1px solid black', height='900px', padding='1px', width='1200px'))

In [91]:
import folium

my_map = folium.Map(
    location=[49.234456,-123.155144],
    zoom_start=14
)
for _, loc in t10_df.iterrows():
    folium.Marker(
        location=[loc['lat'], loc['lng']],
        popup=loc['name'],
        tooltip=loc['rank']
    ).add_to(my_map)

my_map

### Discussion
* YELP API did not do a great job on finding the best restaurants in the Kerrisdale community. (No website info?)

* FOURSQURE returns the most results 99 (compare to YELP with 66, and GOOGLE with 60), and the quality of the query is good.

* Google seems to have the largest coverage of the area, but stuck with a 60 query results hard limit. Also, they only reply 20 results/request, and need three requests to do the job. (Corporate greed at play!)

* Wish there would be a unified data structure for all these APIs. Currently, it seems every player is building its own set.

* If I had more time:
    * I will definitely refine my search criteria to get more accurate data;
    * Reorganize/Recreate my SQL tables. The data I selected to store may not be representative for my database;
    * Make better comparisons between APIs, from different perspectives;
    * Learn how to use visualization tools to better present my results.

Presentation skills  \
quality of data: relevance, domain_knowledge, duplicates of data  \
apples to apples even though losing info during process, e.g. 10 points to 5 stars  \
watch out for empty and NAN cells  \
Crowd sourced database is sus, popularity is not a solid relevance

1. Use APIs/tools in Python. <span style='color:red'>structure, store, access data</span>
2. Figure out things that are not working. Read docs, look for mentors. Debugging.

##### Summary functions in Seaborn

In [None]:
t10_restaurants = [
    {'rank': 1, 'rating': t10_df['rating'][0], 'name': t10_df['name'][0], 'location': (49.236033,-123.155774)},
    {'rank': 2, 'rating': t10_df['rating'][1], 'name': t10_df['name'][1], 'location': (49.233200,-123.155448)},
    {'rank': 3, 'rating': t10_df['rating'][2], 'name': t10_df['name'][2], 'location': (49.237795,-123.155628)},
    {'rank': 4, 'rating': t10_df['rating'][3], 'name': t10_df['name'][3], 'location': (49.242562,-123.170108)},
    {'rank': 5, 'rating': t10_df['rating'][4], 'name': t10_df['name'][4], 'location': (49.219346,-123.149315)},
    {'rank': 6, 'rating': t10_df['rating'][5], 'name': t10_df['name'][5], 'location': (49.234776,-123.159282)},
    {'rank': 7, 'rating': t10_df['rating'][6], 'name': t10_df['name'][6], 'location': (49.234182,-123.140192)},
    {'rank': 8, 'rating': t10_df['rating'][7], 'name': t10_df['name'][7], 'location': (49.233933,-123.139953)},
    {'rank': 9, 'rating': t10_df['rating'][8], 'name': t10_df['name'][8], 'location': (49.233898,-123.154094)},
    {'rank': 10, 'rating': t10_df['rating'][9], 'name': t10_df['name'][9], 'location': (49.234785,-123.156730)}
]

t10_locations = t10_df[['lat', 'lng']]
t10_ranks = t10_df.index
t10_names = t10_df['name']

info_box_template = """
<dl>
<dt>Rank</dt><dd>{rank}</dd>
<dt>Name</dt><dd>{name}</dd>
<dt>Rating</dt><dd>{rating}</dd>
</dl>
"""
# t10_info = [info_box_template.format(**rest) for rest in t10_restaurants]

# marker_layer = gmaps.marker_layer(t10_locations, info_box_content=t10_info)
# marker_layer = gmaps.marker_layer(locations=t10_locations, hover_text=t10_names, label=t10_ranks)
marker_layer = gmaps.marker_layer(locations)
# fig = gmaps.figure()
# fig.add_layer(marker_layer)
# fig