# CAPSTONE PROJECT FINAL NOTEBOOK 1

## 1) Business Problem

   It has been a year since I moved to İstanbul, start my life as an adult, fresh out of university. The city has a lot to offer, in many ways similar to the "Big cities" in the world, but somehow protecting the cultural and historical fabric of hers. Being one of the most important cities in history, İstanbul was home to many great civilizations and integrated some part of each to its identity. I believe that's why it has always been such a colorful, diverse and whimsical city. Its cuisine, alongside other cultural aspects, is a melting pot of cultures therefore it is the source of inspiration for this project. 
   
   In this final project of IBM Data Science course, I will explore the Turkish Restaurants in İstanbul according to their "likes" and price category using Foursquare API. I will use k-means clustering to cluster these restaurants and gain insights on their profiles as well as their popularity. This project may help visitors or residents of İstanbul to have an idea about the restaurants serving Turkish food in this beautiful city. The project may also be of use for those who'd like to start a gastro-business in İstanbul. 
    
   The clusters of restaurants will provide a general profile for each restaurant, hence, it will be an integral indicator of the restaurants' branding and marketing communications. Because the restaurant business in İstanbul is widespread and varied, restaurants need to determine the consumers they will address and form their brand accordingly. Therefore, alongside picturing the profiles for restaurants, the clusters will also be a depiction of consumers prefer to go to these restaurants. 

   Here I will try to identify several restaurants to recommend to visitors looking for a place to experience the Turkish Cuisine in İstanbul. Taking clusters into account will fasten the process of restaurant selection and return a higher rate of satisfaction thanks to data science and machine learning algorithms!

## 2) Data 

### 2.1) Understanding

Let's start by introducing Istanbul. As you can see below, the city has two sides seperated by the bosphorus: Anatolian and European. Both sides have interesting places to visit and see. Our data will be comprised of restaurants from both sides of the city.

In [1]:
IST_LAT = 41.0082 #latitude
IST_LON = 28.9784 #longitude
print('The geograpical coordinates of Istanbul are {}, {}.'.format(IST_LAT, IST_LON))

The geograpical coordinates of Istanbul are 41.0082, 28.9784.


Let's have a look at Istanbul on a map by the help of Folium library. 

In [2]:
!conda install -c conda-forge Folium=0.5.0 --yes 
import folium 
print('imported')


Collecting package metadata (current_repodata.json): done
Solving environment: failed with initial frozen solve. Retrying with flexible solve.
Collecting package metadata (repodata.json): done
Solving environment: done

## Package Plan ##

  environment location: /home/jupyterlab/conda/envs/python

  added / updated specs:
    - folium=0.5.0


The following packages will be downloaded:

    package                    |            build
    ---------------------------|-----------------
    altair-4.1.0               |             py_1         614 KB  conda-forge
    branca-0.4.1               |             py_0          26 KB  conda-forge
    brotlipy-0.7.0             |py36h8c4c3a4_1000         346 KB  conda-forge
    chardet-3.0.4              |py36h9f0ad1d_1006         188 KB  conda-forge
    cryptography-2.9.2         |   py36h45558ae_0         613 KB  conda-forge
    folium-0.5.0               |             py_0          45 KB  conda-forge
    pandas-1.0.3               |   py36h83

In [3]:
istanbul_map = folium.Map(location = [IST_LAT, IST_LON], zoom_start = 12)
folium.Marker([IST_LAT, IST_LON]).add_to(istanbul_map)
istanbul_map

Istanbul is such a big city! Its coordinates are the Historical Peninsula coordinates, lets move the cordinates to more central place so that our radius can encompass restaurants from the relevant boroughs tourism-wise. Taking 'Gayrettepe' neighborhood's latitude and longitude as central point and setting the radius to 15 kilometers will enable us to reach all the beautiful coastal restaurants. I will use the restaurant data from Foursquare's Places API to fetch all the venues in Istanbul in the 'Turkish Restaurant' category.

In [4]:
# Execute the same code with Gayrettepe coordinates
IST_LAT_G = 41.0642 #latitude
IST_LON_G = 29.0067 #longitude
print('The geograpical coordinates of Gayrettepe are {}, {}.'.format(IST_LAT_G, IST_LON_G))
istanbul_map = folium.Map(location = [IST_LAT_G, IST_LON_G], zoom_start = 12)
folium.Marker([IST_LAT_G, IST_LON_G]).add_to(istanbul_map)
istanbul_map

The geograpical coordinates of Gayrettepe are 41.0642, 29.0067.


### 2.2) Preparation

In [6]:
#First things first, import the necessary libraries

import numpy as np 

import pandas as pd 
pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', None)

import json
import requests 
from pandas.io.json import json_normalize

import matplotlib.cm as cm
import matplotlib.colors as colors

from sklearn.cluster import KMeans

#### The Foursquare API

The Foursquare API enables the 'explore' request to have a price parameter. It has also a predefined price parameter ranging from 1 to 4, 1 being the cheapest. I will use price parameter to reach different restaurants in these four different price categories. 

In [2]:
CLIENT_ID = 'VXH2UEH1GIUWSJ0RXLF3JAG4IAKYEVVOTPE5OFGANX33AVBK' 
CLIENT_SECRET = 'LAYNPG0QVGU3L1VCJRB5GIBBC5J1KJ2XKVEKV124WNRAQPKL'
VERSION = '20200512'

In [7]:
LIMIT = 100 

radius = 15000 

query = 'Turkish Restaurant'

IST_LAT_G = 41.0642 #latitude
IST_LON_G = 29.0067 #longitude

url_1 = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}&query={}&price={}'.format(
    CLIENT_ID, 
    CLIENT_SECRET, 
    VERSION, 
    IST_LAT_G, 
    IST_LON_G, 
    radius, 
    LIMIT,
query,'1')


url_2 = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}&query={}&price={}'.format(
    CLIENT_ID, 
    CLIENT_SECRET, 
    VERSION, 
    IST_LAT_G, 
    IST_LON_G, 
    radius, 
    LIMIT,
query, '2')


url_3 = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}&query={}&price={}'.format(
    CLIENT_ID, 
    CLIENT_SECRET, 
    VERSION, 
    IST_LAT_G, 
    IST_LON_G, 
    radius, 
    LIMIT,
query, '3')


url_4 = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}&query={}&price={}'.format(
    CLIENT_ID, 
    CLIENT_SECRET, 
    VERSION, 
    IST_LAT_G, 
    IST_LON_G, 
    radius, 
    LIMIT,
query, '4')



In [8]:
def get_category_type(row):
    try:
        categories_list = row['categories']
    except:
        categories_list = row['venue.categories']
        
    if len(categories_list) == 0:
        return None
    else:
        return categories_list[0]['name']

In [10]:
filtered_columns = ['venue.id','venue.name', 'venue.categories', 'venue.location.lat', 'venue.location.lng']

results_1 = requests.get(url_1).json()
venues_1 = results_1['response']['groups'][0]['items']
df_1 = json_normalize(venues_1) 
df_1 = df_1.loc[:, filtered_columns]
df_1['price_category'] = 1
#filter the category for each row
df_1['venue.categories'] = df_1.apply(get_category_type, axis=1)
# clean columns
df_1.columns = [col.split(".")[-1] for col in df_1.columns]


results_2 = requests.get(url_2).json()
venues_2 = results_2['response']['groups'][0]['items']
df_2 = json_normalize(venues_2) 
df_2 = df_2.loc[:, filtered_columns]
df_2['price_category'] = 2
#filter the category for each row
df_2['venue.categories'] = df_2.apply(get_category_type, axis=1)
# clean columns
df_2.columns = [col.split(".")[-1] for col in df_2.columns]


results_3 = requests.get(url_3).json()
venues_3 = results_3['response']['groups'][0]['items']
df_3 = json_normalize(venues_3) 
df_3 = df_3.loc[:, filtered_columns]
df_3['price_category'] = 3
#filter the category for each row
df_3['venue.categories'] = df_3.apply(get_category_type, axis=1)
# clean columns
df_3.columns = [col.split(".")[-1] for col in df_3.columns]


results_4 = requests.get(url_4).json()

venues_4 = results_4['response']['groups'][0]['items']
df_4 = json_normalize(venues_4) 
df_4 = df_4.loc[:, filtered_columns]
df_4['price_category'] = 4
#filter the category for each row
df_4['venue.categories'] = df_4.apply(get_category_type, axis=1)
# clean columns
df_4.columns = [col.split(".")[-1] for col in df_4.columns]



In [11]:
df_concat = pd.concat([df_1, df_2, df_3, df_4], ignore_index=True)
df_concat.sort_values(by = ['name'])

Unnamed: 0,id,name,categories,lat,lng,price_category
322,5511a590498ed87a67c3ea2d,1C1K BİSTRO RESTORAN,Turkish Restaurant,40.987944,29.033867,4
204,5015839de4b0e3a6f51f7763,34 Restaurant,Turkish Restaurant,41.040917,28.988215,3
281,4ba3def3f964a5209f6838e3,9 ECE AKSOY,Turkish Restaurant,41.030552,28.97422,3
247,4dccfc9b1f6eb1227055833b,Abdulkadir Restaurant,Turkish Restaurant,40.978485,28.876531,3
22,4db2ccd24b226b343d695ed8,Abooov Kebap,Turkish Restaurant,40.996031,29.036989,1
72,4cc96e24786e468834dc9309,Abov Dürüm,Turkish Restaurant,41.01651,29.164926,1
266,4d3326b6f8c9224b15b5b7d2,Adana Dostlar,Turkish Restaurant,40.985052,29.098628,3
58,4ef1da520e617e29c65ebf65,Adana Ocakbaşı İslam Abi'nin Yeri,Turkish Restaurant,40.978931,28.877016,1
271,4dea8b761f6e3ddebdcf7973,Adana Özasmaaltı Kebap,Turkish Restaurant,40.953525,29.094991,3
272,4dc4597045dd26455254b8c9,Adanalı Hasan Kolcuoğlu,Turkish Restaurant,40.988028,29.124572,3


In [13]:
venues_ID = df_concat['id'].values.tolist()
venues_ID[0]

'52155319bce65e6db080f832'

In [14]:
#get the likes for each restaurant in all_venues dataframe
url_list = []
like_list = []
json_list = []

for i in venues_ID:
    venue_url = 'https://api.foursquare.com/v2/venues/{}/likes?client_id={}&client_secret={}&v={}'.format(i, CLIENT_ID, CLIENT_SECRET, VERSION)
    url_list.append(venue_url)
for link in url_list:
    result = requests.get(link).json()
    likes = result['response']['likes']['count']
    like_list.append(likes)
print(like_list)


df_concat['likes'] = like_list
df_concat.head()

[121, 327, 277, 899, 45, 551, 694, 2121, 173, 686, 626, 11, 30, 60, 34, 220, 1118, 161, 70, 140, 121, 115, 234, 831, 41, 654, 215, 718, 164, 58, 2086, 765, 236, 5, 95, 63, 466, 601, 280, 28, 743, 57, 8, 28, 37, 12, 64, 19, 378, 6, 340, 50, 34, 7, 13, 56, 12, 8, 141, 190, 76, 8, 7, 33, 52, 142, 12, 760, 340, 17, 7, 7, 8, 35, 12, 15, 84, 636, 6, 433, 55, 84, 20, 9, 144, 408, 858, 10, 15, 58, 7, 244, 144, 31, 578, 11, 4, 15, 180, 15, 247, 500, 50, 107, 2512, 270, 75, 552, 69, 12, 51, 546, 83, 417, 865, 9, 31, 34, 448, 171, 170, 1012, 27, 508, 504, 250, 292, 28, 206, 37, 1416, 9, 14391, 880, 95, 17, 593, 130, 46, 16, 750, 21, 8, 92, 609, 55, 56, 177, 22, 235, 280, 78, 145, 271, 11, 499, 45, 13, 132, 143, 79, 13, 298, 60, 146, 329, 160, 1499, 28, 12, 4227, 175, 182, 446, 50, 22, 196, 8, 87, 320, 3043, 400, 88, 100, 302, 277, 328, 52, 443, 19, 85, 78, 159, 78, 53, 69, 13, 34, 1601, 49, 377, 1089, 87, 77, 41, 1924, 448, 249, 317, 2931, 1694, 83, 209, 1700, 30, 777, 1153, 588, 2439, 2127, 19, 

Unnamed: 0,id,name,categories,lat,lng,price_category,likes
0,52155319bce65e6db080f832,Ciğerden Cengiz Usta,Turkish Restaurant,41.044394,28.977855,1,121
1,4d5a69f88492a143e9cd3227,Edirne Ciğercisi Naci Usta,Turkish Restaurant,41.048101,29.002113,1,327
2,54fd86d2498ef2ecce6c0ed9,Balkan Lokantası,Turkish Restaurant,41.043625,29.006267,1,277
3,4dcc4a967d8b19012470f580,Yes Kardeşler,Turkish Restaurant,41.05309,29.000424,1,899
4,4e70d88688773f007b7030fb,Lagash,Turkish Restaurant,41.050022,29.05371,1,45


In [43]:
print('So we have {} venues in our dataframe.'.format(df_concat.shape[0]))

So we have 372 venues in our dataframe.


### 2.3) Mapping Restaurants

In [44]:
map_venues = folium.Map(location=[IST_LAT_G, IST_LON_G], zoom_start=10)

# add markers to map
for lat, lng, name in zip(df_concat['lat'], df_concat['lng'], df_concat['name']):
    label = '{}'.format(name)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_venues)  
    
map_venues

### 3) Analysis