***

<h1 align="center"><font color=darkblue>BUDAPEST HARMONY</font></h1>

***

<h2 align="center">Where should you install your startup headquarter to guarantee a pleasant working and rest environment to your collaborators?</h2>

***

## Table of contents
* [1.Introduction: Business Problem](#introduction)
* [2. Data](#data)
 * [2.1 Building of the venues dataset](#buildvenues)
   * [2.1.1 Budapest city center](#budapest)
   * [2.1.2 Official (thermal) baths of Budapest](#baths)
   * [2.1.3 Vegetarian or vegan restaurants](#veg)
   * [2.1.4 Hungarian food restaurants](#hungarian)
   * [2.1.5 Fitness centers](#sport)
   * [2.1.6 Conference rooms](#conference)
   * [2.1.7 Libraries](#libraries)
   * [2.1.8 Concatenate venues datasets](#venuesdata)
 * [2.2 Building of the duration to airport dataset](#buildduration)
* [3. Methodology](#methodology)
* [4. Results](#results)
 * [4.1 Exploration of geospatial data](#geospatial)
 * [4.2 Clustering of geospatial, distance and duration to airport data](#clustering)
 * [4.3 Visualization of the suggested area and description of venues accessibilities](#kalvin)
* [5. Discussion](#discussion)
* [6. Conclusion](#conclusion)

*Libraries import and setup:*

In [1]:
from geopy.geocoders import Nominatim # convert an address into latitude and longitude values
from pandas.io.json import json_normalize # tranforming json file into a pandas dataframe library
from sklearn.cluster import DBSCAN # import DBSCAN from clustering stage
from sklearn.preprocessing import StandardScaler # standardize data
from unidecode import unidecode # Hungarian characters conversion to the closest representation in ascii text

import sklearn.utils # used to set seed
import json # work with JSON data
import requests # library to handle requests
import pandas as pd # library for data analysis
import numpy as np # library to handle data in a vectorized manner
import folium # library for maps
import matplotlib.cm as cm # library for plotting

# Observe maximum 100 rows:
pd.set_option('display.max_rows', 100)

## 1. Introduction: Business Problem <a name="introduction"></a>

An Hungarian startup is quickly growing. Its current headquarter is a small room in the countryside, located 4 hours from the capital city, Budapest. **The company needs to relocate its headquarters in Budapest** in a larger space as the collaborators are getting more numerous and they have to go abroad and host partners from abroad to maintain and increase the international growth of the company. The two founders of the company pays particular attention to the **well-being of their collaborators and their partners visiting the company**. Therefore, the choice of their new headquarter will be highly related to the accessibility of facilities allowing relaxing and/or facilitating the **daily harmony between professional and personal activities**.

The startup founders are thus **asking for suggestion about where to settle their new headquarter**. The location of the new headquarters must meet the following criteria:
- one of the famous **official (thermal) bath** from Budapest should be no more than **1km** (as the crow flies) from the headquarter
- a **conference center** should be no more than **1km** (as the crow flies) from the headquarter
- a **library** should be no more than **1km** (as the crow flies) from the headquarter
- at least one restaurant providing **vegetarian or vegan food** and one restaurant providing **hungarian food** should be no more than **500m** (as the crow flies) from the headquarter
- a **fitness center** should be no more than **500m** (as the crow flies) from the headquarter
- location from the Liszt Ferenc **International Airport** of Budapest is also important. Going to the airport should take **less than 45 minutes** by car/taxi.

In case of more than one location responding to all the criteria, the duration of the trip from/to the airport could be used to hierarchize these locations.

## 2. Data <a name="data"></a>

Based on the request from the startup we have to gather geospatial data as well as name and categories of venues in Budapest in order to find a solution which meets the given criteria.

**Geopy** will be used to obtain **latitude and longitude of Budapest city and Budapest international airport**. It will also be used to get coordinates of the suggested places for the headquarter if these places can be precise enough.

The given distance for the first 5 criteria are "as the crow flies" so we will use **Foursquare API** (https://developer.foursquare.com/) to get coordinates of the venues and **latitude and longitude associated with name and categories** of each of them. However, a maximum duration for the trip by car/taxi to the airport is given, this information cannot be obtain through Foursquare. So, to obtain distance and **trip duration to the airport** we will use **MapQuest API** (https://developer.mapquest.com/).

The startup founders precise that they wish to be close to an "official (thermal) bath" of Budapest. So we will consider only official baths, results from Foursquare will have to be filtered with this purpose. Based on the official website of Budapest baths (http://www.budapestgyogyfurdoi.hu/gyogyfurdok-es-strandok) we know that there are **12 official baths**: Széchenyi, Gellért, Rudas, Lukács, Király, Dandár, Paskál, Palatinus, Csillaghegy, Pesterzsébet, Római and Pünkösdfürdő.



### 2.1 Building of the venues dataset <a name="buildvenues"></a>


#### 2.1.1 Budapest city center <a name="budapest"></a>

**We use geopy library to get the coordinates of Budapest (Hungary)**

In [2]:
address = 'Budapest, Hungary'

geolocator = Nominatim(user_agent="bud_explorer")
location = geolocator.geocode(address)
latBUD = location.latitude
longBUD = location.longitude
print('The geograpical coordinate of Budapest are {}, {}.'.format(latBUD, longBUD))

The geograpical coordinate of Budapest are 47.4983815, 19.0404707.


#### 2.1.2 Official (thermal) baths of Budapest <a name="baths"></a>

We first set a hidden cell with our Foursquare API "client ID" and "Client secret"

In [3]:
# @hidden_cell
CLIENT_ID = 'KBS4CBNEO0UWR2RSVTJQBD4JXVC2UCCH2ZPXMFW103FFEEJT' # your Foursquare ID
CLIENT_SECRET = 'RNGPZ3WG34QOQQS30WEMKSDPQZ5OEOXHO5KG5BYZ3FJMQVRM' # your Foursquare Secret

**We search for the Budapest official thermal baths using Foursquare API**

In order to obtain data from all the official thermal baths of Budapest we need a radius of 12km. We give a limit number of results of 100 as many thermal baths, which are not considered as official city bath, exist in Budapest. Therefore, we expect to get the 12 baths we are looking for within this set of results.
We performed several tests in order to get better results from Foursquare, using different types of queries and combined queries, but none of them is giving the 12 baths at once. Indeed, these baths are classified in different categories, with no string in common in their name, etc., and it seems that the API in not very stable. Best results were obtained first searching for "Gyógyfürdő" (hungarian term for "thermal bath" or "spa") and then searching for "Strand" (hungarian term for "pool" or "water park").
We will keep the same radius and limit for the queries concerning the other venues.

In [4]:
VERSION = '20180605' # Foursquare API version
radius = 12000 #in meters
LIMIT = 100
search_query = 'Gyógyfürdő' #means thermal bath in Hungarian
categoryID = '4bf58dd8d48988d1ed941735'
url = 'https://api.foursquare.com/v2/venues/search?client_id={}&client_secret={}&ll={},{}&v={}&query={}&categoryID={}&radius={}&limit={}'.format(CLIENT_ID, CLIENT_SECRET, latBUD, longBUD, VERSION, search_query, categoryID, radius, LIMIT)

resultsTB = requests.get(url).json()

# assign relevant part of JSON to venues
venuesTB = resultsTB['response']['venues']

# tranform venues into a dataframe
dfTB = pd.json_normalize(venuesTB)

**Keep only columns that include venue name, and anything that is associated with location**

In [5]:
filtered_columns = ['name', 'categories'] + [col for col in dfTB.columns if col.startswith('location.')] + ['id']
dfTB_filtered = dfTB.loc[:, filtered_columns]

# function that extracts the category of the venue
def get_category_type(row):
    try:
        categories_list = row['categories']
    except:
        categories_list = row['venue.categories']
        
    if len(categories_list) == 0:
        return None
    else:
        return categories_list[0]['name']

# filter the category for each row
dfTB_filtered['categories'] = dfTB_filtered.apply(get_category_type, axis=1)

# clean column names by keeping only last term
dfTB_filtered.columns = [column.split('.')[-1] for column in dfTB_filtered.columns]

**We extract only data with category "Spa"**

In [6]:
DataSpa = dfTB_filtered[dfTB_filtered['categories']=='Spa']
DataSpa.reset_index(drop=True, inplace=True)
DataSpa

Unnamed: 0,name,categories,address,crossStreet,lat,lng,labeledLatLngs,distance,postalCode,cc,city,state,country,formattedAddress,neighborhood,id
0,Rudas Gyógyfürdő és Uszoda,Spa,Döbrentei tér 9.,Szent Gellért rkp.,47.489188,19.047761,"[{'label': 'display', 'lat': 47.48918773695423...",1161,1013,HU,Budapest,Budapest,Magyarország,"[Budapest, Döbrentei tér 9. (Szent Gellért rkp...",,4b2a3b92f964a52058a624e3
1,Szent Gellért Gyógyfürdő és Uszoda,Spa,Kelenhegyi út 4.,,47.483917,19.052256,"[{'label': 'display', 'lat': 47.4839168757732,...",1838,1118,HU,Budapest,Budapest,Magyarország,"[Budapest, Kelenhegyi út 4., 1118, Magyarország]",,4b2a3f25f964a52079a624e3
2,Szent Lukács Gyógyfürdő és Uszoda,Spa,Frankel Leó u. 25-29.,,47.518829,19.08177,"[{'label': 'display', 'lat': 47.518829, 'lng':...",3850,1023,HU,Budapest,Budapest,Magyarország,"[Budapest, Frankel Leó u. 25-29., 1023, Magyar...",,4b7ec53ef964a52012fe2fe3
3,Dandár Gyógyfürdő,Spa,Dandár u. 3.,,47.476337,19.071061,"[{'label': 'display', 'lat': 47.47633741284682...",3364,1095,HU,Budapest,Budapest,Magyarország,"[Budapest, Dandár u. 3., 1095, Magyarország]",,4e4131aea809968085325a0f
4,Széchenyi Gyógyfürdő és Uszoda,Spa,Állatkerti körút 9-11.,,47.518302,19.082394,"[{'label': 'display', 'lat': 47.51830206635792...",3854,1146,HU,Budapest,Budapest,Magyarország,"[Budapest, Állatkerti körút 9-11., 1146, Magya...",Városliget,4bb6452c2f70c9b606278530
5,Király Gyógyfürdő,Spa,Fő u. 84.,Kacsa u.,47.510608,19.038185,"[{'label': 'display', 'lat': 47.51060830660980...",1371,1027,HU,Budapest,Budapest,Magyarország,"[Budapest, Fő u. 84. (Kacsa u.), 1027, Magyaro...",,4c756513ff1fb60c5915f6a7


We got 6 of the 12 official Budapest baths.

**Search for the Budapest official water park or pool**

In [7]:
VERSION = '20180605' # Foursquare API version
radius = 12000 #in meters
LIMIT = 100

search_query = 'Strand' 
categoryID = '4bf58dd8d48988d1ed941735'
url = 'https://api.foursquare.com/v2/venues/search?client_id={}&client_secret={}&ll={},{}&v={}&query={}&categoryID={}&radius={}&limit={}'.format(CLIENT_ID, CLIENT_SECRET, latBUD, longBUD, VERSION, search_query, categoryID, radius, LIMIT)

resultsST = requests.get(url).json()

# assign relevant part of JSON to venues
venuesST = resultsST['response']['venues']

# tranform venues into a dataframe
dfST = pd.json_normalize(venuesST)

In [8]:
# keep only columns that include venue name, and anything that is associated with location
filtered_columns = ['name', 'categories'] + [col for col in dfST.columns if col.startswith('location.')] + ['id']
dfST_filtered = dfST.loc[:, filtered_columns]

# function that extracts the category of the venue
def get_category_type(row):
    try:
        categories_list = row['categories']
    except:
        categories_list = row['venue.categories']
        
    if len(categories_list) == 0:
        return None
    else:
        return categories_list[0]['name']

# filter the category for each row
dfST_filtered['categories'] = dfST_filtered.apply(get_category_type, axis=1)

# clean column names by keeping only last term
dfST_filtered.columns = [column.split('.')[-1] for column in dfST_filtered.columns]

pd.set_option('display.max_rows', None)

**We extract only data with category "Water Park" or "Pool"**

In [9]:
DataPoolTemp = dfST_filtered[dfST_filtered['categories'].isin(['Water Park', 'Pool']) ]
DataPoolTemp.reset_index(drop=True, inplace=True)
DataPoolTemp

Unnamed: 0,name,categories,lat,lng,labeledLatLngs,distance,cc,country,formattedAddress,postalCode,city,state,address,crossStreet,neighborhood,id
0,Palatinus Strandfürdő,Water Park,47.52917,19.046946,"[{'label': 'display', 'lat': 47.52916959486753...",3461,HU,Magyarország,"[Budapest, Margitsziget, 1138, Magyarország]",1138.0,Budapest,Budapest,Margitsziget,,,4bd2fdf8caff952130bcd3f0
1,"Dagály Termálfürdő, Strandfürdő és Uszoda",Water Park,47.538782,19.061464,"[{'label': 'display', 'lat': 47.53878223302512...",4766,HU,Magyarország,"[Budapest, Népfürdő u. 36., 1138, Magyarország]",1138.0,Budapest,Budapest,Népfürdő u. 36.,,,4c0e6d8098102d7f989ce306
2,"Budaörs Városi Uszoda, Sportcsarnok és Strand",Pool,47.455988,18.969628,"[{'label': 'display', 'lat': 47.45598830916391...",7119,HU,Magyarország,"[Budaörs, Hársfa u. 6., 2040, Magyarország]",2040.0,Budaörs,Pest megye,Hársfa u. 6.,,,4cc4ba8abde8f04d7440ad4b
3,Paskál Gyógy- és Strandfürdő,Water Park,47.520571,19.127469,"[{'label': 'display', 'lat': 47.52057142517493...",6992,HU,Magyarország,"[Budapest, Egressy út 178/f., 1141, Magyarország]",1141.0,Budapest,Budapest,Egressy út 178/f.,,,4c3c367586ce328f2069ab2d
4,BVSC strand,Pool,47.520624,19.089959,"[{'label': 'display', 'lat': 47.520624, 'lng':...",4469,HU,Magyarország,"[Budapest, 1142 Budapest, 1142, Magyarország]",1142.0,Budapest,Budapest,1142 Budapest,,,4dea07b7d22da22d4ec82ef0
5,Óbudai Strand,Pool,47.552394,19.027236,"[{'label': 'display', 'lat': 47.55239390869436...",6094,HU,Magyarország,[Magyarország],,,,,,,51ea7a70498ea0aa94e54bd6
6,Római Strandfürdő,Pool,47.574811,19.052087,"[{'label': 'display', 'lat': 47.57481146475501...",8552,HU,Magyarország,"[Budapest, Rozgonyi Piroska u., 1031, Magyaror...",1031.0,Budapest,Budapest,Rozgonyi Piroska u.,,,4c2f62663896e21ea1c9e390
7,Pünkösdfürdői Strand,Pool,47.594627,19.0679,"[{'label': 'display', 'lat': 47.59462716397096...",10910,HU,Magyarország,"[Budapest, Királyok útja 272. (Pünkösdfürdői ú...",1039.0,Budapest,Budapest,Királyok útja 272.,Pünkösdfürdői út,,4c3f41fd3735be9abb1b15a4
8,Csillaghegy strand,Pool,47.585284,19.041588,"[{'label': 'display', 'lat': 47.58528354752551...",9674,HU,Magyarország,[Magyarország],,,,,,,4ff811ebe4b0ae40ed63d1d8
9,Tungsram Strand,Pool,47.586423,19.077983,"[{'label': 'display', 'lat': 47.58642285955192...",10197,HU,Magyarország,[Magyarország],,,,,,,4f9d7a0fe4b0e5be40828f2e


We see that the 6 missing baths are in this table. We select them and build a new table:

In [10]:
DataPool = DataPoolTemp.iloc[[0,3,6,7,12,13],:]
DataPool

Unnamed: 0,name,categories,lat,lng,labeledLatLngs,distance,cc,country,formattedAddress,postalCode,city,state,address,crossStreet,neighborhood,id
0,Palatinus Strandfürdő,Water Park,47.52917,19.046946,"[{'label': 'display', 'lat': 47.52916959486753...",3461,HU,Magyarország,"[Budapest, Margitsziget, 1138, Magyarország]",1138,Budapest,Budapest,Margitsziget,,,4bd2fdf8caff952130bcd3f0
3,Paskál Gyógy- és Strandfürdő,Water Park,47.520571,19.127469,"[{'label': 'display', 'lat': 47.52057142517493...",6992,HU,Magyarország,"[Budapest, Egressy út 178/f., 1141, Magyarország]",1141,Budapest,Budapest,Egressy út 178/f.,,,4c3c367586ce328f2069ab2d
6,Római Strandfürdő,Pool,47.574811,19.052087,"[{'label': 'display', 'lat': 47.57481146475501...",8552,HU,Magyarország,"[Budapest, Rozgonyi Piroska u., 1031, Magyaror...",1031,Budapest,Budapest,Rozgonyi Piroska u.,,,4c2f62663896e21ea1c9e390
7,Pünkösdfürdői Strand,Pool,47.594627,19.0679,"[{'label': 'display', 'lat': 47.59462716397096...",10910,HU,Magyarország,"[Budapest, Királyok útja 272. (Pünkösdfürdői ú...",1039,Budapest,Budapest,Királyok útja 272.,Pünkösdfürdői út,,4c3f41fd3735be9abb1b15a4
12,Pesterzsébeti Jódos-Sós Gyógy- És Strandfürdő,Pool,47.435448,19.090965,"[{'label': 'display', 'lat': 47.435448, 'lng':...",7969,HU,Magyarország,"[Budapest, Vízisport utca 2., 1203, Magyarország]",1203,Budapest,Budapest,Vízisport utca 2.,,,5c2717110d8a0f002c27afd4
13,Csillaghegyi Strandfürdő és Uszoda,Pool,47.585979,19.042334,"[{'label': 'display', 'lat': 47.58597927264667...",9752,HU,Magyarország,"[Budapest, Pusztakúti út 2-6. (Fürdő utca), 10...",1038,Budapest,Budapest,Pusztakúti út 2-6.,Fürdő utca,,4cc82453a32bb1f7d25aaca8


**We can now build a dataset with the 12 offical Budapest baths**

In [11]:
WaterPoints = DataSpa.append(DataPool)
WaterPoints.reset_index(drop=True, inplace=True)
WaterPoints

Unnamed: 0,name,categories,address,crossStreet,lat,lng,labeledLatLngs,distance,postalCode,cc,city,state,country,formattedAddress,neighborhood,id
0,Rudas Gyógyfürdő és Uszoda,Spa,Döbrentei tér 9.,Szent Gellért rkp.,47.489188,19.047761,"[{'label': 'display', 'lat': 47.48918773695423...",1161,1013,HU,Budapest,Budapest,Magyarország,"[Budapest, Döbrentei tér 9. (Szent Gellért rkp...",,4b2a3b92f964a52058a624e3
1,Szent Gellért Gyógyfürdő és Uszoda,Spa,Kelenhegyi út 4.,,47.483917,19.052256,"[{'label': 'display', 'lat': 47.4839168757732,...",1838,1118,HU,Budapest,Budapest,Magyarország,"[Budapest, Kelenhegyi út 4., 1118, Magyarország]",,4b2a3f25f964a52079a624e3
2,Szent Lukács Gyógyfürdő és Uszoda,Spa,Frankel Leó u. 25-29.,,47.518829,19.08177,"[{'label': 'display', 'lat': 47.518829, 'lng':...",3850,1023,HU,Budapest,Budapest,Magyarország,"[Budapest, Frankel Leó u. 25-29., 1023, Magyar...",,4b7ec53ef964a52012fe2fe3
3,Dandár Gyógyfürdő,Spa,Dandár u. 3.,,47.476337,19.071061,"[{'label': 'display', 'lat': 47.47633741284682...",3364,1095,HU,Budapest,Budapest,Magyarország,"[Budapest, Dandár u. 3., 1095, Magyarország]",,4e4131aea809968085325a0f
4,Széchenyi Gyógyfürdő és Uszoda,Spa,Állatkerti körút 9-11.,,47.518302,19.082394,"[{'label': 'display', 'lat': 47.51830206635792...",3854,1146,HU,Budapest,Budapest,Magyarország,"[Budapest, Állatkerti körút 9-11., 1146, Magya...",Városliget,4bb6452c2f70c9b606278530
5,Király Gyógyfürdő,Spa,Fő u. 84.,Kacsa u.,47.510608,19.038185,"[{'label': 'display', 'lat': 47.51060830660980...",1371,1027,HU,Budapest,Budapest,Magyarország,"[Budapest, Fő u. 84. (Kacsa u.), 1027, Magyaro...",,4c756513ff1fb60c5915f6a7
6,Palatinus Strandfürdő,Water Park,Margitsziget,,47.52917,19.046946,"[{'label': 'display', 'lat': 47.52916959486753...",3461,1138,HU,Budapest,Budapest,Magyarország,"[Budapest, Margitsziget, 1138, Magyarország]",,4bd2fdf8caff952130bcd3f0
7,Paskál Gyógy- és Strandfürdő,Water Park,Egressy út 178/f.,,47.520571,19.127469,"[{'label': 'display', 'lat': 47.52057142517493...",6992,1141,HU,Budapest,Budapest,Magyarország,"[Budapest, Egressy út 178/f., 1141, Magyarország]",,4c3c367586ce328f2069ab2d
8,Római Strandfürdő,Pool,Rozgonyi Piroska u.,,47.574811,19.052087,"[{'label': 'display', 'lat': 47.57481146475501...",8552,1031,HU,Budapest,Budapest,Magyarország,"[Budapest, Rozgonyi Piroska u., 1031, Magyaror...",,4c2f62663896e21ea1c9e390
9,Pünkösdfürdői Strand,Pool,Királyok útja 272.,Pünkösdfürdői út,47.594627,19.0679,"[{'label': 'display', 'lat': 47.59462716397096...",10910,1039,HU,Budapest,Budapest,Magyarország,"[Budapest, Királyok útja 272. (Pünkösdfürdői ú...",,4c3f41fd3735be9abb1b15a4


**We visualize the location of the water points in Budapest**

In order for Folium to display correctly popups we need to remove special caracters from the hungarian names of the water points.

In [12]:
WaterPoints['name'] = WaterPoints.loc[:,'name'].apply(unidecode)

We can now create a map and display the baths with their labels in popups.

In [13]:
water_points_map = folium.Map(location=[latBUD, longBUD], zoom_start=11)

# loop through the thermal bath and add each to the map
for lat, lng, label in zip(WaterPoints.lat, WaterPoints.lng, WaterPoints.name):
    folium.features.CircleMarker(
        [lat, lng],
        radius=5, # define how big you want the circle markers to be
        color='yellow',
        fill=True,
        popup=label,
        fill_color='blue',
        fill_opacity=0.6
    ).add_to(water_points_map)

# show map
water_points_map

We observe that the coordinates of Szent Lukas thermal bath are wrong. We informed Foursquare about this mistake but we need to correct it here. To do so we searched for the real coordinates in Google map as geopy did not find this address. Google map returns: 47.517898, 19.036682.

**We correct the coordinates of Szent Lukacs in the dataframe**

In [14]:
WaterPoints.at[2, 'lat'] = 47.517898
WaterPoints.at[2, 'lng'] = 19.036682

**We plot the map with the corrected coordinates**

In [15]:
water_points_map = folium.Map(location=[latBUD, longBUD], zoom_start=11)

# loop through the thermal bath and add each to the map
for lat, lng, label in zip(WaterPoints.lat, WaterPoints.lng, WaterPoints.name):
    folium.features.CircleMarker(
        [lat, lng],
        radius=5, # define a circle of 5 pixels diameter around the marker
        color='yellow',
        fill=True,
        popup=label,
        fill_color='blue',
        fill_opacity=0.6
    ).add_to(water_points_map)

# show map
water_points_map

#### 2.1.3 Vegetarian or vegan restaurants <a name="veg"></a>

**We search for vegetarian or vegan restaurants using Foursquare API**

In [16]:
VERSION = '20180605' # Foursquare API version
radius = 12000 #in meters
LIMIT = 100
search_query = [['vega'],['veget']]
catID = '4bf58dd8d48988d1d3941735'

dfVeg = pd.DataFrame(columns=['id', 'name', 'categories', 'location.address', 'location.lat', 'location.lng'])

for i in search_query:
    url = 'https://api.foursquare.com/v2/venues/search?client_id={}&client_secret={}&ll={},{}&v={}&query={}&categoryID={}&radius={}&limit={}'.format(CLIENT_ID, CLIENT_SECRET, latBUD, longBUD, VERSION, search_query, catID, radius, LIMIT)
    
    print(i, '- OK')
    resultsVeg = requests.get(url).json()

    # assign relevant part of JSON to venues
    venuesVeg = resultsVeg['response']['venues']

    # tranform venues into a dataframe
    df = pd.json_normalize(venuesVeg)

    dfVeg=dfVeg.append(df[['id', 'name', 'categories','location.address', 'location.lat', 'location.lng']])
    search_query=search_query[+1:+1]

dfVeg.reset_index(drop=True, inplace=True)

['vega'] - OK
['veget'] - OK


**Keep only columns that include venue name, and anything that is associated with location**

In [17]:
filtered_columns = ['name', 'categories'] + [col for col in dfVeg.columns if col.startswith('location.')] + ['id']
dfVeg_filtered = dfVeg.loc[:, filtered_columns]

# function that extracts the category of the venue
def get_category_type(row):
    try:
        categories_list = row['categories']
    except:
        categories_list = row['venue.categories']
        
    if len(categories_list) == 0:
        return None
    else:
        return categories_list[0]['name']

# filter the category for each row
dfVeg_filtered['categories'] = dfVeg_filtered.apply(get_category_type, axis=1)

# clean column names by keeping only last term
dfVeg_filtered.columns = [column.split('.')[-1] for column in dfVeg_filtered.columns]

pd.set_option('display.max_rows', None)

**We extract only data with category "Vegetarian / Vegan Restaurant"**

In [18]:
DataVeg = dfVeg_filtered[dfVeg_filtered['categories']=='Vegetarian / Vegan Restaurant']
DataVeg.reset_index(drop=True, inplace=True)
DataVeg

Unnamed: 0,name,categories,address,lat,lng,id
0,VegaCity,Vegetarian / Vegan Restaurant,Múzeum krt. 23-25.,47.491976,19.061325,53c686fe498eccdf095ff308
1,Édeni Vegán – Kézműves Vegetáriánus Étterem,Vegetarian / Vegan Restaurant,Iskola u. 31.,47.505884,19.037015,4bc9c8e0cc8cd13af71dbccf
2,Nemsüti Vega Ételbár,Vegetarian / Vegan Restaurant,Jászai Mari tér 4/b,47.513872,19.04841,4c45356fdcd61b8d8d797c56
3,Vegan Love,Vegetarian / Vegan Restaurant,Bartók Béla út 9.,47.482396,19.052456,56fe47ad498e04a4568bec96
4,Vegan Garden,Vegetarian / Vegan Restaurant,Dob u. 40.,47.499686,19.062519,5af2bf71f00a70002c55e720
5,Govinda Vega Sarok,Vegetarian / Vegan Restaurant,Papnövelde u. 1.,47.490787,19.056462,4bd56ee75631c9b65506a430
6,Vegan king,Vegetarian / Vegan Restaurant,,47.489668,19.055639,5cba006206fb600039dd4a4c
7,Vegazzi,Vegetarian / Vegan Restaurant,33.,47.501194,19.059412,5d8774986c23c0000725990a
8,Veganeria Bisztró,Vegetarian / Vegan Restaurant,Nagymező u. 51.,47.506752,19.055643,5abd202d0e5da853478d75d9
9,VegaKuckó,Vegetarian / Vegan Restaurant,Boraros Ter 3,47.481136,19.066939,59392a422bf9a934a4a386c3


Twenty-three vegetarian or vegan restaurants are found in Budapest.

**We visualize the location of the vegetarian and vegan restaurants in Budapest**

In order for Folium to display correctly popups we need to remove special caracters from the hungarian names.

In [19]:
DataVeg['name'] = DataVeg.loc[:,'name'].apply(unidecode)

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  """Entry point for launching an IPython kernel.


We can now create a map and display the vegetarian and vegan restaurants with their labels in popups.

In [20]:
veg_points_map = folium.Map(location=[latBUD, longBUD], zoom_start=11)

# loop through the vegetarian/vegan restaurants and add each to the map
for lat, lng, label in zip(DataVeg.lat, DataVeg.lng, DataVeg.name):
    folium.features.CircleMarker(
        [lat, lng],
        radius=5, # define a circle of 5 pixels diameter around the marker
        color='yellow',
        fill=True,
        popup=folium.Popup(label, parse_html=True),
        fill_color='blue',
        fill_opacity=0.6
    ).add_to(veg_points_map)  

# show map
veg_points_map

#### 2.1.4 Hungarian food restaurants <a name="hungarian"></a>

**We search for Hungarian food restaurants using Foursquare API**

In [22]:
VERSION = '20180605' # Foursquare API version
radius = 12000 #in meters
LIMIT = 100
search_query = [['restaurant'], ['Vendéglő'], ['Étterem']]
catID = '52e81612bcbc57f1066b79fa'

dfHun = pd.DataFrame(columns=['id', 'name', 'categories', 'location.address', 'location.lat', 'location.lng'])

for i in search_query:
    url = 'https://api.foursquare.com/v2/venues/search?client_id={}&client_secret={}&ll={},{}&v={}&query={}&categoryID={}&radius={}&limit={}'.format(CLIENT_ID, CLIENT_SECRET, latBUD, longBUD, VERSION, search_query, catID, radius, LIMIT)
    
    print(i, '- OK')
    resultsHun = requests.get(url).json()

    # assign relevant part of JSON to venues
    venuesHun = resultsHun['response']['venues']

    # tranform venues into a dataframe
    df = pd.json_normalize(venuesHun)

    dfHun=dfHun.append(df[['id', 'name', 'categories','location.address', 'location.lat', 'location.lng']])
    search_query=search_query[+1:+1]

dfHun.reset_index(drop=True, inplace=True)

['restaurant'] - OK
['Vendéglő'] - OK
['Étterem'] - OK


**Keep only columns that include venue name, and anything that is associated with location**

In [23]:
# keep only columns that include venue name, and anything that is associated with location
filtered_columns = ['name', 'categories'] + [col for col in dfHun.columns if col.startswith('location.')] + ['id']
dfHun_filtered = dfHun.loc[:, filtered_columns]

# function that extracts the category of the venue
def get_category_type(row):
    try:
        categories_list = row['categories']
    except:
        categories_list = row['venue.categories']
        
    if len(categories_list) == 0:
        return None
    else:
        return categories_list[0]['name']

# filter the category for each row
dfHun_filtered['categories'] = dfHun_filtered.apply(get_category_type, axis=1)

# clean column names by keeping only last term
dfHun_filtered.columns = [column.split('.')[-1] for column in dfHun_filtered.columns]

pd.set_option('display.max_rows', None)

**We extract only data with category "Hungarian Restaurant"**

In [24]:
DataHun = dfHun_filtered[dfHun_filtered['categories']=='Hungarian Restaurant']
DataHun.reset_index(drop=True, inplace=True)
DataHun

Unnamed: 0,name,categories,address,lat,lng,id
0,Andreas Bistro,Hungarian Restaurant,Szarka u. 1.,47.488275,19.057433,4cadbd3a0e7bbfb76f785b83
1,21 Magyar Vendéglő,Hungarian Restaurant,Fortuna utca 21.,47.504124,19.031075,4b9a4973f964a520b7a935e3
2,Horgásztanya Vendéglő,Hungarian Restaurant,Fő u. 27.,47.501962,19.039235,4d137531d1848cfa680fc471
3,Alabárdos Étterem,Hungarian Restaurant,Országház u. 2.,47.501725,19.03272,4b96b40ef964a5200edf34e3
4,Halászbástya Restaurant,Hungarian Restaurant,Halászbástya - Északi Híradástorony,47.502534,19.034489,4c72c7ed4bc4236a4330cc7a
5,Tüköry Étterem,Hungarian Restaurant,Hold u. 15.,47.505113,19.052301,4b6ffd53f964a5207f022de3
6,Márkus Vendéglő,Hungarian Restaurant,Lövőház u. 17.,47.509562,19.025011,4be55025bcef2d7f60a403e5
7,Múzeum Kávéház és Étterem,Hungarian Restaurant,Múzeum krt. 12.,47.492046,19.061596,4e25ca49d164b6b74afe8927
8,Zöld Kapu Vendéglő,Hungarian Restaurant,Szőlő u. 42.,47.537197,19.038154,4bb47c1e0cbcef3b7bff582a
9,Kéhli Vendéglő,Hungarian Restaurant,Mókus u. 22.,47.5379,19.04337,4be6a354910020a1017cd414


Ten Hungarian restaurants are found in Budapest.

**We visualize the location of the Hungarian restaurants in Budapest**

In order for Folium to display correctly popups we need to remove special caracters from the Hungarian names.

In [25]:
DataHun['name'] = DataHun.loc[:,'name'].apply(unidecode)

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  """Entry point for launching an IPython kernel.


We can now create a map and display the Hungarian restaurants with their labels in popups.

In [26]:
hun_points_map = folium.Map(location=[latBUD, longBUD], zoom_start=11)

# loop through the vegetarian/vegan restaurants and add each to the map
for lat, lng, label in zip(DataHun.lat, DataHun.lng, DataHun.name):
    folium.features.CircleMarker(
        [lat, lng],
        radius=5, # define a circle of 5 pixels diameter around the marker
        color='yellow',
        fill=True,
        popup=folium.Popup(label, parse_html=True),
        fill_color='blue',
        fill_opacity=0.6
    ).add_to(hun_points_map)  

# show map
hun_points_map

#### 2.1.5 Fitness centers <a name="sport"></a>

**We search for fitness centers using Foursquare API**

In [27]:
VERSION = '20180605' # Foursquare API version
radius = 12000 #in meters
LIMIT = 100
search_query = 'Fitness'
categoryID = '4bf58dd8d48988d175941735'
url = 'https://api.foursquare.com/v2/venues/search?client_id={}&client_secret={}&ll={},{}&v={}&query={}&categoryID={}&radius={}&limit={}'.format(CLIENT_ID, CLIENT_SECRET, latBUD, longBUD, VERSION, search_query, categoryID, radius, LIMIT)

resultsFit = requests.get(url).json()

# assign relevant part of JSON to venues
venuesFit = resultsFit['response']['venues']

# tranform venues into a dataframe
dfFit = pd.json_normalize(venuesFit)

**Keep only columns that include venue name, and anything that is associated with location**

In [28]:
filtered_columns = ['name', 'categories'] + [col for col in dfFit.columns if col.startswith('location.')] + ['id']
dfFit_filtered = dfFit.loc[:, filtered_columns]

# function that extracts the category of the venue
def get_category_type(row):
    try:
        categories_list = row['categories']
    except:
        categories_list = row['venue.categories']
        
    if len(categories_list) == 0:
        return None
    else:
        return categories_list[0]['name']

# filter the category for each row
dfFit_filtered['categories'] = dfFit_filtered.apply(get_category_type, axis=1)

# clean column names by keeping only last term
dfFit_filtered.columns = [column.split('.')[-1] for column in dfFit_filtered.columns]

**We extract only data with category "Gym / Fitness Center"**

In [29]:
DataFit = dfFit_filtered[dfFit_filtered['categories']=='Gym / Fitness Center']
DataFit.reset_index(drop=True, inplace=True)
DataFit

Unnamed: 0,name,categories,address,lat,lng,labeledLatLngs,distance,postalCode,cc,city,state,country,formattedAddress,crossStreet,neighborhood,id
0,F&M Fitness and More,Gym / Fitness Center,Csörsz utca 14-16.,47.489801,19.022896,"[{'label': 'display', 'lat': 47.48980059796192...",1630,1123.0,HU,Budapest,Budapest,Magyarország,"[Budapest, Csörsz utca 14-16., 1123, Magyarors...",,,5221e9d9498eba5714347373
1,Daily Fitness,Gym / Fitness Center,Pálya u. 9.,47.495484,19.028377,"[{'label': 'display', 'lat': 47.49548424379652...",965,1012.0,HU,Budapest,Budapest,Magyarország,"[Budapest, Pálya u. 9. (Márvány u.), 1012, Mag...",Márvány u.,Krisztinaváros,4d235202dd6a236a62fd4e38
2,Get Fit River Fitness,Gym / Fitness Center,Révész u. 29.,47.527996,19.056998,"[{'label': 'display', 'lat': 47.52799626316895...",3523,1138.0,HU,Budapest,Budapest,Magyarország,"[Budapest, Révész u. 29. (Dráva u.), 1138, Mag...",Dráva u.,,54b127e8498e6f78be8fa07f
3,Fitness,Gym / Fitness Center,Boscolo Budapest,47.498743,19.070396,"[{'label': 'display', 'lat': 47.49874304289823...",2250,,HU,Budapest,Budapest,Magyarország,"[Budapest, Boscolo Budapest (Erzsébet krt.), M...",Erzsébet krt.,,4feb4136e4b008a767ee16cb
4,Nova-Sun Fitness Club,Gym / Fitness Center,Kádár u. 6.,47.511663,19.055173,"[{'label': 'display', 'lat': 47.51166280485033...",1846,1132.0,HU,Budapest,Budapest,Magyarország,"[Budapest, Kádár u. 6., 1132, Magyarország]",,,4cd15a816449a093908acfcf
5,Menta Fitness,Gym / Fitness Center,Szent István krt. 10.,47.512604,19.05,"[{'label': 'display', 'lat': 47.51260443754759...",1737,1137.0,HU,Budapest,Budapest,Magyarország,"[Budapest, Szent István krt. 10., 1137, Magyar...",,,4ea68715b8f7b8a60b45a323
6,nr1 fitness -Kálvin tér,Gym / Fitness Center,Kecskeméti u. 14,47.48993,19.061149,"[{'label': 'display', 'lat': 47.48993034830719...",1817,1053.0,HU,Budapest V. kerület,Budapest,Magyarország,"[Budapest V. kerület, Kecskeméti u. 14 (Kalvin...",Kalvin ter,Budapest V. kerülete,5aa65d8ed8096e450c623b2d
7,Universum Fitness Récsei,Gym / Fitness Center,Istvánmezei út 6.,47.505183,19.092904,"[{'label': 'display', 'lat': 47.50518278440155...",4015,1146.0,HU,Budapest,Budapest,Magyarország,"[Budapest, Istvánmezei út 6., 1146, Magyarország]",,,5076a96fe4b0eca51a17723b
8,SCITEC Gold Fitness Club,Gym / Fitness Center,Lurdy Ház fsz.,47.470829,19.083172,"[{'label': 'display', 'lat': 47.47082893042727...",4441,,HU,Budapest,Budapest,Magyarország,"[Budapest, Lurdy Ház fsz. (Mester utca), Magya...",Mester utca,,53a85b9c498e26d9ca5e3d1f
9,Fitness and Health Club by Mariott,Gym / Fitness Center,,47.494451,19.049471,"[{'label': 'display', 'lat': 47.4944506725873,...",806,1051.0,HU,Budapest,Budapest,Magyarország,"[Budapest, 1051, Magyarország]",,,58dcc9a4dfa6ff04ddad35fc


Twenty-eight fitness centers are found in Budapest.

**We visualize the location of the fitness centers in Budapest**

In order for Folium to display correctly popups we need to remove special caracters from the hungarian names.

In [30]:
DataFit['name'] = DataFit.loc[:,'name'].apply(unidecode)

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  """Entry point for launching an IPython kernel.


We can now create a map and display the fitness centers with their labels in popups.

In [31]:
fit_points_map = folium.Map(location=[latBUD, longBUD], zoom_start=11)

# loop through the thermal bath and add each to the map
for lat, lng, label in zip(DataFit.lat, DataFit.lng, DataFit.name):
    folium.features.CircleMarker(
        [lat, lng],
        radius=5, # define how big you want the circle markers to be
        color='yellow',
        fill=True,
        popup=label,
        fill_color='blue',
        fill_opacity=0.6
    ).add_to(fit_points_map)

# show map
fit_points_map

#### 2.1.6 Conference rooms <a name="conference"></a>

**We search for conference rooms using Foursquare API**

In [32]:
VERSION = '20180605' # Foursquare API version
radius = 12000 #in meters
LIMIT = 100
search_query = 'conference'
categoryID = '4bf58dd8d48988d127941735'
url = 'https://api.foursquare.com/v2/venues/search?client_id={}&client_secret={}&ll={},{}&v={}&query={}&categoryID={}&radius={}&limit={}'.format(CLIENT_ID, CLIENT_SECRET, latBUD, longBUD, VERSION, search_query, categoryID, radius, LIMIT)

resultsConf = requests.get(url).json()

# assign relevant part of JSON to venues
venuesConf = resultsConf['response']['venues']

# tranform venues into a dataframe
dfConf = pd.json_normalize(venuesConf)

**Keep only columns that include venue name, and anything that is associated with location**

In [33]:
filtered_columns = ['name', 'categories'] + [col for col in dfConf.columns if col.startswith('location.')] + ['id']
dfConf_filtered = dfConf.loc[:, filtered_columns]

# function that extracts the category of the venue
def get_category_type(row):
    try:
        categories_list = row['categories']
    except:
        categories_list = row['venue.categories']
        
    if len(categories_list) == 0:
        return None
    else:
        return categories_list[0]['name']

# filter the category for each row
dfConf_filtered['categories'] = dfConf_filtered.apply(get_category_type, axis=1)

# clean column names by keeping only last term
dfConf_filtered.columns = [column.split('.')[-1] for column in dfConf_filtered.columns]

**We extract only data with category "Conference Room"**

In [34]:
DataConf = dfConf_filtered[dfConf_filtered['categories']=='Conference Room']
DataConf.reset_index(drop=True, inplace=True)
DataConf

Unnamed: 0,name,categories,lat,lng,labeledLatLngs,distance,cc,country,formattedAddress,address,postalCode,city,state,crossStreet,neighborhood,id
0,Quince Conference Room,Conference Room,47.492252,19.050831,"[{'label': 'display', 'lat': 47.492252, 'lng':...",1035,HU,Magyarország,"[Budapest, Marcius 15. ter 1 4. emelet, 1056, ...",Marcius 15. ter 1 4. emelet,1056.0,Budapest,Budapest,,,4cdd4c5622bd721e0a330048
1,CEU Conference Centre,Conference Room,47.505467,19.159598,"[{'label': 'display', 'lat': 47.50546709343197...",8993,HU,Magyarország,"[Budapest, Kerepesi út 87. (Pogány utca), 1106...",Kerepesi út 87.,1106.0,Budapest,Budapest,Pogány utca,,4bd70cc3304fce72ef1633ab
2,Avaya conference room 501,Conference Room,47.514064,19.058598,"[{'label': 'display', 'lat': 47.514064, 'lng':...",2214,HU,Magyarország,[Magyarország],,,,,,,507d2223e4b03d34a5c2d5fe
3,Boscolo Conference Rooms,Conference Room,47.498739,19.070873,"[{'label': 'display', 'lat': 47.498739, 'lng':...",2286,HU,Magyarország,"[Budapest VII. kerület, 1073, Magyarország]",,1073.0,Budapest VII. kerület,Budapest,,,59bb9c82a0215b132c7e12ff
4,Ibis Styles Conference Room,Conference Room,47.479287,19.068935,"[{'label': 'display', 'lat': 47.479287, 'lng':...",3017,HU,Magyarország,"[Budapest IX. kerület, 1095, Magyarország]",,1095.0,Budapest IX. kerület,Budapest,,,595e27d6f427de4acf7eed2a
5,Conference Room 1. Emelet 150,Conference Room,47.456741,18.935946,"[{'label': 'display', 'lat': 47.45674133300781...",9128,HU,Magyarország,[Magyarország],,,,,,,50a24821e4b0e8ebc19bfcf5
6,REC Conference Centre,Conference Room,47.491492,19.081253,"[{'label': 'display', 'lat': 47.49149166428705...",3161,HU,Magyarország,[Magyarország],,,,,,,4f93c84de4b08038d81ce63b
7,Lm Ericsson Hilda Conference Area,Conference Room,47.470818,19.062943,"[{'label': 'display', 'lat': 47.470818, 'lng':...",3503,HU,Magyarország,"[Budapest, 1117, Magyarország]",,1117.0,Budapest,Budapest,,Lágymányos,5b07ee774420d8002ca75bb5
8,"AB InBev, Samlesbury Conference Room",Conference Room,47.53322,19.059249,"[{'label': 'display', 'lat': 47.53322044813539...",4127,HU,Magyarország,"[Népfürdõ Utca 22, Magyarország]",Népfürdõ Utca 22,,,,,,50893de4e4b0b52a6b4574a3
9,Chello Sport Conference Room,Conference Room,47.537034,19.070936,"[{'label': 'display', 'lat': 47.537034, 'lng':...",4874,HU,Magyarország,[Magyarország],,,,,,,5375f93c498eacae73301125


Eleven conference rooms are found in Budapest.

**We visualize the location of the conference rooms in Budapest**

In order for Folium to display correctly popups we need to remove special caracters from the hungarian names.

In [35]:
DataConf['name'] = DataConf.loc[:,'name'].apply(unidecode)

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  """Entry point for launching an IPython kernel.


We can now create a map and display the conference rooms with their labels in popups.

In [36]:
conf_points_map = folium.Map(location=[latBUD, longBUD], zoom_start=11)

# loop through the thermal bath and add each to the map
for lat, lng, label in zip(DataConf.lat, DataConf.lng, DataConf.name):
    folium.features.CircleMarker(
        [lat, lng],
        radius=5, # define how big you want the circle markers to be
        color='yellow',
        fill=True,
        popup=label,
        fill_color='blue',
        fill_opacity=0.6
    ).add_to(conf_points_map)

# show map
conf_points_map

#### 2.1.7 Libraries <a name="libraries"></a>

**We search for libraries using Foursquare API**

In [37]:
VERSION = '20180605' # Foursquare API version
radius = 12000 #in meters
LIMIT = 100
search_query = 'library' #means thermal bath in Hungarian
categoryID = '4bf58dd8d48988d12f941735'
url = 'https://api.foursquare.com/v2/venues/search?client_id={}&client_secret={}&ll={},{}&v={}&query={}&categoryID={}&radius={}&limit={}'.format(CLIENT_ID, CLIENT_SECRET, latBUD, longBUD, VERSION, search_query, categoryID, radius, LIMIT)

resultsLib = requests.get(url).json()

# assign relevant part of JSON to venues
venuesLib = resultsLib['response']['venues']

# tranform venues into a dataframe
dfLib = pd.json_normalize(venuesLib)

**Keep only columns that include venue name, and anything that is associated with location**

In [38]:
filtered_columns = ['name', 'categories'] + [col for col in dfLib.columns if col.startswith('location.')] + ['id']
dfLib_filtered = dfLib.loc[:, filtered_columns]

# function that extracts the category of the venue
def get_category_type(row):
    try:
        categories_list = row['categories']
    except:
        categories_list = row['venue.categories']
        
    if len(categories_list) == 0:
        return None
    else:
        return categories_list[0]['name']

# filter the category for each row
dfLib_filtered['categories'] = dfLib_filtered.apply(get_category_type, axis=1)

# clean column names by keeping only last term
dfLib_filtered.columns = [column.split('.')[-1] for column in dfLib_filtered.columns]

**We extract only data with category "Library" or "College Library"**

In [39]:
DataLib = dfLib_filtered[dfLib_filtered['categories'].isin(['Library', 'College Library']) ]
DataLib.reset_index(drop=True, inplace=True)
DataLib

Unnamed: 0,name,categories,address,lat,lng,labeledLatLngs,distance,postalCode,cc,city,state,country,formattedAddress,crossStreet,neighborhood,id
0,ELTE Egyetemi Könyvtár és Levéltár / ELTE Univ...,College Library,Ferenciek tere 6.,47.492537,19.057242,"[{'label': 'display', 'lat': 47.49253702166014...",1419,1053.0,HU,Budapest,Budapest,Magyarország,"[Budapest, Ferenciek tere 6., 1053, Magyarország]",,,4d4b580f2220b1f7e69a29d2
1,Central European University Library,College Library,Nádor utca 15,47.501188,19.049612,"[{'label': 'display', 'lat': 47.50118807096102...",755,1051.0,HU,Budapest,Budapest,Magyarország,"[Budapest, Nádor utca 15 (Zrínyi utca), 1051, ...",Zrínyi utca,Lipótváros,4d8793ce5ad3a093ec880efe
2,CEU Library,College Library,Nádor u 15.,47.501186,19.049579,"[{'label': 'display', 'lat': 47.50118556542106...",752,1051.0,HU,Budapest,Budapest,Magyarország,"[Budapest, Nádor u 15., 1051, Magyarország]",,,4d8792bb40a7a35d95d14abe
3,Street Library,Library,,47.4979,19.054249,"[{'label': 'display', 'lat': 47.4979, 'lng': 1...",1037,,HU,,,Magyarország,[Magyarország],,,5181575c498ea211c248c6dc
4,SEAS Library (Angol-amerikai Intézet könyvtára),College Library,ELTE BTK,47.492876,19.061628,"[{'label': 'display', 'lat': 47.49287573389571...",1705,,HU,Budapest,Budapest,Magyarország,"[Budapest, ELTE BTK, Magyarország]",,Budapest VII. kerülete,506376f2e4b07031a91ddce7
5,CEU Library of Medieval Studies,College Library,Múzeum krt. 6-8. I. em.,47.492969,19.061665,"[{'label': 'display', 'lat': 47.49296923102233...",1704,,HU,Budapest,Budapest,Magyarország,"[Budapest, Múzeum krt. 6-8. I. em., Magyarország]",,,4ecb82868b813b34fe278ad5
6,Budapest Christian Library,Library,Szent Gellért tér 3,47.483071,19.053922,"[{'label': 'display', 'lat': 47.48307118997125...",1982,,HU,Budapest,Budapest,Magyarország,"[Budapest, Szent Gellért tér 3, Magyarország]",,,4dcaab36e4cde9e42f8b8289
7,IBS Library,College Library,,47.500655,19.067055,"[{'label': 'display', 'lat': 47.50065465575044...",2015,,HU,,,Magyarország,[Magyarország],,,51656cbde4b073f745c96050
8,Semmelweis Orvostörténeti Könyvtár / Semmelwei...,Library,Török utca 12.,47.516056,19.036453,"[{'label': 'display', 'lat': 47.51605582974233...",1990,1023.0,HU,Budapest,Budapest,Magyarország,"[Budapest, Török utca 12. (Gül Baba utca), 102...",Gül Baba utca,,524d394dbce6d140699720f0
9,HIK Library,College Library,Reviczky u. 4.,47.489717,19.065528,"[{'label': 'display', 'lat': 47.48971696889715...",2117,,HU,Budapest,Budapest,Magyarország,"[Budapest, Reviczky u. 4. (Szentkirályi u.), M...",Szentkirályi u.,,4c1cbe018b3aa593f54c995f


Nineteen libraries are found in Budapest.

**We visualize the location of the libraries in Budapest**

In order for Folium to display correctly popups we need to remove special caracters from the hungarian names.

In [40]:
DataLib['name'] = DataLib.loc[:,'name'].apply(unidecode)

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  """Entry point for launching an IPython kernel.


We can now create a map and display the libraries with their labels in popups.

In [41]:
lib_points_map = folium.Map(location=[latBUD, longBUD], zoom_start=11)

# loop through the thermal bath and add each to the map
for lat, lng, label in zip(DataLib.lat, DataLib.lng, DataLib.name):
    folium.features.CircleMarker(
        [lat, lng],
        radius=5, # define how big you want the circle markers to be
        color='yellow',
        fill=True,
        popup=label,
        fill_color='blue',
        fill_opacity=0.6
    ).add_to(lib_points_map)

# show map
lib_points_map

#### 2.1.8 Concatenate venues datasets <a name="venuesdata"></a>

We have gathered all the geospatial and associated data we needed from Foursquare. We can now concatenate the 6 datasets containing the different venues of interest.

**Concatenate all dataframes containing the data about thermal bath and spa, vegetarian and vegan restaurants, Hungarian food restaurant, fitness centers, conference rooms and libraries.**

In [45]:
DataFull = pd.concat([WaterPoints, DataVeg, DataHun, DataFit, DataConf, DataLib], ignore_index=True)
DataFull.iloc[:,[0,1,4,5]]

Unnamed: 0,name,categories,lat,lng
0,Rudas Gyogyfurdo es Uszoda,Spa,47.489188,19.047761
1,Szent Gellert Gyogyfurdo es Uszoda,Spa,47.483917,19.052256
2,Szent Lukacs Gyogyfurdo es Uszoda,Spa,47.517898,19.036682
3,Dandar Gyogyfurdo,Spa,47.476337,19.071061
4,Szechenyi Gyogyfurdo es Uszoda,Spa,47.518302,19.082394
5,Kiraly Gyogyfurdo,Spa,47.510608,19.038185
6,Palatinus Strandfurdo,Water Park,47.52917,19.046946
7,Paskal Gyogy- es Strandfurdo,Water Park,47.520571,19.127469
8,Romai Strandfurdo,Pool,47.574811,19.052087
9,Punkosdfurdoi Strand,Pool,47.594627,19.0679


We save the dataset to a csv file:

In [46]:
DataFull.to_csv('DatFull.csv', index=False)

**We visualize the location of all the venues of interest**

In [65]:
full_map = folium.Map(location=[latBUD, longBUD], zoom_start=11)

# set a color dictionnary
colors = {'Spa' : 'blue', 'Water Park' : 'blue', 'Pool' : 'blue', 'Vegetarian / Vegan Restaurant' : 'limegreen', 'Hungarian Restaurant' : 'red', 'Gym / Fitness Center' : 'black', 'Conference Room' : 'darkviolet', 'Library' : 'brown', 'College Library' : 'brown'}

DataFull.apply(lambda row:folium.CircleMarker(location=[row["lat"], row["lng"]],
                                              radius=5, # define how big you want the circle markers to be
                                              color=False,
                                              fill=True,
                                              fill_color=colors[row['categories']],
                                              popup=row['categories'],
                                              fill_opacity=1)
                                             .add_to(full_map), axis=1)

# show map
full_map

### 2.2 Building of the duration to airport dataset <a name="buildduration"></a>

**We first use geopy to get the latitude and longitude of the Budapest international airport**

In [48]:
address = 'Budapest Airport, Hungary'

geolocator = Nominatim(user_agent="bud_explorer")
location = geolocator.geocode(address)
latAIR = location.latitude
longAIR = location.longitude
print('The geograpical coordinate of Budapest Airport are {}, {}.'.format(latAIR, longAIR))

The geograpical coordinate of Budapest Airport are 47.433211, 19.262335.


**We use the Mapquest API to get the distance and duration from each venues of interest to the Budapest international airport**

Even if distance is not required, we search for this information in case area of interest to settle the headquarter would return the same trip duration to the airport. Distance may help in such cases.

We now run a hidden cell to give our mapquest key.

In [49]:
# @hidden_cell
mapquestkey='Feq0vnFrRt3o2LEMs5XAsO5fIYTF0G9Q'

As computing the distance and duration between each venue and the airport may take a while we use a function to print a progress bar.

In [50]:
import sys

def printProgressBar(i,max,postText):
    n_bar =10 #size of progress bar
    j= i/max
    sys.stdout.write('\r')
    sys.stdout.write(f"[{'=' * int(n_bar * j):{n_bar}s}] {int(100 * j)}%  {postText}")
    sys.stdout.flush()

And we run the queries:

In [51]:
dat = range(0,len(DataFull))
dfDuration = pd.DataFrame(columns=['distance', 'formattedTime'])

for i in dat:
    printProgressBar(i,len(DataFull),"Please wait!") # print progress bar
    loci = str(DataFull.loc[i,'lat'])+','+str(DataFull.loc[i,'lng'])
    locAIR = str(latAIR)+','+str(longAIR)
    loc = {"locations":[loci, locAIR]}
    url = 'https://www.mapquestapi.com/directions/v2/optimizedRoute?json={}&outFormat=json&key={}'.format(loc, mapquestkey)
    
    resultsDur = requests.get(url).json()
    
    # assign relevant part of JSON to venues
    duration = resultsDur['route']

    # tranform venues into a dataframe
    dfDur = pd.json_normalize(duration)
    dfDuration=dfDuration.append(dfDur[['distance', 'formattedTime']])
    
dfDuration.reset_index(drop=True, inplace=True)
distAirport=pd.concat([DataFull[['name']], dfDuration], axis=1)
distAirport.columns = ['from','Distance_to_airport_in_km','Duration_to_airport']
distAirport['Distance_to_airport_in_km'] = distAirport['Distance_to_airport_in_km'].mul(1.609344) #convert miles to km
distAirport



Unnamed: 0,from,Distance_to_airport_in_km,Duration_to_airport
0,Rudas Gyogyfurdo es Uszoda,21.684301,00:24:08
1,Szent Gellert Gyogyfurdo es Uszoda,20.799162,00:23:35
2,Szent Lukacs Gyogyfurdo es Uszoda,24.338109,00:29:37
3,Dandar Gyogyfurdo,18.787482,00:20:08
4,Szechenyi Gyogyfurdo es Uszoda,22.585534,00:25:20
5,Kiraly Gyogyfurdo,25.413151,00:27:43
6,Palatinus Strandfurdo,26.441522,00:29:48
7,Paskal Gyogy- es Strandfurdo,22.505066,00:26:37
8,Romai Strandfurdo,47.091015,00:36:15
9,Punkosdfurdoi Strand,44.725279,00:32:55


We save the dataset to a csv file:

In [None]:
distAirport.to_csv('distAirport.csv', index=False)

**We now have the two datasets necessary to find the best place for the new headquarter: DataFull for the venues and distAirport for the distance and duration to the airport**

## 3. Methodology <a name="methodology"></a>

We first display all the venues of interest using the **folium library** in order to **explore geospatial data** and observe their organisation in Budapest. We plot circles of a radius of 1km around each official bath in order to gain a better idea about the potential areas of interest for headquarter. Indeed, the headquarter should not be further than 1km from a bath. As official baths are spread at quite large distance from each other, as they are not numerous and as the startup founders gave it as the first criteria, we decided to first base our observation on this category of venue.

As the headquarter should be surrounded by few (at least 6 of different given categories) venues, we perform a **cluster analysis** to identify clusters of venues where the HQ would be close enough to everything. We use **DBSCAN algorithm** as this algorithm does not require to set a number of clusters a priori and do not constrain the shape of the clusters. DBSCAN is performed with the **Scikit-learn library**. Clusters are built using the distance of each venue to the airport and the corresponding trip duration as well as using the location (latitude and longitude) of each venue.



## 4. Results <a name="results"></a>

### 4.1 Exploration of geospatial data <a name="geospatial"></a>

**We first explore the organisation of the venues using folium**

In [52]:
full_map = folium.Map(location=[latBUD, longBUD], zoom_start=11)

# set a color dictionnary
colors = {'Spa' : 'blue', 'Water Park' : 'blue', 'Pool' : 'blue', 'Vegetarian / Vegan Restaurant' : 'limegreen', 'Hungarian Restaurant' : 'red', 'Gym / Fitness Center' : 'black', 'Conference Room' : 'darkviolet', 'Library' : 'brown', 'College Library' : 'brown'}

DataFull.apply(lambda row:folium.CircleMarker(location=[row["lat"], row["lng"]],
                                              radius=5, # define how big you want the circle markers to be
                                              color=False,
                                              fill=True,
                                              fill_color=colors[row['categories']],
                                              popup=folium.Popup(row['name'], parse_html=True),
                                              fill_opacity=0.6)
                                             .add_to(full_map), axis=1)

DataFullWater = DataFull[DataFull['categories'].isin(['Water Park', 'Pool', 'Spa']) ]
DataFullWater.apply(lambda row:folium.Circle(location=[row["lat"], row["lng"]],
                                        radius=1000,
                                        color='red',fill=False).add_to(full_map), axis=1)    

# show map
full_map

From this map, we see that most baths are on Buda side (West of the Danube) of Budapest while most venues of interest are on Pest side (East of the Danube). Also, half of the bath do not have any venue of interest in their surrounding (i.e. at less than 1km). In order to meet the criteria of the startup, we can already see that the best place for the headquarter is very likely to be in the city center. The surroundings of the Gellért and Rudas baths seem to be good candidates. Lukács and Király baths areas may also be acceptable locations.

### 4.2 Clustering of geospatial, distance and duration to airport data <a name="clustering"></a>

In order to reduce the number of possibilities and to include all the criteria given by the startup founders, we carry out a cluster analysis using DBSCAN on the latitude and longitude of each venue and their distance and trip duration to the airport.

**We first need to convert Python time format to float for use of duration data in clustering methods**

In [53]:
# Function to convert Python time format to float
def time_to_sec(time_str):
    return sum(x * int(t) for x, t in zip([1, 60, 3600], reversed(time_str.split(":"))))

In [54]:
HtoAir = distAirport['Duration_to_airport'].apply(time_to_sec)/60
dAir=pd.concat([distAirport['from'],distAirport['Distance_to_airport_in_km'], HtoAir], axis=1)
dAir.columns = ['from','Distance_to_airport_in_km','Duration_to_airport_in_minutes']

In [55]:
dAir.head()

Unnamed: 0,from,Distance_to_airport_in_km,Duration_to_airport_in_minutes
0,Rudas Gyogyfurdo es Uszoda,21.684301,24.133333
1,Szent Gellert Gyogyfurdo es Uszoda,20.799162,23.583333
2,Szent Lukacs Gyogyfurdo es Uszoda,24.338109,29.616667
3,Dandar Gyogyfurdo,18.787482,20.133333
4,Szechenyi Gyogyfurdo es Uszoda,22.585534,25.333333


**We concatenate name, category and location data of each venue with their distance and trip duration to the airport**

In [56]:
dataClust=pd.concat([DataFull['name'],DataFull['categories'],dAir['Distance_to_airport_in_km'],dAir['Duration_to_airport_in_minutes'],DataFull['lat'],DataFull['lng']], axis=1)
dataClust.head()

Unnamed: 0,name,categories,Distance_to_airport_in_km,Duration_to_airport_in_minutes,lat,lng
0,Rudas Gyogyfurdo es Uszoda,Spa,21.684301,24.133333,47.489188,19.047761
1,Szent Gellert Gyogyfurdo es Uszoda,Spa,20.799162,23.583333,47.483917,19.052256
2,Szent Lukacs Gyogyfurdo es Uszoda,Spa,24.338109,29.616667,47.517898,19.036682
3,Dandar Gyogyfurdo,Spa,18.787482,20.133333,47.476337,19.071061
4,Szechenyi Gyogyfurdo es Uszoda,Spa,22.585534,25.333333,47.518302,19.082394


**We run the DBSCAN analysis with an epsilon of 0.25 (best obtained results from several tests) and a minimum number of points per cluster of 6 (as at least 6 different categories of venue should be in the surrounding of the headquarter).**

In [57]:
sklearn.utils.check_random_state(1000)

Clus_dataSet = dataClust.iloc[:,2:6]
Clus_dataSet = StandardScaler().fit_transform(Clus_dataSet)

# Compute DBSCAN
db = DBSCAN(eps=0.28, min_samples=6).fit(Clus_dataSet)
core_samples_mask = np.zeros_like(db.labels_, dtype=bool)
core_samples_mask[db.core_sample_indices_] = True
labels = db.labels_
dataClust["Clusters"]=labels

realClusterNum=len(set(labels)) - (1 if -1 in labels else 0)
clusterNum = len(set(labels)) 

# A sample of clusters
dataClust

Unnamed: 0,name,categories,Distance_to_airport_in_km,Duration_to_airport_in_minutes,lat,lng,Clusters
0,Rudas Gyogyfurdo es Uszoda,Spa,21.684301,24.133333,47.489188,19.047761,0
1,Szent Gellert Gyogyfurdo es Uszoda,Spa,20.799162,23.583333,47.483917,19.052256,0
2,Szent Lukacs Gyogyfurdo es Uszoda,Spa,24.338109,29.616667,47.517898,19.036682,-1
3,Dandar Gyogyfurdo,Spa,18.787482,20.133333,47.476337,19.071061,-1
4,Szechenyi Gyogyfurdo es Uszoda,Spa,22.585534,25.333333,47.518302,19.082394,-1
5,Kiraly Gyogyfurdo,Spa,25.413151,27.716667,47.510608,19.038185,-1
6,Palatinus Strandfurdo,Water Park,26.441522,29.8,47.52917,19.046946,-1
7,Paskal Gyogy- es Strandfurdo,Water Park,22.505066,26.616667,47.520571,19.127469,-1
8,Romai Strandfurdo,Pool,47.091015,36.25,47.574811,19.052087,-1
9,Punkosdfurdoi Strand,Pool,44.725279,32.916667,47.594627,19.0679,-1


**We visualize the clusters using folium**

In [58]:
# create map
map_clusters = folium.Map(location=[latBUD, longBUD], zoom_start=12)

# set color scheme for the clusters
x = np.arange(clusterNum)
ys = [i + x + (i*x)**2 for i in range(clusterNum)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = ['darkviolet', 'red', 'blue']

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(dataClust['lat'], dataClust['lng'], dataClust['categories'], dataClust['Clusters']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(map_clusters)
       
map_clusters

Two clusters are identified (numbered 0 in blue and 1 in purple). Numerous venues have been classified as -1 (red points), showing that numerous venues couldn't be associated to a cluster. These venues are two far from each other and/or to the airport by car to be considered as being of interest in our case. The cluster 1 is spread close to Kálvin tér, while cluster 0 is spread on Belváros, Lipótváros and the surrounding of the Opera. 

**We remove the venues of cluster -1 which can be considered as outliers. They are venues which does not have 6 neighbors in their close surrounding. Thus they are unlikely to be in an area meeting the criteria for the HQ.**

In [59]:
clusteredArea = dataClust[dataClust['Clusters'].isin([0, 1])]
clusteredArea.reset_index(inplace=True, drop=True)
clusteredArea

Unnamed: 0,name,categories,Distance_to_airport_in_km,Duration_to_airport_in_minutes,lat,lng,Clusters
0,Rudas Gyogyfurdo es Uszoda,Spa,21.684301,24.133333,47.489188,19.047761,0
1,Szent Gellert Gyogyfurdo es Uszoda,Spa,20.799162,23.583333,47.483917,19.052256,0
2,VegaCity,Vegetarian / Vegan Restaurant,21.911219,24.7,47.491976,19.061325,0
3,Vegan Garden,Vegetarian / Vegan Restaurant,21.373698,25.066667,47.499686,19.062519,0
4,Govinda Vega Sarok,Vegetarian / Vegan Restaurant,20.694554,23.516667,47.490787,19.056462,0
5,Vegan king,Vegetarian / Vegan Restaurant,21.405885,24.483333,47.489668,19.055639,0
6,Vegazzi,Vegetarian / Vegan Restaurant,21.552335,24.95,47.501194,19.059412,0
7,Veganeria Bisztro,Vegetarian / Vegan Restaurant,21.861329,25.466667,47.506752,19.055643,0
8,Zen vegan es vegetarianus konyha,Vegetarian / Vegan Restaurant,19.725729,21.216667,47.488925,19.060994,1
9,827 Vegan Kitchen 2.,Vegetarian / Vegan Restaurant,19.881836,21.766667,47.489105,19.061041,1


Only 35 venues are remaining.

**We display the remaining points of interests**

In [60]:
# create map and display it
full_map = folium.Map(location=[latBUD, longBUD], zoom_start=13)

# set a color dictionnary
colors = {'Spa' : 'blue', 'Water Park' : 'blue', 'Pool' : 'blue', 'Vegetarian / Vegan Restaurant' : 'limegreen', 'Hungarian Restaurant' : 'red', 'Gym / Fitness Center' : 'black', 'Conference Room' : 'darkviolet', 'Library' : 'brown', 'College Library' : 'brown'}

clusteredArea.apply(lambda row:folium.CircleMarker(location=[row["lat"], row["lng"]],
                                              radius=5, # define how big you want the circle markers to be
                                              color=False,
                                              fill=True,
                                              fill_color=colors[row['categories']],
                                              popup=folium.Popup(row['name'], parse_html=True),
                                              fill_opacity=1)
                                             .add_to(full_map), axis=1)

DataFullWater = clusteredArea[clusteredArea['categories'].isin(['Water Park', 'Pool', 'Spa']) ]
DataFullWater.apply(lambda row:folium.Circle(location=[row["lat"], row["lng"]],
                                        radius=1000,
                                        color='red',fill=False).add_to(full_map), axis=1)    

# show map
full_map

We observe that both Gellért and Rudas baths have all required facilities in a radius of 1000m, but all in the other side of the Danube. On the side of most facilities (Pest side), both circle of 1000m radius crossed each other close to Kálvin tér. Kálvin tér is also an area where venues are clustered (see cluster 1). If the headquarter would be close to this area, collaborators would have access to both baths and to all other categories of venue. Therefore, this seems to be the good place for the headquarter.

### 4.3 Visualization of the suggested area and description of venues accessibilities <a name="kalvin"></a>

**We first use geopy to get the coordinates of Kálvin tér.**

In [61]:
address = 'Kálvin tér, Budapest, Hungary'

geolocator = Nominatim(user_agent="bud_explorer")
location = geolocator.geocode(address)
latKalvin = location.latitude
longKalvin = location.longitude
print('The geograpical coordinate of Kálvin tér are {}, {}.'.format(latKalvin, longKalvin))

The geograpical coordinate of Kálvin tér are 47.489152149999995, 19.06170279927273.


**We search for the distance and duration of a trip by car from Kálvin tér to the airport.**

In [62]:
locKal = str(latKalvin)+','+str(longKalvin)
locAIR = str(latAIR)+','+str(longAIR)
loc = {"locations":[locKal, locAIR]}
url = 'https://www.mapquestapi.com/directions/v2/optimizedRoute?json={}&outFormat=json&key={}'.format(loc, mapquestkey)
print(url)

resultsDurKal = requests.get(url).json()
    
# assign relevant part of JSON to venues
durationKal = resultsDurKal['route']

# tranform venues into a dataframe
dfDurKal = pd.json_normalize(durationKal)

Kalvin = {'name': ['Kalvin Ter'], 'categories': ['Headquarter area'], 'Distance_to_airport_in_km': [dfDurKal.loc[0,'distance']*1.609344], 'Duration_to_airport_in_minutes': [time_to_sec(dfDurKal.loc[0,'formattedTime'])/60], 'lat': [latKalvin], 'lng': [longKalvin]}
dfKal = pd.DataFrame(data=Kalvin)
dfKal

https://www.mapquestapi.com/directions/v2/optimizedRoute?json={'locations': ['47.489152149999995,19.06170279927273', '47.433211,19.262335']}&outFormat=json&key=Feq0vnFrRt3o2LEMs5XAsO5fIYTF0G9Q


Unnamed: 0,name,categories,Distance_to_airport_in_km,Duration_to_airport_in_minutes,lat,lng
0,Kalvin Ter,Headquarter area,20.110363,22.233333,47.489152,19.061703


Kálvin tér is located around 20km from the airport and it takes less than 23 minutes to reach the airport by car.

**We observe the accessible venues within a distance of 500m and 1km (as the crow flies) from Kálvin tér. We also display a circle showing an area of 700m and 1200m around Kálvin tér in order to take into consideration that the headquarter may be few meters from the exact position of Kálvin tér. We suggest to search for a real estate in an area of 200m of radius around Kálvin tér.**

In [64]:
# create map and display it
full_map = folium.Map(location=[latKalvin, longKalvin], zoom_start=14)

# set a color dictionnary
colors = {'Spa' : 'blue', 'Vegetarian / Vegan Restaurant' : 'limegreen', 'Hungarian Restaurant' : 'red', 'Gym / Fitness Center' : 'black', 'Conference Room' : 'darkviolet', 'Library' : 'brown', 'College Library' : 'brown'}

dfKal.apply(lambda row:folium.Circle(location=[row["lat"], row["lng"]],
                                        radius=200,
                                        color='darkred',
                                        fill=True,
                                        fill_opacity=0.4,
                                        popup='Best area for headquarter')
                                       .add_to(full_map), axis=1)

dfKal.apply(lambda row:folium.Circle(location=[row["lat"], row["lng"]],
                                        radius=500,
                                        color='tomato',
                                        fill=False,
                                        popup='500m from Kalvin Ter')
                                       .add_to(full_map), axis=1)

dfKal.apply(lambda row:folium.Circle(location=[row["lat"], row["lng"]],
                                        radius=700,
                                        color='lightcoral',
                                        fill=False,
                                        popup='Extension of 200m depending on the exact location of the HQ, so 700m from Kalvin ter')
                                       .add_to(full_map), axis=1)

dfKal.apply(lambda row:folium.Circle(location=[row["lat"], row["lng"]],
                                        radius=1000,
                                        color='tomato',
                                        fill=False,
                                        popup='1000m from Kalvin Ter')
                                       .add_to(full_map), axis=1)    

dfKal.apply(lambda row:folium.Circle(location=[row["lat"], row["lng"]],
                                        radius=1200,
                                        color='lightcoral',
                                        fill=False,
                                        popup='Extension of 200m depending on the exact location of the HQ, so 1200m from Kalvin ter')
                                       .add_to(full_map), axis=1)    

clusteredArea.apply(lambda row:folium.CircleMarker(location=[row["lat"], row["lng"]],
                                              radius=5, # define how big you want the circle markers to be
                                              color=False,
                                              fill=True,
                                              fill_color=colors[row['categories']],
                                              popup=folium.Popup(row['name'], parse_html=True),
                                              fill_opacity=1)
                                             .add_to(full_map), axis=1)

# show map
full_map

If a real estate can be found at the direct vicinity of Kálvin tér: 
- Gellért bath, one of the famous official (thermal) bath from Budapest, will be less than 1km from the headquarter
- Quince conference room will be less than 1km from the headquarter
- 7 libraries will be less than 1km from the headquarter, 6 of them will be less than 500m away
- 5 restaurants providing vegetarian or vegan food and 2 restaurants providing hungarian food will be less than 500m from the headquarter
- 2 fitness centers will be less than 500m from the headquarter

**Kálvin tér is therefore the best candidate, as it meets all the given criteria and even overpass them.**

## 5. Discussion <a name="discussion"></a>

Our analysis shows that even if there are 12 **official (thermal) baths** in Budapest, they are very **far from the airport and/or isolated** without many venues in their surroundings. Both Gellért and Rudas baths could have been good candidates to be selected as the bath close to the headquarter. Howerever, they are both on the West side of the Danube while most of the required venues are located on the East side. Therefore, **this criteria in particular reduced drastically the candidate areas to host the new headquarter**.

**Combining exploration of the geospatial data and clustering using DBSCAN algorithm**, we could eliminate venues which were too far or isolated to have a chance to satisfy the criteria. Based on the results of the maps description and of the clustering, the surrounding of **Kálvin tér have been identified as the best candidate area to host the new headquarter**.

**Kálvin tér** is in **straight connection with the Gellért bath**, which would be located less than 1km from the HQ. Even if they are on different side of the Danube, they are connected with the Szabadság (Liberty) bridge. Going to the bath will be easy either walking or by public transportation. **One conference room and 7 libraries** are located less than 1km from Kálvin tér. Also, the collaborators will have the opportunity to choose between **5 vegetarian/vegan restaurants and 2 Hungarian restaurants**, located less than 500m from Kálvin tér. Fitness is also available with **2 fitness centers** located at less than 500m from Kálvin tér. 
Finally, Kálvin tér is one of the best location to go to the airport as it is on the main road joining the city center and the airport. Therefore, it will take **less than 23 minutes by car to reach the airport terminal**.
Even if not requested, we can add that Kalvin Tér has metro station for 2 lines and is connected by this way to 3 train stations and to the bus station which goes to the airport by public transportation.

As finding a real estate in the direct vicinity of Kálvin tér may be challenging, we also provided visualization for the venues accessibility considering that the HQ may be in an area of 200m of radius around Kálvin tér. In such case we recommend to locate the HQ rather in the west of Kálvin tér if it is desired to strictly keep a distance of less than 1km with an official bath.

## 6. Conclusion <a name="conclusion"></a>

Using Foursquare and Mapquest API, it has been possible to gather the necessary data to response to the given problem. Mapping and description as well as clustering of geospatial and itinerary data allow to propose a solution to the startup founders. We could point the best area in Budapest which meet all the criteria they had to settle their new headquarter.
In continuation, if they would like suggestion to optimize the organization of the time of their collaborators, we could search for the best venue for each category considering the duration to reach each of them from the HQ by walking and by public transportation.