# Coursera Capstone Final Project

## Project Description

The goal for the project is to find the best place in Budapest for a new pub. It is crucial to find the right location for businesses, most of the time it is the deciding factor in the beginning of the business.

I will use the Foursquare API to solve the problem. To find the best place it`s important to find the already existing places, which places are popular and if is there any room for a new business.

There are also regulations where you can open a pub, in our case it has to be at least 150 meters from any public school.

In [163]:
import pandas as pd
import numpy as np
from bs4 import BeautifulSoup
from urllib.request import urlopen
import requests
import folium # map rendering library
import requests # library to handle requests
from pandas.io.json import json_normalize # tranform JSON file into a pandas dataframe
import re
import geopy.distance
import pickle
import math

In [2]:
# Function for parsing url, I needed to add the header because some websites are checking the browser for the get queries
def parse(url):
    headers = {'User-Agent': 'Mozilla/5.0'}
    return requests.get(url, headers=headers).content

This page contains all the gps coordinates for the districts: http://nepesseg.com/budapest/
In this function I will parse the data:

In [3]:
def parseDistricts(num):
    url = "http://nepesseg.com/budapest/budapest-{:02d}-kerulet".format(num)
    content = parse(url)
    soup = BeautifulSoup(content, "lxml")
    tag = soup.find(lambda tag:tag.name=="p" and "GPS koordinátái:" in tag.text)
    arr = re.findall(r"[-+]?\d*\.\d+|\d+", tag.text)
    return (arr[-2], arr[-1])

Get the coordinates for all the 23 districts in Budapest

In [4]:
budapest_coord = []
for i in range(1,24):
    coord = parseDistricts(i)
    budapest_coord.append(("Budapest " + str(i), float(coord[0]), float(coord[1])))

Checking the coordinates

In [5]:
budapest_coord

[('Budapest 1', 47.4968, 19.0375),
 ('Budapest 2', 47.5393, 18.9869),
 ('Budapest 3', 47.5672, 19.0369),
 ('Budapest 4', 47.5778, 19.0952),
 ('Budapest 5', 47.5002, 19.052),
 ('Budapest 6', 47.5081, 19.0678),
 ('Budapest 7', 47.5027, 19.0734),
 ('Budapest 8', 47.4887, 19.0845),
 ('Budapest 9', 47.4649, 19.0916),
 ('Budapest 10', 47.4821, 19.1575),
 ('Budapest 11', 47.4593, 19.0187),
 ('Budapest 12', 47.4991, 18.9905),
 ('Budapest 13', 47.5355, 19.0709),
 ('Budapest 14', 47.5225, 19.1147),
 ('Budapest 15', 47.5627, 19.1325),
 ('Budapest 16', 47.5183, 19.1919),
 ('Budapest 17', 47.4754, 19.2665),
 ('Budapest 18', 47.4281, 19.2098),
 ('Budapest 19', 47.4457, 19.143),
 ('Budapest 20', 47.4333, 19.1193),
 ('Budapest 21', 47.4244, 19.0661),
 ('Budapest 22', 47.4105, 19.0001),
 ('Budapest 23', 47.394, 19.1225)]

In [6]:
budapest_data = pd.DataFrame(list(budapest_coord), columns=['District', 'Latitude', 'Longitude'])

In [7]:
latitude = 47.50
longitude = 19.05

In [8]:
# create map of Manhattan using latitude and longitude values
map_budapest = folium.Map(location=[latitude, longitude], zoom_start=11)

# add markers to map
for lat, lng, label in zip(budapest_data['Latitude'], budapest_data['Longitude'], budapest_data['District']):
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_budapest)  
    
map_budapest

In [9]:

coords_1 = (47.4968, 19.0375)
coords_2 = (47.5393, 18.9869)

print(geopy.distance.vincenty(coords_1, coords_2).km)

6.070562527719248


  """


In [10]:
CLIENT_ID = 'N3WFOT4ZNN400G3S3MD23HOSCMOLSY4IGXLEMDX4O0K5NEWC' # your Foursquare ID
CLIENT_SECRET = 'WJYGOWE2B5KPY30LFQ20IYVJ522BLYLAZRCPNMBJQZLRDFZT' # your Foursquare Secret
VERSION = '20180605' # Foursquare API version

print('Your credentails:')
print('CLIENT_ID: ' + CLIENT_ID)
print('CLIENT_SECRET:' + CLIENT_SECRET)

Your credentails:
CLIENT_ID: N3WFOT4ZNN400G3S3MD23HOSCMOLSY4IGXLEMDX4O0K5NEWC
CLIENT_SECRET:WJYGOWE2B5KPY30LFQ20IYVJ522BLYLAZRCPNMBJQZLRDFZT


In [11]:
neighborhood_latitude = budapest_data.loc[0, 'Latitude'] # neighborhood latitude value
neighborhood_longitude = budapest_data.loc[0, 'Longitude'] # neighborhood longitude value

In [12]:
LIMIT = 100 
radius = 1000
url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
    CLIENT_ID, 
    CLIENT_SECRET, 
    VERSION, 
    neighborhood_latitude, 
    neighborhood_longitude, 
    radius, 
    LIMIT)
results = requests.get(url).json()
results

{'meta': {'code': 200, 'requestId': '5c476fd5f594df0eecf80ebf'},
 'response': {'groups': [{'items': [{'reasons': {'count': 0,
       'items': [{'reasonName': 'globalInteractionReason',
         'summary': 'This spot is popular',
         'type': 'general'}]},
      'referralId': 'e-0-4bc9f998937ca59338c2a692-0',
      'venue': {'categories': [{'icon': {'prefix': 'https://ss3.4sqi.net/img/categories_v2/parks_outdoors/beach_',
          'suffix': '.png'},
         'id': '50aaa49e4b90af0d42d5de11',
         'name': 'Castle',
         'pluralName': 'Castles',
         'primary': True,
         'shortName': 'Castle'}],
       'id': '4bc9f998937ca59338c2a692',
       'location': {'address': 'Budai Vár',
        'cc': 'HU',
        'city': 'Budapest',
        'country': 'Magyarország',
        'distance': 167,
        'formattedAddress': ['Budapest', 'Budai Vár', '1014', 'Magyarország'],
        'labeledLatLngs': [{'label': 'display',
          'lat': 47.496197815204255,
          'lng': 19.0

In [13]:
# function that extracts the category of the venue
def get_category_type(row):
    try:
        categories_list = row['categories']
    except:
        categories_list = row['venue.categories']
        
    if len(categories_list) == 0:
        return None
    else:
        return categories_list[0]['name']

In [14]:
def getNearbyVenues(names, latitudes, longitudes, radius=1000):
    LIMIT = 500
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
            
        # make the GET request
        res_json = requests.get(url).json()
        if "response" in res_json:
            results = requests.get(url).json()["response"]['groups'][0]['items']
        else:
            venues_list.append([(
            '',
            name, 
            lat, 
            lng, 
            '', 
            0, 
            0,  
            '')])
            continue
        
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            v['venue']['id'], 
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Id', 
                  'Neighborhood', 
                  'Neighborhood Latitude', 
                  'Neighborhood Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

In [15]:
budapest_venues = getNearbyVenues(names=budapest_data['District'],
                                   latitudes=budapest_data['Latitude'],
                                   longitudes=budapest_data['Longitude']
                                  )

Budapest 1
Budapest 2
Budapest 3
Budapest 4
Budapest 5
Budapest 6
Budapest 7
Budapest 8
Budapest 9
Budapest 10
Budapest 11
Budapest 12
Budapest 13
Budapest 14
Budapest 15
Budapest 16
Budapest 17
Budapest 18
Budapest 19
Budapest 20
Budapest 21
Budapest 22
Budapest 23


In [16]:
budapest_venues.groupby('Neighborhood').count()

Unnamed: 0_level_0,Id,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
Neighborhood,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1
Budapest 1,100,100,100,100,100,100,100
Budapest 10,19,19,19,19,19,19,19
Budapest 11,51,51,51,51,51,51,51
Budapest 12,26,26,26,26,26,26,26
Budapest 13,99,99,99,99,99,99,99
Budapest 14,61,61,61,61,61,61,61
Budapest 15,31,31,31,31,31,31,31
Budapest 16,27,27,27,27,27,27,27
Budapest 17,24,24,24,24,24,24,24
Budapest 18,22,22,22,22,22,22,22


In [17]:
budapest_pubs = budapest_venues[(budapest_venues['Venue Category'] == 'Beer Bar') | 
                (budapest_venues['Venue Category'] == 'Bistro') | 
                (budapest_venues['Venue Category'] == 'Bar') | 
                (budapest_venues['Venue Category'] == 'Karaoke Bar') | 
                (budapest_venues['Venue Category'] == 'Gastropub') | 
                (budapest_venues['Venue Category'] == 'Cocktail Bar') | 
                (budapest_venues['Venue Category'] == 'Beer Garden') | 
                (budapest_venues['Venue Category'] == 'Brewery') | 
               (budapest_venues['Venue Category'] == 'Pub')]

In [18]:
budapest_pubs

Unnamed: 0,Id,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
11,4c24f4dff7ced13a74b6236d,Budapest 1,47.4968,19.0375,Pater Marcus Apátsági Söröző,47.499427,19.039832,Pub
18,4c5fba3bb36eb713f8049ad2,Budapest 1,47.4968,19.0375,Déryné Bisztró,47.497269,19.031708,Bistro
24,4c90fbf151d9b1f72bae7c46,Budapest 1,47.4968,19.0375,Belga Söröző,47.501695,19.039771,Pub
42,5469e1cf498e7f9936fc45dd,Budapest 1,47.4968,19.0375,Hunyadi Lakásbisztró,47.500601,19.037172,Bar
60,52f8bca2498e0fd293f508c3,Budapest 1,47.4968,19.0375,Mr&Mrs Columbo Pub,47.492552,19.042136,Pub
79,4ba383e1f964a520864338e3,Budapest 1,47.4968,19.0375,Dunaparti Matróz Kocsma,47.502023,19.039671,Pub
93,4bdc70b0c79cc928af8d86e9,Budapest 1,47.4968,19.0375,MÁK,47.501096,19.047983,Bistro
120,4cddc9b33644a0937df1479f,Budapest 3,47.5672,19.0369,Pók Cafe,47.569715,19.048675,Karaoke Bar
135,4dc455adc65b89d3ca521d0b,Budapest 3,47.5672,19.0369,Pók Roulette Cafe,47.569751,19.048679,Bar
176,4d6ea7152427224ba39ad04d,Budapest 5,47.5002,19.0520,Borkonyha,47.499439,19.052330,Bistro


In [183]:
budapest_pubs.groupby('Neighborhood')['Id'].agg(['count'])

Unnamed: 0_level_0,count
Neighborhood,Unnamed: 1_level_1
Budapest 1,7
Budapest 10,2
Budapest 11,3
Budapest 13,4
Budapest 14,1
Budapest 15,1
Budapest 17,1
Budapest 19,1
Budapest 21,3
Budapest 3,2


In [107]:
budapest_pubs[budapest_pubs['Neighborhood'] == 'Budapest 7']

Unnamed: 0,Id,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
386,4be992a362c0c92867ccdfd4,Budapest 7,47.5027,19.0734,Snaps Galéria Belga Söröző,47.506122,19.070342,Pub
388,50cf78e2e4b0a73ce6a7f069,Budapest 7,47.5027,19.0734,Kandalló Kézműves Pub,47.501484,19.065984,Gastropub
389,4d137b5bffa1224b88d3a0ad,Budapest 7,47.5027,19.0734,Csak a jó sör!,47.501792,19.065552,Beer Bar
391,54838153498e8cea44a5d027,Budapest 7,47.5027,19.0734,Fricska,47.501433,19.066282,Gastropub
408,4c8138ec74d7b60c219c74d8,Budapest 7,47.5027,19.0734,Kalicka Bistro,47.503204,19.076142,Bar
410,56659516498ef08574f4acc5,Budapest 7,47.5027,19.0734,Hopaholic - In Hop We Trust,47.500028,19.065813,Beer Bar
425,587ab4e36d349d17a0a46865,Budapest 7,47.5027,19.0734,Refuge Bistro,47.502462,19.064941,Bar
443,585478220a3d5419757ae1ea,Budapest 7,47.5027,19.0734,Beer Point,47.500424,19.068579,Beer Bar
450,57e4191d498e992e0a20d5b3,Budapest 7,47.5027,19.0734,Ogre Bácsi,47.506309,19.06725,Bar
454,4eebb0d9f7903296f89f9d74,Budapest 7,47.5027,19.0734,Kisüzem,47.499683,19.062764,Bar


In [20]:
def get_venue_details(venue_id):
    url = 'https://api.foursquare.com/v2/venues/{}?&client_id={}&client_secret={}&v={}'.format(
        venue_id,
        CLIENT_ID, 
        CLIENT_SECRET, 
        VERSION
    )
    return requests.get(url).json()

In [192]:
def checkResponses(responses):
    for response in responses:
        print(pub_details_5[0]['meta']['code'])

## Get all the pub details from the 5th district 

In [187]:
pub_details_5 = []
for pub_id in budapest_pubs[budapest_pubs['Neighborhood'] == 'Budapest 5']['Id']:
    json = get_venue_details(pub_id) 
    pub_details_5.append(json)
pickle.dump( pub_details_5, open( "pub_details_5.p", "wb" ) )

## Get all the pub details from the 7th district 

In [108]:
pub_details_7 = []
for pub_id in budapest_pubs[budapest_pubs['Neighborhood'] == 'Budapest 7']['Id']:
    json = get_venue_details(pub_id) 
    pub_details_7.append(json)
pickle.dump( pub_details_7, open( "pub_details_7.p", "wb" ) )

## Get all the pub details from the 6h district 

In [110]:
pub_details_6 = []
for pub_id in budapest_pubs[budapest_pubs['Neighborhood'] == 'Budapest 6']['Id']:
    json = get_venue_details(pub_id) 
    pub_details_6.append(json)
pickle.dump( pub_details_6, open( "pub_details_6.p", "wb" ) )

## Get all the pub details from the 8th district 

In [111]:
pub_details_8 = []
for pub_id in budapest_pubs[budapest_pubs['Neighborhood'] == 'Budapest 8']['Id']:
    json = get_venue_details(pub_id) 
    pub_details_8.append(json)
pickle.dump( pub_details_8, open( "pub_details_8.p", "wb" ) )

In [195]:
checkResponses(pub_details_5)
checkResponses(pub_details_6)
checkResponses(pub_details)
checkResponses(pub_details_8)

200
200
200
200
200
200
200
200
200
200
200
200
200
200
200
200
200
200
200
200
200
200
200
200
200
200
200
200
200
200
200
200
200
200
200
200
200
200
200
200
200
200
200
200
200
200
200
200
200
200
200


In [166]:
def getDetail(detail, key):
    if key in detail:
        return detail[key]
    return 'Unknown'

def getPrice(detail):
    price = getDetail(venue, 'price')
    if getDetail(venue, 'price') != 'Unknown':
        return price['tier']
    return -1

def getAttributes(venue):
    result = dict()
    for group in venue["attributes"]["groups"]:
        key = group["items"][0]["displayName"]
        val = group["items"][0]["displayValue"]
        result[key] = val
    return result

def getPostalCode(venue):
    location = getDetail(venue, 'location')
    if 'postalCode' in location:
        return location['postalCode']
    return 9999

def getNumberOfLikes(venue):
    location = getDetail(venue, 'likes')
    if 'count' in location:
        return location['count']
    return 0

def merge_two_dicts(x, y):
    z = x.copy()   # start with x's keys and values
    z.update(y)    # modifies z with y's keys and values & returns None
    return z

In [167]:
merged_pub_details = pub_details_6 + pub_details + pub_details_8
venues_data = []
for detail in merged_pub_details:
    if 'venue'not in detail['response']:
        print(detail['response'])
        continue
    venue = detail['response']['venue']
    name = getDetail(venue, 'name')
    rating = getDetail(venue, 'rating')
    price = getPrice(venue)
    postal_code = getPostalCode(venue)
    likes = getNumberOfLikes(venue)
    attributes = getAttributes(venue)
    venue_dict = {
        'Name': name,
        'Rating': rating,
        'PriceRate' : price,
        'PostalCode' : postal_code,
        'Likes' : likes
    }
    venue_merged = merge_two_dicts(venue_dict, attributes)
    venues_data.append(venue_merged)
venue_df = pd.DataFrame(venues_data) 

#reorder columns
cols = []
non_null_columns = [col for col in venue_df.columns if not(venue_df[col].isnull().any())]
cols.extend(non_null_columns)
cols.extend([x for x in venue_df.columns if x not in non_null_columns])
#venue_df = venue_df[cols]
venue_df = venue_df[cols].drop_duplicates()

{}


In [169]:
venue_df.sort_values(['Likes'], ascending=[0])

Unnamed: 0,Likes,Name,PostalCode,PriceRate,Rating,Bar Service,Beer,Breakfast,Brunch,Cocktails,Credit Cards,Delivery,Dinner,Full Bar,Groups Only,Happy Hour,Jukebox,Live Music,Lunch,Music,Outdoor Seating,Price,Private Room,Reservations,Restroom,Smoking,Street Parking,TVs,Table Service,Wheelchair Accessible,Wi-Fi,Wine
6,576,Kandalló Kézműves Pub,1073,2,9.0,,Beer,,Brunch,,Yes (incl. Visa & MasterCard),Delivery,,,,,,,,,Yes,$$,,Yes,,,,,,,Free,
7,486,360 Bar,1061,3,8.5,,Beer,,Brunch,,Yes (incl. American Express & MasterCard),,,,,,,Live Music,,,Yes,$$$,,Yes,Yes,Yes,Street,,,Yes,Yes,
23,482,STIKA Budapest,1072,2,8.3,,Beer,Breakfast,,,Yes (incl. Visa & MasterCard),,,,,,,,,,Yes,$$,,Yes,,,Street,,Table Service,,Yes,
28,456,A Grund,1082,2,8.2,,Beer,,,,Yes,,,,,,,Live Music,Lunch,,Yes,$$,Yes,Yes,Yes,,,Yes,,,Free,
22,430,Kisüzem,1077,2,9.0,,Beer,,,,Yes,,,,,,,Live Music,,,No,$$,,,,,,,,,Yes,
4,299,Kiadó Kocsma,1061,2,8.5,,Beer,,Brunch,,Yes,,,,Groups Only,,,,,Yes,No,$$,,,Yes,,,,,,Free,
5,261,Csak a jó sör!,1073,2,9.2,,Beer,,,,Yes,Delivery,,,,,,,,,No,$$,,,,,,,,,Free,
18,152,Hopaholic - In Hop We Trust,1072,-1,8.5,,Beer,,,,Yes,,,,,,,,,,No,,,Yes,,,,,,,Yes,
24,148,WarmUp,1077,3,8.7,,,,,,Yes (incl. Visa & MasterCard),,,Full Bar,Groups Only,,Jukebox,,,,No,$$$,,,Yes,,Street,,,,Free,
8,145,The Caledonia Budapest Scottish Pub & Shop,1066,2,8.3,,Beer,Breakfast,,,Yes (incl. Visa & MasterCard),,,,,,,,,,Yes,$$,,Yes,,,,Yes,,,Free,


In [170]:

venue_df.groupby('PostalCode')['Rating', 'Likes'].agg(np.mean)

Unnamed: 0_level_0,Rating,Likes
PostalCode,Unnamed: 1_level_1,Unnamed: 2_level_1
9999,8.0,71.5
1061,8.5,392.5
1066,8.35,99.5
1067,7.833333,43.666667
1071,8.4,124.0
1072,8.4,317.0
1073,8.54,210.8
1074,7.4,40.0
1077,8.85,289.0
1082,7.125,146.5


In [143]:
for col in venue_df.columns:
    print(col)
    if isinstance(col, str) and venue_df[col].dtype == 'object': 
        venue_df.loc[venue_df[col] == col, col] = True
        venue_df.loc[venue_df[col] == 'Yes', col] = True
        venue_df.loc[venue_df[col] == 'No', col] = False
        venue_df.loc[venue_df[col] == 'Yes (incl. American Express & MasterCard)', col] = True
        venue_df.loc[venue_df[col] == 'Yes (incl. Visa & MasterCard)', col] = True
        venue_df.loc[venue_df[col] == 'Yes (incl. NFC Payments & MasterCard)', col] = True
        venue_df.loc[venue_df[col] == 'Free', col] = True
        

Name
PostalCode
PriceRate
Rating
Bar Service
Beer
Breakfast
Brunch
Cocktails
Credit Cards
Delivery
Dinner
Full Bar
Groups Only
Happy Hour
Jukebox
Live Music
Lunch
Music
Outdoor Seating
Price
Private Room
Reservations
Restroom
Smoking
Street Parking
TVs
Table Service
Wheelchair Accessible
Wi-Fi
Wine


In [136]:
#venue_transformed = venue_df.fillna(False)


In [145]:
from IPython.display import display
pd.options.display.max_columns = None
venue_df

Unnamed: 0,Name,PostalCode,PriceRate,Rating,Bar Service,Beer,Breakfast,Brunch,Cocktails,Credit Cards,Delivery,Dinner,Full Bar,Groups Only,Happy Hour,Jukebox,Live Music,Lunch,Music,Outdoor Seating,Price,Private Room,Reservations,Restroom,Smoking,Street Parking,TVs,Table Service,Wheelchair Accessible,Wi-Fi,Wine
0,Snaps Galéria Belga Söröző,1071,2,8.4,,True,,,,True,,,,,,,,,,False,$$,,True,,,,,,,True,
1,Ogre Bácsi,1067,2,7.9,,True,,,,,,,,,,,,,,,$$,,,,,,,,,,
2,Ferdinánd Monarchia Cseh Sörház,1067,1,7.7,,True,,,,True,,,,,,,,True,,False,$,,True,,,,,,,True,
3,Jaromír '68 Cseh Sörpince,1067,1,7.9,,True,,,,True,,True,,,,,,,,False,$,,,,,,,,,True,
4,Kiadó Kocsma,1061,2,8.5,,True,,True,,True,,,,True,,,,,True,False,$$,,,True,,,,,,True,
5,Csak a jó sör!,1073,2,9.2,,True,,,,True,True,,,,,,,,,False,$$,,,,,,,,,True,
6,Kandalló Kézműves Pub,1073,2,9.0,,True,,True,,True,True,,,,,,,,,True,$$,,True,,,,,,,True,
7,360 Bar,1061,3,8.5,,True,,True,,True,,,,,,,True,,,True,$$$,,True,True,True,Street,,,True,True,
8,The Caledonia Budapest Scottish Pub & Shop,1066,2,8.3,,True,True,,,True,,,,,,,,,,True,$$,,True,,,,True,,,True,
9,Fricska,1073,3,8.7,,,,,,True,,,,,,,,True,,False,$$$,,True,,,,,,,True,True


In [30]:
budapest_other_venues = pd.concat([budapest_pubs, budapest_venues]).drop_duplicates(keep=False)

In [148]:
budapest_pubs_merged = budapest_pubs.set_index('Venue').join(venue_df[['Name', 'PriceRate', 'Rating']].set_index('Name'))

In [149]:
budapest_pubs_merged

Unnamed: 0,Id,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue Latitude,Venue Longitude,Venue Category,PriceRate,Rating
360 Bar,5374ddf2498e43fccfbc5e89,Budapest 6,47.5081,19.0678,47.503698,19.061546,Bar,3.0,8.5
A Grund,4b8c2928f964a5206fc132e3,Budapest 8,47.4887,19.0845,47.484834,19.076565,Beer Garden,2.0,8.2
Andersen 2,4ff756c6e4b01ead0237222c,Budapest 8,47.4887,19.0845,47.484224,19.075747,Bar,2.0,6.1
B.A.K. Hütte,4ffb2f30e4b08a4c7447bf45,Budapest 13,47.5355,19.0709,47.528392,19.073459,Pub,,
Beer Point,585478220a3d5419757ae1ea,Budapest 7,47.5027,19.0734,47.500424,19.068579,Beer Bar,-1.0,7.6
Belga Söröző,4c90fbf151d9b1f72bae7c46,Budapest 1,47.4968,19.0375,47.501695,19.039771,Pub,,
Bistro Fine,56e1c416498efa4a84d3be51,Budapest 5,47.5002,19.0520,47.500810,19.056635,Bistro,,
Bohemia Söröző,4ee3bea5e5faffd730fe5610,Budapest 8,47.4887,19.0845,47.492597,19.073258,Pub,1.0,6.8
Bohém Söröző,4be6c1aa477d9c74fdc8e62d,Budapest 11,47.4593,19.0187,47.465388,19.023178,Pub,,
Borkonyha,4d6ea7152427224ba39ad04d,Budapest 5,47.5002,19.0520,47.499439,19.052330,Bistro,,


In [171]:
# create map of Budapest using latitude and longitude values
pub_budapest_inner = folium.Map(location=[latitude, longitude], zoom_start=11)

# add markers to map
for lat, lng, label, rating in zip(budapest_pubs_merged['Venue Latitude'], budapest_pubs_merged['Venue Longitude'], budapest_pubs_merged.index, budapest_pubs_merged['Rating']):
    label = folium.Popup(label, parse_html=True) 
    color = 'red'
    if rating>=8.5:
        color = 'green'
    elif rating > 7.7:
        color = 'yellow'
    elif math.isnan(rating):
        color = 'blue'
    
    print((lat, lng, label, rating))
    folium.Circle(
        [lat, lng],
        radius=5,
        popup=label,
        color=color,
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(pub_budapest_inner)  
    
pub_budapest_inner

(47.503697754917006, 19.06154599823155, <folium.map.Popup object at 0x000000000B4D7048>, 8.5)
(47.48483361786026, 19.0765654315334, <folium.map.Popup object at 0x000000000C529E80>, 8.2)
(47.48422350498602, 19.075747053584575, <folium.map.Popup object at 0x000000000B4D7550>, 6.1)
(47.52839171406426, 19.073459064246148, <folium.map.Popup object at 0x000000000C6D5A58>, nan)
(47.5004242051535, 19.068579061206155, <folium.map.Popup object at 0x000000000C6D3860>, 7.6)
(47.50169488330903, 19.03977086335183, <folium.map.Popup object at 0x000000000C6D58D0>, nan)
(47.50080995767804, 19.056635360343453, <folium.map.Popup object at 0x000000000C6EDC18>, nan)
(47.49259678550234, 19.073257906488436, <folium.map.Popup object at 0x000000000C6D5DD8>, 6.8)
(47.46538783551039, 19.023178057450455, <folium.map.Popup object at 0x000000000C529C50>, nan)
(47.499439422937684, 19.052329953576, <folium.map.Popup object at 0x000000000C6D32B0>, nan)
(47.49958811566566, 19.05601531647689, <folium.map.Popup object at

In [157]:
# create map of Budapest using latitude and longitude values
pub_budapest = folium.Map(location=[latitude, longitude], zoom_start=11)

# add markers to map
for lat, lng, label in zip(budapest_pubs['Venue Latitude'], budapest_pubs['Venue Longitude'], budapest_pubs['Venue']):
    label = folium.Popup(label, parse_html=True)
    folium.Circle(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(pub_budapest)  
    
pub_budapest

In [None]:
budapest_venues['Venue Category'].unique()

## Schools

In [173]:
num = 20
url = "http://www.iskolaklistaja.eu/tipus/?regio=kozep-magyarorszag&kerulet=budapest&start={}".format(num)
content = parse(url)
soup = BeautifulSoup(content, "lxml")
names = soup.findAll("div", {"class": "school_name"})
infos = soup.findAll("div", {"class": "school_info"})
for name, info in zip(names, infos):
    print(name.text + " " + info.text.split(',')[1])


Csik Ferenc Általános Iskola és Gimnázium  Medve u. 5-7.
Szent Angéla Ferences Általános Iskola és Gimnázium  Ady Endre u. 3.
Kispesti Erkel Ferenc Általános Iskola  Hungária út 11.
Pestszentlőrinci Közgazdasági és Informatikai Szakközépiskola  Hengersor u. 34.
Pannonhalmi Béla Baptista Általános Iskola  Kőér u. 7/b
Sashalmi Tanoda Általános Iskola  Metró u. 3-7.
Budapest IX. Kerületi Kőrösi Csoma Sándor Kéttannyelvű Általános Iskola  Ifjúmunkás u. 13.
Greater Grace Nemzetközi Óvoda, Általános Iskola és Gimnázium  Szilágyi Erzsébet fasor 22/b.
EFEB Érettségizettek Szakgimnáziuma és Szakközépiskolája  FRANGEPÁN U. 19.
Pestszentlőrinc-Pestszentimrei Felnőttek Gimnáziuma  Kondor Béla sétány 10.
Budapesti Komplex Szakképzési Centrum Erzsébet Királyné Szépészeti Szakgimnáziuma  Kossuth Lajos u. 35.
Újlak Utcai Általános, Német Nemzetiségi és Magyar-Angol Két Tanítási Nyelvű Iskola  Újlak u. 110.
BMSZC Pataky István Híradásipari és Informatikai Szakgimnáziuma  Salgótarjáni u. 53/b
Budapest X

In [174]:
def get_school_data(ker):
    url = "https://holmivan.valami.info/budapest-{}-kerulet/iskola-93".format(ker)
    content = parse(url)
    soup = BeautifulSoup(content, "lxml")
    table = soup.find('table', attrs={'class':'itemlist table table-condensed table-striped'})
    data = []
    rows = table.find_all('tr')
    for row in rows:
        cols = row.find_all('td')
        res = []
        res.append(cols[0].text.strip())
        res.append(cols[1].text.strip())
        #We need to find the gps coord in the last column
        coord_tag = cols[-1].find(lambda tag:tag.name=="a")
        arr = re.findall(r"[-+]?\d*\.\d+|\d+", coord_tag['onclick'])
        res.append(float(arr[-2]))
        res.append(float(arr[-1]))
        data.append([ele for ele in res if ele]) # Get rid of empty values
    school_data = pd.DataFrame(list(data), columns=['Name', 'Address', 'Latitude', 'Longitude'])
    return school_data

In [175]:
school_data = pd.concat([get_school_data(5),get_school_data(6),get_school_data(7),get_school_data(8)])

In [177]:
school_budapest = folium.Map(location=[latitude, longitude], zoom_start=11)

# add markers to map
for lat, lng, label in zip(school_data['Latitude'], school_data['Longitude'], school_data['Name']):
    label = folium.Popup(label, parse_html=True)
    folium.Circle(
        [lat, lng],
        radius=50,
        popup=label,
        color='red',
        fill=True,
        fill_color='#ffcccc',
        fill_opacity=0.7,
        parse_html=False).add_to(school_budapest) 
    
# add markers to map
for lat, lng, label, rating in zip(budapest_pubs_merged['Venue Latitude'], budapest_pubs_merged['Venue Longitude'], budapest_pubs_merged.index, budapest_pubs_merged['Rating']):
    label = folium.Popup(label, parse_html=True) 
    color = 'red'
    if rating>=8.5:
        color = 'green'
    elif rating > 7.7:
        color = 'yellow'
    elif math.isnan(rating):
        color = 'grey'
    
    print((lat, lng, label, rating))
    folium.Circle(
        [lat, lng],
        radius=5,
        popup=label,
        color=color,
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(school_budapest)  
    
for lat, lng, label in zip(budapest_other_venues['Venue Latitude'], budapest_other_venues['Venue Longitude'], budapest_other_venues['Venue']):
    label = folium.Popup(label, parse_html=True)
    folium.Circle(
        [lat, lng],
        radius=5,
        popup=label,
        color='grey',
        fill=True,
        fill_color='#00ffff',
        fill_opacity=0.7,
        parse_html=False).add_to(school_budapest)  
    
school_budapest

(47.503697754917006, 19.06154599823155, <folium.map.Popup object at 0x000000000EC652E8>, 8.5)
(47.48483361786026, 19.0765654315334, <folium.map.Popup object at 0x000000000ECA6CF8>, 8.2)
(47.48422350498602, 19.075747053584575, <folium.map.Popup object at 0x000000000EC6E898>, 6.1)
(47.52839171406426, 19.073459064246148, <folium.map.Popup object at 0x000000000ECA6048>, nan)
(47.5004242051535, 19.068579061206155, <folium.map.Popup object at 0x000000000EC904E0>, 7.6)
(47.50169488330903, 19.03977086335183, <folium.map.Popup object at 0x000000000ECA6EB8>, nan)
(47.50080995767804, 19.056635360343453, <folium.map.Popup object at 0x000000000EC90630>, nan)
(47.49259678550234, 19.073257906488436, <folium.map.Popup object at 0x000000000ECA6FD0>, 6.8)
(47.46538783551039, 19.023178057450455, <folium.map.Popup object at 0x000000000ECA66D8>, nan)
(47.499439422937684, 19.052329953576, <folium.map.Popup object at 0x000000000EC90198>, nan)
(47.49958811566566, 19.05601531647689, <folium.map.Popup object at

## 3. Analyze Each Neighborhood

In [178]:
# one hot encoding
budapest_onehot = pd.get_dummies(budapest_venues[['Venue Category']], prefix="", prefix_sep="")

# add neighborhood column back to dataframe
budapest_onehot['Neighborhood'] = budapest_venues['Neighborhood'] 

# move neighborhood column to the first column
fixed_columns = [budapest_onehot.columns[-1]] + list(budapest_onehot.columns[:-1])
budapest_onehot = budapest_onehot[fixed_columns]

budapest_onehot.head()

Unnamed: 0,Yoga Studio,American Restaurant,Arcade,Art Gallery,Art Museum,Arts & Entertainment,Asian Restaurant,Athletics & Sports,Auto Garage,BBQ Joint,Bagel Shop,Bakery,Bar,Beach,Beach Bar,Beer Bar,Beer Garden,Beer Store,Bike Rental / Bike Share,Bistro,Boat or Ferry,Bookstore,Botanical Garden,Boutique,Boxing Gym,Breakfast Spot,Brewery,Burger Joint,Bus Station,Bus Stop,Business Service,Café,Camera Store,Campground,Candy Store,Carpet Store,Castle,Chinese Restaurant,Chocolate Shop,Church,Clothing Store,Cocktail Bar,Coffee Shop,College Arts Building,Comedy Club,Convenience Store,Cosmetics Shop,Cuban Restaurant,Cupcake Shop,Dance Studio,Deli / Bodega,Department Store,Dessert Shop,Diner,Dive Bar,Dive Shop,Dog Run,Eastern European Restaurant,Electronics Store,Event Space,Falafel Restaurant,Farm,Farmers Market,Fast Food Restaurant,Film Studio,Flower Shop,Food,Food & Drink Shop,Food Court,Forest,French Restaurant,Fried Chicken Joint,Fruit & Vegetable Store,Furniture / Home Store,Garden,Garden Center,Gastropub,General Entertainment,German Restaurant,Gift Shop,Gourmet Shop,Greek Restaurant,Grocery Store,Gun Range,Gym,Gym / Fitness Center,Gym Pool,Gymnastics Gym,Health & Beauty Service,Heliport,Historic Site,History Museum,Hockey Arena,Home Service,Hookah Bar,Hostel,Hot Dog Joint,Hotel,Hungarian Restaurant,IT Services,Ice Cream Shop,Indian Restaurant,Indie Movie Theater,Indie Theater,Intersection,Italian Restaurant,Japanese Restaurant,Jewish Restaurant,Karaoke Bar,Kids Store,Lebanese Restaurant,Light Rail Station,Liquor Store,Lounge,Market,Martial Arts Dojo,Mediterranean Restaurant,Metro Station,Mexican Restaurant,Miscellaneous Shop,Mobile Phone Shop,Modern European Restaurant,Monument / Landmark,Motorcycle Shop,Mountain,Movie Theater,Multiplex,Music Store,Music Venue,Neighborhood,Nightclub,Office,Opera House,Optical Shop,Outdoor Sculpture,Paintball Field,Paper / Office Supplies Store,Park,Pastry Shop,Pedestrian Plaza,Performing Arts Venue,Pet Store,Pharmacy,Photography Studio,Pizza Place,Platform,Playground,Plaza,Pool,Pool Hall,Pub,Public Art,Ramen Restaurant,Record Shop,Restaurant,River,Road,Salad Place,Salon / Barbershop,Sandwich Place,Scenic Lookout,Science Museum,Sculpture Garden,Seafood Restaurant,Shoe Repair,Shop & Service,Shopping Mall,Skating Rink,Ski Area,Smoke Shop,Soccer Field,Soccer Stadium,Social Club,Soup Place,Spa,Speakeasy,Sporting Goods Shop,Squash Court,Stadium,Steakhouse,Student Center,Supermarket,Sushi Restaurant,Tanning Salon,Tattoo Parlor,Tea Room,Tennis Court,Thai Restaurant,Theater,Theme Park,Thrift / Vintage Store,Toy / Game Store,Track,Train Station,Tram Station,Turkish Restaurant,Vegetarian / Vegan Restaurant,Video Game Store,Vietnamese Restaurant,Water Park,Wine Bar,Wine Shop
0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,Budapest 1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
1,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,Budapest 1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
2,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,Budapest 1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
3,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,Budapest 1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
4,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,Budapest 1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1


In [179]:
budapest_grouped = budapest_onehot.groupby('Neighborhood').mean().reset_index()
num_top_venues = 5

for hood in budapest_grouped['Neighborhood']:
    print("----"+hood+"----")
    temp = budapest_grouped[budapest_grouped['Neighborhood'] == hood].T.reset_index()
    temp.columns = ['venue','freq']
    temp = temp.iloc[1:]
    temp['freq'] = temp['freq'].astype(float)
    temp = temp.round({'freq': 2})
    print(temp.sort_values('freq', ascending=False).reset_index(drop=True).head(num_top_venues))
    print('\n')

----Budapest 1----
                  venue  freq
0                 Hotel  0.09
1                  Café  0.08
2                 Plaza  0.06
3        Scenic Lookout  0.06
4  Hungarian Restaurant  0.05


----Budapest 10----
                 venue  freq
0         Tram Station  0.16
1             Bus Stop  0.11
2        Grocery Store  0.11
3                 Park  0.11
4  Sporting Goods Shop  0.05


----Budapest 11----
         venue  freq
0     Bus Stop  0.14
1       Bakery  0.12
2      Dog Run  0.06
3     Platform  0.04
4  Bus Station  0.04


----Budapest 12----
                  venue  freq
0              Bus Stop  0.15
1        Scenic Lookout  0.12
2                  Park  0.08
3              Platform  0.08
4  Hungarian Restaurant  0.08


----Budapest 13----
                  venue  freq
0  Gym / Fitness Center  0.07
1           Coffee Shop  0.07
2                 Diner  0.05
3                  Café  0.04
4         Grocery Store  0.04


----Budapest 14----
                  venue  freq
0