## Berlin Restaurants Analysis

We would like to open a restaurant(turkish or italian) in Berlin, therefore we need to decide which area/neighborhood would be the best for it. We want to focus on the center neighborhoods where the most of the touristic attractions are located so that we get more customers at the end.

The steps to be followed:
- downloading the boroughs and neighborhoods of Berlin by using **Beautiful soup** and **request**
- getting all latitudes and longitudes of the neighborhoods by using geolocator 
- gathering all the data of the current locations(food places) by making a get request in Forsquare
- grouping these places according tto their types and locations
- creating a dataframe in which shows the most common ten types of places for each neighborhood
- clustering these places
- making a decision for the best neighborhood to open our restaurant

In [104]:
import requests 
import pandas as pd 
import numpy as np 
import random 
from bs4 import BeautifulSoup
from geopy.geocoders import Nominatim 


from IPython.display import Image 
from IPython.core.display import HTML 
pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', None)    

from pandas.io.json import json_normalize



import folium 


print('All the necessary Libraries imported.')

All the necessary Libraries imported.


#### Importing the boroughs and neigborhoods of Berlin

In [2]:
URL="https://en.wikipedia.org/wiki/Boroughs_and_neighborhoods_of_Berlin"

In [3]:
response = requests.get(URL)

web_page = response.text

soup= BeautifulSoup(web_page, "html.parser")

In [4]:
 def long_lat(address):   
    geolocator = Nominatim(user_agent="foursquare_agent")
    location = geolocator.geocode(f"{address},Berlin")
    latitude = location.latitude
    longitude = location.longitude
    return (latitude, longitude)

In [5]:
boroughs=[]
neighborhoods=[]
latitude=[]
longitude=[]
data = {"Boroughs":boroughs,"Neighborhoods":neighborhoods,"Latitude":latitude,"Longitude":longitude}
table_1= soup.findAll('table')
all_tr= table_1[-3].findAll("tr")

In [6]:
for row in all_tr[1:-1]:
    x=row.text.replace("\n",",").split(",")[:-1]
    for i in x[1:]:
        boroughs.append(x[0].strip(" ").split(" ")[0])
        neighborhoods.append(i)
        data["Latitude"].append(long_lat(i)[0])
        data["Longitude"].append(long_lat(i)[1])  

#### creating a dataframe with boroughs, neighborhoods and their latitudes&longitudes

In [107]:
df=pd.DataFrame(data)
df.drop([50,55,61,83,94],inplace=True)
df

Unnamed: 0,Boroughs,Neighborhoods,Latitude,Longitude
0,Charlottenburg-Wilmersdorf,Charlottenburg,52.515747,13.309683
1,Charlottenburg-Wilmersdorf,Charlottenburg-Nord,52.540525,13.296266
2,Charlottenburg-Wilmersdorf,Grunewald,52.487347,13.263754
3,Charlottenburg-Wilmersdorf,Halensee,52.497226,13.292999
4,Charlottenburg-Wilmersdorf,Schmargendorf,52.478902,13.292996
5,Charlottenburg-Wilmersdorf,Westend,52.513399,13.255842
6,Charlottenburg-Wilmersdorf,Wilmersdorf,52.487115,13.32033
7,Friedrichshain-Kreuzberg,Friedrichshain,52.512215,13.45029
8,Friedrichshain-Kreuzberg,Kreuzberg,52.497644,13.411914
9,Lichtenberg,Alt-Hohenschönhausen,52.550409,13.502549


#### showing the neighboorhoods on the map

In [8]:
address = 'Berlin'
geolocator = Nominatim(user_agent="foursquare_agent")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print(latitude, longitude)

52.5170365 13.3888599


In [9]:
map_berlin = folium.Map(location=[latitude, longitude], zoom_start=10)

# add markers to map
for lat, lng, borough, neighborhood in zip(df['Latitude'], df['Longitude'], df['Boroughs'], df['Neighborhoods']):
    label = '{}, {}'.format(neighborhood, borough)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_berlin)  
    
map_berlin

#### gathering all the data of the current locations(food places) by making a get request in Forsquare

In [10]:
CLIENT_ID = 'BWMTUZQYGU4TI1TFRSQBF5CNBKKROSW1OTOE4YR31QYUMAL1' # your Foursquare ID
CLIENT_SECRET = 'E1GDUMA3AMSI1JSIPZKA5UJ11AIVNBC2VMQX2L3L5IFMFLCU' # your Foursquare Secret
ACCESS_TOKEN = 'F0DUIXZZ3JI5JZP3PDRE10QU3SZGMM2IYYVXNNHZ3WUXXYEX' # your FourSquare Access Token
VERSION = '20210406'
LIMIT = 50
print('Your credentails:')
print('CLIENT_ID: ' + CLIENT_ID)
print('CLIENT_SECRET:' + CLIENT_SECRET)

Your credentails:
CLIENT_ID: BWMTUZQYGU4TI1TFRSQBF5CNBKKROSW1OTOE4YR31QYUMAL1
CLIENT_SECRET:E1GDUMA3AMSI1JSIPZKA5UJ11AIVNBC2VMQX2L3L5IFMFLCU


In [11]:
def getNearbyVenues(names, latitudes, longitudes, radius=500, section="food"):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        #print(name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&section={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION,
            section,
            lat, 
            lng, 
            radius, 
            LIMIT)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Neighborhood', 
                  'Neighborhood Latitude', 
                  'Neighborhood Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)


In [12]:
berlin_venues = getNearbyVenues(names=df['Neighborhoods'],
                                   latitudes=df['Latitude'],
                                   longitudes=df['Longitude']
                                  )

In [13]:
print(berlin_venues.shape)
berlin_venues.head()

(891, 7)


Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,Charlottenburg,52.515747,13.309683,Trattoria Rathaus Piazza,52.516778,13.308748,Trattoria/Osteria
1,Charlottenburg,52.515747,13.309683,Zur Mieze - Katzenmusikcafé,52.515899,13.304765,Pet Café
2,Charlottenburg,52.515747,13.309683,Sole d`Oro CUCINA ITALIANA,52.51466,13.305146,Pizza Place
3,Charlottenburg,52.515747,13.309683,Curry Station 36,52.514946,13.315115,Fast Food Restaurant
4,Charlottenburg,52.515747,13.309683,"Falafel, Schawarma & Halloumi",52.512515,13.305099,Falafel Restaurant


In [87]:
berlin_venues.groupby('Neighborhood').count()

Unnamed: 0_level_0,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
Neighborhood,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
Grünau,4,4,4,4,4,4
Adlershof,7,7,7,7,7,7
Alt-Hohenschönhausen,4,4,4,4,4,4
Alt-Treptow,17,17,17,17,17,17
Baumschulenweg,5,5,5,5,5,5
Biesdorf,5,5,5,5,5,5
Blankenburg,3,3,3,3,3,3
Blankenfelde,3,3,3,3,3,3
Bohnsdorf,2,2,2,2,2,2
Borsigwalde,4,4,4,4,4,4
