# 1) Introduction
The problem that I will be investigating is determining the most suitable location for building a new gastro restaurant. A finacial investor as hired me, being a free-lancer, to investigate this problem. Not only will I be looking for the right area, but I will also be investigating what kind of restaurants are out there. The city of interest for the investor is Copenhagen, Denmark. Copenhagen has a population roughly around 700,000, and is the captial city of Denmark. With various industrial companies having offices around the central area, several tourist attractions and growing modern constructing buildings which directs other danish people from other cities in wanting to live here, copenhagen is an attractive area for opening up a restaurant. The investor has family connections to Denmark, and also has a house in the suburbs. It has always been an interest for him to open up a restaurant because of his passion for food. However, he wants to venture off in new explored areas in the food business, and does not want to open up a place that is already out there. This is why it is very crucial to investigate what type, and how many resturants are in the copenhagen area. Lastly, money is not an issue for him, so finding a location in an area where the rent is very high will not be a problem.

## 1.1) Data
To solve this problem, I will use foursquare to retrieve data on how many resturants are established in the copenhagen area. This will help me start to determine the possible candidates for location. At the same time, I will be using the data to see what type and how many categories are in the copenhagen city. The data will be scraped, and then a table will be constructed using pandas and beautifulsoup libraries. Once the dataframe is made, the analysis can begin in order to determine the best area to establish this restaurant. The type and number of restaurants will be determined using https://foursquare.com/explore?mode=url&near=Copenhagen%2C%20Denmark&nearGeoId=72057594040546361&q=Food, and population/neighborhoods in the copenhagen will be found on https://www.opendata.dk/city-of-copenhagen/oversigtskort.

## 1.2) Report structure
The structure of the report will start off with providing some background information about Copenhagen, and some practical information on some of the libraries that will be used for this project. They will not be covered in detail, since it is assumed that the majority of the readers would have some knowledge about these libraries. It should also be mentioned, it is not the intention to write a long report, but more a technical description of a problem followed by the solution/method. This will follow the data science methodology structure where the business problem, which is already defined in the previous section, followed by methodology including what processing methods will be needed, modelling, testing and then ensuring that the problem is solved.

# 2) Background information
Copenhagen is, as mentioned in the introduction section,

In [53]:
!pip install folium



In [3]:
!pip install geopy



In [54]:
!pip install geocoder



In [55]:

from bs4 import BeautifulSoup
import numpy as np
import pandas as pd
from pandas.io.json import json_normalize
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.cluster import KMeans
from geopy.geocoders import Nominatim
from geopy.geocoders import ArcGIS
import matplotlib.cm as cm
import matplotlib.colors as colors
import folium
import requests
import json
import geocoder
import pickle

In [56]:
url = 'https://en.wikipedia.org/wiki/Districts_of_Copenhagen'
response = requests.get(url)
soup = BeautifulSoup(response.content, 'html.parser')

In [57]:
## scrape neighborhood list from page html stored in variable soup
list = []
for ultag in soup.find_all('ul'):
    for litag in ultag.find_all('li'):
        list.append(litag.text)

In [58]:
list = list[16:80]
list

['Middelalderbyen\nLatin Quarter',
 'Latin Quarter',
 'New Copenhagen\nFrederiksstaden\nNyboder',
 'Frederiksstaden',
 'Nyboder',
 'Gammelholm',
 'Slotsholmen',
 'Nørrevold, Østervold and Vestervold',
 'Latin Quarter',
 'Frederiksstaden',
 'Nyboder',
 'Christianshavn\nAsiatisk Plads\nWilders Plads\nKrøyers Plads\nNordatlantisk Brygge',
 'Asiatisk Plads',
 'Wilders Plads',
 'Krøyers Plads',
 'Nordatlantisk Brygge',
 'Holmen',
 'Asiatisk Plads',
 'Wilders Plads',
 'Krøyers Plads',
 'Nordatlantisk Brygge',
 'Amager East\nAmagerbro\nSundbyøster',
 'Amagerbro',
 'Sundbyøster',
 'Amager West\nIslands Brygge\nØrestad\nSundbyvester\nEberts Villaby',
 'Islands Brygge',
 'Ørestad',
 'Sundbyvester\nEberts Villaby',
 'Eberts Villaby',
 'Amagerbro',
 'Sundbyøster',
 'Islands Brygge',
 'Ørestad',
 'Sundbyvester\nEberts Villaby',
 'Eberts Villaby',
 'Eberts Villaby',
 'The Meatpacking District',
 'Humleby',
 'Carlsberg',
 'Kalvebod Brygge',
 'Havneholmen',
 'Sydhavnen\nSluseholmen\nTeglholmen',
 'Slu

In [59]:
df = pd.DataFrame(list, columns = ['District'])
df.head()

Unnamed: 0,District
0,Middelalderbyen\nLatin Quarter
1,Latin Quarter
2,New Copenhagen\nFrederiksstaden\nNyboder
3,Frederiksstaden
4,Nyboder


In [60]:
#drop rows that have no importance
df1 = df.drop(df.index[[0,2,7,21,24,41,51,52,53,60]])
df1 = df1.drop(df.index[[11,27,33]])
df1.head()

Unnamed: 0,District
1,Latin Quarter
3,Frederiksstaden
4,Nyboder
5,Gammelholm
6,Slotsholmen


In [62]:
with open('df1.pkl', 'wb') as f:
    pickle.dump(df1, f)

In [63]:
## getting latitudes and longitudes for all the neighborhoods
latitudes = [] #empty list
longitudes = [] #empty list
for neighborhood in df1['District'].tolist():
    g = geocoder.arcgis('{}, Copenhagen, Denmark'.format(str(neighborhood)))
    latitudes.append(g.latlng[0])
    longitudes.append(g.latlng[1])

In [64]:
len(longitudes), len(latitudes), len(df1)

(51, 51, 51)

In [65]:
#creating the dataframe
coord = pd.DataFrame({'Longitudes': longitudes, 'Latitudes': latitudes})
coord.head()

Unnamed: 0,Longitudes,Latitudes
0,12.56756,55.67567
1,12.547018,55.669152
2,12.56756,55.67567
3,12.577307,55.677352
4,12.582781,55.675469


In [66]:
#combine the df1 and coord datasets
df2 = pd.concat([df1, coord], axis = 1)
df2.head()

Unnamed: 0,District,Longitudes,Latitudes
0,,12.56756,55.67567
1,Latin Quarter,12.547018,55.669152
2,,12.56756,55.67567
3,Frederiksstaden,12.577307,55.677352
4,Nyboder,12.582781,55.675469
5,Gammelholm,12.56756,55.67567
6,Slotsholmen,12.547018,55.669152
7,,12.56756,55.67567
8,Latin Quarter,12.588922,55.674816
9,Frederiksstaden,12.594607,55.676142


In [67]:
#since there are missing coordinates, these locations have to be removed.
df2.dropna(inplace = True)
df2.head()

Unnamed: 0,District,Longitudes,Latitudes
1,Latin Quarter,12.547018,55.669152
3,Frederiksstaden,12.577307,55.677352
4,Nyboder,12.582781,55.675469
5,Gammelholm,12.56756,55.67567
6,Slotsholmen,12.547018,55.669152


In [68]:
#saving the dataframe
with open('df2.pkl', 'wb') as f:
    pickle.dump(df2, f)

In [69]:
#making a copy of dataframe
df_clone = df2.copy(deep = True)

In [71]:
# create map of Pune using latitude and longitude values
from IPython.display import display
map_cph= folium.Map(location=[12.56553, 55.67594], zoom_start=10)

# add markers to map
for lat, lng,neighborhood in zip(df_clone['Latitudes'], df_clone['Longitudes'],df_clone['District']):
    label = '{}'.format(neighborhood)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_cph)  
    
display(map_cph)

In [72]:
map_cph.save('map_cph.html')

In [73]:
#now we have our dataframe with the districts found in copenhagen.
CLIENT_ID = 'U3XAK32BDIOLRUCQN5K2SANFLOMCRUNSZ5UNDEF4FHY0ULUY' # your Foursquare ID
CLIENT_SECRET = 'BYVZ0355ZKI3KPSHUYKKEHZYNIOP2RDE4KXGNNIVLD3IZCHN'
VERSION = '20200314'
radius = 500
LIMIT = 100

def getNearbyVenues(names, latitudes, longitudes, radius=500):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Neighborhood', 
                  'Neighborhood Latitude', 
                  'Neighborhood Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

In [75]:
cph_venues = getNearbyVenues(names=df_clone['District'],
                                   latitudes=df_clone['Latitudes'],
                                   longitudes=df_clone['Longitudes']
                                  )

Latin Quarter
Frederiksstaden
Nyboder
Gammelholm
Slotsholmen
Latin Quarter
Frederiksstaden
Nyboder
Asiatisk Plads
Wilders Plads
Krøyers Plads
Nordatlantisk Brygge
Holmen
Asiatisk Plads
Wilders Plads
Krøyers Plads
Nordatlantisk Brygge
Amagerbro
Sundbyøster
Islands Brygge
Ørestad
Eberts Villaby
Amagerbro
Sundbyøster
Islands Brygge
Ørestad
Eberts Villaby
Eberts Villaby
The Meatpacking District
Humleby
Carlsberg
Kalvebod Brygge
Havneholmen
Sluseholmen
Teglholmen
Sluseholmen
Teglholmen
Vigerslev
Amerika Plads
Nordhavn
Ryparken
Søndre Frihavn


In [76]:
print(cph_venues.shape)
cph_venues.head()

(1870, 7)


Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,Latin Quarter,55.669152,12.547018,Pizzeria MaMeMi,55.667879,12.5482,Pizza Place
1,Latin Quarter,55.669152,12.547018,Store VEGA,55.668221,12.543882,Music Venue
2,Latin Quarter,55.669152,12.547018,Osteria 16,55.667726,12.545811,Italian Restaurant
3,Latin Quarter,55.669152,12.547018,Lille VEGA,55.667878,12.544416,Music Venue
4,Latin Quarter,55.669152,12.547018,VEGA,55.668207,12.543911,Music Venue
