Introduction 1.1 Background -

Trondheim is a city on the north western coast of Norway situated next to Trondheim Fjord. It is a highly student-oriented city with 1/5th of its population being comprised of students and is home to the Norwegian University of Science and Technology (NTNU). A technology company (Acorn Technologies) wants to set up headquarters and begin trading in Trondheim, Norway in order to utilise this talent pool. The owner wants to evaluate all of the options available to him to minimise risk and build for the future. It is important to him that the new headquarters be situated near the university which will allow him to attract new and bright employees, sharing talent and establishing relationships with people. Additionally, he only has a limited budget to work with and recognises that this is an expensive city so would like to maximise the value he gets from the neighbourhood he decides on. Especially considering that cash flow tends to be the biggest challenge to a new business. Whilst these are his main priorities for the near future, he would also like to prepare for future growth and the ability to retain talented employees. This means he is interested in the facilities a neighbourhood has to offer, such as transport links, cafes, and shops.

1.2 Problem -

The purpose of this project is to build an analysis of what the different neighbourhoods in Trondheim have to offer Acorn Technologies, with relevance to the three main areas the owner has highlighted as important to making the correct decision.

1.	Proximity to universities
2.	Neighbourhood facilities available

This breakdown, into the two areas of interest, will allow the report to build a detailed analysis of each individually and then enable the owner to make a value judgement on the neighbourhoods, considering his weighting of their importance.

1.4 Interest -

This report has been commissioned by the owner of Acorn Technologies and thus will likely be kept private but shared with whomever they decide will be able to help in planning and deciding which neighbourhood to base his headquarters.

Data Description 2.1 Data Sources

Neighbourhood Source

The data pertaining to the neighbourhoods of Trondheim will be taken from the url: https://en.wikipedia.org/wiki/Lerkendal The information in this link is contained in a table towards the bottom, it includes the four boroughs the city is divided into with their respective neighbourhoods alongside. The data will have to be transformed as it needs other data appending to it.

Property price source

It was initially planned to include property prices by region but upon reflection and the search for data, obtaining commercial property prices by neighbourhood for Trondheim is not going to be possible. The housing market in Trondheim is complex and very little data is compiled within one place to be able to scrape and have confidence it is accurate. Few properties are bought and sold in the city and its especially rare to get listings for anything other than apartments which would not be suitable for this business start-up. In this regard, an alternative is availability of transport in certain areas.

Foursquare API

The source of data to be taken from the Foursquare API includes:
•	availability of restaurants and stores etc. 
The client would like to have information to plan for the future which involves quality of life for future employees. For this, the Foursquare API is going to be used to analyse the number of services available in each borough.

In [1]:
!pip install geocoder
!pip install folium

Collecting geocoder
  Downloading geocoder-1.38.1-py2.py3-none-any.whl (98 kB)
[K     |████████████████████████████████| 98 kB 10.1 MB/s eta 0:00:01
[?25hCollecting ratelim
  Downloading ratelim-0.1.6-py2.py3-none-any.whl (4.0 kB)
Installing collected packages: ratelim, geocoder
Successfully installed geocoder-1.38.1 ratelim-0.1.6
Collecting folium
  Downloading folium-0.12.1-py2.py3-none-any.whl (94 kB)
[K     |████████████████████████████████| 94 kB 6.1 MB/s  eta 0:00:01
Collecting branca>=0.3.0
  Downloading branca-0.4.2-py3-none-any.whl (24 kB)
Installing collected packages: branca, folium
Successfully installed branca-0.4.2 folium-0.12.1


In [2]:
import pandas as pd
import folium
import geocoder
from geopy.geocoders import Nominatim
from sklearn.cluster import KMeans
import requests
import json
from pandas.io.json import json_normalize
from bs4 import BeautifulSoup
import matplotlib

Scrape Wikipedia for Boroughs and Neighbourhoods

In [3]:
#List url to be scraped
url = 'https://en.wikipedia.org/wiki/Lerkendal'

#Scrape web page
trondheim_postal = requests.get(url)

#Print page text
trondheim_postal.text

'<!DOCTYPE html>\n<html class="client-nojs" lang="en" dir="ltr">\n<head>\n<meta charset="UTF-8"/>\n<title>Lerkendal - Wikipedia</title>\n<script>document.documentElement.className="client-js";RLCONF={"wgBreakFrames":!1,"wgSeparatorTransformTable":["",""],"wgDigitTransformTable":["",""],"wgDefaultDateFormat":"dmy","wgMonthNames":["","January","February","March","April","May","June","July","August","September","October","November","December"],"wgRequestId":"b0c8d838-3ea9-4fe8-8869-5fc696c7e397","wgCSPNonce":!1,"wgCanonicalNamespace":"","wgCanonicalSpecialPageName":!1,"wgNamespaceNumber":0,"wgPageName":"Lerkendal","wgTitle":"Lerkendal","wgCurRevisionId":983719234,"wgRevisionId":983719234,"wgArticleId":10523433,"wgIsArticle":!0,"wgIsRedirect":!1,"wgAction":"view","wgUserName":null,"wgUserGroups":["*"],"wgCategories":["CS1 Norwegian-language sources (no)","Articles with short description","Short description is different from Wikidata","Coordinates on Wikidata","Pages using infobox settlemen

In [4]:
#Extract table with pd.read and match function and assign it to data frame
trondheim_postal = pd.read_html(url, match = 'Heimdal', flavor='bs4')[0]
#Drop NA 
trondheim_postal.columns = ['Borough','Neighbourhood', 'N/a']
trondheim_postal_1 = trondheim_postal.drop('N/a', axis=1)
#Print data frame
trondheim_postal_1

Unnamed: 0,Borough,Neighbourhood
0,Heimdal,Byåsen Kattem Klett Langørjan Ringvål Spongdal
1,Lerkendal,Bratsberg Dragvoll Gløshaugen Moholt Nardo Tyh...
2,Midtbyen,Dyrborg Elgeseter Ila Kalvskinnet Selsbakk Sin...
3,Østbyen,Bakklandet Charlottenlund Korsvika Lade Lademo...


Use geolocator to find latitudes and longitudes of dataframe

In [5]:
Borough = ['Heimdal','Heimdal','Heimdal','Heimdal','Heimdal','Heimdal','Lerkendal','Lerkendal','Lerkendal','Lerkendal','Lerkendal','Lerkendal','Lerkendal','Lerkendal','Midtbyen','Midtbyen','Midtbyen','Midtbyen','Midtbyen','Midtbyen','Midtbyen','Midtbyen','Midtbyen','Midtbyen','Østbyen','Østbyen','Østbyen','Østbyen','Østbyen','Østbyen','Østbyen','Østbyen','Østbyen','Østbyen','Østbyen']
Neighbourhood = ['Byåsen','Kattem','Klett','Langørjan','Ringvål','Spongdal','Bratsberg','Dragvoll','Gløshaugen','Moholt','Nardo','Tyholt','Valentinlyst','Fossegrenda','Dyrborg','Elgeseter','Ila','Kalvskinnet','Selsbakk','Singsaker','Stavne','Sverresborg','Trolla','Øya','Bakklandet','Charlottenlund','Korsvika','Lade','Lademoen','Leangen','Nedre Elvehavn','Ranheim','Rosenborg','Rotvoll','Strindheim']

In [6]:
address = '{}, Trondheim, NORWAY'.format(Neighbourhood, Borough)

for address in Neighbourhood:
  g = geolocator.geocode(address)

  if g is None:
    print ('{} could not be geocoded'.format(address))
  else:
    print((g.latitude, g.longitude))
    
#Having printed the latitude and longitude then plotted them, it was obvious that around 6 of the coordinates were inaccurate so these were manually calculate then added to the list

NameError: name 'geolocator' is not defined

In [7]:
Trondheim = {'Borough': ['Heimdal','Heimdal','Heimdal','Heimdal','Heimdal','Heimdal','Lerkendal','Lerkendal','Lerkendal','Lerkendal','Lerkendal','Lerkendal','Lerkendal','Lerkendal','Midtbyen','Midtbyen','Midtbyen','Midtbyen','Midtbyen','Midtbyen','Midtbyen','Midtbyen','Midtbyen','Midtbyen','Østbyen','Østbyen','Østbyen','Østbyen','Østbyen','Østbyen','Østbyen','Østbyen','Østbyen','Østbyen','Østbyen'],
            'Neighbourhood': ['Byåsen','Kattem','Klett','Langørjan','Ringvål','Spongdal','Bratsberg','Dragvoll','Gløshaugen','Moholt','Nardo','Tyholt','Valentinlyst','Fossegrenda','Dyrborg','Elgeseter','Ila','Kalvskinnet','Selsbakk','Singsaker','Stavne','Sverresborg','Trolla','Øya','Bakklandet','Charlottenlund','Korsvika','Lade','Lademoen','Leangen','Nedre Elvehavn','Ranheim','Rosenborg','Rotvoll','Strindheim'],
            'Latitude': [63.4015119, 63.3457974,63.3242803, 63.4202304, 63.3557764, 63.3556309, 63.3485196, 63.4074638, 63.4188267, 63.4116965, 63.4078848, 63.4228056, 63.4231516, 63.388526, 63.4260011, 63.4229378, 63.4304847, 63.4289294, 63.3893857, 63.4223721, 63.412196, 63.418393, 63.451897, 63.4229124, 63.4289706, 63.4208568, 63.4499749, 63.4434969, 63.4372254, 63.4354302, 63.4346796, 63.4273369, 63.4303011, 63.4366589, 63.4329183],
            'Longitude': [10.3565066, 10.3334348, 10.3028567, 10.1407379, 10.2571518, 10.1663252, 10.4826406, 10.4671194, 10.4027324, 10.4336338, 10.4195418, 10.4295849, 10.440585, 10.3992462, 10.3659654, 10.3943594, 10.3704598, 10.3873911, 10.3730983, 10.4111909, 10.3853556, 10.3563244, 10.309398, 10.3846496, 10.4031776, 10.4952621, 10.43178403052184, 10.4510982, 10.4166393, 10.4655544, 10.4101472, 10.5367671, 10.4207654, 10.480544, 10.4539733] 
            }
Trondheim = pd.DataFrame(Trondheim, columns = ['Borough','Neighbourhood','Latitude','Longitude'])
Trondheim

Unnamed: 0,Borough,Neighbourhood,Latitude,Longitude
0,Heimdal,Byåsen,63.401512,10.356507
1,Heimdal,Kattem,63.345797,10.333435
2,Heimdal,Klett,63.32428,10.302857
3,Heimdal,Langørjan,63.42023,10.140738
4,Heimdal,Ringvål,63.355776,10.257152
5,Heimdal,Spongdal,63.355631,10.166325
6,Lerkendal,Bratsberg,63.34852,10.482641
7,Lerkendal,Dragvoll,63.407464,10.467119
8,Lerkendal,Gløshaugen,63.418827,10.402732
9,Lerkendal,Moholt,63.411696,10.433634


Plot neighbourhoods onto map with boroughs in different colours

In [8]:
address = 'Trondheim, NO'

geolocator = Nominatim(user_agent="toronto_explorer")
location = geolocator.geocode(address)
Latitude = location.latitude
Longitude = location.longitude
print('The geograpical coordinate of Trondheim are {}, {}.'.format(Latitude, Longitude))

The geograpical coordinate of Trondheim are 63.4305658, 10.3951929.


In [9]:
map_trondheim = folium.Map(location=[Latitude, Longitude], zoom_start=11)
map_trondheim

In [10]:
Heimdal_data = Trondheim[Trondheim['Borough'] == 'Heimdal'].reset_index(drop=True)
Lerkendal_data = Trondheim[Trondheim['Borough'] == 'Lerkendal'].reset_index(drop=True)
Midtbyen_data = Trondheim[Trondheim['Borough'] == 'Midtbyen'].reset_index(drop=True)
Østbyen_data = Trondheim[Trondheim['Borough'] == 'Østbyen'].reset_index(drop=True)

In [11]:
for lat, lng, borough, neighborhood in zip(Heimdal_data['Latitude'], Heimdal_data['Longitude'], Heimdal_data['Borough'], Heimdal_data['Neighbourhood']):
    label = '{}, {}'.format(neighborhood, borough)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_trondheim)
for lat, lng, borough, neighborhood in zip(Lerkendal_data['Latitude'], Lerkendal_data['Longitude'], Lerkendal_data['Borough'], Lerkendal_data['Neighbourhood']):
    label = '{}, {}'.format(neighborhood, borough)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='red',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_trondheim)
for lat, lng, borough, neighborhood in zip(Midtbyen_data['Latitude'], Midtbyen_data['Longitude'], Midtbyen_data['Borough'], Midtbyen_data['Neighbourhood']):
    label = '{}, {}'.format(neighborhood, borough)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='green',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_trondheim)
for lat, lng, borough, neighborhood in zip(Østbyen_data['Latitude'], Østbyen_data['Longitude'], Østbyen_data['Borough'], Østbyen_data['Neighbourhood']):
    label = '{}, {}'.format(neighborhood, borough)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='pink',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_trondheim)
    
map_trondheim

Use geolocator to find location of NTNU in the city and append to map

In [12]:
address = 'NTNU, Trondheim, NORWAY'

geolocator = Nominatim(user_agent="toronto_explorer")
location = geolocator.geocode(address)
Latitude = location.latitude
Longitude = location.longitude
print('The geograpical coordinate of NTNU are {}, {}.'.format(Latitude, Longitude))

The geograpical coordinate of NTNU are 63.42477755, 10.471420563535554.


In [13]:
NTNU = {'Latitude':  [63.42477755, 63.41671],
        'Longitude': [10.471420563535554, 10.405302]}

In [14]:
for lat, lng in zip(NTNU['Latitude'], NTNU['Longitude']):
    label = '{}, {}'.format(neighborhood, borough)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=8,
        popup=label,
        color='orange',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=1,
        parse_html=False).add_to(map_trondheim)
    
map_trondheim

Use foursquare API to retrieve location data for four separate boroughs

In [15]:
CLIENT_ID='11J1WYVO0NZAQ2DOWTTWMUFQUA1X1SWPDPRKTTQTNJ3KQTEO'
CLIENT_SECRET='A1OFHQPD04LWU4FMNQ2FQSC4CYCACJV0RKHEP2VQOMLTZG04'
ACCESS_TOKEN='C1ZRH1ED4NQJX5OJIIZVZPGHQRNWBTUYRFAEFMJHJ1G2OETA'
VERSION=20210317
limit=100
latitude=63.4264996
longitude=10.4604377

In [16]:
radius = 1000
LIMIT = 200

venues = []

for lat, long, neighbourhood in zip(Trondheim['Latitude'], Trondheim['Longitude'], Trondheim['Borough']):
    
        # create the API request URL
    url = "https://api.foursquare.com/v2/venues/explore?client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}".format(
        CLIENT_ID,
        CLIENT_SECRET,
        VERSION,
        latitude,
        longitude,
        radius, 
        LIMIT)
    
    results = requests.get(url).json()["response"]['groups'][0]['items']
    
    
    for venue in results:
        venues.append((
            neighbourhood,
            lat, 
            long,
            venue['venue']['name'], 
            venue['venue']['location']['lat'], 
            venue['venue']['location']['lng'],
            venue['venue']['categories'][0]['name']))

In [17]:
# convert the venues list into a new DataFrame
venues_df = pd.DataFrame(venues)

# define the column names
venues_df.columns = ['Neighbourhood', 'Latitude', 'Longitude', 'VenueName', 'location.lat', 'location.lng', 'categories']

print(venues_df.shape)
venues_df

(525, 7)


Unnamed: 0,Neighbourhood,Latitude,Longitude,VenueName,location.lat,location.lng,categories
0,Heimdal,63.401512,10.356507,Trondheim Klatresenter,63.434459,10.462260,Climbing Gym
1,Heimdal,63.401512,10.356507,Leangen Ishall,63.427669,10.465469,Stadium
2,Heimdal,63.401512,10.356507,Vinmonopolet (Valentinlyst),63.425011,10.442527,Wine Shop
3,Heimdal,63.401512,10.356507,Plantasjen,63.431369,10.451830,Garden Center
4,Heimdal,63.401512,10.356507,IKEA,63.428764,10.473686,Furniture / Home Store
...,...,...,...,...,...,...,...
520,Østbyen,63.432918,10.453973,Kiwi,63.418663,10.464790,Grocery Store
521,Østbyen,63.432918,10.453973,SATS ELIXIA Valentinlyst SATS,63.424233,10.443001,Gym / Fitness Center
522,Østbyen,63.432918,10.453973,Mega,63.424333,10.441615,Grocery Store
523,Østbyen,63.432918,10.453973,Tregården Restaurant,63.431271,10.443668,Italian Restaurant


In [18]:
print('There are {} uniques categories.'.format(len(venues_df['categories'].unique())))

There are 12 uniques categories.


In [19]:
venues_df['categories'].unique()[:50]

array(['Climbing Gym', 'Stadium', 'Wine Shop', 'Garden Center',
       'Furniture / Home Store', 'Scandinavian Restaurant',
       'Fast Food Restaurant', 'Shopping Mall', 'Gas Station',
       'Grocery Store', 'Gym / Fitness Center', 'Italian Restaurant'],
      dtype=object)

In [20]:
venues_df.groupby(["Neighbourhood"]).count()

Unnamed: 0_level_0,Latitude,Longitude,VenueName,location.lat,location.lng,categories
Neighbourhood,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
Heimdal,90,90,90,90,90,90
Lerkendal,120,120,120,120,120,120
Midtbyen,150,150,150,150,150,150
Østbyen,165,165,165,165,165,165


In [21]:
Heimdal_venue = venues_df[venues_df['Neighbourhood'] == 'Heimdal'].reset_index(drop=True)
Lerkendal_venue = venues_df[venues_df['Neighbourhood'] == 'Lerkendal'].reset_index(drop=True)
Midtbyen_venue = venues_df[venues_df['Neighbourhood'] == 'Midtbyen'].reset_index(drop=True)
Østbyen_venue = venues_df[venues_df['Neighbourhood'] == 'Østbyen'].reset_index(drop=True)

In [34]:
pd.set_option("display.max_rows", None, "display.max_columns", None)
duplicateDFRow = Heimdal_venue[Heimdal_venue.duplicated(['categories'])]
duplicateDFRow

Unnamed: 0,Neighbourhood,Latitude,Longitude,VenueName,location.lat,location.lng,categories
7,Heimdal,63.401512,10.356507,Skeidar,63.420884,10.463564,Furniture / Home Store
12,Heimdal,63.401512,10.356507,Mega,63.424333,10.441615,Grocery Store
14,Heimdal,63.401512,10.356507,Coop Prix,63.433078,10.446825,Grocery Store
15,Heimdal,63.345797,10.333435,Trondheim Klatresenter,63.434459,10.46226,Climbing Gym
16,Heimdal,63.345797,10.333435,Leangen Ishall,63.427669,10.465469,Stadium
17,Heimdal,63.345797,10.333435,Vinmonopolet (Valentinlyst),63.425011,10.442527,Wine Shop
18,Heimdal,63.345797,10.333435,Plantasjen,63.431369,10.45183,Garden Center
19,Heimdal,63.345797,10.333435,IKEA,63.428764,10.473686,Furniture / Home Store
20,Heimdal,63.345797,10.333435,IKEA Restaurant og Café,63.42835,10.472956,Scandinavian Restaurant
21,Heimdal,63.345797,10.333435,Burger King,63.421104,10.460791,Fast Food Restaurant


Graphing most popular destinations around the four boroughs

In [23]:
Train_station1 = venues_df[venues_df['categories'] == 'Grocery Store'].reset_index(drop=True)

In [38]:
from collections import Counter
pd.Series(' '.join(Østbyen_venue['categories']).lower().split()).value_counts()[:100]

store           55
restaurant      33
grocery         33
/               33
furniture       22
home            22
gym             22
center          22
shopping        11
garden          11
gas             11
scandinavian    11
fitness         11
climbing        11
mall            11
fast            11
shop            11
station         11
stadium         11
wine            11
italian         11
food            11
dtype: int64

In [28]:
Train_station1

Unnamed: 0,Neighbourhood,Latitude,Longitude,VenueName,location.lat,location.lng,categories
0,Heimdal,63.401512,10.356507,Kiwi,63.418663,10.464790,Grocery Store
1,Heimdal,63.401512,10.356507,Mega,63.424333,10.441615,Grocery Store
2,Heimdal,63.401512,10.356507,Coop Prix,63.433078,10.446825,Grocery Store
3,Heimdal,63.345797,10.333435,Kiwi,63.418663,10.464790,Grocery Store
4,Heimdal,63.345797,10.333435,Mega,63.424333,10.441615,Grocery Store
...,...,...,...,...,...,...,...
100,Østbyen,63.436659,10.480544,Mega,63.424333,10.441615,Grocery Store
101,Østbyen,63.436659,10.480544,Coop Prix,63.433078,10.446825,Grocery Store
102,Østbyen,63.432918,10.453973,Kiwi,63.418663,10.464790,Grocery Store
103,Østbyen,63.432918,10.453973,Mega,63.424333,10.441615,Grocery Store


In [47]:
Venue_data_chart = {'Borough': ['Heimdal','Lerkendal','Midtbyen','Østbyen'],
            'Restaurant': [18,24,30,33],
            'Grocery Store': [18,24,30,33],
            'Home Store': [12,16,20,22],
            'Gym': [12,16,20,22],
            'Wine Shop': [6,8,10,11]
            }
Venue_data_chart = pd.DataFrame(Venue_data_chart, columns = ['Borough','Restaurant','Grocery Store','Home Store','Gym','Wine Shop'])

In [48]:
Venue_data_chart

Unnamed: 0,Borough,Restaurant,Grocery Store,Home Store,Gym,Wine Shop
0,Heimdal,18,18,12,12,6
1,Lerkendal,24,24,16,16,8
2,Midtbyen,30,30,20,20,10
3,Østbyen,33,33,22,22,11


In [85]:
import plotly.graph_objects as go

#  Get a convenient list of x-values
years = Venue_data_chart['Borough']
x = list(range(len(years)))

    # Specify the plots
bar_plots = [
    go.Bar(x=x, y=Venue_data_chart['Restaurant'], name='Restaurant', marker=go.bar.Marker(color='#0343df')),
    go.Bar(x=x, y=Venue_data_chart['Grocery Store'], name='grocery Store', marker=go.bar.Marker(color='#e50000')),
    go.Bar(x=x, y=Venue_data_chart['Home Store'], name='Home Store', marker=go.bar.Marker(color='#ffff14')),
    go.Bar(x=x, y=Venue_data_chart['Gym'], name='Gym', marker=go.bar.Marker(color='#929591')),
    go.Bar(x=x, y=Venue_data_chart['Wine Shop'], name='Wine Shop', marker=go.bar.Marker(color='#ff4da6'))
]


layout = go.Layout(
    title=go.layout.Title(text="Most Popular venues in each Borough of Trondheim", x=0.5),
    yaxis_title="Number of Venues",
    xaxis_tickmode="array",
    xaxis_tickvals=list(range(27)),
    xaxis_ticktext=tuple(Venue_data_chart['Borough'].values)
)

    # Make the multi-bar plot
fig = go.Figure(data=bar_plots, layout=layout)

    # Tell Plotly to render it
fig.show()