# ****Capstone Project - The Battle of the Neighborhoods****

# **1. Introduction**

The Nuremberg Metropolitan Region comprises 3.5 million people on 21,800 square kilometres. It consists of the cities of Nuremberg, Fürth, Erlangen, Bayreuth and Bamberg and is one of Germany’s strongest economic areas. Due to a decline in historically prevalent industry, such as consumer electronics the area has lacked behind in economic development compared to other more famous German regions, such as Munich or Stuttgart. 

However, this is also means that real estate and wages are lower compared to its contemporaries. Thus, potential investors find a large pool of well-educated workers, consumers and relatively cheap real estate.

The optimal location for an investor would maximize population density, while minimizing real estate prices and competition. These values vary significantly from district to district and from city to city.
Therefore, we want to create a map, which charts all areas according to its real estate values, population and venue density.
Afterwards, each district is clustered according to the density of venues and business opportunities.


# 2. Data

**2.1 Data description**

The following data sources were identified to tackle the business problem:
•	The number of venues within the certain radius of each district (Foresquare API)

•	The net income per citizen per district. Source: 
http://www.boeckler.de/pdf/wsi_vm_verfuegbare_einkommen.xlsx

•	The population and the population density of the district. Source: 
http://www.daten.statistik.nuernberg.de/geoinf/ia_bezirksatlas/atlas.html

•	The housing prices per district. Source: 
https://www.sollmann.de/infothek/preisspiegel-metropolregion/

•	The coordinates of each district. Source: Open Street Map 
https://nominatim.openstreetmap.org/ui/search.html?q=nuremberg


**2.2 Data Preparation**

In [1]:
#Importing and installing all necessary libaries

import numpy as np # library to handle data in a vectorized manner
import pandas as pd # library for data analsysis
pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', None)

import json # library to handle JSON files

#!pip install geopy
from geopy.geocoders import Nominatim # convert an address into latitude and longitude values

import requests # library to handle requests
from pandas.io.json import json_normalize # tranform JSON file into a pandas dataframe

# Matplotlib and associated plotting modules
import matplotlib.cm as cm
import matplotlib.colors as colors

# import k-means from clustering stage
from sklearn.cluster import KMeans

#!pip -q install folium
import folium # map rendering library

print('Libraries imported.')

Libraries imported.


**Load the district data from wikipedia**

In [4]:
df_pop_size = pd.read_excel('Districts.xlsx')
df_pop_size.rename(columns = {'Bezirk':'District','Name':'Borough', 'Fläche (ha)':'Size (ha)', 'Einwohner':'Population'}, inplace = True)
df_pop_size.head()

Unnamed: 0,District,Borough,Size (ha),Population
0,1,"Altstadt, St. Lorenz",86.7,5275
1,2,Marienvorstadt,60.0,1338
2,3,Tafelhof,64.7,1312
3,4,Gostenhof,51.8,9462
4,5,Himpfelshof,65.4,6193


**Load the location data that was scrapped from Open Street Map**

In [5]:
df_location = pd.read_excel('District_Coordinates.xlsx')
df_location.rename(columns = {'Bezirk':'District'}, inplace = True)
df_location = df_location.drop(columns=['Name'])
df_location.head()

Unnamed: 0,District,Latitude,Longitude
0,1,49.447654,11.081863
1,2,49.449398,11.090167
2,3,49.444268,11.070317
3,4,49.449685,11.059096
4,5,49.451141,11.063438


**Scrape the public information from a public record**

In [6]:
df_gov = pd.read_excel('District_Government.xlsx')
df_gov.rename(columns = {'Bezirk':'District','Bevölkerung Unter 18 in %':'Population below 18 in %','Haushalte insgesamt':'Number of Households', 'Arbeitlose':'Unemployed', 'Wohnung Fertigstellung':'Finished Houses', 'Bevölkerung mit Beschäftigung':'Population with Employment'}, inplace = True)
df_gov.head()

Unnamed: 0,District,Population below 18 in %,Number of Households,Population with Employment,Unemployed,Finished Houses
0,1,75,3 605,2 334,227,14
1,2,111,919,591,57,11
2,3,178,676,557,72,-
3,4,164,5 166,3 525,593,-
4,5,136,3 616,2 614,196,-


**Merge it into a new dataframe and calculate population density **

In [7]:
Nürnberg = df_location.merge(df_pop_size, on='District', how='left')
Nürnberg = Nürnberg.merge(df_gov, on='District', how='left')

In [8]:
Nürnberg.dtypes

District                       object
Latitude                      float64
Longitude                     float64
Borough                        object
Size (ha)                     float64
Population                      int64
Population below 18 in %       object
Number of Households           object
Population with Employment     object
Unemployed                     object
Finished Houses                object
dtype: object

In [9]:
Nürnberg['Population'] = Nürnberg['Population'].astype('float')
Nürnberg['Population Density'] = round(Nürnberg['Population']/Nürnberg['Size (ha)'])

In [10]:
Nürnberg.shape
Nürnberg.head()

Unnamed: 0,District,Latitude,Longitude,Borough,Size (ha),Population,Population below 18 in %,Number of Households,Population with Employment,Unemployed,Finished Houses,Population Density
0,1,49.447654,11.081863,"Altstadt, St. Lorenz",86.7,5275.0,75,3 605,2 334,227,14,61.0
1,2,49.449398,11.090167,Marienvorstadt,60.0,1338.0,111,919,591,57,11,22.0
2,3,49.444268,11.070317,Tafelhof,64.7,1312.0,178,676,557,72,-,20.0
3,4,49.449685,11.059096,Gostenhof,51.8,9462.0,164,5 166,3 525,593,-,183.0
4,5,49.451141,11.063438,Himpfelshof,65.4,6193.0,136,3 616,2 614,196,-,95.0


**Retrieve the latitude and longitude for Nürnberg in order to use them in Foursquare**

In [13]:
address = 'Nürnberg'

geolocator = Nominatim(user_agent="ny_explorer")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinate of Nürnberg are {}, {}.'.format(latitude, longitude))

The geograpical coordinate of Nürnberg are 49.453872, 11.077298.


**Define Foursquare Credentials and Version**

In [14]:
CLIENT_ID = 'GOINK1A2G20UREDNWAU1AKQ3B31SM31RIUWNZSAMPLD5L3MK' # your Foursquare ID
CLIENT_SECRET = '4YVQZPKHRFF12GXKMITYUWI5KJT5UZBOZYWQRIJMFSO1NCWR' # your Foursquare Secret
VERSION = '20180605' # Foursquare API version

print('Your credentails:')
print('CLIENT_ID: ' + CLIENT_ID)
print('CLIENT_SECRET:' + CLIENT_SECRET)

Your credentails:
CLIENT_ID: GOINK1A2G20UREDNWAU1AKQ3B31SM31RIUWNZSAMPLD5L3MK
CLIENT_SECRET:4YVQZPKHRFF12GXKMITYUWI5KJT5UZBOZYWQRIJMFSO1NCWR


**Create a function to explore all venues for all neighborhoods in Nürnberg**

In [15]:
def getNearbyVenues(names, latitudes, longitudes, radius=500):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Borough', 
                  'Borough Latitude', 
                  'Borough Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

**Save the new venues in a dataframe**

In [16]:
LIMIT = 100
radius = 500
Nürnberg_venues = getNearbyVenues(names=Nürnberg['Borough'],
                                   latitudes=Nürnberg['Latitude'],
                                   longitudes=Nürnberg['Longitude'])

Altstadt, St. Lorenz
Marienvorstadt
Tafelhof
Gostenhof
Himpfelshof
Altstadt, St. Sebald
St. Johannis
Pirckheimerstraße
Wöhrd
Ludwigsfeld
Glockenhof
Guntherstraße
Galgenhof
Hummelstein
Gugelstraße
Steinbühl
Gibitzenhof
Sandreuth
Schweinau
St. Leonhard
Sündersbühl
Bärenschanze
Sandberg
Bielingplatz
Uhlandstraße
Maxfeld
Veilhof
Tullnau
Gleißhammer
Dutzendteich
Rangierbahnhof-Siedlung
Langwasser Nordwest
Langwasser Nordost
Beuthener Straße
Altenfurt Nord
Langwasser Südost
Langwasser Südwest
Altenfurt, Moorenbrunn
Gewerbepark Nürnberg-Feucht
Hasenbuck
Rangierbahnhof
Katzwanger Straße
Dianastraße
Trierer Straße
Gartenstadt
Werderau
Maiach
Katzwang, Reichelsdorf Ost, Reichelsdorfer Keller
Kornburg, Worzeldorf
Hohe Marter
Röthenbach West
Röthenbach Ost
Eibach
Reichelsdorf
Krottenbach, Mühlhof
Großreuth bei Schweinau
Gebersdorf
Gaismannshof
Höfen
Eberhardshof
Muggenhof
Westfriedhof
Schniegling
Wetzendorf
Buch
Thon
Almoshof
Kraftshof
Neunhof
Boxdorf
Großgründlach
Schleifweg
Schoppershof
Schafhof

In [18]:
Nürnberg_venues.shape
Nürnberg_venues.head(50)

Unnamed: 0,Borough,Borough Latitude,Borough Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,"Altstadt, St. Lorenz",49.447654,11.081863,Sangam,49.448187,11.081314,Indian Restaurant
1,"Altstadt, St. Lorenz",49.447654,11.081863,Kokoro,49.44769,11.080558,Sushi Restaurant
2,"Altstadt, St. Lorenz",49.447654,11.081863,Wurstdurst,49.447971,11.078928,Currywurst Joint
3,"Altstadt, St. Lorenz",49.447654,11.081863,Atelier-Bar,49.447632,11.082496,Hotel Bar
4,"Altstadt, St. Lorenz",49.447654,11.081863,Park Plaza,49.447502,11.083238,Hotel
5,"Altstadt, St. Lorenz",49.447654,11.081863,Drei Raben,49.448836,11.080116,Hotel
6,"Altstadt, St. Lorenz",49.447654,11.081863,Padelle d'Italia,49.449862,11.080408,Italian Restaurant
7,"Altstadt, St. Lorenz",49.447654,11.081863,Hotel Victoria,49.447668,11.080904,Hotel
8,"Altstadt, St. Lorenz",49.447654,11.081863,Hans im Glück - Burgergrill,49.448054,11.080534,Burger Joint
9,"Altstadt, St. Lorenz",49.447654,11.081863,Neues Museum,49.447691,11.080537,Art Museum


**Check the amount of venues returned for each district**

In [None]:
Nürnberg_venues.groupby('District').count()