# ****Capstone Project - The Battle of the Neighborhoods****

# **1. Introduction**

The Nuremberg Metropolitan Region comprises 3.5 million people on 21,800 square kilometres. It consists of the cities of Nuremberg, Fürth, Erlangen, Bayreuth and Bamberg and is one of Germany’s strongest economic areas. Due to a decline in historically prevalent industry, such as consumer electronics the area has lacked behind in economic development compared to other more famous German regions, such as Munich or Stuttgart. 

However, this is also means that real estate and wages are lower compared to its contemporaries. Thus, potential investors find a large pool of well-educated workers, consumers and relatively cheap real estate.

The optimal location for an investor would maximize population density, while minimizing real estate prices and competition. These values vary significantly from district to district and from city to city.
Therefore, we want to create a map, which charts all areas according to its real estate values, population and venue density.
Afterwards, each district is clustered according to the density of venues and business opportunities.


# 2. Data

**2.1 Data description**

The following data sources were identified to tackle the business problem:
•	The number of venues within the certain radius of each district (Foresquare API)

•	The net income per citizen per district. Source: 
http://www.boeckler.de/pdf/wsi_vm_verfuegbare_einkommen.xlsx

•	The population and the population density of the district. Source: 
http://www.daten.statistik.nuernberg.de/geoinf/ia_bezirksatlas/atlas.html

•	The housing prices per district. Source: 
https://www.sollmann.de/infothek/preisspiegel-metropolregion/

•	The coordinates of each district. Source: Open Street Map 
https://nominatim.openstreetmap.org/ui/search.html?q=nuremberg


**2.2 Data Preparation**

In [7]:
#Importing and installing all necessary libaries

import numpy as np # library to handle data in a vectorized manner
import pandas as pd # library for data analsysis
pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', None)

import json # library to handle JSON files

!pip install geopy
from geopy.geocoders import Nominatim # convert an address into latitude and longitude values

import requests # library to handle requests
from pandas.io.json import json_normalize # tranform JSON file into a pandas dataframe

# Matplotlib and associated plotting modules
import matplotlib.cm as cm
import matplotlib.colors as colors

# import k-means from clustering stage
from sklearn.cluster import KMeans

!pip -q install folium
import folium # map rendering library

print('Libraries imported.')

Libraries imported.


**Load the district data from wikipedia**

In [8]:
df_pop_size = pd.read_excel('Districts.xlsx')
df_pop_size.rename(columns = {'Bezirk':'District', 'Fläche (ha)':'Size (ha)', 'Einwohner':'Population'}, inplace = True)
df_pop_size.head()

Unnamed: 0,District,Name,Size (ha),Population
0,1,"Altstadt, St. Lorenz",86.7,5275
1,2,Marienvorstadt,60.0,1338
2,3,Tafelhof,64.7,1312
3,4,Gostenhof,51.8,9462
4,5,Himpfelshof,65.4,6193


**Load the location data that was scrapped from Open Street Map**

In [14]:
df_location = pd.read_excel('District_Coordinates.xlsx')
df_location.rename(columns = {'Bezirk':'District'}, inplace = True)
df_location.head()

Unnamed: 0,District,Name,Latitude,Longitude
0,1,"Altstadt, St. Lorenz",49.447654,11.081863
1,2,Marienvorstadt,49.449398,11.090167
2,3,Tafelhof,49.444268,11.070317
3,4,Gostenhof,49.449685,11.059096
4,5,Himpfelshof,49.451141,11.063438


**Scrape the public information from a public record**

In [15]:
df_gov = pd.read_excel('District_Government.xlsx')
df_gov.rename(columns = {'Bezirk':'District','Fläche (ha)':'Size (ha)','Bevölkerung Insgesamt':'Population', 'Arbeitlose':'Unemployed', 'Wohnung Fertigstellung':'Finished Houses'}, inplace = True)
df_gov.head()

Unnamed: 0,District,Fläche (in ha),Population,Bevölkerung Unter 18 in %,Bevölkerung Über 65 in %,Ausländer,Veränderung zum Vorjahr in %,Haushalte insgesamt,Haushalte einzelpersonen,Bevölkerung mit Beschäftigung,Unemployed,Finished Houses
0,1,867,5 275,75,137,343,-08,3 605,2 573,2 334,227,14
1,2,600,1 338,111,149,212,20,919,606,591,57,11
2,3,647,1 312,178,85,479,83,676,388,557,72,-
3,4,518,9 462,164,94,460,-,5 166,2 996,3 525,593,-
4,5,654,6 193,136,173,253,16,3 616,2 107,2 614,196,-
