# Coursera Capstone Project: The Battle of Neighborhoods - Week 1 & 2

**I will do the entire project description within one notebook where I will combine both parts of my project. Both sets will be indicated by Week 1 and Week 2. In the end, I will submit only this notebook, together with my written report as well as a blogpost on my GitHub Account (indicated accordingly).**

## Week 1: Description of the problem as well as the data used for the project

### Business Purpose

In the past few weeks, we introduced the notion of Geolocation, spatial data analysis and clustering. Doing so enabled us to retrieve geographic data for a given location and segment it into subcategories according to pre-defined characteristics. Unsupervised learning through k-meaned clustering provided us with the information necessary to analyze and compare the cities of New York as well as Toronto based on existing venues taken from FourSquare. 

Throughout this period, we gathered information about the existence and prevalence of the institutional composition within each neighborhood, making it possible to group the dataset and, potentially, derive assumptions regarding the socio-economic as well as cultural reality each subregion shows. 

In a next step, we can use this knowledge to help us define and form decisions based on economic considerations. Importantly, we can use the data to assess potential outcomes of opening certain venues, such as restaurants, cafés or bars, within a given neighborhood. Based on the data, we can see which subregions potentially are economically saturated and in which potential demand may exist. 

To follow this idea, we assume that we are a medium-priced Japanese franchise chain operating in the food and beverage industry (comparable to the likes of Vapiano - the (now insolvent) German food chain that offered Italian food). As we already analyzed both New York and Toronto, we plan on opening the first hub in Vienna, the capital of Austria. 

Such decisions bear a wide range of important considerations about economic and social variables, which are great in number and sometimes impossible to assess in a quantifiable manner. Although the list is certainly not complete, one can define the following considerations as fulfilment criteria in order to derive a potential verdict: 

1. The neighborhood or district cannot be saturated within the food or beverage market. This implies that we are required to find a region which either does not offer what the company is trying to introduce or that, although supply is given, demand for the product is still available. As we are unable to measure the latter (for now) we focus on the first. 

2. The neighborhood or district must be frequently visited. This implies that the region should be located in an amusement area which is frequented preferably by both, the local consumers as well as tourists. This can be measured by analyzing the density of restaurants, bars and other venues as well as tourist attractions. 

3. The price level of our offering must suit the average income for the respective region. Especially, we cannot introduce an offering with prices highly above the paying ability of the societal environment. Although this is especially hard to measure, a potential solution may lie in the analysis of average rental prices, if available. Also, the availability of services such as uber or airbnb may lead to a better understanding of the respective socio-economic status of the individual regions. 

### Data to be used

In order to make an adequate assessment, I will use the data from Austrian public sources as well as FourSquare to retrieve both the socio-economic composition as well as the venues. Within the socio-economic dataset, indicators regarding income, household expenditure, tourism, real estate prices and growth as well as demographics will be included. Further, I will create a data set in which each region shows average house prices and in which AirBnB is available (potentially also to what extent). I will use the Foursquare data to assess which regions are potentially already saturated for Japanese food and Asian food in general by looking at both the existence as well as prevalence of certain food types and assess tourist hotspots by looking at the existence of attractions and general availability of venues. Further, I can assess the average house prices from local sources and also AirBnB as indicators defining average income. 

Once all factors are included, I will perform a k-means clustering analysis which shows the respective clusters in which an opening appears to be interesting and, if time permits, perform a more detailed analysis of the respective regions. Importantly we are looking for a region that: 

1. Is economically viable
2. Has a strong amusement area and is preferably located in a tourist area
3. Is frequently visited 
4. Has not already existing strong Japanese food scene
3. Has AirBnB available and fairly even rental prices 

In the end, I will deliver a graphical representation of the region and a clustering output on which I will base my assessment. 

## Week 2: Analysis and implementation of the code

## Table of contents
* [Introduction: Business Problem](#introduction)
* [Data](#data)
* [Methodology](#methodology)
* [Analysis](#analysis)
* [Results and Conclusion](#results)

### Introduction: Business Problem

In this project, we will follow the footsteps of a successful Japanese franchise chain, which opened up its gates in Toronto, Canada, and has since expanded throughout the its origin country as well as the USA. Although established there, the owners plan on expanding business operations to Europe. As a former Austrian, one of the owners proposes to open up the first franchise location in Vienna, the capital of Austria. 

In order to assess the profitability of said strategy, we are required to perform an analysis of districts in Vienna to reach the most promising geographic location for our venue. Especially, we are interested in a neighborhood that: 

1. Is economically viable
2. Has a strong amusement area and is preferably located in a tourist area
3. Is frequently visited 
4. Has not already existing strong Japanese food scene
3. Has AirBnB available and fairly even rental prices 

In order to make the correct assessment, we will use a geo-location approach and analyze each district of Vienna according to our pre-defined characteristics and requirements. 

### Data

For this project, we will use three different types of data repositories. 

- On the one hand, we will require geo-location data from **FourSquare** that enables us to retrieve locations as well as types of venues for the coordinates of all 23 districts of Vienna. 
- Also, we will **segment the districts based on demographic, economic and sociographic characteristics**. The types of characteristics we are looking for include proxies and indicators for real estate developments, tourist characteristics, general economic status of inhabitants as well as infrastructure and traffic situation
- Lastly, we will require some useful **proxies for local and tourism demand** by looking at the availability and distribution of Google searches from Google analytics as well as AirBnB offerings throughout the 23 districts of Vienna 

#### First, we will again load the distinctive packages and features into our lab: 

In [1]:
from bs4 import BeautifulSoup
import numpy as np
import pandas as pd
import folium
from sklearn.cluster import KMeans 
from sklearn.datasets.samples_generator import make_blobs
from sklearn.preprocessing import StandardScaler
import json 
import urllib.request
import requests
from geopy.geocoders import Nominatim # module to convert an address into latitude and longitude values

# libraries for displaying images
from IPython.display import Image 
from IPython.core.display import HTML 
    
# tranforming json file into a pandas dataframe library
from pandas.io.json import json_normalize

#### Next, we can load the csv file compiled from numerous public Austrian and Viennaise sources as well as Google and AirBnB. This is a csv file consisting of demographic as well as socio-economic characteristics per borough that are found in the official statistics of the city ov Vienna. This will supply us with a dataframe consisting of: 

1. Rental prices per sqm 
2. Growth of housing in the last decade
3. Two factors of AirBnB availability (if AirBnB is available at all (> 5 offers) and if it is commonly used (> 50 offers)
4. Gross income median
5. Google searches via a real estate platform, grouped into five bins 
6. An indicator if the area is considered a tourist area, defined by the city of Vienna, tourist department 
7. A survey response for the frequent availability of public transport 

In [199]:
vienna = pd.read_csv(r"/Users/nikolas.anic/Desktop/ML/Vienna.csv")
vienna_demographics = pd.read_csv(r"/Users/nikolas.anic/Desktop/ML/Vienna_Demographics.csv")

#### Then, we can get access to the FourSquare API of Vienna: 

We can get the Vienna based coordinates: 

http://download.geonames.org/export/zip/

For which we will again use the QGis application to transform the coordinates into a geometric form of Vienna's geography. 

Unfortunately, these coordinates can only show the 22 boroughs (Bezirke) in Vienna, but don't show the individual neighborhoods each borough has. However, several aspects speak in favour of the borough-styled analysis: 

1. The data for rental prices is only available on borough basis
2. Econoomic factors are only available on borough basis 
3. Considering that the most promising boroughs (the more central and "trendier" boroughs) are maximum 5-9 square km in area and very well accessible by public transport, they potentially don't substantially differ in within characteristics 

Merging both dataframes will result in a frame consisting of all demographics needed for our analyis. We will call it "vienna_total"

In [200]:
vienna_total = pd.merge(vienna, vienna_demographics, on = "Borough")
vienna_total

Unnamed: 0,Country,Postal Code,City,Borough,Latitude,Longitude,Rental Prices per sqm,Growth last decade %,AirBnb availability (>5),AirBnB availability (>50),Income Gross,Google searches for rental flat ranked,Tourist Area inidicator,Good Public Transport indicator %
0,Austria,1010,Wien,Innere Stadt,48.2077,16.3705,19.96,2.7,1,1,40116,5,1,91
1,Austria,1020,Wien,Leopoldstadt,48.2167,16.4,16.51,12.3,1,1,33189,4,1,63
2,Austria,1030,Wien,Landstrasse,48.1981,16.3948,16.59,8.6,1,1,35649,3,1,61
3,Austria,1040,Wien,Wieden,48.192,16.3671,16.36,9.2,1,1,38837,5,1,69
4,Austria,1050,Wien,Margareten,48.1865,16.3549,14.97,8.7,1,1,29306,3,0,58
5,Austria,1060,Wien,Mariahilf,48.1952,16.3503,16.23,8.3,1,1,35405,3,1,82
6,Austria,1070,Wien,Neubau,48.2,16.35,16.45,7.1,1,1,37601,4,1,95
7,Austria,1080,Wien,Josefstadt,48.2167,16.35,15.21,7.7,1,1,37745,4,1,97
8,Austria,1090,Wien,Alsergrund,48.2333,16.35,16.35,8.2,1,0,36738,4,0,92
9,Austria,1100,Wien,Favoriten,48.1521,16.3878,16.25,15.8,1,0,27246,1,1,63


#### Now we will access the FourSquare API to get information of the venues in Vienna, by calling our function: 

In [3]:
def Vienna_venues(Borough, Latitude, Longitude, Postal_Code): 
    
    CLIENT_ID = 'JBREGZ4UNA53HX43WMAD4TQ2X2XJWMX5DPHEZEIZHQA0ACNP' # your Foursquare ID
    CLIENT_SECRET = 'VNS40KF3V4MGSWWAV0IGQINZIGIT1EQKNCWBFPOS3QF1JMOJ' # your Foursquare Secret
    VERSION = '20180605'
    LIMIT = 90
    radius = 500

    venues_list =  [] 
    
    for Bor, latitude, longitude, post in zip(Borough, Latitude, Longitude, Postal_Code): 
        
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            latitude, 
            longitude, 
            radius, 
            LIMIT)
        
        venue = requests.get(url).json()["response"]['groups'][0]['items']
        
        venues_list.append([(Bor, 
                          latitude, 
                          longitude,
                          post,
                          v["venue"]["name"], 
                          v["venue"]["categories"][0]["name"],
                          v["venue"]["location"]["lat"],
                          v["venue"]["location"]["lng"]) for v in venue])
        
        pd_v = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
        pd_v.columns = ['Neighborhood', 
                        'Neighborhood-Latitude', 
                        'Neighborhood-Longitude', 
                        'Postal Code',
                        'Venue', 
                        'Venue_Category',
                        'Venue_Latitude', 
                        'Venue_Longitude', 
                        ]
    return(pd_v)



In [4]:
Vienna = Vienna_venues(Postal_Code = vienna["Postal Code"],
Borough = vienna["Borough"],
Latitude = vienna["Latitude"],
Longitude = vienna["Longitude"])

This will provide us with a dataframe consisting of up to 90 venues per Neighborhood.

In [5]:
Vienna

Unnamed: 0,Neighborhood,Neighborhood-Latitude,Neighborhood-Longitude,Postal Code,Venue,Venue_Category,Venue_Latitude,Venue_Longitude
0,Innere Stadt,48.2077,16.3705,1010,Stephansplatz,Plaza,48.208299,16.371880
1,Innere Stadt,48.2077,16.3705,1010,Stephansdom,Church,48.208626,16.372672
2,Innere Stadt,48.2077,16.3705,1010,Graben,Pedestrian Plaza,48.208915,16.369379
3,Innere Stadt,48.2077,16.3705,1010,DO & CO Restaurant,Restaurant,48.208240,16.371758
4,Innere Stadt,48.2077,16.3705,1010,COS,Clothing Store,48.209359,16.371591
...,...,...,...,...,...,...,...,...
604,Liesing,48.1433,16.2934,1230,Atzgersdorfer Platz,Plaza,48.146615,16.296017
605,Liesing,48.1433,16.2934,1230,Etsan,Grocery Store,48.143154,16.292838
606,Liesing,48.1433,16.2934,1230,Lichtenstöger,Austrian Restaurant,48.142088,16.295672
607,Liesing,48.1433,16.2934,1230,Quan Lounge,Asian Restaurant,48.141400,16.291759


#### In order to get a better overview for slicing in the methodology section, we add a non-string based indicator called Group Index (GrpIdx). Based on this value, we can form indexes for later analyis

In [8]:
Vienna['GrpIdx'] = Vienna['Neighborhood'].rank(method='dense').astype(int)
Vienna.sort_values("Neighborhood", inplace = True)
Vienna

Unnamed: 0,Neighborhood,Neighborhood-Latitude,Neighborhood-Longitude,Postal Code,Venue,Venue_Category,Venue_Latitude,Venue_Longitude,GrpIdx
465,Alsergrund,48.2333,16.3500,1090,Mozart & Meisl,Gastropub,48.235467,16.348887,1
467,Alsergrund,48.2333,16.3500,1090,Blaustern,Café,48.232030,16.354460,1
468,Alsergrund,48.2333,16.3500,1090,Eurogym Döbling,Gym,48.233808,16.352828,1
469,Alsergrund,48.2333,16.3500,1090,The Grey's,Restaurant,48.236285,16.349373,1
470,Alsergrund,48.2333,16.3500,1090,Hofer,Supermarket,48.230786,16.354078,1
...,...,...,...,...,...,...,...,...,...
233,Wieden,48.1920,16.3671,1040,Pub Klemo,Wine Bar,48.192732,16.360752,23
234,Wieden,48.1920,16.3671,1040,SPAR,Supermarket,48.190488,16.371944,23
235,Wieden,48.1920,16.3671,1040,Donatella,Italian Restaurant,48.195419,16.363587,23
218,Wieden,48.1920,16.3671,1040,denn's Biomarkt,Organic Grocery,48.195934,16.365115,23


#### We now quickly visualize the locations of all our venues obtained from FourSquare: 

In [9]:
address = 'Vienna, AT'

geolocator = Nominatim(user_agent="foursquare_agent") # call the geolocator 

location = geolocator.geocode(address)
latitude_vienna = location.latitude
longitude_vienna = location.longitude

vienna_map = folium.Map(location = [latitude_vienna, longitude_vienna], zoom_start = 12)

folium.features.CircleMarker(
    [latitude_vienna, longitude_vienna],
    radius=10,
    color='red',
    popup='District Center',
    fill = True,
    fill_color = 'red',
    fill_opacity = 0.6,
    
).add_to(vienna_map)

for lat, lng, label in zip(Vienna.Venue_Latitude, Vienna.Venue_Longitude, Vienna.Venue_Category):
    folium.features.CircleMarker(
        [lat, lng],
        radius=5,
        color='blue',
        fill = True,
        fill_color='blue',
        fill_opacity=0.6,
        popup=folium.Popup(label, parse_html=True)
    ).add_to(vienna_map)


vienna_map

#### Now, we can merge the venue and the total demographics list and retrieve the following list, which we indicate Vienna: 

In [10]:
Vienna = pd.merge(Vienna, vienna_total, on = "Postal Code")
Vienna

Unnamed: 0,Neighborhood,Neighborhood-Latitude,Neighborhood-Longitude,Postal Code,Venue,Venue_Category,Venue_Latitude,Venue_Longitude,GrpIdx,Country,...,Latitude,Longitude,Rental Prices per sqm,Growth last decade %,AirBnb availability (>5),AirBnB availability (>50),Income Gross,Google searches for rental flat ranked,Tourist Area inidicator,Good Public Transport indicator %
0,Alsergrund,48.2333,16.3500,1090,Mozart & Meisl,Gastropub,48.235467,16.348887,1,Austria,...,48.2333,16.3500,16.35,8.2,1,0,36738,4,0,92
1,Alsergrund,48.2333,16.3500,1090,Blaustern,Café,48.232030,16.354460,1,Austria,...,48.2333,16.3500,16.35,8.2,1,0,36738,4,0,92
2,Alsergrund,48.2333,16.3500,1090,Eurogym Döbling,Gym,48.233808,16.352828,1,Austria,...,48.2333,16.3500,16.35,8.2,1,0,36738,4,0,92
3,Alsergrund,48.2333,16.3500,1090,The Grey's,Restaurant,48.236285,16.349373,1,Austria,...,48.2333,16.3500,16.35,8.2,1,0,36738,4,0,92
4,Alsergrund,48.2333,16.3500,1090,Hofer,Supermarket,48.230786,16.354078,1,Austria,...,48.2333,16.3500,16.35,8.2,1,0,36738,4,0,92
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
604,Wieden,48.1920,16.3671,1040,Pub Klemo,Wine Bar,48.192732,16.360752,23,Austria,...,48.1920,16.3671,16.36,9.2,1,1,38837,5,1,69
605,Wieden,48.1920,16.3671,1040,SPAR,Supermarket,48.190488,16.371944,23,Austria,...,48.1920,16.3671,16.36,9.2,1,1,38837,5,1,69
606,Wieden,48.1920,16.3671,1040,Donatella,Italian Restaurant,48.195419,16.363587,23,Austria,...,48.1920,16.3671,16.36,9.2,1,1,38837,5,1,69
607,Wieden,48.1920,16.3671,1040,denn's Biomarkt,Organic Grocery,48.195934,16.365115,23,Austria,...,48.1920,16.3671,16.36,9.2,1,1,38837,5,1,69


#### We now have to clean the dataset and delete some doubled or unnecessary columns: 

In [11]:
Vienna.drop(["Borough", "Longitude", "Country", "City", "Latitude"], axis = 1, inplace = True)

In [12]:
Vienna

Unnamed: 0,Neighborhood,Neighborhood-Latitude,Neighborhood-Longitude,Postal Code,Venue,Venue_Category,Venue_Latitude,Venue_Longitude,GrpIdx,Rental Prices per sqm,Growth last decade %,AirBnb availability (>5),AirBnB availability (>50),Income Gross,Google searches for rental flat ranked,Tourist Area inidicator,Good Public Transport indicator %
0,Alsergrund,48.2333,16.3500,1090,Mozart & Meisl,Gastropub,48.235467,16.348887,1,16.35,8.2,1,0,36738,4,0,92
1,Alsergrund,48.2333,16.3500,1090,Blaustern,Café,48.232030,16.354460,1,16.35,8.2,1,0,36738,4,0,92
2,Alsergrund,48.2333,16.3500,1090,Eurogym Döbling,Gym,48.233808,16.352828,1,16.35,8.2,1,0,36738,4,0,92
3,Alsergrund,48.2333,16.3500,1090,The Grey's,Restaurant,48.236285,16.349373,1,16.35,8.2,1,0,36738,4,0,92
4,Alsergrund,48.2333,16.3500,1090,Hofer,Supermarket,48.230786,16.354078,1,16.35,8.2,1,0,36738,4,0,92
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
604,Wieden,48.1920,16.3671,1040,Pub Klemo,Wine Bar,48.192732,16.360752,23,16.36,9.2,1,1,38837,5,1,69
605,Wieden,48.1920,16.3671,1040,SPAR,Supermarket,48.190488,16.371944,23,16.36,9.2,1,1,38837,5,1,69
606,Wieden,48.1920,16.3671,1040,Donatella,Italian Restaurant,48.195419,16.363587,23,16.36,9.2,1,1,38837,5,1,69
607,Wieden,48.1920,16.3671,1040,denn's Biomarkt,Organic Grocery,48.195934,16.365115,23,16.36,9.2,1,1,38837,5,1,69


#### We can now define indicator variables which tell us what type of venue the individual borough, or neighborhood, has: 

In [13]:
vienna_coordinates = "/Users/nikolas.anic/Desktop/ML/GeoJSON/Vienna.geojson"

#### What also interests us is which neighborhoods already an Asian cuisine exists. If already many Asian restaurants are operating within a given neighborhood, chances are higher that demand is already saturated. 

Doing so requires us to create a dummy variable that indicates 1 if the respective neighborhood has any type of Asian restaurant operating. Then we can extract all Neighborhoods for which the condition is true and assign a new dummy to match for each neighborhood with a 1 if an Asian cuisine is currently operating within and a 0 otherwise. 

In [14]:
Vienna["Asian_restaurants_available"] = (Vienna["Venue_Category"].isin(["Chinese Restaurant", "Asian Restaurant", "Japanese Restaurant", "Sushi Restaurant"])).astype(int)
Vienna_asia_neighborhoods = Vienna.loc[Vienna["Asian_restaurants_available"]  == 1]["Neighborhood"]
Vienna_asia_neighborhoods

29                 Alsergrund
166              Innere Stadt
207                Josefstadt
210                Josefstadt
246               Landstrasse
321                   Liesing
355                 Mariahilf
361                 Mariahilf
376                 Mariahilf
434                    Neubau
498     Rudolfsheim-Fuenfhaus
503     Rudolfsheim-Fuenfhaus
520     Rudolfsheim-Fuenfhaus
552                    Wieden
557                    Wieden
567                    Wieden
587                    Wieden
597                    Wieden
599                    Wieden
608                    Wieden
Name: Neighborhood, dtype: object

In [15]:
Vienna["Asian_cuisine_available"] = (Vienna["Neighborhood"].isin(Vienna_asia_neighborhoods)).astype(int)

In [16]:
Vienna

Unnamed: 0,Neighborhood,Neighborhood-Latitude,Neighborhood-Longitude,Postal Code,Venue,Venue_Category,Venue_Latitude,Venue_Longitude,GrpIdx,Rental Prices per sqm,Growth last decade %,AirBnb availability (>5),AirBnB availability (>50),Income Gross,Google searches for rental flat ranked,Tourist Area inidicator,Good Public Transport indicator %,Asian_restaurants_available,Asian_cuisine_available
0,Alsergrund,48.2333,16.3500,1090,Mozart & Meisl,Gastropub,48.235467,16.348887,1,16.35,8.2,1,0,36738,4,0,92,0,1
1,Alsergrund,48.2333,16.3500,1090,Blaustern,Café,48.232030,16.354460,1,16.35,8.2,1,0,36738,4,0,92,0,1
2,Alsergrund,48.2333,16.3500,1090,Eurogym Döbling,Gym,48.233808,16.352828,1,16.35,8.2,1,0,36738,4,0,92,0,1
3,Alsergrund,48.2333,16.3500,1090,The Grey's,Restaurant,48.236285,16.349373,1,16.35,8.2,1,0,36738,4,0,92,0,1
4,Alsergrund,48.2333,16.3500,1090,Hofer,Supermarket,48.230786,16.354078,1,16.35,8.2,1,0,36738,4,0,92,0,1
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
604,Wieden,48.1920,16.3671,1040,Pub Klemo,Wine Bar,48.192732,16.360752,23,16.36,9.2,1,1,38837,5,1,69,0,1
605,Wieden,48.1920,16.3671,1040,SPAR,Supermarket,48.190488,16.371944,23,16.36,9.2,1,1,38837,5,1,69,0,1
606,Wieden,48.1920,16.3671,1040,Donatella,Italian Restaurant,48.195419,16.363587,23,16.36,9.2,1,1,38837,5,1,69,0,1
607,Wieden,48.1920,16.3671,1040,denn's Biomarkt,Organic Grocery,48.195934,16.365115,23,16.36,9.2,1,1,38837,5,1,69,0,1


All set and done. We now have a dataset consisting of all venues that FourSquare provided us with including their location and type data. Further, we added economic indicators to refine our research and included a differentiated the neighborhoods according to the availability of Asian cuisine. 

Based on this data, we can now commence to look for an optimal location for our Japanese restaurant. 

### Methodology

In the project, I will assess the profitability of Vienna's neighborhoods with regard to our Japanese franchise restaurant. As indicated in the Business Problem Section, we focus on three requirements to make a verdict: 

1. The neighborhood or district cannot be saturated within the food or beverage market. 

2. The neighborhood or district must be frequently visited.  

3. The price level of our offering must suit the average income for the respective region. 

In a **first** step, we gathered the data to proxy the given focus areas and define dissimmilarities and equalities between the neighborhoods. 

In a **second** step, we will analyze the FourSquare data to see which neighborhood offers what type of venues. We will use simple data analysis commands to group and summarize the data. Then, we will add the specific socio- and demographic characteristics to the newly established dataframe. 

**Thirdly**, we will use the dataframe to assess which are the most common venues per neighborhood. As it is our aim to set a certain pattern for our location, we can take this dataframe to make more nuanced assessments about the availability of each neighborhood's amusement offerings. 

**Fourth**, we can use a k-means approach to assess the differences between each neighborhood and define clusters based on the location data. From this data we can visualize the patterns and also add the clusters to the additional characteristics of each neighborhood. 

**Ultimately**, we can then combine all information obtained from every resource we were able to access and start selecting the neighborhoods according to our pre-defined combinations of characteristics. I will downscale according to satisfaction levels for each prerequisite and make a comparison of the two neighborhoods that are most suitable. Besides additional graphic representation, I will introduce the thinking steps I followed in assessing the neighborhoods.

### Analysis

#### As we now analyzed which neighborhoods are already offering Asian cuisine, we can start making clusters

First, we define the dummy variables that indicate which venues are given in a certain neighborhood.

In [169]:
dummy_cat = pd.get_dummies(Vienna["Venue_Category"])
Vienna = pd.concat([Vienna, dummy_cat], axis=1)
Vienna

Unnamed: 0,Neighborhood,Neighborhood-Latitude,Neighborhood-Longitude,Postal Code,Venue,Venue_Category,Venue_Latitude,Venue_Longitude,GrpIdx,Rental Prices per sqm,...,Tram Station,Trattoria/Osteria,Turkish Restaurant,Vegetarian / Vegan Restaurant,Vietnamese Restaurant,Vineyard,Wine Bar,Wine Shop,Winery,Yoga Studio
0,Alsergrund,48.2333,16.3500,1090,Mozart & Meisl,Gastropub,48.235467,16.348887,1,16.35,...,0,0,0,0,0,0,0,0,0,0
1,Alsergrund,48.2333,16.3500,1090,Blaustern,Café,48.232030,16.354460,1,16.35,...,0,0,0,0,0,0,0,0,0,0
2,Alsergrund,48.2333,16.3500,1090,Eurogym Döbling,Gym,48.233808,16.352828,1,16.35,...,0,0,0,0,0,0,0,0,0,0
3,Alsergrund,48.2333,16.3500,1090,The Grey's,Restaurant,48.236285,16.349373,1,16.35,...,0,0,0,0,0,0,0,0,0,0
4,Alsergrund,48.2333,16.3500,1090,Hofer,Supermarket,48.230786,16.354078,1,16.35,...,0,0,0,0,0,0,0,0,0,0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
604,Wieden,48.1920,16.3671,1040,Pub Klemo,Wine Bar,48.192732,16.360752,23,16.36,...,0,0,0,0,0,0,1,0,0,0
605,Wieden,48.1920,16.3671,1040,SPAR,Supermarket,48.190488,16.371944,23,16.36,...,0,0,0,0,0,0,0,0,0,0
606,Wieden,48.1920,16.3671,1040,Donatella,Italian Restaurant,48.195419,16.363587,23,16.36,...,0,0,0,0,0,0,0,0,0,0
607,Wieden,48.1920,16.3671,1040,denn's Biomarkt,Organic Grocery,48.195934,16.365115,23,16.36,...,0,0,0,0,0,0,0,0,0,0


The dummy variables can now be grouped and set into the mean value of the respective group. Like that, we create a dataframe consisting of indicators for all venues each neighborhood is offering, according to FourSquare. 

In [170]:
Vienna_clusters = pd.concat([Vienna.iloc[:,19:], Vienna.iloc[:,8]], axis = 1)
Vienna_clusters = Vienna_clusters.groupby('GrpIdx').sum().reset_index()
Vienna_clusters

Unnamed: 0,GrpIdx,Afghan Restaurant,American Restaurant,Antique Shop,Aquarium,Art Gallery,Art Museum,Arts & Crafts Store,Asian Restaurant,Athletics & Sports,...,Tram Station,Trattoria/Osteria,Turkish Restaurant,Vegetarian / Vegan Restaurant,Vietnamese Restaurant,Vineyard,Wine Bar,Wine Shop,Winery,Yoga Studio
0,1,0,0,0,0,0,0,0,0,0,...,1,0,0,0,0,0,1,0,0,0
1,2,0,0,0,0,0,0,0,0,0,...,1,0,0,0,0,0,0,0,0,0
2,3,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,2,0,1,0
3,4,0,1,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
4,5,0,0,0,0,0,0,0,0,1,...,0,0,0,0,0,0,0,0,0,0
5,6,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,1,1,1,0,0
6,7,0,0,0,0,0,0,0,0,1,...,1,0,0,0,0,0,0,0,0,0
7,8,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
8,9,0,0,1,0,0,1,0,1,0,...,0,1,0,0,0,0,1,0,0,0
9,10,0,0,0,0,0,0,0,1,0,...,0,0,0,0,0,0,0,0,0,0


Next, we add the Neighborhood names to the list

In [171]:
Vienna_neighborhoods = pd.concat([Vienna.iloc[:,0], Vienna.iloc[:,8]], axis = 1).drop_duplicates()
Vienna_neighborhoods
Vienna_clustered = pd.merge(Vienna_neighborhoods, Vienna_clusters, on = "GrpIdx")
Vienna_clustered

Unnamed: 0,Neighborhood,GrpIdx,Afghan Restaurant,American Restaurant,Antique Shop,Aquarium,Art Gallery,Art Museum,Arts & Crafts Store,Asian Restaurant,...,Tram Station,Trattoria/Osteria,Turkish Restaurant,Vegetarian / Vegan Restaurant,Vietnamese Restaurant,Vineyard,Wine Bar,Wine Shop,Winery,Yoga Studio
0,Alsergrund,1,0,0,0,0,0,0,0,0,...,1,0,0,0,0,0,1,0,0,0
1,Brigittenau,2,0,0,0,0,0,0,0,0,...,1,0,0,0,0,0,0,0,0,0
2,Doebling,3,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,2,0,1,0
3,Donaustadt,4,0,1,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
4,Favoriten,5,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
5,Floridsdorf,6,0,0,0,0,0,0,0,0,...,0,0,0,0,0,1,1,1,0,0
6,Hernals,7,0,0,0,0,0,0,0,0,...,1,0,0,0,0,0,0,0,0,0
7,Hietzing,8,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
8,Innere Stadt,9,0,0,1,0,0,1,0,1,...,0,1,0,0,0,0,1,0,0,0
9,Josefstadt,10,0,0,0,0,0,0,0,1,...,0,0,0,0,0,0,0,0,0,0


Third, we define a dataframe with all additional demographics for the respective neighborhoods, which we obtained earlier. 

In [172]:
Vienna_clusters_additional = pd.concat([Vienna.iloc[:,0], Vienna.iloc[:,3], Vienna.iloc[:,8:16], Vienna.iloc[:,18], Vienna.iloc[:,1:3]], axis = 1)
Vienna_clusters_additional = Vienna_clusters_additional.drop_duplicates()
Vienna_clusters_additional

Unnamed: 0,Neighborhood,Postal Code,GrpIdx,Rental Prices per sqm,Growth last decade %,AirBnb availability (>5),AirBnB availability (>50),Income Gross,Google searches for rental flat ranked,Tourist Area inidicator,Asian_cuisine_available,Neighborhood-Latitude,Neighborhood-Longitude
0,Alsergrund,1090,1,16.35,8.2,1,0,36738,4,0,1,48.2333,16.35
32,Brigittenau,1200,2,13.95,6.4,0,0,26130,3,0,0,48.2402,16.3773
52,Doebling,1190,3,16.44,6.1,0,0,42260,4,0,0,48.2591,16.3339
63,Donaustadt,1220,4,15.59,22.0,1,1,38125,2,1,0,48.219,16.495
69,Favoriten,1100,5,16.25,15.8,1,0,27246,1,1,0,48.1521,16.3878
75,Floridsdorf,1210,6,14.11,14.4,0,0,33274,1,0,0,48.2811,16.4113
82,Hernals,1170,7,14.14,9.1,1,0,32378,1,1,0,48.2338,16.2901
91,Hietzing,1130,8,14.88,6.0,1,0,44674,5,0,0,48.1773,16.2456
93,Innere Stadt,1010,9,19.96,2.7,1,1,40116,5,1,1,48.2077,16.3705
183,Josefstadt,1080,10,15.21,7.7,1,1,37745,4,1,1,48.2167,16.35


Lastly, we combine all dataframes that we just assembled, giving us the base for our clustering approach. As you can see, this dataframe consnists of the Group Indexes, demographics as well as the venues each district offers

In [173]:
Vienna_clustered_total = pd.merge(Vienna_clustered, Vienna_clusters_additional, on = "Neighborhood")
Vienna_clustered_total

Unnamed: 0,Neighborhood,GrpIdx_x,Afghan Restaurant,American Restaurant,Antique Shop,Aquarium,Art Gallery,Art Museum,Arts & Crafts Store,Asian Restaurant,...,Rental Prices per sqm,Growth last decade %,AirBnb availability (>5),AirBnB availability (>50),Income Gross,Google searches for rental flat ranked,Tourist Area inidicator,Asian_cuisine_available,Neighborhood-Latitude,Neighborhood-Longitude
0,Alsergrund,1,0,0,0,0,0,0,0,0,...,16.35,8.2,1,0,36738,4,0,1,48.2333,16.35
1,Brigittenau,2,0,0,0,0,0,0,0,0,...,13.95,6.4,0,0,26130,3,0,0,48.2402,16.3773
2,Doebling,3,0,0,0,0,0,0,0,0,...,16.44,6.1,0,0,42260,4,0,0,48.2591,16.3339
3,Donaustadt,4,0,1,0,0,0,0,0,0,...,15.59,22.0,1,1,38125,2,1,0,48.219,16.495
4,Favoriten,5,0,0,0,0,0,0,0,0,...,16.25,15.8,1,0,27246,1,1,0,48.1521,16.3878
5,Floridsdorf,6,0,0,0,0,0,0,0,0,...,14.11,14.4,0,0,33274,1,0,0,48.2811,16.4113
6,Hernals,7,0,0,0,0,0,0,0,0,...,14.14,9.1,1,0,32378,1,1,0,48.2338,16.2901
7,Hietzing,8,0,0,0,0,0,0,0,0,...,14.88,6.0,1,0,44674,5,0,0,48.1773,16.2456
8,Innere Stadt,9,0,0,1,0,0,1,0,1,...,19.96,2.7,1,1,40116,5,1,1,48.2077,16.3705
9,Josefstadt,10,0,0,0,0,0,0,0,1,...,15.21,7.7,1,1,37745,4,1,1,48.2167,16.35


Great. We do have the combination of venues for each neighborhood in Vienna. This will give us a handy image about the existence and prevalence of cultural offerings. What is more, we can make a work around process and define a function which displays the five most common venue categories per neighborhood and merge it into a new dataframe. 

In [118]:
Vienna_top5 = Vienna_clustered.drop(["Neighborhood-Latitude", "Neighborhood-Longitude"], axis = 1)
Vienna_top5.set_index("GrpIdx", inplace = True)

In [119]:
num_top_venues = 5

for hood in Vienna_top5['Neighborhood']:
    print("----"+hood+"----")
    temp = Vienna_top5[Vienna_top5['Neighborhood'] == hood].T.reset_index()
    temp.columns = ['venue','freq']
    temp = temp.iloc[1:]
    temp['freq'] = temp['freq'].astype(float)
    temp = temp.round({'freq': 2})
    print(temp.sort_values('freq', ascending=False).reset_index(drop=True).head(num_top_venues))
    print('\n')

---- Alsergrund----
         venue  freq
0  Supermarket  0.09
1          Pub  0.06
2         Café  0.06
3    BBQ Joint  0.06
4     Pharmacy  0.06


---- Brigittenau----
          venue  freq
0      Bus Stop  0.20
1         Plaza  0.15
2        Bakery  0.10
3   Supermarket  0.10
4  Tram Station  0.05


---- Doebling----
                 venue  freq
0  Austrian Restaurant  0.27
1             Wine Bar  0.18
2                 Park  0.09
3               Winery  0.09
4           Restaurant  0.09


---- Donaustadt----
                 venue  freq
0             Pharmacy  0.17
1  American Restaurant  0.17
2           Restaurant  0.17
3    Indian Restaurant  0.17
4             Bus Stop  0.17


---- Favoriten----
                venue  freq
0       Shopping Mall  0.17
1  Italian Restaurant  0.17
2       Grocery Store  0.17
3  Athletics & Sports  0.17
4          Smoke Shop  0.17


---- Floridsdorf----
                  venue  freq
0             Gastropub  0.29
1  Fast Food Restaurant  0.14
2      

In [120]:
def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:num_top_venues]

In [174]:
num_top_venues = 5

indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
columns = ['Neighborhood']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

# create a new dataframe
neighborhoods_venues_sorted = pd.DataFrame(columns=columns)
neighborhoods_venues_sorted['Neighborhood'] = Vienna_clustered['Neighborhood']

for ind in np.arange(Vienna_clustered.shape[0]):
    neighborhoods_venues_sorted.iloc[ind, 1:] = return_most_common_venues(Vienna_top5.iloc[ind, 1:], num_top_venues)

neighborhoods_venues_sorted

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue
0,Alsergrund,Supermarket,Café,Pharmacy,Pub,BBQ Joint
1,Brigittenau,Bus Stop,Plaza,Supermarket,Bakery,Tram Station
2,Doebling,Austrian Restaurant,Wine Bar,Restaurant,Food,Park
3,Donaustadt,American Restaurant,Indian Restaurant,Food & Drink Shop,Restaurant,Bus Stop
4,Favoriten,Shopping Mall,Italian Restaurant,Smoke Shop,Athletics & Sports,Grocery Store
5,Floridsdorf,Gastropub,Wine Shop,Wine Bar,Vineyard,Fast Food Restaurant
6,Hernals,Italian Restaurant,Athletics & Sports,Construction & Landscaping,Bakery,Austrian Restaurant
7,Hietzing,Scenic Lookout,Yoga Studio,Electronics Store,Food Court,Food & Drink Shop
8,Innere Stadt,Restaurant,Café,Plaza,Austrian Restaurant,Italian Restaurant
9,Josefstadt,Café,Hotel,Supermarket,Bar,Restaurant


#### Now, we can run the clustering approach. I choose a k-means approach as given in the lecture. We will perform the clustering approach based on the socio-economic as well as demographic characteristics for each neighborhood. This is based on the notion that the approach is able to summarize these characteristics into one indicative number, which make matters more simple for our analysis. Further, a k-meaned approach based on venue distributions is too nuanced and cannot offer distinctive results. 

#### The entire approach consists of: 

1. Reading the data 
2. Normalizing the data
3. Creating a k-meaned division of our neighborhoods
4. Merging the clusters with the existing dataset.

In [178]:
from sklearn.cluster import KMeans 
from sklearn.datasets.samples_generator import make_blobs
from sklearn.preprocessing import StandardScaler

In [179]:
Vienna_kmeans = Vienna_clusters_additional.drop(["Neighborhood"], axis = 1)

In [180]:
Vienna_kmeans

Unnamed: 0,Postal Code,GrpIdx,Rental Prices per sqm,Growth last decade %,AirBnb availability (>5),AirBnB availability (>50),Income Gross,Google searches for rental flat ranked,Tourist Area inidicator,Asian_cuisine_available,Neighborhood-Latitude,Neighborhood-Longitude
0,1090,1,16.35,8.2,1,0,36738,4,0,1,48.2333,16.35
32,1200,2,13.95,6.4,0,0,26130,3,0,0,48.2402,16.3773
52,1190,3,16.44,6.1,0,0,42260,4,0,0,48.2591,16.3339
63,1220,4,15.59,22.0,1,1,38125,2,1,0,48.219,16.495
69,1100,5,16.25,15.8,1,0,27246,1,1,0,48.1521,16.3878
75,1210,6,14.11,14.4,0,0,33274,1,0,0,48.2811,16.4113
82,1170,7,14.14,9.1,1,0,32378,1,1,0,48.2338,16.2901
91,1130,8,14.88,6.0,1,0,44674,5,0,0,48.1773,16.2456
93,1010,9,19.96,2.7,1,1,40116,5,1,1,48.2077,16.3705
183,1080,10,15.21,7.7,1,1,37745,4,1,1,48.2167,16.35


In [181]:
x = Vienna_kmeans.values[:,2:]
x = np.nan_to_num(x)
clustered = StandardScaler().fit_transform(x)
clustered

array([[ 0.6050205 , -0.46285753,  0.45883147, -0.95742711,  0.42605815,
         0.48731592, -0.95742711,  1.24721913,  0.8155171 , -0.0148618 ],
       [-1.23504185, -0.90235986, -2.17944947, -0.95742711, -1.67295185,
        -0.21320072, -0.95742711, -0.80178373,  1.02768524,  0.45882964],
       [ 0.67402284, -0.97561025, -2.17944947, -0.95742711,  1.51869891,
         0.48731592, -0.95742711, -0.80178373,  1.60884144, -0.29421828],
       [ 0.02233409,  2.90666036,  0.45883147,  1.04446594,  0.7005045 ,
        -0.91371736,  1.04446594, -0.80178373,  0.37580632,  2.5010817 ],
       [ 0.52835124,  1.39281899,  0.45883147, -0.95742711, -1.4521284 ,
        -1.614234  ,  1.04446594, -0.80178373, -1.68130214,  0.64101865],
       [-1.11237103,  1.05098384, -2.17944947, -0.95742711, -0.25936518,
        -1.614234  , -0.95742711, -0.80178373,  2.28531956,  1.04877501],
       [-1.08937025, -0.24310636,  0.45883147, -0.95742711, -0.43665712,
        -1.614234  ,  1.04446594, -0.80178373

In [182]:
num_of_clstr = 6

In [183]:
k_means = KMeans(init = "k-means++", n_clusters = num_of_clstr, n_init = 12)
k_means.fit(clustered)
labels = k_means.labels_

neighborhoods_venues_sorted["cluster"] = labels

We now also create a new dataframe that consists of the most common venues, our demographics as well as the clusters. Based on this dataframe we can then commence our assessment.

In [191]:
neighborhoods_venues_sorted_clustered = pd.merge(neighborhoods_venues_sorted, Vienna_clusters_additional, on = "Neighborhood")
neighborhoods_venues_sorted_clustered["cluster"].value_counts()

0    7
5    4
2    4
3    3
1    3
4    2
Name: cluster, dtype: int64

In [185]:
vienna_coordinates = "/Users/nikolas.anic/Desktop/ML/GeoJSON/Vienna_Bezirke.geojson"

In [157]:
# create a numpy array of length 6 and has linear spacing from the minium total immigration to the maximum total immigration
threshold_scale = np.linspace(neighborhoods_venues_sorted_clustered['cluster'].min(),
                              neighborhoods_venues_sorted_clustered['cluster'].max(),5, dtype=int)
threshold_scale = threshold_scale.tolist() # change the numpy array to a list
threshold_scale[-1] = threshold_scale[-1] + 1

Vienna_map_clusters = folium.Map(location = [latitude_vienna, longitude_vienna], zoom_start = 11)

Vienna_map_clusters.choropleth(
    geo_data=vienna_coordinates,
    data=neighborhoods_venues_sorted_clustered,
    columns=['Postal Code', 'cluster'],
    key_on='feature.properties.DISTRICT_C',
    threshold_scale=threshold_scale,
    fill_color='YlOrRd', 
    fill_opacity=0.7, 
    line_opacity=0.2,
    legend_name='Clusters',
    reset=True
)

Vienna_map_clusters

In [167]:
Vienna_nvsc

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,cluster,Postal Code,GrpIdx,Rental Prices per sqm,Growth last decade %,AirBnb availability (>5),AirBnB availability (>50),Income Gross,Google searches for rental flat ranked,Tourist Area inidicator,Asian_cuisine_available,Neighborhood-Latitude,Neighborhood-Longitude
0,Alsergrund,Supermarket,Café,Pharmacy,Pub,BBQ Joint,1,1090,1,16.35,8.2,1,0,36738,4,0,1,48.2333,16.35
1,Brigittenau,Bus Stop,Plaza,Supermarket,Bakery,Tram Station,5,1200,2,13.95,6.4,0,0,26130,3,0,0,48.2402,16.3773
2,Doebling,Austrian Restaurant,Wine Bar,Restaurant,Food,Park,1,1190,3,16.44,6.1,0,0,42260,4,0,0,48.2591,16.3339
3,Donaustadt,American Restaurant,Indian Restaurant,Food & Drink Shop,Restaurant,Bus Stop,3,1220,4,15.59,22.0,1,1,38125,2,1,0,48.219,16.495
4,Favoriten,Shopping Mall,Italian Restaurant,Smoke Shop,Athletics & Sports,Grocery Store,3,1100,5,16.25,15.8,1,0,27246,1,1,0,48.1521,16.3878
5,Floridsdorf,Gastropub,Wine Shop,Wine Bar,Vineyard,Fast Food Restaurant,5,1210,6,14.11,14.4,0,0,33274,1,0,0,48.2811,16.4113
6,Hernals,Italian Restaurant,Athletics & Sports,Construction & Landscaping,Bakery,Austrian Restaurant,2,1170,7,14.14,9.1,1,0,32378,1,1,0,48.2338,16.2901
7,Hietzing,Scenic Lookout,Yoga Studio,Electronics Store,Food Court,Food & Drink Shop,1,1130,8,14.88,6.0,1,0,44674,5,0,0,48.1773,16.2456
8,Innere Stadt,Restaurant,Café,Plaza,Austrian Restaurant,Italian Restaurant,0,1010,9,19.96,2.7,1,1,40116,5,1,1,48.2077,16.3705
9,Josefstadt,Café,Hotel,Supermarket,Bar,Restaurant,0,1080,10,15.21,7.7,1,1,37745,4,1,1,48.2167,16.35


**Now we have a first visualization of the clusters based on our indicators**. 

#### As have our final dataset including the clusters we received with a k-means approach, we can make an analysis based on our ideas. 


We can start with the fact that we'd prefer a much frequented area, preferably in a neighborhood that benefits from tourism. As a consequence, we can initially filter according to the tourist indicator. This can also be seen from the clustered visualization. Furthermore, we can combine this approach with the indicators for AirBnB availability, which we retrieved earlier. We may want to filter according to either five or 50 available offerings. 

Next, we want to open up the restaurant in an area which is not currently saturated when it comes to Asian food and, as a consequence, competition is less fierce. 

Third, we need to position ourselves within a given price range. As defined earlier, we want to offer a Japanese form of Vapiano, implying that price ranges must remain in an affordable, but not cheap range. Further assuming that the taste for exotic food is more pronounced within  trendier neighborhoods primarily inhabited by young professionals and families, we may require that Gross Income should not be in the lower third as well as Rental prices should be above the mean value. 

Fourth, we can assess the remaining neighborhoods according to the most common venues. If, for example, a neighborhood already offers a wide range of restaurants, but from a different kind, then this may indicate that, as we already filtered according to the "trendiness of a neighborhood", a commonly known and much appreciated food scene is established in which a new form of taste is likely to be welcomed. On the other hand, we could also profit in neighborhoods which would not see restaurants as most common venues, as we could obtain a first-mover advantage and, potentially, lower rental prices. 

Lastly, under the condition that we did not find a sufficiently small number of neighborhoods to choose from, we may filter according to google searches of the respective area.




In [159]:
Vienna_nvsc = neighborhoods_venues_sorted_clustered


# We now execute the first three steps:

Vienna_nvsc.loc[(Vienna_nvsc["GrpIdx"].isin([4,5,7,9,10,11,12,15,17,18,23])) 
                & (Vienna_nvsc["Asian_cuisine_available"] == 0) 
                & (Vienna_nvsc["Rental Prices per sqm"] >= Vienna_nvsc["Rental Prices per sqm"].mean())
                & (Vienna_nvsc["Income Gross"] >= 31000)]


Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,cluster,Postal Code,GrpIdx,Rental Prices per sqm,Growth last decade %,AirBnb availability (>5),AirBnB availability (>50),Income Gross,Google searches for rental flat ranked,Tourist Area inidicator,Asian_cuisine_available,Neighborhood-Latitude,Neighborhood-Longitude
3,Donaustadt,American Restaurant,Indian Restaurant,Food & Drink Shop,Restaurant,Bus Stop,3,1220,4,15.59,22.0,1,1,38125,2,1,0,48.219,16.495
11,Leopoldstadt,Theme Park Ride / Attraction,Restaurant,Café,Hotel,Museum,0,1020,12,16.51,12.3,1,1,33189,4,1,0,48.2167,16.4


#### As expected, the range of suitable neighborhoods declined dramatically. Currently, we only have a choice of two options, conisting of the 2nd district, Leopoldstadt and the 22nd district, Donaustadt. 

The both areas areas are indicated below: 

In [165]:
Vienna_selected = Vienna.loc[Vienna["Postal Code"].isin([1020, 1220])]

In [166]:
address = 'Vienna, AT'

geolocator = Nominatim(user_agent="foursquare_agent") # call the geolocator 

location = geolocator.geocode(address)
latitude_vienna = location.latitude
longitude_vienna = location.longitude

vienna_map = folium.Map(location = [latitude_vienna, longitude_vienna], zoom_start = 12)

folium.features.CircleMarker(
    [latitude_vienna, longitude_vienna],
    radius=10,
    color='red',
    popup='District Center',
    fill = True,
    fill_color = 'red',
    fill_opacity = 0.6,
    
).add_to(vienna_map)

for lat, lng, label in zip(Vienna_selected.Venue_Latitude, Vienna_selected.Venue_Longitude, Vienna_selected.Venue_Category):
    folium.features.CircleMarker(
        [lat, lng],
        radius=5,
        color='blue',
        fill = True,
        fill_color='blue',
        fill_opacity=0.6,
        popup=folium.Popup(label, parse_html=True)
    ).add_to(vienna_map)
vienna_map

In [197]:
Vienna_last = Vienna_nvsc.iloc[[3,11],:]
Vienna_last

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,cluster,Postal Code,GrpIdx,Rental Prices per sqm,Growth last decade %,AirBnb availability (>5),AirBnB availability (>50),Income Gross,Google searches for rental flat ranked,Tourist Area inidicator,Asian_cuisine_available,Neighborhood-Latitude,Neighborhood-Longitude
3,Donaustadt,American Restaurant,Indian Restaurant,Food & Drink Shop,Restaurant,Bus Stop,3,1220,4,15.59,22.0,1,1,38125,2,1,0,48.219,16.495
11,Leopoldstadt,Theme Park Ride / Attraction,Restaurant,Café,Hotel,Museum,0,1020,12,16.51,12.3,1,1,33189,4,1,0,48.2167,16.4


As we can see, Donaustadt is still a work in progress. Although established to certain parts, it still does not offer fairly many reviews and venues on FourSquare, another potential indicator that the area is not that frequently visited by tourists. Further, we observed in the clustered plot that Donaustadt is in group 4, together with the likes of Southern Vienna. As these neighborhoods are commonly known as socio-economically burdened and pre-dominantly inhabited by people with Islamic-Arabic background, a cultural scene may already exist whose demand for the proposed restaurant type may not yet be established. On the other hand, we can see that Leopoldstadt is both more centrally located as well as better equipped when it comes to venue locations. As we can see in the list above, the most common venues within this area are Amusement parks and attractions, Museums, Cafés, Hotels and Restaurants. Furthermore, rental growth increased at 12 percent throughout the last decade and the neighborhood is within the highest Google search category for real estate related topics. Moreover, the recent rise in facilities for public-educational as well as private-economic purposes indicates that the area is highly frequented by thousand of people every day coming from an educated background, being at least partly of young age and, potentially, having a greater interest in foreign cultures and international cuisine. In essence, Leopoldstadt offers an interesting socio-economic, cultural as well as construction-oriented pattern of which we, as a Japanese restaurant chain, are likely to profit. 

Therefore, let's look at the area more closely: 

In [78]:
Vienna_selected = Vienna.loc[Vienna["Postal Code"].isin([1020])]

Leo_map = folium.Map(location = [latitude_vienna, longitude_vienna], zoom_start = 14)

folium.features.CircleMarker(
    [latitude_vienna, longitude_vienna],
    radius=10,
    color='red',
    popup='District Center',
    fill = True,
    fill_color = 'red',
    fill_opacity = 0.6,
    
).add_to(Leo_map)

for lat, lng, label in zip(Vienna_selected.Venue_Latitude, Vienna_selected.Venue_Longitude, Vienna_selected.Venue_Category):
    folium.features.CircleMarker(
        [lat, lng],
        radius=5,
        color='blue',
        fill = True,
        fill_color='blue',
        fill_opacity=0.6,
        popup=folium.Popup(label, parse_html=True)
    ).add_to(Leo_map)


Leo_map

As we can see, the area consists of an exhibition site, university, three underground railway stations, parks and a wide range of attraction sites. This concludes our analysis. We have found one distinctive area of a neighborhood deemed profitable to open up our Japanese-style restaurant.

### Results and Conclusion

The analysis revealed that Vienna has a wide range of cultural and gastronomic venues throughout the entire city. Although important, an exact analysis cannot solely be based on prevalence of existing restaurants within a given area, but is also based on socio-economic and cultural backgrounds of the respective neighborhood. In order to satisfy both restraints, a coherent analysis requires the inclusion of both factors. We satisfied this condition by including indicators obtained from public as well as private analytics sites. 

Working with the dataset, we first obtained a list of existing venues based on each neighborhood (in our case: district). We cleaned the dataset and included the additional characteristics. After, we identified existing venues per neighborhood and defined both existence and prevalence of venues within a neighborhood. In addition to geographic location, we then defined clusters based on socio-economic characteristics to portray the differences betweeen neighborhoods deemed to have an impact on demand for the chain's services. This delivered important implications to the socio-economic distributions within the city itself. Then, we combined said distribution with the addditional characteristics as well as the prevalence indiactors of Asian cuisine and restaurant density and narrowed-down the offer to two sites, namely Donaustadt and Leopoldstadt. 

Further using analytics data from Google searches and AirBnB offerings, we defined a given location range in which prevalence of cultural sites is existing, removing Donaustadt from our analysis. This step left us with Leopoldstadt, which appears to suit the requirements of the stakeholders in a best possible way. 

Please bear in mind that, although this area meets the respective requirements at best, it cannot be guaranteed that the pre-selected choices indeed form a valid basis for analysis. It was our aim to cluster according to certain socio-economic, touristic and gastronomic considerations. These considerations may, however be based on incorrect assumptions. Consequenlty, recommended areas should be considered only as a starting point for more detailed analysis. 

In a next step, the district of Leopoldstadt should be analyzed in a more nuanced fashion, by taking in the stakeholder's opinion based on economic considerations (e.g. rental prices, advertising prices, preferred location with infrastructural needs) as well as enhanced methods of location attractiveness through further research. 