# Coursera IBM Data Science Capstone Project - full report & notebook


## Table of contents
* [Introduction: Business Problem](#introduction)
* [Data](#data)
* [Methodology](#methodology)
* [Analysis](#analysis)
* [Results and Discussion](#results)
* [Conclusion](#conclusion)



## Introduction: Business Problem <a name="introduction"></a>

In this project I will try to find an optimal location for a new greek restaurant. Specifically, this report will be targeted to stakeholders interested in opening an **greek restaurant** in the city of **Mannheim**, Germany. It is assumed that a current owner of a greek restaurant from another town is about to open a next greek restaurant in Mannheim. To find a good spot for this restaurant, this owner as the main stakeholder of this report sought out to me for analyzing the current situation of greek restaurants in the city of Mannheim.

The location should not be crowded with a lot of competitors. As there are quite a lot of restaurants in Mannheim, I´ll try to detect a location that is not already crowded with a lot of competitors which are only assumed as greek restaurants. This means that I am going to investigate the situation of greek restaurants, so that all other categories of restaurants will not be part of this report. Especially, interesting are areas with no greek restaurants nearby. The location should be as close to city center as possible. An area of a radius of 300 m around the center of Mannheim is considered as close.

The center of city of Mannheim is considered to be the 'Paradeplatz'.

I will use data science powers to generate a few most promissing places based on this criteria. Advantages of each area will then be clearly expressed so that best possible final location can be chosen by stakeholders.

## Data <a name="data"></a>

To sum up this criteria, based on definition of the problem, factors that will influence the decision are:
* number of existing restaurants in the center of Mannheim (any type of restaurant)
* number of and distance to greek restaurants in the neighborhood
* distance of neighborhood from city center

I decided to use regularly spaced grid of locations, centered around city center, to define our neighborhoods.

Following data sources will be needed to extract/generate the required information:
* the latitude & longitude (coordinates) of the center of Mannheim  will be generated using Foursquare API
* number of restaurants and their type and location will be obtained using Foursquare API

### Find the center of Mannheim

Let's create latitude & longitude coordinates for the center of mannheim. I´ll create a grid of cells covering our area of interest which is aprox. in a radius of 2000 meters centered around Mannheim center.

Let's first find the latitude & longitude of Mannheim city center, using specific, well known address of Mannheim Paradeplatz and Google Maps geocoding API.

### import necessary libraries

In [23]:
import requests # library to handle requests
import pandas as pd # library for data analsysis
import numpy as np # library to handle data in a vectorized manner
import random # library for random number generation
import math


!pip install geopy
from geopy.geocoders import Nominatim # module to convert an address into latitude and longitude values

# libraries for displaying images
from IPython.display import Image 
from IPython.core.display import HTML 
    
# tranforming json file into a pandas dataframe library
from pandas.io.json import json_normalize


! pip install folium==0.5.0
import folium # plotting library

print('Folium installed')
print('Libraries imported.')

address = 'Paradeplatz, Mannheim, Germany'



Folium installed
Libraries imported.


In [16]:
# Foursquare credentials
CLIENT_ID = 'SNWYGYS0H4MESNDGOBJQ3RXW2I0EXRFGXNT21O0YQCLDVXXM' 
CLIENT_SECRET = '52NYBOIHHVRAF3QZPLZEYMF2N0WNEXZLDOYZHH1N1OYLFYCS'
ACCESS_TOKEN = 'KCE50YUYKQL02L4GVVMD4PWF3MJQWG0TV3VYASS5REEH1NBJ' 
VERSION = '20180604'
LIMIT = 30


#find coordinates of address
address = 'Paradeplatz, Mannheim, Germany'

geolocator = Nominatim(user_agent="foursquare_agent")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
location_coordinates = [latitude, longitude]
print('latitude:',latitude, '\nlongitude:',longitude)

latitude: 49.487182250000004 
longitude: 8.466282438198064


OK, we now have the coordinates of the city center of Mannheim.  

Let's now search for the equivalent category

In [3]:
search_query = 'Greek'
radius = 2000 #greek restaurant in a radius of 2000 m
print(search_query + ' .... OK!')
url = 'https://api.foursquare.com/v2/venues/search?client_id=SNWYGYS0H4MESNDGOBJQ3RXW2I0EXRFGXNT21O0YQCLDVXXM&client_secret=52NYBOIHHVRAF3QZPLZEYMF2N0WNEXZLDOYZHH1N1OYLFYCS&ll=49.487182250000004,8.466282438198064&oauth_token=KCE50YUYKQL02L4GVVMD4PWF3MJQWG0TV3VYASS5REEH1NBJ&v=20180604&query=Greek&radius=2000&limit=30'.format(CLIENT_ID, CLIENT_SECRET, latitude, longitude,ACCESS_TOKEN, VERSION, search_query, radius, LIMIT)
url #check url
results = requests.get(url).json() #send request for getting results

Greek .... OK!


In [4]:
#relevant part of JSON and Gettransform it into a _pandas_ dataframe
# assign relevant part of JSON to venues
venues = results['response']['venues']

# tranform venues into a dataframe
df = pd.json_normalize(venues)
df.head()


Unnamed: 0,id,name,categories,referralId,hasPerk,location.address,location.lat,location.lng,location.labeledLatLngs,location.distance,location.postalCode,location.cc,location.city,location.state,location.country,location.formattedAddress
0,540de7b9498eadd8beb91a85,Greek Food Bar -Restaurant,"[{'id': '4bf58dd8d48988d10e941735', 'name': 'G...",v-1617184544,False,"Q2, 14",49.488117,8.468732,"[{'label': 'display', 'lat': 49.48811654211074...",205,68161.0,DE,Mannheim,Baden-Württemberg,Deutschland,"[Q2, 14, 68161 Mannheim]"
1,5bbbf9e8b8fd9d002c902466,Ellin - Original Greek,"[{'id': '4bf58dd8d48988d10e941735', 'name': 'G...",v-1617184544,False,"E3, 1",49.489254,8.464833,"[{'label': 'display', 'lat': 49.489254, 'lng':...",253,68161.0,DE,Mannheim,Baden-Württemberg,Deutschland,"[E3, 1, 68161 Mannheim]"
2,5cc1b79604d1ae002ca44b99,Greek Food,"[{'id': '4bf58dd8d48988d10e941735', 'name': 'G...",v-1617184544,False,Im Zollhof 4,49.486038,8.447235,"[{'label': 'display', 'lat': 49.486038, 'lng':...",1383,,DE,Ludwigshafen am Rhein,Rheinland-Pfalz,Deutschland,"[Im Zollhof 4, Ludwigshafen am Rhein]"
3,5d092fe02d2fd9002cd9605e,Mykonos Greek Food,"[{'id': '4bf58dd8d48988d1df931735', 'name': 'B...",v-1617184544,False,Bahnhofstraße 22,49.4823,8.444235,"[{'label': 'display', 'lat': 49.4823, 'lng': 8...",1684,67059.0,DE,Ludwigshafen am Rhein,Rheinland-Pfalz,Deutschland,"[Bahnhofstraße 22, 67059 Ludwigshafen am Rhein]"


In [6]:
# Define information of interest and filter dataframe
# keep only columns that include venue name, and anything that is associated with location
filtered_columns = ['name', 'categories'] + [col for col in df.columns if col.startswith('location.')] + ['id']
df_filtered = df.loc[:, filtered_columns]

# function that extracts the category of the venue
def get_category_type(row):
    try:
        categories_list = row['categories']
    except:
        categories_list = row['venue.categories']
        
    if len(categories_list) == 0:
        return None
    else:
        return categories_list[0]['name']

# filter the category for each row
df_filtered['categories'] = df_filtered.apply(get_category_type, axis=1)

# clean column names by keeping only last term
df_filtered.columns = [column.split('.')[-1] for column in df_filtered.columns]

df_filtered

Unnamed: 0,name,categories,address,lat,lng,labeledLatLngs,distance,postalCode,cc,city,state,country,formattedAddress,id
0,Greek Food Bar -Restaurant,Greek Restaurant,"Q2, 14",49.488117,8.468732,"[{'label': 'display', 'lat': 49.48811654211074...",205,68161.0,DE,Mannheim,Baden-Württemberg,Deutschland,"[Q2, 14, 68161 Mannheim]",540de7b9498eadd8beb91a85
1,Ellin - Original Greek,Greek Restaurant,"E3, 1",49.489254,8.464833,"[{'label': 'display', 'lat': 49.489254, 'lng':...",253,68161.0,DE,Mannheim,Baden-Württemberg,Deutschland,"[E3, 1, 68161 Mannheim]",5bbbf9e8b8fd9d002c902466
2,Greek Food,Greek Restaurant,Im Zollhof 4,49.486038,8.447235,"[{'label': 'display', 'lat': 49.486038, 'lng':...",1383,,DE,Ludwigshafen am Rhein,Rheinland-Pfalz,Deutschland,"[Im Zollhof 4, Ludwigshafen am Rhein]",5cc1b79604d1ae002ca44b99
3,Mykonos Greek Food,BBQ Joint,Bahnhofstraße 22,49.4823,8.444235,"[{'label': 'display', 'lat': 49.4823, 'lng': 8...",1684,67059.0,DE,Ludwigshafen am Rhein,Rheinland-Pfalz,Deutschland,"[Bahnhofstraße 22, 67059 Ludwigshafen am Rhein]",5d092fe02d2fd9002cd9605e


Let´s the greek restaurants in the area

In [24]:
dataframe_filtered.name
venues_map = folium.Map(location=[latitude, longitude], zoom_start=13) # generate map centred around the Paradeplatz Mannheim

# add a red circle marker to represent the center (Paradeplatz)
folium.CircleMarker(
    [latitude, longitude],
    radius=9,
    color='red',
    popup='Paradeplatz',
    fill = True,
    fill_color = 'red',
    fill_opacity = 0.6
).add_to(venues_map)

# add the greek restaurants as blue circle markers
for lat, lng, label in zip(dataframe_filtered.lat, dataframe_filtered.lng, dataframe_filtered.categories):
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        color='blue',
        popup=label,
        fill = True,
        fill_color='blue',
        fill_opacity=0.6
    ).add_to(venues_map)

# display map
venues_map

### Foursquare
Now that we have our location candidates, let's use Foursquare API to get info on restaurants in each neighborhood.

We're interested in venues in 'food' category, but only those that are proper restaurants - coffe shops, pizza places, bakeries etc. are not direct competitors so we don't care about those. So we will include in out list only venues that have 'restaurant' in category name, and we'll make sure to detect and include all the subcategories of specific 'Italian restaurant' category, as we need info on Italian restaurants in the neighborhood.

Looking good. So now we have all the restaurants in area within few kilometers from Alexanderplatz, and we know which ones are Italian restaurants! We also know which restaurants exactly are in vicinity of every neighborhood candidate center.

This concludes the data gathering phase - we're now ready to use this data for analysis to produce the report on optimal locations for a new Italian restaurant!

## Methodology <a name="methodology"></a>

This analysis focuses on places with low density of greek restaurants which are as close as possible to the center of Mannheim, the Paradeplatz. As already mentioned, the area is located about 2km around the city center.

In first step, I have collected the required **data: location and type (category) of every restaurant within 2km from Mannheim center** (Paradeplatz). I have also **identified greek restaurants** (according to Foursquare categorization).

Second step in this analysis will be the calculation and exploration of '**restaurant density**' of those four greek restaurants. Therefore, we will calculate the distance of each restaurant to the Paradeplatz.

In third and final step I will focus on most promising areas and within Mannheim city center. 
Established in the discussion with stakeholders: 
*only locations with **no other greek restaurant in radius of 300 meters** are taken into consideration. 

I will present a map of all such locations but also create clusters (using **k-means clustering**) of those locations to identify general zones / neighborhoods / addresses which should be a starting point for final 'street level' exploration and search for optimal venue location by stakeholders.

## Analysis <a name="analysis"></a>

OK, now let's analyze the **distance of each greek restaurant from Paradeplatz** (not only those within 300m - distance to closest one, regardless of how distant it is).

Let´s get the distance out of json-file.

In [8]:
distance_to_center = df_filtered['distance']
distance_to_center

0     205
1     253
2    1383
3    1684
Name: distance, dtype: int64

In [9]:
#average distance of each greek restaurant from center
print('Average distance to closest greek restaurant from center:', df_filtered['distance'].mean()) 

Average distance to closest greek restaurant from center: 881.25


OK, so I can see that two greek restaurants are within 250 meters of the center of the city of Mannheim. We need to seek carefully for
a possible place for the new greek restaurant. 

Looks like an area of low greek-restaurant density closest to city center can be found **south, south-east from Paradeplatz**. At this point, we only want to focus on greek restaurants. 


I can conclude that near the center of Mannheim, in a radius of less than 300 m already two greek restaurants exist.
For this reason, we will focus on an area with a bigger radius around the center.
To do so, we take the average distance of greek restaurants of the center, which is **881 m**. 
To conclude, we want to open our restaurant possibly in a radius of 881 m around Paradeplatz with respect to a south or
sout-east location.

OK. Now let's show our area of interest in a map.

Let's see how this looks on a map.

In [46]:
dataframe_filtered.name
venues_map = folium.Map(location=[latitude, longitude], zoom_start=13) # generate map centred around the Paradeplatz Mannheim

# add a red circle marker to represent the center (Paradeplatz)
folium.CircleMarker(
    [latitude, longitude],
    radius=5,
    color='red',
    popup='Paradeplatz',
    fill = True,
    fill_color = 'red',
    fill_opacity = 0.6
).add_to(venues_map)

# add the greek restaurants as blue circle markers
for lat, lng, label in zip(dataframe_filtered.lat, dataframe_filtered.lng, dataframe_filtered.categories):
    folium.CircleMarker(
        [lat, lng],
        radius=3,
        color='blue',
        popup=label,
        fill = True,
        fill_color='blue',
        fill_opacity=0.6
    ).add_to(venues_map)

    
# add a red circle for our area of interest
# are from 250 to 881 m around center
folium.Circle([latitude, longitude], radius=255, color='green', fill=False, fill_opacity=1).add_to(venues_map)
folium.Circle([latitude, longitude], radius=881, color='green', fill=False, fill_opacity=1).add_to(venues_map)

# display map
venues_map

Looking good. In the are within the two green circles, there are possible good locations to open a greek restaurant.
For the owner of the new greek restaurant, it is now easier possible to select a good spot to open his restaurant.
Any of the locations within the two green circles woould be a good fit, whereas the owner of the new greek restaurant 
will likely seek for a spot possibly far away from the two competitors in the center of Mannheim.
Therefore, it is highly recommended to seek a sport southly of the center of Mannheim.

 


This concludes my analysis.

## Results and Discussion <a name="results"></a>

This analysis shows that although there is a great number of greek restaurants in the center of Mannheim within an area of approximately a radius of 2 km,  there are possible spots of low greek-restaurant density fairly close to city center. Especially, southly of Paradeplatz, our estimated center of the city of Mannheim, there is a huge area of possible spots to open a new greek restaurant. 

For this analysis, the center of the city of Mannheim is assumed to be Paradeplatz with the following address and coordinates:
- address: 'Paradeplatz, Mannheim, Germany'
- coordinates: latitude (49.487182250000004) & longitude(8.466282438198064)

Within a radius of 2 km around the Paradeplatz, four greek restaurants could be found. The goal of this analysis is to find possible spots
of low density of greek-restaurants. This possible spots for a new greek restaurant should be as close to the Paradeplatz as possible. Therefore, a radius of a maximum of 300 m around the center is assumed as very close. In this analysis it could be found that already two greek restaurants exist within a radius of 250 m, whereas the two other greek restaurants are in a radius of 1383 m and 1684 m. It can be concluded, that the very center of the city of Mannheim is quite crowded with a lot of greek restaurants. Altogether, four restaurants could be found within a total radius of 2 km around the center.

Interestingly, reviewing the location map showing all four restaurants with their respective distance to the Paradeplatz, it can be concluded that there are basically two clusters of greek restaurants within the total radius of 2 km. 
- one cluster consists of the two restaurants close to the center
- another cluster consists of the two restaurants quite far away of the center.
That way, it can be summed up that in the area referred as close to the center and far away is quite low crowded with creek restaurants. 

All in all, the chosen project for using Foursquare API could not provide such an extensive scope a assumed at first. Nevertheless, by analyzing the current situation of the crowded greek restaurants, interesting insights could be won. A hypothesis has to be stated that by the course of the current COVID-19 pandemia, a possible reason for the low density of greek restaurants can be found. This goes for the time of working on this report in March 2021. This hypothesis is in so far interesting, as the author of this report expected by far more greek restaurants in the center of the city of Mannheim when starting this final capstone project. 


## Conclusion <a name="conclusion"></a>

Purpose of this project was to identify Mannheim spots close to center with low density of greek restaurants in order to aid the main stakeholder in narrowing down the search for optimal location for a new greek restaurant. By calculating restaurant density distribution from Foursquare data at first the current distribution of greek restaurants is identified. Accordingly, the distance of each restaurant to the center was calculated, whereas four greek restaurants can be found in an area of a radius of 2 km around the city center. Those four restaurants can be clustered in two different clusters, as two restaurants are located very close to the center and two are both located quite far from the center of Mannheim. Therefore, a zone for good spots can be concluded and are presented to the stakeholder as possible good spots for opening a new greek restaurant. This zone starts with a radius of 255 m around the Paradeplatz and reaches to the average distance of all four greek restaurants from the center, which makes a radius of 881 m. 

The final decision for an optimal restaurant location will be made by the main stakeholder based on his specific desires a good spot. To assist this decision, a spot southly of the city center is recommended as the two greek restaurants near the center are a little bit northly of the Paradeplatz.