# Capstone Project - The Battle of the Neighborhoods (Week 2)
### Applied Data Science Capstone by IBM/Coursera

## Table of contents
* [Introduction: Business Problem](#introduction)
* [Data](#data)
* [Methodology](#methodology)
* [Analysis](#analysis)
* [Results and Discussion](#results)
* [Conclusion](#conclusion)

## Introduction: Business Problem <a name="introduction"></a>

### 1.1 Background

Attica is the largest region in Greece and the population is approximately above the 50% of the country. The region of Attica is includes Megaris as part of the regional unit West Attica, and the Saronic Islands and Cythera, as well as the municipality of Troizinia on the Peloponnesian mainland, as the regional unit Islands. It is very difficult to define which location will have the best prospect to open a new restaurant using a traditional approach, as the area of Attica is extremely wide. 

We will explore Latitude and Longitude data in order to find the best place to open a restaurant. We have to utilize foursquare location data based on the most frequent venues.

### 1.2 Problem

In this project we try to utilize Foursquare data in order to find locations that restaurants are the most frequently venues in that location.

Moreover, it is also important to answer the following question "Which neighborhood exhibits the same characteristics based on venues?" In order to answer the above mentioned question a clustering method is applied. The neighborhoods will be categorized into 5 clusters that have similar characteristics. This project aims to choose the location that is best for restaurants per cluster.

### 1.3 Interest

Many businesses in the food industry as well advertising companies would be very interested in the above stated problems. For instance, a franchise fast-food restaurant wants to know where to open restaurants. The best strategy would be to open one restaurant in each unique cluster. Moreover, the knowledge of the location with the most frequently restaurants venues are very important for the business plan of a company in in the food industry.

## Data <a name="data"></a>

### 2.1 Data sources

First of all, we need to find data for all the areas of Greece. Noted that Greece has 408 sub regions based on data from https://simplemaps.com. The data for this project was obtained from public web site such as Foursquare and simplemaps. The initial dataset was downloaded from (https://simplemaps.com/data/gr-cities). The initial dataset was in csv and then was imported to Jupiter through pandas.

### 2.1 Data cleaning

First all, the data combined into dataframe. The initial dataset had 4 columns (city, lat, lng and Region). We chose the region to be Attika and we renamed the columns as following (Neighborhood, Latitude, Longitude, Region,). 

Afterwards we can easily run folium.Map in order to visualize a map with all sample locations in Attica. Finally, we request data with Foursquare API for all locations to get the top 50 venues per location within a radius of 500 meters. Noted that our dataset consist of 82 Neighborhoods.

### 2.3 Feature selection

In this project we will apply an unsupervised machine learning algorithm in order to cluster the locations, so we only have features (not label data needed). Our features will be the venues, such as Restaurant, Coffee Shop, Wine Shop, Gym, etc. In this project the number of features was 197.

#### Download and install all Libaries and APIs wii will need in this project 

In [71]:
# Libaries 

import pandas as pd
import numpy as np
from bs4 import BeautifulSoup 
import requests 
!pip -q install geopy
!pip -q install geocoder
import geocoder
import folium # map rendering library
from geopy.geocoders import Nominatim # convert an address into latitude and longitude values

import matplotlib.cm as cm # Matplotlib and associated plotting modules
import matplotlib.colors as colors

from sklearn.cluster import KMeans # import k-means from clustering stage

### Download and Explore Dataset

In [75]:
# Dowload data for latitude and longitude for all sub regions in Greece 
url_1 = 'https://raw.githubusercontent.com/Konstantinos-Anathreptakis/Coursera_Capstone/main/greece.csv'
df = pd.read_csv(url_1)

In [76]:
df.head()

Unnamed: 0,city,lat,lng,Region
0,Athens,37.9794,23.7161,Attiki
1,Piraeus,37.95,23.7,Attiki
2,Thessaloniki,40.6333,22.95,Kentriki Makedonia
3,Patra,38.25,21.7333,Dytiki Ellada
4,Larisa,39.6385,22.4131,Thessalia


### Clean Dataset

In [77]:
# Choose Dataframe only for Attika '
df_attika = df[df['Region']=='Attiki']

In [82]:
# Count numver of cities inAttika Region 
df_attika.count()

city      82
lat       82
lng       82
Region    82
dtype: int64

In [83]:
df_attika.head()

Unnamed: 0,city,lat,lng,Region
0,Athens,37.9794,23.7161,Attiki
1,Piraeus,37.95,23.7,Attiki
6,Peristeri,38.0167,23.6833,Attiki
7,Kallithea,37.95,23.7,Attiki
8,Nikaia,37.9667,23.6333,Attiki


In [85]:
# Rename colums names city to Neighborhood, lat to Latitude and lng to Longitude
df_attika.columns = ['Neighborhood','Latitude','Longitude','Region']

In [87]:
address = 'Athens'

geolocator = Nominatim(user_agent="ny_explorer")
lodf_attikacation = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinate of Athens are {}, {}.'.format(latitude, longitude))

The geograpical coordinate of Athens are 37.9839412, 23.7283052.


In [88]:
# create map of London using latitude and longitude values
map_attika = folium.Map(location=[latitude, longitude], zoom_start=10)

# add markers to map
for lat, lng, Region, Neighborhood in zip(df_attika['Latitude'], df_attika['Longitude'], df_attika['Region'], df_attika['Neighborhood']):
    label = '{}, {}'.format(Region, Neighborhood)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_attika)  
    
map_attika

### Foursquare Credentials and Version

In [476]:
CLIENT_ID = 'My_Cliend_id' # your Foursquare ID
CLIENT_SECRET = 'My_CLient_Secret' # your Foursquare Secret
VERSION = '20210605' # Foursquare API version
LIMIT = 50 # A default Foursquare API limit value
radius = 500 # define radius

print('Your credentails:')
print('CLIENT_ID: ' + CLIENT_ID)
print('CLIENT_SECRET:' + CLIENT_SECRET)

Your credentails:
CLIENT_ID: My_Cliend_id
CLIENT_SECRET:My_CLient_Secret


In [96]:
# Function to collect venues data 
def getNearbyVenues(names, latitudes, longitudes, radius=500):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
                    
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Neighborhood', 
                  'Neighborhood Latitude', 
                  'Neighborhood Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

In [97]:
# We Run the above function to create a new dataframe 
Attika_venues = getNearbyVenues(names=df_attika['Neighborhood'],
                                   latitudes=df_attika['Latitude'],
                                   longitudes=df_attika['Longitude']
                                  )

In [103]:
Attika_venues.head()

Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,Athens,37.9794,23.7161,Η Ταράτσα Του Φοίβου,37.979165,23.715034,Music Venue
1,Athens,37.9794,23.7161,ΡΑΚΟΡ,37.980391,23.717082,Greek Restaurant
2,Athens,37.9794,23.7161,Elvis (Έλβις),37.981313,23.716325,Souvlaki Shop
3,Athens,37.9794,23.7161,Holy Moly,37.979533,23.714712,Street Food Gathering
4,Athens,37.9794,23.7161,λούης,37.980881,23.71536,Kafenio


In [104]:
Attika_venues.count()

Neighborhood              1772
Neighborhood Latitude     1772
Neighborhood Longitude    1772
Venue                     1772
Venue Latitude            1772
Venue Longitude           1772
Venue Category            1772
dtype: int64

## Analysis <a name="analysis"></a>

Explanatory data analysis and additional info from our raw data. 

Frequency o venues per  count the number of restaurants in every area candidate:

In [105]:
# one hot encoding
attika_onehot = pd.get_dummies(Attika_venues[['Venue Category']], prefix="", prefix_sep="")

# add neighborhood column back to dataframe
attika_onehot['Neighborhood'] = Attika_venues['Neighborhood'] 

# move neighborhood column to the first column
fixed_columns = [attika_onehot.columns[-1]] + list(attika_onehot.columns[:-1])
attika_onehot = attika_onehot[fixed_columns]

attika_onehot.head()

Unnamed: 0,Yoga Studio,Accessories Store,American Restaurant,Art Gallery,Art Museum,Arts & Crafts Store,Asian Restaurant,Athletics & Sports,Auto Garage,Auto Workshop,...,Vape Store,Vegetarian / Vegan Restaurant,Video Game Store,Video Store,Volleyball Court,Warehouse Store,Whisky Bar,Wine Bar,Wine Shop,Women's Store
0,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
1,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
2,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
3,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
4,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0


In [106]:
# Mean frequency of Neighborhoods Venues in Attika 
attika_grouped = attika_onehot.groupby('Neighborhood').mean()
attika_grouped.head()

Unnamed: 0_level_0,Yoga Studio,Accessories Store,American Restaurant,Art Gallery,Art Museum,Arts & Crafts Store,Asian Restaurant,Athletics & Sports,Auto Garage,Auto Workshop,...,Vape Store,Vegetarian / Vegan Restaurant,Video Game Store,Video Store,Volleyball Court,Warehouse Store,Whisky Bar,Wine Bar,Wine Shop,Women's Store
Neighborhood,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
Acharnes,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
Agia Paraskevi,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.02,0.0,0.0,...,0.02,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.02,0.0
Agia Varvara,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
Agkistri,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
Aigaleo,0.0,0.0,0.02,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


In [284]:
# Sum frequency of Neighborhoods Venues in Attika 
attika_sum = attika_onehot.groupby('Neighborhood').sum()
attika_sum.head()

Unnamed: 0_level_0,Yoga Studio,Accessories Store,American Restaurant,Art Gallery,Art Museum,Arts & Crafts Store,Asian Restaurant,Athletics & Sports,Auto Garage,Auto Workshop,...,Vape Store,Vegetarian / Vegan Restaurant,Video Game Store,Video Store,Volleyball Court,Warehouse Store,Whisky Bar,Wine Bar,Wine Shop,Women's Store
Neighborhood,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
Acharnes,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
Agia Paraskevi,0,0,0,0,0,0,0,1,0,0,...,1,0,0,0,0,0,0,0,1,0
Agia Varvara,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
Agkistri,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
Aigaleo,0,0,1,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0


In [276]:
# Traspose table and then reset index in order to calculate number of resturants, Taverna and Souvlaki
attika_sum_t = attika_sum.T
attika_sum_t = attika_sum_t.reset_index()
attika_sum_t.head()

Neighborhood,index,Acharnes,Agia Paraskevi,Agia Varvara,Agkistri,Aigaleo,Aigina,Anavyssos,Argyroupoli,Aspropyrgos,...,Zefyri,Zografos,agioi Anargyroi,agios Dimitrios,agios Stefanos,alimos,ano Liosia,anoixi,ilion,ydra
0,Yoga Studio,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
1,Accessories Store,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,1
2,American Restaurant,0,0,0,0,1,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
3,Art Gallery,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
4,Art Museum,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0


In [282]:
# Search in teh dataset for venues that contain the words 'Restaurant' or 'Taverna' or 'Souvlaki

search = ['Restaurant','Taverna','Souvlaki']

Number_Restaurants = attika_sum_t[attika_sum_t['index'].str.contains('|'.join(search))]

In [323]:
# Output Table with the Neighborhoods with more restaurants

table = Number_Restaurants.sum()
table = pd.DataFrame(table)
table = table.drop('index')
table.columns=['Number_of_Restraurants']
table = table.sort_values('Number_of_Restraurants', ascending=False)
table.head(10)

Unnamed: 0_level_0,Number_of_Restraurants
Neighborhood,Unnamed: 1_level_1
ydra,21
Nea Filadelfeia,19
Argyroupoli,17
Kaisariani,15
Chalandri,15
Palaia Fokaia,14
Ilioupoli,14
Melissia,12
Aigina,12
alimos,12


In [107]:
# Traspose table in order to visuliaze output  
attika_grouped_t =attika_grouped.T

In [109]:
attika_grouped_t.head()

Neighborhood,Acharnes,Agia Paraskevi,Agia Varvara,Agkistri,Aigaleo,Aigina,Anavyssos,Argyroupoli,Aspropyrgos,Athens,...,Zefyri,Zografos,agioi Anargyroi,agios Dimitrios,agios Stefanos,alimos,ano Liosia,anoixi,ilion,ydra
Yoga Studio,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
Accessories Store,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.02
American Restaurant,0.0,0.0,0.0,0.0,0.02,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
Art Gallery,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
Art Museum,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.02,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


In [110]:
# Loop function to print the most frequent venues per Neighborhood
for name in (attika_grouped_t):
    top_10 = attika_grouped_t[[name]].sort_values(name, ascending=False)
    print(name)
    print(top_10.head(10))
    print("----------")

Acharnes
Neighborhood        Acharnes
Mobile Phone Shop   0.214286
Café                0.214286
Supermarket         0.142857
Plaza               0.071429
Creperie            0.071429
Cosmetics Shop      0.071429
Souvlaki Shop       0.071429
Taverna             0.071429
Seafood Restaurant  0.071429
Yoga Studio         0.000000
----------
Agia Paraskevi
Neighborhood            Agia Paraskevi
Bakery                            0.10
Pizza Place                       0.06
Clothing Store                    0.06
Pharmacy                          0.06
Café                              0.06
Coffee Shop                       0.04
Cosmetics Shop                    0.04
Supermarket                       0.04
Baby Store                        0.04
Furniture / Home Store            0.04
----------
Agia Varvara
Neighborhood      Agia Varvara
Fish Taverna          0.074074
Café                  0.074074
Grocery Store         0.074074
Greek Restaurant      0.074074
Optical Shop          0.074074
Restaur

In [111]:
def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:num_top_venues]

In [112]:
attika_grouped_reindex = attika_grouped.reset_index()

In [115]:
num_top_venues = 10

indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
columns = ['Neighborhood']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

# create a new dataframe
neighborhoods_venues_sorted = pd.DataFrame(columns=columns)
neighborhoods_venues_sorted['Neighborhood'] = attika_grouped_reindex['Neighborhood']

for ind in np.arange(attika_grouped_reindex.shape[0]):
    neighborhoods_venues_sorted.iloc[ind, 1:] = return_most_common_venues(attika_grouped_reindex.iloc[ind, :], num_top_venues)

neighborhoods_venues_sorted.head()

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Acharnes,Café,Mobile Phone Shop,Supermarket,Creperie,Souvlaki Shop,Taverna,Seafood Restaurant,Plaza,Cosmetics Shop,Dance Studio
1,Agia Paraskevi,Bakery,Pharmacy,Pizza Place,Clothing Store,Café,Plaza,Baby Store,Cosmetics Shop,Coffee Shop,Furniture / Home Store
2,Agia Varvara,Grocery Store,Fish Taverna,Café,Optical Shop,Greek Restaurant,Restaurant,Bakery,Bus Stop,Fast Food Restaurant,Betting Shop
3,Agkistri,Hotel,Nightclub,Hotel Bar,Cocktail Bar,Harbor / Marina,Café,Greek Restaurant,Women's Store,Farmers Market,Food & Drink Shop
4,Aigaleo,Café,Bar,Meze Restaurant,Coffee Shop,Burger Joint,Souvlaki Shop,Mobile Phone Shop,Donut Shop,Bakery,Snack Place


In [129]:
# Find Neighborhoods that Restaurant, Taverna or Souvlaki is the 1st, 2nd and 3rd most common venue

searchfor = ['Restaurant', 'Taverna','Souvlaki']

Output1 = neighborhoods_venues_sorted[neighborhoods_venues_sorted['1st Most Common Venue'].str.contains('|'.join(searchfor))]
Output2 = neighborhoods_venues_sorted[neighborhoods_venues_sorted['2nd Most Common Venue'].str.contains('|'.join(searchfor))]
Output3 = neighborhoods_venues_sorted[neighborhoods_venues_sorted['3rd Most Common Venue'].str.contains('|'.join(searchfor))]

In [130]:
# Neighborhoods Restaurants is the 1st Most Common Venue
Output1[['Neighborhood','1st Most Common Venue']]

Unnamed: 0,Neighborhood,1st Most Common Venue
13,Dafni,Greek Restaurant
15,Elefsina,Cretan Restaurant
25,Kalyvia Thorikou,Taverna
30,Koropi,Meze Restaurant
36,Mandra,Taverna
41,Melissia,Greek Restaurant
50,Oropos,Greek Restaurant
52,Palaia Fokaia,Seafood Restaurant
60,Porto Rafti,Fish Taverna
71,Zografos,Greek Restaurant


In [131]:
# Neighborhoods Restaurants is the 2nd Most Common Venue
Output2[['Neighborhood','2nd Most Common Venue']]

Unnamed: 0,Neighborhood,2nd Most Common Venue
2,Agia Varvara,Fish Taverna
5,Aigina,Greek Restaurant
10,Chaidari,Greek Restaurant
13,Dafni,Meze Restaurant
17,Galatas,Greek Restaurant
18,Galatsi,Greek Restaurant
19,Gerakas,Grilled Meat Restaurant
21,Ilioupoli,Meze Restaurant
23,Kaisariani,Meze Restaurant
24,Kallithea,Souvlaki Shop


In [132]:
# Neighborhood Restaurants is the 3rd Most Common Venue
Output3[['Neighborhood','3rd Most Common Venue']]

Unnamed: 0,Neighborhood,3rd Most Common Venue
4,Aigaleo,Meze Restaurant
6,Anavyssos,Greek Restaurant
7,Argyroupoli,Souvlaki Shop
21,Ilioupoli,Kebab Restaurant
25,Kalyvia Thorikou,Souvlaki Shop
26,Kapandriti,Seafood Restaurant
29,Kitsi,Greek Restaurant
36,Mandra,Souvlaki Shop
42,Metamorfosi,Greek Restaurant
44,Nea Filadelfeia,Meze Restaurant


## Methodology <a name="methodology"></a>

In this project we will try to cluster all the locations with venues in london into 5 categories. We will limit our analysis to radious 500 and limi to 50 venues. 

The clusstering method we will apply is K-meens. We define k at 5. 

The final step we will present map of all such locations but also  clusters of those locations.

In [312]:
# set number of clusters
kclusters = 5

attika_grouped_clustering = attika_grouped_reindex.drop('Neighborhood', 1)

# run k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(attika_grouped_clustering)

# check cluster labels generated for each row in the dataframe
kmeans.labels_[0:10] 

array([1, 0, 0, 0, 1, 1, 1, 0, 1, 0])

In [316]:
# add clustering labels
#neighborhoods_venues_sorted.drop(['Cluster Labels'], axis=1, inplace=True)
neighborhoods_venues_sorted.insert(0, 'Cluster Labels', kmeans.labels_)

In [317]:
neighborhoods_venues_sorted.head()

Unnamed: 0,Cluster Labels,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,1,Acharnes,Café,Mobile Phone Shop,Supermarket,Creperie,Souvlaki Shop,Taverna,Seafood Restaurant,Plaza,Cosmetics Shop,Dance Studio
1,0,Agia Paraskevi,Bakery,Pharmacy,Pizza Place,Clothing Store,Café,Plaza,Baby Store,Cosmetics Shop,Coffee Shop,Furniture / Home Store
2,0,Agia Varvara,Grocery Store,Fish Taverna,Café,Optical Shop,Greek Restaurant,Restaurant,Bakery,Bus Stop,Fast Food Restaurant,Betting Shop
3,0,Agkistri,Hotel,Nightclub,Hotel Bar,Cocktail Bar,Harbor / Marina,Café,Greek Restaurant,Women's Store,Farmers Market,Food & Drink Shop
4,1,Aigaleo,Café,Bar,Meze Restaurant,Coffee Shop,Burger Joint,Souvlaki Shop,Mobile Phone Shop,Donut Shop,Bakery,Snack Place


In [318]:
# Cluster 1 shape and head
print(neighborhoods_venues_sorted.loc[neighborhoods_venues_sorted['Cluster Labels'] == 0].shape)
neighborhoods_venues_sorted.loc[neighborhoods_venues_sorted['Cluster Labels'] == 0]

(54, 12)


Unnamed: 0,Cluster Labels,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
1,0,Agia Paraskevi,Bakery,Pharmacy,Pizza Place,Clothing Store,Café,Plaza,Baby Store,Cosmetics Shop,Coffee Shop,Furniture / Home Store
2,0,Agia Varvara,Grocery Store,Fish Taverna,Café,Optical Shop,Greek Restaurant,Restaurant,Bakery,Bus Stop,Fast Food Restaurant,Betting Shop
3,0,Agkistri,Hotel,Nightclub,Hotel Bar,Cocktail Bar,Harbor / Marina,Café,Greek Restaurant,Women's Store,Farmers Market,Food & Drink Shop
7,0,Argyroupoli,Bar,Café,Souvlaki Shop,Meze Restaurant,Beer Bar,Greek Restaurant,Snack Place,Coffee Shop,Fast Food Restaurant,Italian Restaurant
9,0,Athens,Bar,Nightclub,Café,Cocktail Bar,Greek Restaurant,Movie Theater,Mediterranean Restaurant,Restaurant,Souvlaki Shop,Spa
11,0,Chalandri,Café,Cocktail Bar,Bakery,Bar,Souvlaki Shop,Dessert Shop,Meze Restaurant,Coffee Shop,Greek Restaurant,Italian Restaurant
12,0,Cholargos,Supermarket,Plaza,Coffee Shop,Burger Joint,Café,Meze Restaurant,Shopping Mall,Farmers Market,Salon / Barbershop,Movie Theater
13,0,Dafni,Greek Restaurant,Meze Restaurant,Café,Bar,Pharmacy,Clothing Store,Breakfast Spot,Taverna,Bakery,Basketball Stadium
14,0,Dionysos,Whisky Bar,Pedestrian Plaza,Women's Store,Event Space,Fried Chicken Joint,Forest,Food & Drink Shop,Food,Flower Shop,Fish Taverna
15,0,Elefsina,Cretan Restaurant,Gym / Fitness Center,Pool,Café,Meze Restaurant,Fast Food Restaurant,Greek Restaurant,Beach,Bar,Grocery Store


In [319]:
# Cluster 2 shape and head
print(neighborhoods_venues_sorted.loc[neighborhoods_venues_sorted['Cluster Labels'] == 1].shape)
neighborhoods_venues_sorted.loc[neighborhoods_venues_sorted['Cluster Labels'] == 1]

(23, 12)


Unnamed: 0,Cluster Labels,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,1,Acharnes,Café,Mobile Phone Shop,Supermarket,Creperie,Souvlaki Shop,Taverna,Seafood Restaurant,Plaza,Cosmetics Shop,Dance Studio
4,1,Aigaleo,Café,Bar,Meze Restaurant,Coffee Shop,Burger Joint,Souvlaki Shop,Mobile Phone Shop,Donut Shop,Bakery,Snack Place
5,1,Aigina,Café,Greek Restaurant,Ice Cream Shop,Fish Taverna,Dessert Shop,Seafood Restaurant,Kafenio,Hotel,Cocktail Bar,Nightclub
6,1,Anavyssos,Café,Supermarket,Greek Restaurant,Bakery,Cosmetics Shop,Souvlaki Shop,Convenience Store,Coffee Shop,Forest,Food & Drink Shop
8,1,Aspropyrgos,Café,Boutique,Coffee Shop,Shipping Store,Bakery,Mobile Phone Shop,Fast Food Restaurant,Kafenio,Dessert Shop,Souvlaki Shop
10,1,Chaidari,Café,Greek Restaurant,Dessert Shop,Grilled Meat Restaurant,Movie Theater,Motorcycle Shop,Mobile Phone Shop,Meze Restaurant,Supermarket,Fast Food Restaurant
17,1,Galatas,Café,Greek Restaurant,Boat or Ferry,Ice Cream Shop,Souvenir Shop,Taverna,Seafood Restaurant,Food & Drink Shop,Food,Flower Shop
18,1,Galatsi,Café,Greek Restaurant,Coffee Shop,Supermarket,Souvlaki Shop,Dessert Shop,Grilled Meat Restaurant,Bakery,Burger Joint,Breakfast Spot
22,1,Irakleio,Café,Coffee Shop,Martial Arts School,Bus Stop,Drugstore,Park,Gym / Fitness Center,Light Rail Station,Supermarket,Snack Place
26,1,Kapandriti,Plaza,Pizza Place,Seafood Restaurant,Café,Women's Store,Electronics Store,Food & Drink Shop,Food,Flower Shop,Fish Taverna


In [320]:
# Cluster 3 shape and head
print(neighborhoods_venues_sorted.loc[neighborhoods_venues_sorted['Cluster Labels'] == 2].shape)
neighborhoods_venues_sorted.loc[neighborhoods_venues_sorted['Cluster Labels'] == 2]

(1, 12)


Unnamed: 0,Cluster Labels,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
33,2,Lavrio,Shipping Store,Women's Store,Electronics Store,Fried Chicken Joint,Forest,Food & Drink Shop,Food,Flower Shop,Fish Taverna,Fish Market


In [321]:
# Cluster 4 shape and head
print(neighborhoods_venues_sorted.loc[neighborhoods_venues_sorted['Cluster Labels'] == 3].shape)
neighborhoods_venues_sorted.loc[neighborhoods_venues_sorted['Cluster Labels'] == 3]

(1, 12)


Unnamed: 0,Cluster Labels,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
65,3,Spetses,Mountain,Women's Store,Farmers Market,Frozen Yogurt Shop,Fried Chicken Joint,Forest,Food & Drink Shop,Food,Flower Shop,Fish Taverna


In [322]:
# Cluster 5 shape and head
print(neighborhoods_venues_sorted.loc[neighborhoods_venues_sorted['Cluster Labels'] == 4].shape)
neighborhoods_venues_sorted.loc[neighborhoods_venues_sorted['Cluster Labels'] == 4]

(1, 12)


Unnamed: 0,Cluster Labels,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
50,4,Oropos,Greek Restaurant,Women's Store,Farmers Market,Frozen Yogurt Shop,Fried Chicken Joint,Forest,Food & Drink Shop,Food,Flower Shop,Fish Taverna


### Finally, let's visualize the resulting clusters

In [326]:
# merge manhattan_grouped with manhattan_data to add latitude/longitude for each neighborhood
attika_merged = df_attika

attika_merged = attika_merged.join(neighborhoods_venues_sorted.set_index('Neighborhood'), on='Neighborhood')

attika_merged.head() 

Unnamed: 0,Neighborhood,Latitude,Longitude,Region,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Athens,37.9794,23.7161,Attiki,0.0,Bar,Nightclub,Café,Cocktail Bar,Greek Restaurant,Movie Theater,Mediterranean Restaurant,Restaurant,Souvlaki Shop,Spa
1,Piraeus,37.95,23.7,Attiki,0.0,Café,Souvlaki Shop,Dairy Store,Grocery Store,Gym,Supermarket,Dessert Shop,Hookah Bar,Convenience Store,Coffee Shop
6,Peristeri,38.0167,23.6833,Attiki,0.0,Bus Stop,Souvlaki Shop,Supermarket,Greek Restaurant,Tennis Court,Gym,Taverna,Park,Café,Volleyball Court
7,Kallithea,37.95,23.7,Attiki,0.0,Café,Souvlaki Shop,Dairy Store,Grocery Store,Gym,Supermarket,Dessert Shop,Hookah Bar,Convenience Store,Coffee Shop
8,Nikaia,37.9667,23.6333,Attiki,1.0,Café,Park,Playground,Fruit & Vegetable Store,Coffee Shop,Snack Place,Theater,Gym,Grilled Meat Restaurant,Food


In [474]:
# create  merged files fore the fisrt two clusters (cluster 3,4,5 consists smal nument of n Neighborhoods)
attika_merged_1  = attika_merged[attika_merged['Cluster Labels'] == 0]
attika_merged_2  = attika_merged[attika_merged['Cluster Labels'] == 1]

* Map Cluster 1

In [470]:
# create map
map_clusters = folium.Map(location=[latitude, longitude], zoom_start=11)

# add markers to the map
for lat, lon, poi, cluster in zip(attika_merged_1['Latitude'], attika_merged_1['Longitude'], attika_merged_1['Neighborhood'], attika_merged_1['Cluster Labels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7).add_to(map_clusters)       
        
map_clusters

* Map Cluster 2

In [472]:
# create map
map_clusters = folium.Map(location=[latitude, longitude], zoom_start=11)

# add markers to the map
for lat, lon, poi, cluster in zip(attika_merged_2['Latitude'], attika_merged_2['Longitude'], attika_merged_2['Neighborhood'], attika_merged_2['Cluster Labels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color='red',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7).add_to(map_clusters)       
        
map_clusters

## Results and Discussion <a name="results"></a>

Through our analysis we are in position to answer important questions about venues frequency in each location in Attica Greece. We found that the restaurant, Taverna and Souvlaki are very common venues in most of our sample sub regions. Some potential good candidates to open a restaurant should be the island of ydra and the sub regions Nea Filadelfeia and Argyroupoli, but probably will not be the best choice as the competition is very high in that locations. In order to find the best location to open a new restaurant we should take into account the Neighborhoods that restaurants are the most frequently venues such as Melissia, Elefsina, Zografos as restaurants is the first most common venue.

Result of all locations in our sample this is around 80 sub regions is very large, so we now can have 5 clusters with approximately some characteristics. We found that the first two clusters consist of many Neighborhoods. While the clusters 3, 4 and 5 include only one sub region as the location is island.

There is place for future development of this project, as we can add more variables such as economic development, demographic data and financial data about the potential customer in each sub region. Moreover, we can run the same analysis for more locations versus for each Neighborhood.

Noted that we chose to analyze sub regions of Attica and not the whole county of Greece, in order to deal with the limit of foursquare API calls volume. 

## Conclusion <a name="conclusion"></a>

Identify locations in which restaurants is the most frequently venue is a very important information for many stakeholders. It is not an easy problem as many factors can impact our decision. In this project we conclude which neighborhoods appear to be more favorable to open a new restaurant. Finally throughout the clustering algorithm we applied we can categorize all sub regions in Attica into 5 categories based of the most common venues. The clustering analysis is very useful as a franchise restaurant company can utilize these categories in order to open restaurants in all clusters. 