# <center>APPLIED DATA SCIENCE CAPSTONE</center>
https://www.coursera.org/learn/applied-data-science-capstone
# ASSIGNMENT: THE BATTLE OF NEIGHBORHOODS
** **
## **Starting a new cinema in Athens, Greece**

### **Dimitra Dionysiou**
** **

### Finding the best locations for starting an indoor cinema in Athens, Greece, based on a number of predefined requirements, related to a) the scale and number of competitor cinemas in an area, b) the existence of nearby leisure facilities and c) the availability of transportation options. 

### A candidate location is defined as a cyclic area, with a radius of 250m, centered on an already existing cinema. 

In [3]:
from bs4 import BeautifulSoup
import requests
import pandas as pd
import csv
import numpy as np

**Getting the list of cinemas in Athens from the web page of the local cinema guide 'athinorama'**

In [2]:
source = requests.get('https://www.athinorama.gr/cinema/guide.aspx?show=1&seltab=1&sec=2').text

**Cinema names**

In [3]:
soup = BeautifulSoup(source, 'lxml')

cinemas = soup.find_all('h2', class_="placename")
#print(cinemas)

#get the results in a list
my_cinemas=[]
for c in cinemas:
    my_cinemas.append(c.text)

In [4]:
#store cinema names in a list
cinema_names=[]
for c in my_cinemas:
    cinema_names.append(c.split('\n')[0])
#cinema_names

**Cinema addresses**

In [5]:
#store cinema addresses in a list
cinema_addresses=[]
for c in my_cinemas:
    cinema_addresses.append(c.split('\n')[2])
#cinema_addresses

**Create a dataframe for the cinemas**

In [6]:
data = pd.DataFrame({'cinema_name': cinema_names, 'cinema_address': cinema_addresses}, columns=['cinema_name', 'cinema_address'])
data.head()

Unnamed: 0,cinema_name,cinema_address
0,Ααβόρα,"Ιπποκράτους 180 (ΜΕΤΡΟ Αμπελόκηποι), Νεάπολη"
1,Άστορ,Σταδίου 28 (είσοδος από στοά Κοραή) (ΜΕΤΡΟ Παν...
2,Άστυ,Κοραή 4 (ΜΕΤΡΟ Πανεπιστήμιο)
3,Έλλη,Ακαδημίας 64 (ΜΕΤΡΟ Πανεπιστήμιο)
4,Έμπασσυ Novacinema Odeon,"Πατρ. Ιωακείμ 5 & Ηροδότου, Κολωνάκι (ΜΕΤΡΟ Ευ..."


In [7]:
len(data)

58

**Drop large multiplexes with more than 4 screens as well as two cinemas that are located very far away from the center of Athens**

In [8]:
data.drop(data.index[[21,23,31,33,35,38,41,45,47,48,50,55,56,57]], inplace=True)

In [9]:
data.reset_index(drop=True,inplace=True)

In [10]:
data.head(10)

Unnamed: 0,cinema_name,cinema_address
0,Ααβόρα,"Ιπποκράτους 180 (ΜΕΤΡΟ Αμπελόκηποι), Νεάπολη"
1,Άστορ,Σταδίου 28 (είσοδος από στοά Κοραή) (ΜΕΤΡΟ Παν...
2,Άστυ,Κοραή 4 (ΜΕΤΡΟ Πανεπιστήμιο)
3,Έλλη,Ακαδημίας 64 (ΜΕΤΡΟ Πανεπιστήμιο)
4,Έμπασσυ Novacinema Odeon,"Πατρ. Ιωακείμ 5 & Ηροδότου, Κολωνάκι (ΜΕΤΡΟ Ευ..."
5,Ιντεάλ,Πανεπιστημίου 46 (στάση μετρό Πανεπιστήμιο)
6,Odeon Όπερα,Ακαδημίας 57
7,Ταινιοθήκη της Ελλάδος,"Ιερά Οδός 48 & Μεγ.Αλεξάνδρου 134-136, (ΜΕΤΡΟ ..."
8,Αθήναιον,Β. Σοφίας 124 (ΜΕΤΡΟ Αμπελόκηποι)
9,Ανδόρα,"Σεβαστουπόλεως 117, Ερ. Σταυρός (ΜΕΤΡΟ Πανόρμου)"


**Store the dataframe in a csv file in order to manually process the greek text of the designated addresses since for some of them Nominatim returned no latitude-longitude info**

In [11]:
data.to_csv('athens_cinemas2.csv', index=False)

**Load the processed cinema info in a new dataframe**

In [12]:
cinemas2 = pd.ExcelFile('cinemas.xlsx')
ath_cinemas = cinemas2.parse('Sheet1')
ath_cinemas.drop('Unnamed: 0', axis=1, inplace=True)
ath_cinemas.head()

Unnamed: 0,cinema name,cinema address
0,Ααβόρα,Ααβόρα Ιπποκράτους 180 Νεάπολη
1,Άστορ,Άστορ Σταδίου 28
2,Άστυ,Asti Korai 4
3,Έλλη,Έλλη Ακαδημίας 64
4,Έμπασσυ Novacinema Odeon,Πατριάρχου Ιωακείμ 5 κολωνάκι


In [13]:
len(ath_cinemas)

44

**So at this stage the  number of candidate locations centered around each cinema is 44**

In [14]:
#rename some columns to facilitate further processing
ath_cinemas.rename(columns={'cinema name': 'cinema_name', 'cinema address': 'cinema_address'}, inplace=True)
#ath_cinemas

**Get latitude longitude values for each cinema and add them to the dataframe**

In [151]:
from geopy.geocoders import Nominatim 
geolocator = Nominatim(user_agent="*********@gmail.com")   

longitudes = np.empty(len(ath_cinemas), dtype=float)
latitudes  = np.empty(len(ath_cinemas), dtype=float)

for i in range(0,len(ath_cinemas),1):
    
    cinema_address = ath_cinemas['cinema_address'][i]
    cinema_location = geolocator.geocode(cinema_address)
    
    latitudes[i] = cinema_location.latitude
    longitudes[i] = cinema_location.longitude
    
ath_cinemas['latitude'] = latitudes
ath_cinemas['longitude'] = longitudes            


In [154]:
ath_cinemas.head(10)

Unnamed: 0,cinema_name,cinema_address,latitude,longitude
0,Ααβόρα,Ααβόρα Ιπποκράτους 180 Νεάπολη,37.988,23.746283
1,Άστορ,Άστορ Σταδίου 28,37.979521,23.732192
2,Άστυ,Asti Korai 4,37.979778,23.732302
3,Έλλη,Έλλη Ακαδημίας 64,37.982776,23.733563
4,Έμπασσυ Novacinema Odeon,Πατριάρχου Ιωακείμ 5 κολωνάκι,37.977705,23.74204
5,Ιντεάλ,"Ideal, 46, Ελευθερίου Βενιζέλου, Exarcheia",37.982464,23.731459
6,Odeon Όπερα,akadimias 57,37.982289,23.733577
7,Ταινιοθήκη της Ελλάδος,"48, iera odos, gkazi",37.980923,23.712603
8,Αθήναιον,"αθήναιον, 124, Βασιλίσσης Σοφίας",37.985962,23.761325
9,Ανδόρα,Σεβαστουπόλεως 117,37.995862,23.769469


In [155]:
ath_cinemas.to_csv('candidate_cinemas.csv', index=False)   #44 candidate cinemas

In [15]:
ath_cinemas = pd.read_csv('candidate_cinemas.csv')   #when starting work from the 44 initial cinemas

**Define Foursquare credentials**

In [1]:
# sensitive code containing Foursquare credentials was removed for sharing

**Check how many cinemas are within each candidate area**

In [16]:
categoryId = '4bf58dd8d48988d17f941735'  #MOVIE THEATER
check=[]
LIMIT=50

for j in range(0, len(ath_cinemas), 1):
    lat= ath_cinemas.latitude[j]
    lng= ath_cinemas.longitude[j]
            
    #url for movie theatres
    url = 'https://api.foursquare.com/v2/venues/search?client_id={}&client_secret={}&ll={},{}&categoryId={}&v={}&radius={}&limit={}'.format(
    CLIENT_ID, 
    CLIENT_SECRET,
    lat,
    lng,
    categoryId,
    VERSION, 
    500, #SPECIFICALLY FOR MOVIE THEATERS 
    LIMIT)
        
    results0 = requests.get(url).json() #movie theaters
        
    check.append(ath_cinemas.cinema_name[j])
    for i in range(0,len(results0['response']['venues']),1):
        check.append(results0['response']['venues'][i]['name'])
        
    check.append(len(results0['response']['venues']))
  

**The results needed a bit of manual processing in order to derive the final number of neighboring cinemas, since Foursquare sometimes returns the same venue multiple times with slightly different name. After inspecting the results, the number of cinemas in each candidate area is corrected as follows:**

In [17]:
movie_theaters=[]
movie_theaters=[3,7,7,10,2,10,9,1,5,2,5,2,5,0,0,0,1,2,1,3,1,2,3,3,0,2,4,4,2,1,2,1,0,1,3,1,0,1,1,0,0,0,0,0]

**Exclude from the subsequent analysis all candidate areas with more than 5 additional cinemas**

In [18]:
indices=[index for index,value in enumerate(movie_theaters) if value > 5]
#indices holds the indices of the cinemas to be excluded
ath_cinemas.drop(ath_cinemas.index[indices], inplace=True)
ath_cinemas.reset_index(drop=True, inplace=True)
ath_cinemas.head(10)

Unnamed: 0,cinema_name,cinema_address,latitude,longitude
0,Ααβόρα,Ααβόρα Ιπποκράτους 180 Νεάπολη,37.988,23.746283
1,Έμπασσυ Novacinema Odeon,Πατριάρχου Ιωακείμ 5 κολωνάκι,37.977705,23.74204
2,Ταινιοθήκη της Ελλάδος,"48, iera odos, gkazi",37.980923,23.712603
3,Αθήναιον,"αθήναιον, 124, Βασιλίσσης Σοφίας",37.985962,23.761325
4,Ανδόρα,Σεβαστουπόλεως 117,37.995862,23.769469
5,Γαλαξίας,"Γαλαξίας, Μεσογείων 6",37.985015,23.761063
6,Δαναός,δαναός 109 κηφισίας,37.993468,23.766729
7,Νιρβάνα 1 & 2 Cinemax,νιρβάνα αλεξάνδρας 192,37.986809,23.75829
8,Αβάνα,Αβάνα Λυκούργου 3,38.014793,23.785307
9,Athena,αθηνά σολωμού 18 χαλάνδρι,38.023326,23.794781


In [19]:
len(ath_cinemas)

39

**So the final number of candidate areas, each one centered on an already existing cinema, is 39**

In [20]:
ath_cinemas.to_csv('39_candidate_cinemas.csv', index=False)   #39 candidate cinemas

In [4]:
ath_cinemas = pd.read_csv('39_candidate_cinemas.csv')   #when starting work from the 39  cinemas

**Create a folium map centered around Athens**

In [5]:
#world map centered aroung athens
import folium
ath_map = folium.Map(location=[37.98, 23.73], zoom_start=10)
ath_map

**Display the candidate cinemas on the map**

In [6]:
# instantiate a feature group for the cinemas 
cin = folium.map.FeatureGroup()

# loop through the 100 crimes and add each to the incidents feature group
for lat, lng, in zip(ath_cinemas.latitude, ath_cinemas.longitude):
#for lat, lng, in zip(reload.latitude, reload.longitude):
    cin.add_child(
        folium.features.CircleMarker(
            [lat, lng],
            radius=4, # define how big you want the circle markers to be
            color='yellow',
            fill=True,
            fill_color='blue',
            fill_opacity=0.6
        )
    )
     
ath_map.add_child(cin)

In [7]:
latitudes = list(ath_cinemas.latitude)
longitudes = list(ath_cinemas.longitude)
labels = list(ath_cinemas.cinema_name)

for lat, lng, label in zip(latitudes, longitudes, labels):
    folium.Marker([lat, lng], popup=label).add_to(ath_map)    
    
ath_map

**Define a function to use Foursquare location data to get nearby venues for each cinema in terms of:
    restaurants, nightlife spots, shopping malls, metro stations, bus stops, parking places**

In [26]:
import requests # library to handle requests

radius = 250
LIMIT = 50    #50 is currently the maximum possible number of venues returned by Foursqaure when using the search query 

categoryId1 = '4d4b7105d754a06374d81259'  #food
categoryId2 = '4d4b7105d754a06376d81259'  #nightlife spot
categoryId3 = '4bf58dd8d48988d1fd931735'  #metro station
categoryId4 = '52f2ab2ebcbc57f1066b8b4f'  #bus stop
categoryId5 = '4c38df4de52ce0d596b336e1'  #parking
categoryId6 = '4bf58dd8d48988d1fd941735'  #shopping mall
   
def getNearbyVenues(names, latitudes, longitudes):  #tha klithei me tis geitonies tis athinas apo to dataframe test
    
    #movie_theaters=[]  
    restaurants_list=[]
    night=[]
    stations_list=[]
    bus_list=[]
    parking_list=[]
    shopping_mals_list=[]
    
    
    for name, lat, lng in zip(names, latitudes, longitudes):  #gia kathe geitonia
        print(name)
                  
        #url for restaurants etc.
        url1 = 'https://api.foursquare.com/v2/venues/search?client_id={}&client_secret={}&ll={},{}&categoryId={}&v={}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET,
            lat,
            lng,
            categoryId1,
            VERSION, 
            radius, 
            LIMIT)
        
        #url for nightlife spots
        url2 = 'https://api.foursquare.com/v2/venues/search?client_id={}&client_secret={}&ll={},{}&categoryId={}&v={}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET,
            lat,
            lng,
            categoryId2,
            VERSION, 
            radius, 
            LIMIT)   
        
        #url for metro stations
        url3 = 'https://api.foursquare.com/v2/venues/search?client_id={}&client_secret={}&ll={},{}&categoryId={}&v={}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET,
            lat,
            lng,
            categoryId3,
            VERSION, 
            radius, 
            LIMIT)  
        
        #url for bus stops
        url4 = 'https://api.foursquare.com/v2/venues/search?client_id={}&client_secret={}&ll={},{}&categoryId={}&v={}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET,
            lat,
            lng,
            categoryId4,
            VERSION, 
            radius, 
            LIMIT)
        
        #url for private parking places
        url5 = 'https://api.foursquare.com/v2/venues/search?client_id={}&client_secret={}&ll={},{}&categoryId={}&v={}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET,
            lat,
            lng,
            categoryId5,
            VERSION, 
            radius, 
            LIMIT)
        
        #url for shopping malls
        url6 = 'https://api.foursquare.com/v2/venues/search?client_id={}&client_secret={}&ll={},{}&categoryId={}&v={}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET,
            lat,
            lng,
            categoryId6,
            VERSION, 
            radius, 
            LIMIT)
        
               
        # make the GET requests
               
        results1 = requests.get(url1).json()
        results2 = requests.get(url2).json()
        results3 = requests.get(url3).json()
        results4 = requests.get(url4).json()
        results5 = requests.get(url5).json()
        results6 = requests.get(url6).json()
                  
        restaurants_list.append(len(results1['response']['venues']))
        night.append(len(results2['response']['venues']))
        stations_list.append(len(results3['response']['venues']))
        bus_list.append(len(results4['response']['venues']))
        parking_list.append(len(results5['response']['venues']))
        shopping_mals_list.append(len(results6['response']['venues']))
          
    return(restaurants_list, night, stations_list, bus_list, parking_list, shopping_mals_list)


**Retrieve the location data for the candidate cinemas**

In [29]:
restaurants, night_spots, stations, bus_stops, parkings, shopping_malls = getNearbyVenues(names=ath_cinemas['cinema_name'],
                                     latitudes=ath_cinemas['latitude'],
                                     longitudes=ath_cinemas['longitude']
                                    )

Ααβόρα
Έμπασσυ Novacinema Odeon
Ταινιοθήκη της Ελλάδος
Αθήναιον
Ανδόρα
Γαλαξίας
Δαναός
Νιρβάνα 1 & 2 Cinemax
Αβάνα
Athena
Αίγλη 3D Digital
Σινέ Χολαργός
Διάνα
Κηφισιά Cinemax
Κηφισιά Cinemax 3
Novacinema Odeon Μαρούσι
Τρία Αστέρια 3D Digital
Αλεξάνδρα Europa Cinemas Digital
Ίλιον Cinema & Stage
Όσκαρ Digital
Studio new star art cinema
Τριανόν
Πάλας
Πτι-Παλαι
Ατλαντίς Classic Cinemas
Σοφία HD DIGITAL
Αλεξάνδρα Digital Cinema
Μικρόκοσμος
Cinerama Digital cinema
Σπόρτιγκ Digital Cinema
Άνοιξη Digital Cinema (δημ. Κιν/φος) 2+1
Λάμπρος Κωνσταντάρας - Ρένα Βλαχοπούλου
Φοίβος Digital Cinema
Μαρία Έλενα-Όναρ Digital Cinema (Δημ. Κιν/φος)
Novacinema Odeon Γλυφάδα
Δημ. Κιν. Όνειρο Ρέντη
Δημ. Κιν. Σινεάκ
Ζέα Digital Cinema
Cine Παράδεισος 2+1 (Δημ. Κιν/φος)


**Add the number of retrieved venues for each category in the dataframe**

In [31]:
ath_cinemas['restaurants']=restaurants
ath_cinemas['nightlife_spots']=night_spots
ath_cinemas['shopping_malls']=shopping_malls
ath_cinemas['metro_stations']=stations
ath_cinemas['bus_stops']=bus_stops
ath_cinemas['parkings']=parkings

In [34]:
ath_cinemas.head(10)

Unnamed: 0,cinema_name,cinema_address,latitude,longitude,restaurants,nightlife_spots,shopping_malls,metro_stations,bus_stops,parkings
0,Ααβόρα,Ααβόρα Ιπποκράτους 180 Νεάπολη,37.988,23.746283,39,9,0,0,4,0
1,Έμπασσυ Novacinema Odeon,Πατριάρχου Ιωακείμ 5 κολωνάκι,37.977705,23.74204,50,50,1,0,2,4
2,Ταινιοθήκη της Ελλάδος,"48, iera odos, gkazi",37.980923,23.712603,50,50,0,1,0,1
3,Αθήναιον,"αθήναιον, 124, Βασιλίσσης Σοφίας",37.985962,23.761325,50,12,0,0,3,9
4,Ανδόρα,Σεβαστουπόλεως 117,37.995862,23.769469,49,6,1,0,2,3
5,Γαλαξίας,"Γαλαξίας, Μεσογείων 6",37.985015,23.761063,50,8,0,0,4,9
6,Δαναός,δαναός 109 κηφισίας,37.993468,23.766729,50,45,1,1,5,4
7,Νιρβάνα 1 & 2 Cinemax,νιρβάνα αλεξάνδρας 192,37.986809,23.75829,50,21,0,1,4,4
8,Αβάνα,Αβάνα Λυκούργου 3,38.014793,23.785307,16,3,0,0,1,0
9,Athena,αθηνά σολωμού 18 χαλάνδρι,38.023326,23.794781,43,13,0,0,3,2


In [35]:
ath_cinemas.to_csv('final_dataframe.csv', index=False)

In [8]:
reload=pd.read_csv('final_dataframe.csv')
#reload.head()

**Create a new feature, named 'transport', by adding the numbers of metro stations, bus stops and private parking areas for each cinema**

In [9]:
reload['transport'] = reload['metro_stations'] + reload['bus_stops'] + reload['parkings']
columns = ['metro_stations', 'bus_stops', 'parkings']
reload_new = reload.drop(columns, axis=1)
reload_new.head(5)

Unnamed: 0,cinema_name,cinema_address,latitude,longitude,restaurants,nightlife_spots,shopping_malls,transport
0,Ααβόρα,Ααβόρα Ιπποκράτους 180 Νεάπολη,37.988,23.746283,39,9,0,4
1,Έμπασσυ Novacinema Odeon,Πατριάρχου Ιωακείμ 5 κολωνάκι,37.977705,23.74204,50,50,1,6
2,Ταινιοθήκη της Ελλάδος,"48, iera odos, gkazi",37.980923,23.712603,50,50,0,2
3,Αθήναιον,"αθήναιον, 124, Βασιλίσσης Σοφίας",37.985962,23.761325,50,12,0,12
4,Ανδόρα,Σεβαστουπόλεως 117,37.995862,23.769469,49,6,1,5


** **
**RECOMMENDATION AT THE CLUSTER LEVEL**

**Normalize the features of interest in order to use k-means clustering**

In [10]:
from sklearn.preprocessing import StandardScaler

from sklearn.exceptions import DataConversionWarning
warnings.filterwarnings(action='ignore', category=DataConversionWarning)

X = reload_new.values[:,4:]
X = np.nan_to_num(X) #Returns an array or scalar replacing Not a Number (NaN) with zero, (positive) infinity with a very large number and negative infinity with a very small (or negative) number.
cluster_dataset = StandardScaler().fit_transform(X)


**Use k-means clustering in order to detect similar cinemas (candidate locations) in terms of the studied features in their vicinity**

In [11]:
# import k-means from clustering stage
from sklearn.cluster import KMeans
# set number of clusters
kclusters = 3

#cinemas_clustering = ath_cinemas[['neighborhood','restaurants','metro_stations','parkings','bus_stops','taxi_stands','night_spots']]
#cinemas_clustering = reload_new[['restaurants','night_spots','transport']]

# run k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(cluster_dataset)


In [12]:
# add clustering labels
reload_new.insert(0, 'Cluster Labels', kmeans.labels_)

reload_new.head()

Unnamed: 0,Cluster Labels,cinema_name,cinema_address,latitude,longitude,restaurants,nightlife_spots,shopping_malls,transport
0,0,Ααβόρα,Ααβόρα Ιπποκράτους 180 Νεάπολη,37.988,23.746283,39,9,0,4
1,1,Έμπασσυ Novacinema Odeon,Πατριάρχου Ιωακείμ 5 κολωνάκι,37.977705,23.74204,50,50,1,6
2,1,Ταινιοθήκη της Ελλάδος,"48, iera odos, gkazi",37.980923,23.712603,50,50,0,2
3,0,Αθήναιον,"αθήναιον, 124, Βασιλίσσης Σοφίας",37.985962,23.761325,50,12,0,12
4,0,Ανδόρα,Σεβαστουπόλεως 117,37.995862,23.769469,49,6,1,5


**Creating a Folium map to visualize the clusters**

In [13]:
import matplotlib.cm as cm
import matplotlib.colors as colors
# create map
map_clusters = folium.Map(location=[37.98, 23.73], zoom_start=11)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i + x + (i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(reload_new['latitude'], reload_new['longitude'], reload_new['cinema_name'], reload_new['Cluster Labels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        #fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(map_clusters)
       
map_clusters

### Red: Cluster 0, Purple: Cluster 1, Light Green: Cluster 2

## **Inspecting the clusters and providing recommendations**

In [42]:
cluster0=reload_new.loc[reload_new['Cluster Labels'] == 0, reload_new.columns[[0] +[1]+list(range(5, reload_new.shape[1]))]]
cluster0

Unnamed: 0,Cluster Labels,cinema_name,restaurants,nightlife_spots,shopping_malls,transport
0,0,Ααβόρα,39,9,0,4
3,0,Αθήναιον,50,12,0,12
4,0,Ανδόρα,49,6,1,5
5,0,Γαλαξίας,50,8,0,13
7,0,Νιρβάνα 1 & 2 Cinemax,50,21,0,9
9,0,Athena,43,13,0,5
12,0,Διάνα,50,21,0,5
17,0,Αλεξάνδρα Europa Cinemas Digital,38,11,1,4
19,0,Όσκαρ Digital,37,5,0,4
21,0,Τριανόν,40,10,0,5


**Computing mean values of the considered features for cluster 0**

In [43]:
cluster0_rest_mean=round(cluster0['restaurants'].mean(),2)
cluster0_night_mean=round(cluster0['nightlife_spots'].mean(),2)
cluster0_malls_mean=round(cluster0['shopping_malls'].mean(),2)
cluster0_transp_mean=round(cluster0['transport'].mean(),2)

#print(round(cluster0['restaurants'].mean(),2))
#print(round(cluster0['nightlife_spots'].mean(),2))
#print(round(cluster0['shopping_malls'].mean(),2))
#print(round(cluster0['transport'].mean(),2))

In [44]:
cluster0.to_csv('cluster0.csv', index=False)

In [45]:
cluster1=reload_new.loc[reload_new['Cluster Labels'] == 1, reload_new.columns[[0] +[1]+list(range(5, reload_new.shape[1]))]]
cluster1

Unnamed: 0,Cluster Labels,cinema_name,restaurants,nightlife_spots,shopping_malls,transport
1,1,Έμπασσυ Novacinema Odeon,50,50,1,6
2,1,Ταινιοθήκη της Ελλάδος,50,50,0,2
6,1,Δαναός,50,45,1,10
11,1,Σινέ Χολαργός,41,11,2,5
14,1,Κηφισιά Cinemax 3,50,24,3,8
22,1,Πάλας,50,34,1,1
27,1,Μικρόκοσμος,49,35,0,6
29,1,Σπόρτιγκ Digital Cinema,49,27,1,2
32,1,Φοίβος Digital Cinema,50,43,0,1
36,1,Δημ. Κιν. Σινεάκ,50,33,3,9


**Computing mean values of the considered features for cluster 1**

In [46]:
cluster1_rest_mean=round(cluster1['restaurants'].mean(),2)
cluster1_night_mean=round(cluster1['nightlife_spots'].mean(),2)
cluster1_malls_mean=round(cluster1['shopping_malls'].mean(),2)
cluster1_transp_mean=round(cluster1['transport'].mean(),2)

#print(round(cluster1['restaurants'].mean(),2))
#print(round(cluster1['night_spots'].mean(),2))
#print(round(cluster1['shopping_malls'].mean(),2))
#print(round(cluster1['transport'].mean(),2))

In [47]:
cluster1.to_csv('cluster1.csv', index=False)

In [48]:
cluster2=reload_new.loc[reload_new['Cluster Labels'] == 2, reload_new.columns[[0] +[1]+list(range(5, reload_new.shape[1]))]]
cluster2

Unnamed: 0,Cluster Labels,cinema_name,restaurants,nightlife_spots,shopping_malls,transport
8,2,Αβάνα,16,3,0,1
10,2,Αίγλη 3D Digital,15,5,0,1
13,2,Κηφισιά Cinemax,17,5,1,2
15,2,Novacinema Odeon Μαρούσι,5,4,0,2
16,2,Τρία Αστέρια 3D Digital,29,4,0,3
18,2,Ίλιον Cinema & Stage,25,8,0,0
20,2,Studio new star art cinema,21,4,0,4
26,2,Αλεξάνδρα Digital Cinema,23,5,0,0
30,2,Άνοιξη Digital Cinema (δημ. Κιν/φος) 2+1,13,2,0,0
31,2,Λάμπρος Κωνσταντάρας - Ρένα Βλαχοπούλου,14,1,0,1


**Computing mean values of the considered features for cluster 2**

In [49]:
cluster2_rest_mean=round(cluster2['restaurants'].mean(),2)
cluster2_night_mean=round(cluster2['nightlife_spots'].mean(),2)
cluster2_malls_mean=round(cluster2['shopping_malls'].mean(),2)
cluster2_transp_mean=round(cluster2['transport'].mean(),2)

#print(round(cluster2['restaurants'].mean(),2))
#print(round(cluster2['night_spots'].mean(),2))
#print(round(cluster2['shopping_malls'].mean(),2))
#print(round(cluster2['transport'].mean(),2))

In [50]:
cluster2.to_csv('cluster2.csv', index=False)

**Inspecting the mean values of the three clusters**

In [51]:
clustering_result = pd.DataFrame()

data=[ ["Cluster 0", cluster0_rest_mean, cluster0_night_mean, cluster0_malls_mean, cluster0_transp_mean],\
      ["Cluster 1", cluster1_rest_mean, cluster1_night_mean, cluster1_malls_mean, cluster1_transp_mean],\
      ["Cluster 2", cluster2_rest_mean, cluster2_night_mean, cluster2_malls_mean, cluster2_transp_mean] ]

clustering_result = pd.DataFrame(data,columns=['Cluster','Mean no of restaur','Mean no of night_spots','Mean no of shopp_malls', 'Mean no of transport_options'])
clustering_result.set_index('Cluster', inplace=True)
clustering_result

Unnamed: 0_level_0,Mean no of restaur,Mean no of night_spots,Mean no of shopp_malls,Mean no of transport_options
Cluster,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
Cluster 0,45.44,12.19,0.12,5.38
Cluster 1,48.9,35.2,1.2,5.0
Cluster 2,16.46,3.38,0.08,1.08


**Recommendation at the cluster level:**

**Based on the inspection of the clusters it can be derived that cinemas of clusters 1 are generally preferable, because they are characterized by comparatively larger mean values of restaurants, nightlife spots, and shopping malls, while transport options are good (slightly less than in cluster 0).**

**So based on the k-means cluster analysis we would recommend starting a new cinema in the vicinity (250m-radius cyclic areas) of any of the cinemas included in cluster 1 (recommendation at the cluster level).**

** **
**RECOMMENDATION AT THE SINGLE CINEMA LEVEL**

**If our client requests a recommendation at the cinema (location) level, then we can proceed as follows to perform a basic statistical processing on the best cluster: Normalize the values of interest in Cluster 1 (restaurants, nightlife spots, shopping malls and transportation options), and compute scores which emphasize different aspects of the performance of candidate location.**

In [52]:
#normalize values in Cluster 1

max_rest = cluster1['restaurants'].max()              #50
max_night = cluster1['nightlife_spots'].max()             #49
max_transport = cluster1['transport'].max()           #16
max_shopping_malls = cluster1['shopping_malls'].max() #7

#get normalized values for restaurants, night_spots and transport
cluster1['restaurants_normal'] = round(cluster1['restaurants']/max_rest,2)
cluster1['nightlife_spots_normal'] = round(cluster1['nightlife_spots']/max_night,2)
cluster1['transport_normal'] = round(cluster1['transport']/max_transport,2)
cluster1['shopping_malls_normal'] = round(cluster1['shopping_malls']/max_shopping_malls,2)

In [53]:
cluster1[['Cluster Labels','cinema_name','restaurants_normal','nightlife_spots_normal','shopping_malls_normal','transport_normal']]

Unnamed: 0,Cluster Labels,cinema_name,restaurants_normal,nightlife_spots_normal,shopping_malls_normal,transport_normal
1,1,Έμπασσυ Novacinema Odeon,1.0,1.0,0.33,0.6
2,1,Ταινιοθήκη της Ελλάδος,1.0,1.0,0.0,0.2
6,1,Δαναός,1.0,0.9,0.33,1.0
11,1,Σινέ Χολαργός,0.82,0.22,0.67,0.5
14,1,Κηφισιά Cinemax 3,1.0,0.48,1.0,0.8
22,1,Πάλας,1.0,0.68,0.33,0.1
27,1,Μικρόκοσμος,0.98,0.7,0.0,0.6
29,1,Σπόρτιγκ Digital Cinema,0.98,0.54,0.33,0.2
32,1,Φοίβος Digital Cinema,1.0,0.86,0.0,0.1
36,1,Δημ. Κιν. Σινεάκ,1.0,0.66,1.0,0.9


In [54]:
#Score Neutral
#Restaurants, night spots, shopping malls and transportation options in the area aroung a cinema are of equal importance
cluster1['score_neutral']=round((cluster1['restaurants_normal'] + cluster1['nightlife_spots_normal'] + cluster1['transport_normal']+cluster1['shopping_malls_normal'])/4,2)

#Score Leisure Activities
# Leisure activity options are considered of greater importance than transportation options (the magnitute of weights can be adjusted)
cluster1['score_leisure']=round((0.3*cluster1['restaurants_normal'] + 0.3*cluster1['nightlife_spots_normal'] + 0.3*cluster1['shopping_malls_normal']+ 0.1*cluster1['transport_normal']),2)

#Score Transportation
# Transportation options are considered of greater importance than leisure activity options (the magnitute of weights can be adjusted)
cluster1['score_transport']=round((0.2*cluster1['restaurants_normal'] + 0.2*cluster1['nightlife_spots_normal'] + 0.2*cluster1['shopping_malls_normal']+  0.4*cluster1['transport_normal']),2)


Let's now inspect again the candidate cinemas:

In [55]:
cluster1[['Cluster Labels','cinema_name','score_neutral','score_leisure','score_transport']]

Unnamed: 0,Cluster Labels,cinema_name,score_neutral,score_leisure,score_transport
1,1,Έμπασσυ Novacinema Odeon,0.73,0.76,0.71
2,1,Ταινιοθήκη της Ελλάδος,0.55,0.62,0.48
6,1,Δαναός,0.81,0.77,0.85
11,1,Σινέ Χολαργός,0.55,0.56,0.54
14,1,Κηφισιά Cinemax 3,0.82,0.82,0.82
22,1,Πάλας,0.53,0.61,0.44
27,1,Μικρόκοσμος,0.57,0.56,0.58
29,1,Σπόρτιγκ Digital Cinema,0.51,0.57,0.45
32,1,Φοίβος Digital Cinema,0.49,0.57,0.41
36,1,Δημ. Κιν. Σινεάκ,0.89,0.89,0.89


In [56]:
print('Cinema with the best Score Neutral:', cluster1.cinema_name[cluster1['score_neutral'].idxmax()])

Cinema with the best Score Neutral: Δημ. Κιν. Σινεάκ


In [57]:
print('Cinema with the best Score Leisure:', cluster1.cinema_name[cluster1['score_leisure'].idxmax()])

Cinema with the best Score Leisure: Δημ. Κιν. Σινεάκ


In [58]:
print('Cinema with the best Score Transport:', cluster1.cinema_name[cluster1['score_transport'].idxmax()])

Cinema with the best Score Transport: Δημ. Κιν. Σινεάκ


In [59]:
#cluster1.to_csv('cluster1_modified.csv', index=False)

**Recommendation at the cinema (location) level:**

**Based on this statistical processing the best candidate location will be the area centered on the cinema with index 36 in our dataframe (greek name: ‘Δημοτικός Κινηματογράφος Σινεάκ’, latitude: 37.941869, longitude: 23.647198) which has the maximum value for all three computed scores defined in section 3.5 of the methodology.**