# Coursera Capstone project: identification of the best places for a café-bookshop in Geneva (Switzerland)

# Table of contents

* [0. Introduction: business Problem](#introduction)
* [1. Data](#data)
* [2. Coordinates transformation](#transformation)
* [3. Grid creation with Google Map data](#grid)
* [4. Integration of Foursquare and TPG data](#integration)
* [5. Calculation of best neighboroods](#bestNeighboroods)
* [6. Calculation of clusters of neighboroods](#clusters)
* [7. Conclusion](#conclusion)

## 0. Introduction: business problem <a name="introduction"></a>

With this project, we intend to support any investors looking for the best place in Geneva (Switzerland) to open a new concept of **Café-Bookshop**: a place which doesn't exist in this city yet and where the visitor can savor a hot chocolate as well as discover the newest published titles of fictions, thrillers or documentaries.

To this means, we need to find a neighborhood which meets the following criterias: i) in the city center, ii) with no bookshop nor iii) any cafés in the close vicinity and finally, iv) with a bus stop nearby.


## 1. Data <a name="data"></a>

In order to decide which location is the best, the following figures need to be produced:
* distance of the neighborhood from the city center
* number of existing cafés in the neighborhood
* number of existing bookshops in the neighborhood
* number of existing bus stops in the neighborhhod.


We decided to use regularly spaced grid of locations, centered around city center, to define our neighborhoods.


Following data sources will be needed to extract/generate the required information:<br>
1. number of cafés, bookshops and their location in every neighborhood:<br>
* will be obtained using **Foursquare API** (https://developer.foursquare.com/docs/places-api/)<br>
* the data are updated yearly and available through direct query<br>

2. number of bus stops and their location: <br>
* will by extracted from the **Geneva public transportation (TPG)** data (https://opendata.swiss/fr/dataset/__274)<br>
*the data are updated yearly and available as a CSV file<br>

3. coordinate of Geneva city center:<br>
* will be obtained using **Google Maps API geocoding** (https://developers.google.com/maps/documentation/geocoding/overview)<br>

4.  centers of candidate areas will be generated algorithmically and approximate addresses of centers of those areas:<br>
* will be obtained using **Google Maps API reverse geocoding** (https://developers.google.com/maps/documentation/geocoding/overview#ReverseGeocoding)<br>





## 2. Coordinates transformation<a name="transformation"></a>

In [None]:
import os
os.add_dll_directory("C:/ProgramData/Anaconda3/Lib/site-packages/shapely/DLLs/")
#!pip install shapely
import shapely
import geos


#!pip install pyproj
import pyproj
from pyproj import Transformer

import math

def lonlat_to_xy(lon, lat):
    
    transformer = Transformer.from_crs(  "epsg:4326", "+proj=utm +zone=33 +ellps=WGS84",
    always_xy=True,)
    x1, y1 = (lon, lat)   
    xy = transformer.transform(x1, y1)

    return xy[0], xy[1]

def xy_to_lonlat(x, y):
      
    transformer = Transformer.from_crs("+proj=utm +zone=33 +ellps=WGS84", "epsg:4326", 
                                       always_xy=True)
    x1, y1 = (x, y)   
    lonlat = transformer.transform(x1, y1)
    
    return lonlat[0], lonlat[1]

def calc_xy_distance(x1, y1, x2, y2):
    dx = x2 - x1
    dy = y2 - y1
    return math.sqrt(dx*dx + dy*dy)

print('Coordinate transformation check')
print('-------------------------------')
print('Geneve center longitude={}, latitude={}'.format(geneve_center[1], geneve_center[0]))
x, y = lonlat_to_xy(geneve_center[1], geneve_center[0])
print('Geneve center UTM X={}, Y={}'.format(x, y))
lo, la = xy_to_lonlat(x, y)
print('Geneve center longitude={}, latitude={}'.format(lo, la))

In [None]:
#Adapted from: https://github.com/ValentinMinder/Swisstopo-WGS84-LV03/blob/master/scripts/py/wgs84_ch1903.py

#CH1903 and WGS84 system
# Convert CH y/x to WGS long
def CHtoWGSlng(e, n):
        # Axiliary values (% Bern)
        y_aux = (e - 2600000) / 1000000
        x_aux = (n - 1200000) / 1000000
        lng = (2.6779094 + (4.728982 * y_aux) + \
                + (0.791484 * y_aux * x_aux) + \
                + (0.1306 * y_aux * pow(x_aux, 2))) + \
                - (0.0436 * pow(y_aux, 3))
        # Unit 10000" to 1" and convert seconds to degrees (dec)
        lng = (lng * 100) / 36
        return lng

# Convert CH y/x to WGS lat
def CHtoWGSlat(e, n):
        # Axiliary values (% Bern)
        y_aux = (e - 2600000) / 1000000
        x_aux = (n - 1200000) / 1000000
        lat = (16.9023892 + (3.238272 * x_aux)) + \
                - (0.270978 * pow(y_aux, 2)) + \
                - (0.002528 * pow(x_aux, 2)) + \
                - (0.0447 * pow(y_aux, 2) * x_aux) + \
                - (0.0140 * pow(x_aux, 3))
        # Unit 10000" to 1" and convert seconds to degrees (dec)
        lat = (lat * 100) / 36
        return lat


### 3. Grid creation with Google Map data<a name="grid"></a>

#### Geneva city center

We have chosen as city center a place we know for its central position in the city of Geneva: Plaine de Plainpalais, Genève, Switzerland: [46.1983927, 6.1405447]

In [None]:
import requests

google_api_key = NULL

def get_coordinates(api_key, address, verbose=False):
    try:
        url = 'https://maps.googleapis.com/maps/api/geocode/json?key={}&address={}'.format(api_key, address)
        response = requests.get(url).json()
        if verbose:
            print('Google Maps API JSON result =>', response)
        results = response['results']
        geographical_data = results[0]['geometry']['location'] # get geographical coordinates
        lat = geographical_data['lat']
        lon = geographical_data['lng']
        return [lat, lon]
    except:
        return [None, None]
    
address = 'Plaine de Plainpalais, Genève, Switzerland'
geneve_center = get_coordinates(google_api_key, address)
print('Coordinate of {}: {}'.format(address, geneve_center))

In [None]:
def get_address(api_key, latitude, longitude, verbose=False):
    try:
        url = 'https://maps.googleapis.com/maps/api/geocode/json?key={}&latlng={},{}'.format(api_key, latitude, longitude)
        response = requests.get(url).json()
        if verbose:
            print('Google Maps API JSON result =>', response)
        results = response['results']
        address = results[0]['formatted_address']
        return address
    except:
        return None

addr = get_address(google_api_key, geneve_center[0], geneve_center[1])
print('Reverse geocoding check')
print('-----------------------')
print('Address of [{}, {}] is: {}'.format(geneve_center[0], geneve_center[1], addr))
#print(addr.find('Switzerland'))

#### Grid creation from the City center

In [None]:
geneve_center_x, geneve_center_y = lonlat_to_xy(geneve_center[1], geneve_center[0]) # City center in Cartesian coordinates

k = math.sqrt(3) / 2 # Vertical offset for hexagonal grid cells
x_min = geneve_center_x - 6000
x_step = 600
y_min = geneve_center_y - 6000 - (int(21/k)*k*600 - 12000)/2
y_step = 600 * k 

latitudes = []
longitudes = []
distances_from_center = []
xs = []
ys = []
for i in range(0, int(21/k)):
    y = y_min + i * y_step
    x_offset = 300 if i%2==0 else 0
    for j in range(0, 21):
        x = x_min + j * x_step + x_offset
        distance_from_center = calc_xy_distance(geneve_center_x, geneve_center_y, x, y)
        if (distance_from_center <= 6001):
            lon, lat = xy_to_lonlat(x, y)
            latitudes.append(lat)
            longitudes.append(lon)
            distances_from_center.append(distance_from_center)
            xs.append(x)
            ys.append(y)

print(len(latitudes), 'candidate neighborhood centers generated.')

In [None]:
print('Obtaining location addresses: ', end='')
addresses = []
for lat, lon in zip(latitudes, longitudes):
    address = get_address(google_api_key, lat, lon)
    if address is None:
        address = 'NO ADDRESS'
     
   
    addresses.append(address)
    print(' .', end='')
print(' done.')

In [None]:
import pandas as pd

df_locations = pd.DataFrame({'Address': addresses,
                             'Latitude': latitudes,
                             'Longitude': longitudes,
                             'X': xs,
                             'Y': ys,
                             'Distance_from_center': distances_from_center})

df_locations

In [None]:
#!pip install folium

import folium

map_geneve = folium.Map(location=geneve_center, zoom_start=12)
folium.Marker(geneve_center, popup='Plaine Plainpalais').add_to(map_geneve)
for lat, lon in zip(latitudes, longitudes):
    latlon=str(lat)+" " +str(lon)
    folium.CircleMarker([lat, lon], radius=2, color='blue', fill=True, fill_color='blue', fill_opacity=1).add_to(map_geneve) 
    folium.Circle([lat, lon],  popup=latlon, radius=300, color='blue', fill=False).add_to(map_geneve)
    
map_geneve


## 4. Integration of Foursquare and TPG data<a name="integration"></a>

### Integration of Foursquare data

In [None]:
foursquare_client_id = NULL
foursquare_client_secret = NULL
foursquare_version = '20210101'
foursquare_limit = 300
print('Your credentails:')
print('foursquare_client_id: ' + foursquare_client_id)
print('foursquare_client_secret:' + foursquare_client_secret)


In [None]:
#identification of venue categories

food_category = '4d4b7105d754a06374d81259'


cafe_categories = ['4bf58dd8d48988d16d941735','4bf58dd8d48988d1e0931735', '4bf58dd8d48988d1f0941735', 
                   '4bf58dd8d48988d18d941735', ]
#cafe, coffee_shop, internet cafe, gaming cafée, tea-room, bar

shops_category = '4d4b7105d754a06378d81259'

bookshops_categories = ['4bf58dd8d48988d114951735', '52f2ab2ebcbc57f1066b8b30', '4bf58dd8d48988d1b1941735' ]
#bookstore, used bookstore, college bookstore


def is_cafe(categories, specific_filter=None):
    cafe_words = ['café']
    cafe = False
    specific = False
    
    for c in categories:
        category_name = c[0].lower()
        category_id = c[1]
        for r in cafe_words:
            if r in category_name:
                cafe = True

        if not(specific_filter is None) and (category_id in specific_filter):
            specific = True
            cafe = True
    return cafe, specific

def is_shop(categories, specific_filter=None):
    bookshop_words = ['magasin']
    bookshop = False
    specific = False
    for c in categories:
        category_name = c[0].lower()
        category_id = c[1]
        for r in bookshop_words:
            if r in category_name:
                bookshop = True
      
        if not(specific_filter is None) and (category_id in specific_filter):
            specific = True
            bookshop = True
    return bookshop, specific


def get_categories(categories):
    return [(cat['name'], cat['id']) for cat in categories]

def format_address(location):
    address = ', '.join(location['formattedAddress'])
    address = address.replace(', Switzerland', '')
    address = address.replace(', Schweiz', '')
    address = address.replace(', Suisse', '')
    address = address.replace(', Svizzera', '')
    return address

def get_venues_near_location(lat, lon, category, client_id, client_secret, radius, limit=100):
    version = '20210101'
    url = 'https://api.foursquare.com/v2/venues/explore?client_id={}&client_secret={}&v={}&ll={},{}&categoryId={}&radius={}&limit={}'.format(foursquare_client_id, foursquare_client_secret, foursquare_version, lat, lon, category, radius, foursquare_limit)
    try:
        results = requests.get(url).json()['response']['groups'][0]['items']
        venues = [(item['venue']['id'],
                   item['venue']['name'],
                   get_categories(item['venue']['categories']),
                   (item['venue']['location']['lat'], item['venue']['location']['lng']),
                   format_address(item['venue']['location']),
                   item['venue']['location']['distance']) for item in results]  
         
        
    except:
        venues = []
   
    return venues




In [None]:
# Let's now go over our neighborhood locations and get nearby bookshops
import pickle


def get_bookshops(lats, lons):
 
    bookshops = {}
    location_bookshops = []
    print('Obtaining venues around candidate locations:', end='')
    for lat, lon in zip(lats, lons):
       
        venues = get_venues_near_location(lat, lon, shops_category, foursquare_client_id, foursquare_client_secret,
                                         radius=350, limit=100)
        
        area_bookshops = []
        for venue in venues:
            venue_id = venue[0]
            venue_name = venue[1]
            venue_categories = venue[2]
            venue_latlon = venue[3]
            venue_address = venue[4]
            venue_distance = venue[5]
            is_bo, is_bookshop = is_shop(venue_categories, specific_filter=bookshops_categories)
            if is_bo:
                x, y = lonlat_to_xy(venue_latlon[1], venue_latlon[0])
                bookshop = (venue_id, venue_name, venue_latlon[0], venue_latlon[1], venue_address, venue_distance, x, y)
                if venue_distance<=300:
                    area_bookshops.append(bookshop)
                bookshops[venue_id] = bookshop
                if is_bookshop:
                    bookshops[venue_id] = bookshop
        location_bookshops.append(area_bookshops)
        print(' .', end='')
    print(' done.')
   
    return bookshops, location_bookshops

# Try to load from local file system in case we did this before

bookshops = {}
location_bookshops = []
loaded = False
try:

    with open('bookshops_350.pkl', 'rb') as f:
        bookshops = pickle.load(f)
    with open('location_bookshops_350.pkl', 'rb') as f:
        location_bookshops = pickle.load(f)
    print('Bookshops data loaded.')
    loaded = True
except:
    pass

# If load failed use the Foursquare API to get the data
if not loaded:
    bookshops, location_bookshops = get_bookshops(latitudes, longitudes)
  
    with open('bookshops_350.pkl', 'wb') as f:
        pickle.dump(bookshops, f)
    with open('location_bookshops_350.pkl', 'wb') as f:
        pickle.dump(location_bookshops, f)

bookshops

In [None]:

def get_cafes(lats, lons):
    cafes = {}
    location_cafes = []

    print('Obtaining venues around candidate locations:', end='')
    for lat, lon in zip(lats, lons):
       
        venues = get_venues_near_location(lat, lon, food_category, foursquare_client_id, foursquare_client_secret,
                                          radius=350, limit=100)
        area_cafes = []
        for venue in venues:
            venue_id = venue[0]
            venue_name = venue[1]
            venue_categories = venue[2]
            venue_latlon = venue[3]
            venue_address = venue[4]
            venue_distance = venue[5]
            is_ca = is_cafe(venue_categories, specific_filter=cafe_categories)
            if is_ca:
                x, y = lonlat_to_xy(venue_latlon[1], venue_latlon[0])
                cafe = (venue_id, venue_name, venue_latlon[0], venue_latlon[1], venue_address, venue_distance, x, y)
                if venue_distance<=300:
                    area_cafes.append(cafe)
                cafes[venue_id] = cafe
        location_cafes.append(area_cafes)
        print(' .', end='')
    print(' done.')
    return cafes, location_cafes


cafes = {}
location_cafes = []
loaded = False
try:
    with open('cafes_350.pkl', 'rb') as f:
        cafes = pickle.load(f)
    with open('location_cafes_350.pkl', 'rb') as f:
        location_cafes = pickle.load(f)
    print('Cafes data loaded.')
    loaded = True
except:
    pass

# If load failed use the Foursquare API to get the data
if not loaded:
    cafes, location_cafes = get_cafes(latitudes, longitudes)
    
    # Let's persists this in local file system
    with open('cafes_350.pkl', 'wb') as f:
        pickle.dump(cafes, f)
    with open('location_cafes_350.pkl', 'wb') as f:
        pickle.dump(location_cafes, f)

cafes

### Integration of TPG DATA

In [None]:
df_tpg = pd.read_csv('arrets_utf.csv', sep="\t", error_bad_lines=False, header=None)
df_tpg.drop([df_tpg.columns[3], df_tpg.columns[4]], axis=1, inplace=True)


df_tpg.head()

#rename columns
df_tpg.rename(columns={0: "bus_stop_name", 1: "east_CH1903_coordinates", 2: "north_CH1903_coordinates"}, inplace=True)
df_tpg.head()


In [None]:
import types

df_tpg = pd.read_csv('arrets_utf.csv', sep="\t", error_bad_lines=False, header=None)
df_tpg.drop([df_tpg.columns[3], df_tpg.columns[4]], axis=1, inplace=True)

df_tpg.head()

#rename columns
df_tpg.rename(columns={0: "bus_stop_name", 1: "east_CH1903_coordinates", 2: "north_CH1903_coordinates"}, inplace=True)

#convert to longitude/latitude
df_tpg["latitude"] = CHtoWGSlat(df_tpg["east_CH1903_coordinates"], df_tpg["north_CH1903_coordinates"])
df_tpg["longitude"] = CHtoWGSlng(df_tpg["east_CH1903_coordinates"], df_tpg["north_CH1903_coordinates"])

#add x,y 
transformer1 = Transformer.from_crs(  "epsg:4326", "+proj=utm +zone=33 +ellps=WGS84",
always_xy=True,)
transformer2 = Transformer.from_crs(  "epsg:4326", "+proj=utm +zone=33 +ellps=WGS84",
always_xy=True,)

df_tpg["x"] = transformer1.transform(df_tpg["longitude"], df_tpg["latitude"])[0]
df_tpg["y"] = transformer2.transform(df_tpg["longitude"], df_tpg["latitude"])[1]


df_tpg.head()


In [None]:
import warnings
#warnings.filterwarnings("ignore")

def get_stops_near_location_tpg(lat_x, lon_y, source_tpg, radius, limit):
    venues = []
    temp = {}
    lst = []
    dd=0.00
    dict_tpg= source_tpg.to_dict('index')
    #print(dict_tpg)
    try:
            
        for key, value in iter(dict_tpg.items()):
            
            temp = key,value
              
            lst = list(temp)
                      
            dd=calc_xy_distance(lat_x, lon_y, lst[1]['x'], lst[1]['y'])

            if dd <= radius:
                lst[1]['distance'] = dd
                lst[1]['lat_to_x'] = lat_x
                lst[1]['lon_to_y'] = lon_y
                
                tempo = tuple(lst)
                venues=tempo
                #print(venues)
      
    except:
        venues = []
    return venues



In [None]:
def get_bus(lats, lons):
    buses = {}
    venues = {}
    location_buses = []
 
    print('Obtaining venues around candidate locations:', end='')
   
    for lat, lon in zip(lats, lons):

     
        print(lonlat_to_xy(lon, lat)[0])
        venues = get_stops_near_location_tpg(lonlat_to_xy(lon, lat)[0], 
                                             lonlat_to_xy(lon, lat)[1], df_tpg, radius=350, limit=100)
        print(venues)
        area_bus = []
        bus = {}

        for venue in venues:            
            venue_id= venues[0]
            venue_name=venues[1]['bus_stop_name']

            venue_lat=venues[1]['latitude']
            venue_lon=venues[1]['longitude']
            venue_d= venues[1]['distance']
                    
            x =venues[1]['lat_to_x']
            y =venues[1]['lon_to_y'] 
            bus = (venue_id, venue_name, venue_lat, venue_lon, venue_d, x, y)
     
            if venue_d<=100:
                area_bus.append(bus)
            buses[venue_id] = bus
        location_buses.append(area_bus)
        
   
        print(' .', end='')
    print(' done.')
    return buses, location_buses
     

In [None]:
    
# Try to load from local file system in case we did this before

buses = {}
location_buses = []
loaded = False
try:
    with open('buses_350.pkl', 'rb') as f:
        buses = pickle.load(f)
    with open('location_buses_350.pkl', 'rb') as f:
        location_buses = pickle.load(f)
    print('Bus data loaded.')
    loaded = True
except:
    pass

# If load failed use the TPG file to get the data
if not loaded:
    buses, location_buses = get_bus(latitudes, longitudes)
    
    # Let's persists this in local file system
    with open('buses_350.pkl', 'wb') as f:
        pickle.dump(buses, f)
    with open('location_buses_350.pkl', 'wb') as f:
        pickle.dump(location_buses, f)


### Data Recapitulation

In [None]:
import numpy as np

print('Total number of cafes:', len(cafes))
print('Average number of cafes in neighborhood:', np.array([len(r) for r in location_cafes]).mean())

print('Total number of bookshops:', len(bookshops))
print('Average number of bookshops in neighborhood:', np.array([len(r) for r in location_bookshops]).mean())

print('Total number of bus stops:', len(buses))
print('Average number of bus stops in neighborhood:', np.array([len(r) for r in location_buses]).mean())

In [None]:
map_geneve_with_markers = folium.Map(location=geneve_center, zoom_start=13)
folium.Marker(geneve_center, popup='Plaine de Plainpalais',  icon=folium.Icon(color="gray", icon="info", prefix='fa')).add_to(map_geneve_with_markers)

for res in cafes.values():
    lat = res[2]; lon = res[3]    
    color = 'red'
    folium.Marker([lat, lon], icon=folium.Icon(color="red", icon="coffee", prefix='fa')).add_to(map_geneve_with_markers)
    folium.CircleMarker([lat, lon], radius=3, color=color, fill=True, fill_color=color, fill_opacity=1).add_to(map_geneve_with_markers)
    
for res in bookshops.values():
    lat = res[2]; lon = res[3]    
    color = 'blue'
    folium.Marker([lat, lon], icon=folium.Icon(color="blue", icon="book", prefix='fa')).add_to(map_geneve_with_markers)
    folium.CircleMarker([lat, lon], radius=3, color=color, fill=True, fill_color=color, fill_opacity=1).add_to(map_geneve_with_markers)

for res in buses.values():
    lat = res[2]; lon = res[3]    
    color = 'green'
    folium.Marker([lat, lon], icon=folium.Icon(color="green", icon="bus", prefix='fa')).add_to(map_geneve_with_markers)
    folium.CircleMarker([lat, lon], radius=3, color=color, fill=True, fill_color=color, fill_opacity=1).add_to(map_geneve_with_markers)
    
for lat, lon in zip(latitudes, longitudes):
    latlon=str(lat)+" " +str(lon)
    folium.CircleMarker([lat, lon], radius=2, color='blue', fill=True, fill_color='blue', fill_opacity=1).add_to(map_geneve_with_markers) 
    folium.Circle([lat, lon], radius=300, color='blue', fill=False).add_to(map_geneve_with_markers)
map_geneve_with_markers

## Data analysis

In [None]:

location_cafes_count = [len(res) for res in location_cafes]
location_bookshops_count = [len(res) for res in location_bookshops]
location_bus_count = [len(res) for res in location_buses]

df_locations['Cafe_in_area'] = location_cafes_count
df_locations['Bookshop_in_area'] = location_bookshops_count
df_locations['Bus_in_area'] = location_bus_count

print('Average number of cafes in every area with radius=300m:', np.array(location_cafes_count).mean())
print('Average number of bookshops in every area with radius=100m:', np.array(location_bookshops_count).mean())
print('Average number of bus in every area with radius=100m:', np.array(location_bus_count).mean())


## 5. Calculation of best neighborhoods<a name="bestNeighborhoods"></a>

In [None]:
#best neighborhoods combining 4 criterias

df_good_locations=df_locations[(df_locations.Cafe_in_area <=4) & 
(df_locations.Bookshop_in_area <=2) & 
(df_locations.Bus_in_area >=1)  &             
(df_locations.Distance_from_center <= 3000)]


good_cafe_count = np.array((df_locations['Cafe_in_area']<=4))
print('Locations with no more than two cafes nearby:', good_cafe_count.sum())

good_bookshop_distance = np.array(df_locations['Bookshop_in_area']<=2)
print('Locations with no bookshop within 100m:', good_bookshop_distance.sum())

good_bus_distance = np.array(df_locations['Bus_in_area']>=1)
print('Locations with a bus within distance 100m:', good_bus_distance.sum())

good_distance_from_center = np.array(df_locations['Distance_from_center']<=3000)
print('Locations with a distance of max. 3000m from the city center:', good_distance_from_center.sum())

good_locations =  np.logical_and(np.logical_and(good_cafe_count, good_bookshop_distance), np.logical_and(good_bus_distance, good_distance_from_center))


print('Locations with all 4 conditions met:', good_locations.sum())

df_good_locations = df_locations[good_locations]


good_latitudes = df_good_locations['Latitude'].values
good_longitudes = df_good_locations['Longitude'].values

good_locations = [[lat, lon] for lat, lon in zip(good_latitudes, good_longitudes)]

cafes_latlons = [[res[2], res[3]] for res in cafes.values()]
bookshops_latlons = [[res[2], res[3]] for res in bookshops.values()]
buses_latlons = [[res[2], res[3]] for res in buses.values()]

In [None]:
df_good_locations

In [None]:
#map good locations
map_geneve_good_locations=folium.Map(location=geneve_center, zoom_start=13)
folium.Circle(geneve_center, radius=3000, color='white', fill=True, fill_opacity=0.4).add_to(map_geneve_good_locations)
map_geneve_good_locations


folium.Marker(geneve_center, popup=str(str(geneve_center[0]) +" "+ str(geneve_center[1])), icon=folium.Icon(color='black')).add_to(map_geneve_good_locations)
map_geneve_good_locations

for index, row in df_good_locations.iterrows():
    latlon=str(str(row['Latitude'])+" "+str(row['Longitude']))
    folium.Marker([row['Latitude'], row['Longitude']], popup=latlon).add_to(map_geneve_good_locations) 
map_geneve_good_locations



## 6. Calculation of clusters of neighborhoods <a name="clusters"></a>

In [None]:
#display potential cluser

from sklearn.cluster import KMeans
from folium.features import DivIcon

number_of_clusters = 5

good_xys = df_good_locations[['X', 'Y']].values
kmeans = KMeans(n_clusters=number_of_clusters, random_state=1).fit(good_xys)

cluster_centers = [xy_to_lonlat(cc[0], cc[1]) for cc in kmeans.cluster_centers_]

df_cluster_centers= pd.DataFrame(cluster_centers)
df_cluster_centers['color'] = np.arange(len(df_cluster_centers))
df_cluster_centers.columns = ['lon', 'lat', 'color']


cluster_map = pd.DataFrame()
cluster_map['data_index'] = df_good_locations.index.values
cluster_map['cluster'] = kmeans.labels_


map_geneve_clusters = folium.Map(location=geneve_center, zoom_start=14)
colors = ['purple', 'darkblue', 'orange', 'red', 'blue', 'green']


folium.Circle(geneve_center, radius=3000, color='white', fill=True, fill_opacity=0.4).add_to(map_geneve_clusters)


folium.Marker(geneve_center, popup=str(str(geneve_center[0]) +" "+ str(geneve_center[1])), icon=folium.Icon(color='black')).add_to(map_geneve_clusters)


for index, row in df_cluster_centers.iterrows():
    latlon=str(str(row['lat'])+" "+str(row['lon']))
    folium.Marker([row['lat'], row['lon']], icon=folium.Icon(color=colors[int(row['color'])]), popup=latlon).add_to(map_geneve_clusters) 


map_geneve_clusters

cluster_map['cluster']
cluster_map[cluster_map.cluster == 5]


for index, row in cluster_map.iterrows():
    df_good_locations.loc[row['data_index']].Color = 1
    folium.Circle([df_good_locations.loc[row['data_index']].Latitude, df_good_locations.loc[row['data_index']].Longitude], radius=300, color=colors[row['cluster']], fill=True, fill_color=colors[row['cluster']], fill_opacity=0.4).add_to(map_geneve_clusters)
    latlon=str(df_good_locations.loc[row['data_index']].Latitude)+" " +str(df_good_locations.loc[row['data_index']].Longitude)
    folium.CircleMarker([df_good_locations.loc[row['data_index']].Latitude, df_good_locations.loc[row['data_index']].Longitude], popup=latlon, radius=2, color=colors[row['cluster']], fill=True, fill_color=colors[row['cluster']], fill_opacity=1).add_to(map_geneve_clusters)

map_geneve_clusters


In [None]:
candidate_area_addresses = []
print('==============================================================')
print('Clusters center addresses')
print('==============================================================\n')
for lon, lat in cluster_centers:
    addr = get_address(google_api_key, lat, lon)
    
    addr = addr.replace("'", '')
    candidate_area_addresses.append(addr)    
    x, y = lonlat_to_xy(lon, lat)
    d = calc_xy_distance(x, y, geneve_center_x, geneve_center_y)
    print('{}{} => {:.1f}km from City of Geneva'.format(addr, ' '*(10-len(addr)), d/1000))



## 7. Conclusion <a name="conclusion"></a>

<br>
Our goal was to explore the opportunities for an investor to open a coffee/bookshop in the city of Geneva (Switzerland) as this kind of place doesn't exist in the city yet. 

We have chosen to rely on the data of Google maps, Foursquare and TPG to identify the best locations. As Foursquare has shown lacks of data for the bus stops, we could, in a future test, query another data source in order to challenge it to assess whether Foursquare should remain the data source for the café and bookshops venues.

As further directions, we could :

* remove localisations on the Leman lake and gardens
* integrate the neighborhood mean incomes
* try another clustering method 

For more details on this project, look at the **Full Report** in the same repository.