<h1> Coursera Capstone Project - Week 2 </h1>

This notebook will be used for Coursera Capstone Project by Syl. L.

<h2> Introduction / Business Problem </h2>

Toulouse is one of the most young and dynamic big city in France. It is the second largest city of the country in terms of student and the fourth biggest in terms of population. Besides its student, its industry mostly relies on the aeronautic and aerospatial field with young engineer as employees. Therefore the city is rather young and dynamic. Moreover, Toulouse is located is the South of France : its inhabitant have a good living spirit, typical of the mediterranean towns : people like to go out, drink beers and have fun. 

There are now plenty of bars and pubs in the city and its surrounding, and they are usually full in the afternoon and at night.

This project aims at identifying the best location to open a new bar dedicating to beer beverages in the city-center of Toulouse.

The purpose of this project is therefore to determine the best location in the city-center of Toulouse and its surrounding to open a bar dedicated to beer beverages in order to be accessible to both student and young workers willing to have a drink after work. The choice of the location must also be rationalize regarding potential customers working/studying nearby but also regarding the accessibility of the place (transportation nearby). Indeed, the purpose of this bar is to serve customers after their workday and before they go back home. Thus the place must be accessible from their work place and must also be close to some public transportation services. The purpose is to give a comprehensive insight into where where is best to open a new venue to maximize the value for money.

<h2> Data </h2>

The following data will be used:
- Foursquare API : in order to find existing bars, companies and universities
- Toulouse Metropole API : to locate the transportation in the area of interest


Two datasets from the Toulouse Metropole Open data sites will be usesd :
- Bus stops location (https://data.toulouse-metropole.fr/explore/dataset/arrets-de-bus/information/)
- Subway stops location (https://data.toulouse-metropole.fr/explore/dataset/stations-de-metro/api/?rows=1)

Those data might help locate the best place where to implement the new bar taking into account the proximity both of customers and transportation services 

In [23]:
CLIENT_ID = 'your Foursquare ID'
CLIENT_SECRET = 'your Foursquare Secret'
VERSION = 'Foursquare API version'

<h2> Methodology </h2>

The analysis is decomposed into several parts :
 - data of interest retrieval
 - elaboration of a criteria of interest
 - analysis of the data
 
 The data of interest are :
  - the location of the potential customers and competitors. The customers might be student (studying at universities) and professional working in shops / companies nearby. The location of those points of interest will be retrieved from FourSquare.
  - the transportation easily accessible from the location. The two main public transports in Toulouse are buses and subways. The information about the bus stops / subway stations is retrieved from the Toulouse Metropole Open Data website.
  
A first analysis will give the spatial location of the potential customers and of the actual competitors (existing bars).
A second analysis will provide heatmap in order to better locate the best spots to open a new venue. Several criteria could be taken into account. We will focus on the proximity of customers and the proximity of public transportation.

<h3> Imports </h3>

In [2]:
from IPython.display import Image
import pickle
import json
import requests
import folium
import pandas as pd
!pip install shapely
import shapely.geometry
#!pip install pyproj
import pyproj
import math
import warnings
import sys
import numpy as np
warnings.simplefilter("ignore")

Collecting shapely
[?25l  Downloading https://files.pythonhosted.org/packages/9d/18/557d4f55453fe00f59807b111cc7b39ce53594e13ada88e16738fb4ff7fb/Shapely-1.7.1-cp36-cp36m-manylinux1_x86_64.whl (1.0MB)
[K     |████████████████████████████████| 1.0MB 1.1MB/s eta 0:00:01
[?25hInstalling collected packages: shapely
Successfully installed shapely-1.7.1


<h3> Basic functions </h3>
Those functions help change the coordinates (from latitue/longitude to cartesian) and derive the distance between two points.

In [3]:
def lonlat_to_xy(lon, lat):
    proj_latlon = pyproj.Proj(proj='latlong',datum='WGS84')
    proj_xy = pyproj.Proj(proj="utm", zone=33, datum='WGS84')
    xy = pyproj.transform(proj_latlon, proj_xy, lon, lat)
    return xy[0], xy[1]

def latlon_to_xy(lat, lon):
    proj_latlon = pyproj.Proj(proj='latlong',datum='WGS84')
    proj_xy = pyproj.Proj(proj="utm", zone=33, datum='WGS84')
    xy = pyproj.transform(proj_latlon, proj_xy, lon, lat)
    return xy[0], xy[1]

def xy_to_lonlat(x, y):
    proj_latlon = pyproj.Proj(proj='latlong',datum='WGS84')
    proj_xy = pyproj.Proj(proj="utm", zone=33, datum='WGS84')
    lonlat = pyproj.transform(proj_xy, proj_latlon, x, y)
    return lonlat[0], lonlat[1]

def calc_xy_distance(x1, y1, x2, y2):
    dx = x2 - x1
    dy = y2 - y1
    return math.sqrt(dx*dx + dy*dy)

def calc_distance(a1,a2):
    x1, y1 = a1[0], a1[1]
    x2, y2 = a2[0], a2[1]
    dx = x2 - x1
    dy = y2 - y1
    return math.sqrt(dx*dx + dy*dy)

The center of the city is defined manually. It is the center of the main square of Toulouse, "la place du Capitole", where the town hall is.

In [4]:
coordinates_center = [43.6044499, 1.444494]
xy_center = lonlat_to_xy(coordinates_center[1], coordinates_center[0])

Datasets about the public transportation are retrieved.

In [5]:
url_bus = 'https://data.toulouse-metropole.fr/api/records/1.0/search/?dataset=arrets-de-bus&q=&rows=2053'
url_subway= 'https://data.toulouse-metropole.fr/api/records/1.0/search/?dataset=stations-de-metro&q=&rows=44&facet=ligne'

response_sub = requests.get(url_subway).json()
response_bus = requests.get(url_bus).json()

The different stops (bus stop & subway station) are retrieved and their distance to the city center is calculated

In [6]:
stops_latlon = []

for stop in response_bus['records']:
    field = stop.get('fields')
    geo_coord = field.get('geo_point_2d')
    xy = latlon_to_xy(geo_coord[0], geo_coord[1])
    dist = calc_distance(xy_center, xy)
    stops_latlon.append([field.get('champ_calcule'), 'bus', geo_coord[0], geo_coord[1], dist])
    
for stop in response_sub['records']:
    field = stop.get('fields')
    geo_coord = field.get('geo_point_2d')
    xy = latlon_to_xy(geo_coord[0], geo_coord[1])
    dist = calc_distance(xy_center, xy)
    stops_latlon.append([field.get('nom'), 'subway', geo_coord[0], geo_coord[1], dist])

clm_values = ['Stop_name', 'Stop_type', 'Latitude', 'Longitude','Distance']
df = pd.DataFrame(data=stops_latlon, columns=clm_values)
df

Unnamed: 0,Stop_name,Stop_type,Latitude,Longitude,Distance
0,Arrêt: 1er RTP\n58,bus,43.545091,1.385569,8250.724060
1,Arrêt: 8 mai 1945\n116 315,bus,43.509677,1.173544,24650.282926
2,Arrêt: A. Perdiguier\n55,bus,43.583701,1.289223,12933.839973
3,Arrêt: Acacias\n42,bus,43.657098,1.470549,6304.923205
4,Arrêt: Achiary\n23 37 51 L01 SCOL7,bus,43.600891,1.474156,2462.146594
...,...,...,...,...,...
2092,RANGUEIL,subway,43.574843,1.461701,3622.216345
2093,SAINT-MICHEL - MARCEL LANGER,subway,43.586359,1.447146,2050.405329
2094,JEAN JAURES,subway,43.605803,1.448634,371.743288
2095,LA VACHE,subway,43.633717,1.435560,3378.538007


Only the stops within 1.5 km of the city center are kept. The neighboor analysis will be performed on a circle with a radius of 1km, but nearby stations outside this circle can still be considered useful by customers.

In [7]:
is_close = df['Distance']<1500
df2 = df[is_close]
df2.shape

(85, 5)

<h3> Competitors </h3>
We now are lookin for competitors. Those are bars, pub etc...
Their category id was retrieved from the foursquare site in order to narrow down a bit the research.

The next function helps check if a selected nighlife place is a bar or not. We do not consider nightclubs as competitors for instance.

In [8]:
# Category IDs corresponding to bars were taken from Foursquare web site 
# (https://developer.foursquare.com/docs/resources/categories):
bar_word = ['Irish Pub', 'Juice Bar', 'Bar', 'Beer Bar', 'Brewery', 'Sports Bar', 'Pub']
bar_categories = ['52e81612bcbc57f1066b7a06','4bf58dd8d48988d112941735', '4bf58dd8d48988d116941735', '56aa371ce4b08b9a8d57356c', '50327c8591d4c4b30a586d5d', '4bf58dd8d48988d11d941735', '4bf58dd8d48988d11b941735']

def is_bar(categories, specific_filter=None):
    bar = False
    specific = False
    for c in categories:
        category_name = c['name'].lower()
        print(category_name)
        category_id = c['id']
        for r in bar_word:
            if r.lower() in category_name:
                bar = True
        if not(specific_filter is None) and (category_id in specific_filter):
            specific = True
            bar = True
    return bar, specific

In [9]:
LIMIT = 300 # limit of number of venues returned by Foursquare API
radius = 1000 # define radius

nightlife_spots = '4d4b7105d754a06376d81259'

def getNearbyVenues(names, latitudes, longitudes, radius=1000):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        #print(name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&categoryId={}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            nightlife_spots,
            radius, 
            LIMIT)
            
        # make the GET request
        #print(requests.get(url).json()["response"]["groups"])
        results = requests.get(url).json()["response"]["groups"][0]['items']
            
        for v in results:
            venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name'])])
        
    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Neighborhood', 
                  'Neighborhood Latitude', 
                  'Neighborhood Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    
    
    return(nearby_venues)

All the venues within 1km of the city center are retrieved and only the bar are selected. There are 85 competitors in the area

In [10]:
toulouse_venues = getNearbyVenues('Zone center',
                                   latitudes=[coordinates_center[0]],
                                   longitudes=[coordinates_center[1]]
                                  )
toulouse_venues

Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,Z,43.60445,1.444494,Fat Cat,43.605105,1.443847,Cocktail Bar
1,Z,43.60445,1.444494,Cale Sèche,43.603780,1.442684,Bar
2,Z,43.60445,1.444494,Le Caribe,43.606175,1.446982,Bar
3,Z,43.60445,1.444494,The Classroom,43.604448,1.438613,Bar
4,Z,43.60445,1.444494,Snapper Rock,43.605572,1.449896,Bar
...,...,...,...,...,...,...,...
81,Z,43.60445,1.444494,Chez Authié,43.600035,1.453760,Bar
82,Z,43.60445,1.444494,l'Autan,43.611772,1.438892,Bar
83,Z,43.60445,1.444494,Bar Le Cactus,43.611154,1.437068,Pub
84,Z,43.60445,1.444494,la suite,43.600512,1.455420,Bar


In order to better vizualize their location, we plot a map with markers for each venue.

In [11]:
map_toulouse = folium.Map(location=coordinates_center, zoom_start=15)
folium.Marker(coordinates_center, popup='Capitole').add_to(map_toulouse)
for index,res in toulouse_venues.iterrows():
    lat = res[4]; lon = res[5]
    color = 'red'
    label = '{}'.format(res[1])
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker([lat, lon], radius=3, color=color, fill=True, fill_color=color,                         
                        popup=label, fill_opacity=1, parse_html=False).add_to(map_toulouse)
map_toulouse

![Distribution of competitors](1.PNG)

It can be noted that the competitors are not evenly distributed around the city-center. A lot are near the river 'Garonne' (probably because it is pleasant in the summer...). Quite a few are located south of the neighbourhood Esquirol, and about a third are eastward of the Jean Jaurès place. Just a few are located exactly at the city-center.

<h3> Customers </h3>

Let's now have a look at the spatial distribution of the potential customers.

<h4> Universities </h4>
Student will probably be the first customers, let's see where they spend their day.
The process is similar than for competitors, the id of university helps to narrow the research.

In [12]:
# Category IDs corresponding to bars were taken from Foursquare web site 
# (https://developer.foursquare.com/docs/resources/categories):

LIMIT = 100 # limit of number of venues returned by Foursquare API
radius = 1000 # define radius
def getNearbyUny(names, latitudes, longitudes, radius=1000):
    
    college_code = '4d4b7105d754a06372d81259'
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        #print(name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&categoryId={}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            college_code,
            radius, 
            LIMIT)
            
        # make the GET request
        #print(requests.get(url).json()["response"]["groups"])
        results = requests.get(url).json()["response"]["groups"][0]['items']
            
        for v in results:
            venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name'])])
        

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Neighborhood', 
                  'Neighborhood Latitude', 
                  'Neighborhood Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    
    
    return(nearby_venues)

In [13]:
toulouse_uni = getNearbyUny('Zone center',
                                   latitudes=[coordinates_center[0]],
                                   longitudes=[coordinates_center[1]]
                                  )
toulouse_uni

Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,Z,43.60445,1.444494,IAE Toulouse - Institut D'Administration Des E...,43.607385,1.439512,University
1,Z,43.60445,1.444494,Université Toulouse 1 Capitole,43.606611,1.436766,University
2,Z,43.60445,1.444494,ISEG Toulouse,43.61107,1.444988,University
3,Z,43.60445,1.444494,INP - ENSEEIHT,43.602033,1.454597,University
4,Z,43.60445,1.444494,IAE Toulouse,43.607419,1.43939,General College & University
5,Z,43.60445,1.444494,institut supérieur des arts De Toulouse,43.600148,1.440199,General College & University
6,Z,43.60445,1.444494,Sciences Po Toulouse,43.607253,1.437964,General College & University
7,Z,43.60445,1.444494,Institut d'Études Politiques de Toulouse,43.60722,1.437882,General College & University
8,Z,43.60445,1.444494,Toulouse School Of Management,43.606128,1.437816,School
9,Z,43.60445,1.444494,Conseillère d’orientation Tonavenir Toulouse L...,43.610888,1.440128,Student Center


As before we plot a map with the location. It is interesting to note than most Universities are at the north west of the city-center.

In [14]:
map_toulouse = folium.Map(location=coordinates_center, zoom_start=15)
folium.Marker(coordinates_center, popup='Capitole').add_to(map_toulouse)
for index,res in toulouse_uni.iterrows():
    lat = res[4]; lon = res[5]
    color = 'yellow'
    label = '{}'.format(res[1])
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker([lat, lon], radius=3, color=color, fill=True, fill_color=color,                         
                        popup=label, fill_opacity=1, parse_html=False).add_to(map_toulouse)
map_toulouse

![Distribution of universities](2.PNG)

<h4> Companies </h4>
The next potential clients of a bar are working people wanting to have a drink right after work.
We retrive with the same process as before the places of work (companies, shops etc...).

In [15]:
# Category IDs corresponding to bars were taken from Foursquare web site 
# (https://developer.foursquare.com/docs/resources/categories):
companies_categories = ['4d4b7105d754a06375d81259', '4d4b7105d754a06378d81259', '4d4b7105d754a06379d81259',
                     '4d4b7104d754a06370d81259']

LIMIT = 100 # limit of number of venues returned by Foursquare API
radius = 1000 # define radius
def getNearbyCompanies(names, latitudes, longitudes, radius=1000):
      
    venues_list=[]
    
    for company_id in companies_categories:
        for name, lat, lng in zip(names, latitudes, longitudes):
            #print(name)

            # create the API request URL
            url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&categoryId={}&radius={}&limit={}'.format(
                CLIENT_ID, 
                CLIENT_SECRET, 
                VERSION, 
                lat, 
                lng, 
                company_id,
                radius, 
                LIMIT)

            # make the GET request
            #print(requests.get(url).json()["response"]["groups"])
            results = requests.get(url).json()["response"]["groups"][0]['items']

            for v in results:
                venues_list.append([(
                name, 
                lat, 
                lng, 
                v['venue']['name'], 
                v['venue']['location']['lat'], 
                v['venue']['location']['lng'],  
                v['venue']['categories'][0]['name'])])
        

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Neighborhood', 
                  'Neighborhood Latitude', 
                  'Neighborhood Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    
    
    return(nearby_venues)

In [16]:
toulouse_companies = getNearbyCompanies('Zone center',
                                   latitudes=[coordinates_center[0]],
                                   longitudes=[coordinates_center[1]]
                                  )
toulouse_companies

Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,Z,43.60445,1.444494,Crowne Plaza Toulouse,43.603772,1.443034,Hotel
1,Z,43.60445,1.444494,Pepit Toulouse - Expert comptable,43.606139,1.448600,Financial or Legal Service
2,Z,43.60445,1.444494,Apec,43.602433,1.448693,Office
3,Z,43.60445,1.444494,Artibazar Headquarter,43.600903,1.448083,Office
4,Z,43.60445,1.444494,TOTEM numérique,43.599927,1.448723,Office
...,...,...,...,...,...,...,...
220,Z,43.60445,1.444494,Théâtre Les 3T,43.606037,1.455694,Comedy Club
221,Z,43.60445,1.444494,Galerie Sourillan,43.596305,1.447424,Art Gallery
222,Z,43.60445,1.444494,Auditorium Saint Pierre Des Cuisines,43.603480,1.437056,Concert Hall
223,Z,43.60445,1.444494,Galerie SOURILLAN,43.596312,1.447389,Art Gallery


We plot once again the distribution of the shops/companies. These companies appear to be eastward / southish of the city-center.

In [17]:
map_toulouse = folium.Map(location=coordinates_center, zoom_start=15)
folium.Marker(coordinates_center, popup='Capitole').add_to(map_toulouse)
for index,res in toulouse_companies.iterrows():
    lat = res[4]; lon = res[5]
    color = 'green'
    label = '{}'.format(res[1])
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker([lat, lon], radius=3, color=color, fill=True, fill_color=color,                         
                        popup=label, fill_opacity=1, parse_html=False).add_to(map_toulouse)
map_toulouse

![Distribution of potential customers](3.PNG)

<h4> All in all </h4>
We now plot the competitors, the customers (universities and shops) AND as well the bus st/ subway stations in order to have a sens of the best location.

- blue: bus stops / subway stations
- yellow: universities
- red: competitors
- green: shops & companies

In [18]:
map_toulouse = folium.Map(location=coordinates_center, zoom_start=15)
folium.Marker(coordinates_center, popup='Prefecture').add_to(map_toulouse)
for index,res in toulouse_companies.iterrows():
    lat = res[4]; lon = res[5]
    color = 'green'
    label = '{}'.format(res[1])
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker([lat, lon], radius=3, color=color, fill=True, fill_color=color,                         
                        popup=label, fill_opacity=1, parse_html=False).add_to(map_toulouse)
for index,res in toulouse_uni.iterrows():
    lat = res[4]; lon = res[5]
    color = 'yellow'
    label = '{}'.format(res[1])
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker([lat, lon], radius=3, color=color, fill=True, fill_color=color,                         
                        popup=label, fill_opacity=1, parse_html=False).add_to(map_toulouse)
for index,res in toulouse_venues.iterrows():
    lat = res[4]; lon = res[5]
    color = 'red'
    label = '{}'.format(res[1])
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker([lat, lon], radius=3, color=color, fill=True, fill_color=color,                         
                        popup=label, fill_opacity=1, parse_html=False).add_to(map_toulouse)

for index,res in df2.iterrows():
    lat = res[2]; lon = res[3]
    color = 'blue'
    label = '{}'.format(res[1])
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker([lat, lon], radius=3, color=color, fill=True, fill_color=color,                         
                        popup=label, fill_opacity=1, parse_html=False).add_to(map_toulouse)
    
map_toulouse

![Distribution of competitors, bus stop and consumers](4.PNG)

It can be seen that nereby the city center there are a lot of companies but no bar ! It could be interesting there !

<h4> HeatMaps </h4>
We order to better visualize the distribution of customer, we plot a heat map.
All customers are gathered in a single dataframe.

In [19]:
# Customer = uni + companies
# Category IDs corresponding to bars were taken from Foursquare web site 
# (https://developer.foursquare.com/docs/resources/categories):

categories = ['4d4b7105d754a06372d81259','4d4b7105d754a06375d81259', '4d4b7105d754a06378d81259', '4d4b7105d754a06379d81259',
                     '4d4b7104d754a06370d81259']

LIMIT = 100 # limit of number of venues returned by Foursquare API
radius = 1000 # define radius
def getNearbyCustomers(names, latitudes, longitudes, radius=1000):
      
    venues_list=[]
    
    for customer_id in categories:
        for name, lat, lng in zip(names, latitudes, longitudes):
            #print(name)

            # create the API request URL
            url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&categoryId={}&radius={}&limit={}'.format(
                CLIENT_ID, 
                CLIENT_SECRET, 
                VERSION, 
                lat, 
                lng, 
                customer_id,
                radius, 
                LIMIT)

            # make the GET request
            #print(requests.get(url).json()["response"]["groups"])
            results = requests.get(url).json()["response"]["groups"][0]['items']

            for v in results:
                venues_list.append([(
                name, 
                lat, 
                lng, 
                v['venue']['name'], 
                v['venue']['location']['lat'], 
                v['venue']['location']['lng'],  
                v['venue']['categories'][0]['name'])])
        

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Neighborhood', 
                  'Neighborhood Latitude', 
                  'Neighborhood Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    
    
    return(nearby_venues)

toulouse_customers = getNearbyCustomers('Zone center',
                                   latitudes=[coordinates_center[0]],
                                   longitudes=[coordinates_center[1]]
                                  )

The Heatmap displays the best areas in terms of potential customers concentration !

The best place seems to be just a bit east of the main square, in Montardy Street. There are also interesting places :
- near the train station (Matabiau, north east)
- near Jean Jaurès square (east)
- around the capitole square (city center)

In [20]:
from folium import plugins
from folium.plugins import HeatMap
import numpy as np

map_toulouse = folium.Map(location=coordinates_center, zoom_start=16)
folium.TileLayer('cartodbpositron').add_to(map_toulouse) #cartodbpositron cartodbdark_matter
HeatMap(np.array(toulouse_customers[['Venue Latitude', 'Venue Longitude']]).tolist()).add_to(map_toulouse)

# folium.Marker(Versailles_center).add_to(map_versailles)
folium.Circle(coordinates_center, radius=100, fill=False, color='white').add_to(map_toulouse)
folium.Circle(coordinates_center, radius=250, fill=False, color='white').add_to(map_toulouse)
folium.Circle(coordinates_center, radius=500, fill=False, color='white').add_to(map_toulouse)
folium.Circle(coordinates_center, radius=750, fill=False, color='white').add_to(map_toulouse)
map_toulouse

![HeatMap](5.PNG)

This map however does not reflect the distance to the buses / subway.
We therefore introduce a weight. This weight gives a better estimate of the interest of each location, relatively to its distance to the nearest bus stop or subway station.
As people are generally lazy, and as it appear in physical process (such as gravity), the inverse of the square of the distance (1/distance^2) is used as a weight.

In [21]:
weight_customers = pd.DataFrame(np.array(toulouse_customers[['Venue Latitude', 'Venue Longitude']]))
weight_dist = []

for index,res in weight_customers.iterrows():
    lon = res[1]
    lat = res[0]
    xy = lonlat_to_xy(lon, lat)
    
    min_dist = 10000
    
    for index,res2 in df2.iterrows():
        lat_stop = res2[2]; lon_stop = res2[3]
        xy_stop = lonlat_to_xy(lon_stop, lat_stop)
        
        dist = calc_distance(xy, xy_stop)
        
        if (dist < min_dist):
            min_dist = dist
    
    weight_dist.append(1000. / dist**2)
    
weight_customers['2'] = np.array(weight_dist)

In [22]:
map_toulouse = folium.Map(location=coordinates_center, zoom_start=16)
folium.TileLayer('cartodbpositron').add_to(map_toulouse) #cartodbpositron cartodbdark_matter
HeatMap(np.array(weight_customers).tolist()).add_to(map_toulouse)

# folium.Marker(Versailles_center).add_to(map_versailles)
folium.Circle(coordinates_center, radius=100, fill=False, color='white').add_to(map_toulouse)
folium.Circle(coordinates_center, radius=250, fill=False, color='white').add_to(map_toulouse)
folium.Circle(coordinates_center, radius=500, fill=False, color='white').add_to(map_toulouse)
folium.Circle(coordinates_center, radius=750, fill=False, color='white').add_to(map_toulouse)
map_toulouse

![HeatMap weight with public transportation](6.PNG)

This weight heat map clearly highlight one spot where a bar could be opened : the Jean Jaurès square. There are a lot a customers there and transportation nearby. It is interesting to note that despite having a lot of potential clients, the universities (in the north west) are too far from transportation.

<h3> Results and discussion </h3>
This analysis shows that choosing the place for a new bar is not an easy task and that several criterias must be considered. The location of the competitors and of the employers of potential customers is not necessarily enough.

The area selected (Jean Jaurès) is really interesting but there is not a lot of place available there : buildings are quite scarce and really expensive because being at the center of the transportation system.

It could be interseting to fetch data relative to the cost of building in each area, and perhaps also of the density of population.

<h3> Conclusion </h3>

This project could be reused for other cities or neighbourhood in the same city. It could be more precise with more data but I was unable to find more relevant data (precise density of population, cost of the buildings).