# Capstone Project - Potential Fitness Facility in Paris
### Applied Data Science Capstone by IBM/Coursera

## Table of Contents:
* [Introduction: Business Problem](#introduction)
* [Data](#data)
* [Methology](#methodology)
* [Analysis](#analysis)
* [Results and Discussion](#discussion)
* [Conclusion](#conclusion)

## Introduction: Business Problem <a name="introduction"></a>

This project is aimed to find a proper location for a new fitness facility in the city of Paris. As sociaty continues to thrive, more and more of people's attention and affort are put into having a healthy lifestyle and taking care of their own body. Like any other big city, Paris has plenty of fitness facilities such as gym, swiming pool, yogar studios across its neighborhoods, thus the competition for piece of cake in the industry is really high. Among other factor, the location of a fitness center is crucial for its long turn success and the goal of this project is to provide a way to satisfy this need and help those who are interested in entering the industry.

## Data <a name = 'data'></a>

* Our ideal candidate is located within a radius of **4 km** from the city centre and given the amount of sport facilities available in the city, it is also to select a location that has as less other fitness clubs as possible.

* The existing target venues in the city are obtained through **Foursquare API** calls. Regarding locations around which these calls are made, we decided to use regularly spaced grid of locations, centered around city center.

* Geocoders **Nominatim** is used to extract several location coordinates.

* Information regarding Paris' neighborhood borders is obtianed from [Paris Data](https://opendata.paris.fr/page/home/).

## Neighborhood Cantidates

The first step is to use Nominatim to find the coordinates of the center of Paris, position around which we'd create latitude and longitude coordinates for centroids for our candidate neighborhoods, which would be enclosed by a circle of 4 km of radius.

In [1]:
from geopy.geocoders import Nominatim

address = 'Paris, France'

geolocator = Nominatim(user_agent = 'Paris')
location = geolocator.geocode(address)
paris_center = [location.latitude, location.longitude]

print('The location of {} is {}, {}.'.format(address, paris_center[0], paris_center[1]))

The location of Paris, France is 48.8566101, 2.3514992.


Before getting into finding the neighborhood candidates, the following functions are needed.
They are meant to calculate the distance between two spots and in order to do that, we need to project the latitude and longitude coordinates into a Cartesian 2D system.

In [387]:
#!pip install shapely
import shapely.geometry

#!pip install pyproj
import pyproj

import math

def lonlat_to_xy(lon, lat):
    proj_latlon = pyproj.Proj(proj='latlong',datum='WGS84')
    proj_xy = pyproj.Proj(proj="utm", zone=33, datum='WGS84')
    xy = pyproj.transform(proj_latlon, proj_xy, lon, lat)
    return xy[0], xy[1]

def xy_to_lonlat(x, y):
    proj_latlon = pyproj.Proj(proj='latlong',datum='WGS84')
    proj_xy = pyproj.Proj(proj="utm", zone=33, datum='WGS84')
    lonlat = pyproj.transform(proj_xy, proj_latlon, x, y)
    return lonlat[0], lonlat[1]

def calc_xy_distance(x1, y1, x2, y2):
    dx = x2 - x1
    dy = y2 - y1
    return math.sqrt(dx*dx + dy*dy)

x, y = lonlat_to_xy(paris_center[1], paris_center[0])

lo, la = xy_to_lonlat(x, y)

Now let's create a grid of area candidates, equaly spaced, centered around city center and within 4 km from the center of Paris. Our neighborhoods will be defined as circular areas with a radius of 300 meters, so our neighborhood centers will be 600 meters apart.

We create a function that select spots around the defined city area and the only taking those that are within the 4 km radius circle.

In [388]:
paris_center_x, paris_center_y = lonlat_to_xy(paris_center[1], paris_center[0]) # City center in Cartesian coordinates

k = math.sqrt(3) / 2 # Vertical offset for hexagonal grid cells
x_min = paris_center_x - 4000
x_step = 600
y_min = paris_center_y - 4000 - (16*k*600 - 8000)/2
y_step = 600 * k 

latitudes = []
longitudes = []
distances_from_center = []
xs = []
ys = []

for i in range(0, int(21/k)):
    y = y_min + i * y_step
    x_offset = 300 if i%2==0 else 0
    for j in range(0, 21):
        x = x_min + j * x_step + x_offset
        distance_from_center = calc_xy_distance(paris_center_x, paris_center_y, x, y)
        if (distance_from_center <= 4001):#only taking those spots that are in the defined 4km radius circle
            lon, lat = xy_to_lonlat(x, y)
            latitudes.append(lat)
            longitudes.append(lon)
            distances_from_center.append(distance_from_center)
            xs.append(x)
            ys.append(y)

print(len(latitudes), 'candidate neighborhood centers generated.')

161 candidate neighborhood centers generated.


Now let's visualize our neighborhoods on a folium map and as we can see, the cells does effectively cover the circle.

In [25]:
import folium

map_paris = folium.Map(location=paris_center, zoom_start=12)
folium.Marker(paris_center, popup='Paris').add_to(map_paris)

folium.Circle(paris_center,
             radius = 4000,
             color = 'red').add_to(map_paris)

for lat, lon in zip(latitudes, longitudes):
    folium.Circle([lat, lon], radius=300, color='blue', fill=False).add_to(map_paris)

map_paris

Next, we convert the obtained information of our neighborhoods into a dataframe.

In [389]:
import pandas as pd

df = pd.DataFrame({'Latitude':latitudes,
                   'Longitude':longitudes,
                   'X':xs,
                   'Y':ys,
                   'Dist_from_center':distances_from_center})
Neigh = []
for i in range(0, df.shape[0]):
    neigh = 'Neighborhood ' + str(i)
    Neigh.append(neigh)

df['Neighborhood'] = Neigh

df = df[[df.columns[-1]] + df.columns.values[0:-1].tolist()]

df.head()

Unnamed: 0,Neighborhood,Latitude,Longitude,X,Y,Dist_from_center
0,Neighborhood 0,48.822311,2.338401,-428335.065403,5485502.0,3973.663297
1,Neighborhood 1,48.823201,2.346376,-427735.065403,5485502.0,3772.267223
2,Neighborhood 2,48.824091,2.354352,-427135.065403,5485502.0,3659.234893
3,Neighborhood 3,48.82498,2.362328,-426535.065403,5485502.0,3642.80112
4,Neighborhood 4,48.825869,2.370304,-425935.065403,5485502.0,3724.24489


## Foursquare API

Have our target area of the city devided into neighborhoods, now let's move on to obtaining the target venues for every each one of them. We would make calls for venues that belongs to **'Gym/Fitness Center'** category for every neighborhood, with a radius of search of 300 meters.

First we define our credentials for Foursquare API.

In [30]:
CLIENT_ID = '2TIPHX0ZXP0M10QMWCDLUTIYW2CXLHL32GWZ4BPX15O4V0PY' # your Foursquare ID
CLIENT_SECRET = 'MZLILRLKWV4V3B5A2UJZQOPOC5OITR3C0AED1GMT55OSID2U' # your Foursquare Secret
VERSION = '20180605' # Foursquare API version

print('Your credentials:')
print('CLIENT_ID: ' + CLIENT_ID)
print('CLIENT_SECRET:' + CLIENT_SECRET)

Your credentails:
CLIENT_ID: 2TIPHX0ZXP0M10QMWCDLUTIYW2CXLHL32GWZ4BPX15O4V0PY
CLIENT_SECRET:MZLILRLKWV4V3B5A2UJZQOPOC5OITR3C0AED1GMT55OSID2U


The following function goes through the dataframe and makes calls to retrieve the target venues for each neighborhood.

Our initial goal is the find the location for a high-end sports/fitness center that has all sorts of sporting facilities such as weight room, swimming pool, indoor basketball court, yoga sessions, etc. and they all fall into the category of Gym/Fitness Center in Venue Categories in [Foursquare's website](https://developer.foursquare.com/docs/resources/categories). Though having specified the venue category by adding the target category id to the url, we're still getting venues the does not belong to the listed categories shown in Foursquare's website, for this, a target list that contains our desired venues categories is to filtered the venues.

In [390]:
import json
import requests

target_cat = ['Gym / Fitness Center', 'Gym', 'Martial Arts Dojo',
              'Gymnastics Gym','Yoga Studio', 'Gym Pool', 
              'Boxing Gym', 'Climbing Gym', 'Athletics & Sports', 
              'Pilates Studio', 'Massage Studio', 'Outdoor Gym', 'Cycle Studio', 'Spa', 'Pool']

def getNearbyVenues(names, latitudes, longitudes, radius=300):
    LIMIT=10
    gym_id = '4bf58dd8d48988d175941735'
    venues_list=[]
    
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(name)
        
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?client_id={}&client_secret={}&v={}&ll={},{}&categoryId={}&radius={}&limit={}'.format(
            CLIENT_ID,
            CLIENT_SECRET,
            VERSION,
            lat,
            lng,
            gym_id,
            radius,
            LIMIT)
            
        # make the GET request
        results = requests.get(url).json()['response']['groups'][0]['items']

        # return only relevant information for each nearby venue
        for v in results:
            #Filter the venues and only that those of our interests
            if v['venue']['categories'][0]['name'] in target_cat:
                
                venues_list.append([(name, 
                                     lat, 
                                     lng, 
                                     v['venue']['name'], 
                                     v['venue']['location']['lat'], 
                                     v['venue']['location']['lng'], 
                                     v['venue']['location']['distance'],
                                     v['venue']['categories'][0]['name'])])
            else:
                pass

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Neighborhood',
                             'Neighborhood Latitude',
                             'Neighborhood Longitude', 
                             'Venue',
                             'Venue Latitude', 
                             'Venue Longitude', 
                             'Venue Distance',
                             'Venue Category']
    
    return(nearby_venues)

In [375]:
paris_venues = getNearbyVenues(names=df['Neighborhood'],
                               latitudes=df['Latitude'], 
                               longitudes=df['Longitude'])

Neighborhood 0
Neighborhood 1
Neighborhood 2
Neighborhood 3
Neighborhood 4
Neighborhood 5
Neighborhood 6
Neighborhood 7
Neighborhood 8
Neighborhood 9
Neighborhood 10
Neighborhood 11
Neighborhood 12
Neighborhood 13
Neighborhood 14
Neighborhood 15
Neighborhood 16
Neighborhood 17
Neighborhood 18
Neighborhood 19
Neighborhood 20
Neighborhood 21
Neighborhood 22
Neighborhood 23
Neighborhood 24
Neighborhood 25
Neighborhood 26
Neighborhood 27
Neighborhood 28
Neighborhood 29
Neighborhood 30
Neighborhood 31
Neighborhood 32
Neighborhood 33
Neighborhood 34
Neighborhood 35
Neighborhood 36
Neighborhood 37
Neighborhood 38
Neighborhood 39
Neighborhood 40
Neighborhood 41
Neighborhood 42
Neighborhood 43
Neighborhood 44
Neighborhood 45
Neighborhood 46
Neighborhood 47
Neighborhood 48
Neighborhood 49
Neighborhood 50
Neighborhood 51
Neighborhood 52
Neighborhood 53
Neighborhood 54
Neighborhood 55
Neighborhood 56
Neighborhood 57
Neighborhood 58
Neighborhood 59
Neighborhood 60
Neighborhood 61
Neighborhood 62
Ne

## Methodology<a name="methodology"></a>

In this project we will direct our efforts on finding spots with low density of fitness related facilities within a circle of 4km of radius from the center of Paris.

We use Foursquare API we find the target venues around the above-defined neighborhoods and use them to create a folium heatmap in order to visualize the situation and detect proper location candidates. Once we narrow down to a few location candidates, we leave the decision to be made to the stakeholders.

## Analysis <a name="analysis"></a>

In [392]:
print('A total of {} venues are obtained.'.format(paris_venues.shape[0]))

A total of 383 venues are obtained.


We print the venues obtained on a folium map to visualize it.

In [380]:
map_paris = folium.Map(paris_center, zoom_start = 12)

folium.Circle(paris_center,
             radius=4000,
             color='red').add_to(map_paris)

for lat, lng, name in zip(paris_venues['Venue Latitude'],paris_venues['Venue Longitude'],paris_venues['Venue']):
    label = folium.Popup(name)
    folium.CircleMarker([lat, lng],
                       color='blue',
                       radius=5,
                       fill=True).add_to(map_paris)
map_paris

It looks good, a first approximation suggests that the left side of the "La Seine" river that devides the city has much lower venue desity than the right side. To improve this visualization, let's create a **heatmap** showing the density of venues and let's add also the borders of Paris' neighborhoods on top of it.

Getting the data for the borders of Paris' neighborhoods.

In [382]:
#path_arrond = 'https://opendata.paris.fr/explore/dataset/arrondissements/download/?format=geojson&timezone=Europe/Berlin'
path_quart = 'https://opendata.paris.fr/explore/dataset/quartier_paris/download/?format=geojson&timezone=Europe/Berlin'
paris_quartiers = requests.get(path_quart).json()

Creating a list that contains geographical locations of all the venues in order to create the heatmap. We're also manually creating a list of locations with limited venues and then use Nominatim to obtain its geographical locations.

In [412]:
hm_list=[]
for row in paris_venues.iterrows():
    hm_list.append([row[1][4],row[1][5]])

    
addresses=['Ambassade de Suisse',
           'Jardin du Luxembourg',
           'Jardin des Plantes',
           'Cimetière du Père Lachaise',
           'Parc de Bercy',
           'Croulebarbe']
lat_list=[]
lng_list=[]

geolocator = Nominatim(user_agent = 'Paris')
for i in addresses:
    address = i + ', Paris, France'
    print(address)
    location = geolocator.geocode(address)
    lat_list.append(location.latitude)
    lng_list.append(location.longitude)

print(lat_list,lng_list)

Ambassade de Suisse, Paris, France
Jardin du Luxembourg, Paris, France
Jardin des Plantes, Paris, France
Cimetière du Père Lachaise, Paris, France
Parc de Bercy, Paris, France
Croulebarbe, Paris, France
[48.85859385, 48.84672285, 48.8432224, 48.8612168, 48.83564465, 48.833974] [2.31595145195593, 2.33641303020225, 2.35950895709484, 2.39392926381064, 2.38171930578623, 2.34763417199232]


In [413]:
from folium import plugins
from folium.plugins import HeatMap

map_paris = folium.Map(location=paris_center, zoom_start=12)
folium.CircleMarker(paris_center,
                    radius=10,
                    fill=True,
                    fill_color='#3186cc',
                    fill_opacity=0.7).add_to(map_paris)
folium.Circle(paris_center,
             radius=4000,
             color='red').add_to(map_paris)

folium.TileLayer('cartodbpositron').add_to(map_paris) #cartodbpositron cartodbdark_matter

HeatMap(hm_list).add_to(map_paris)#adding heatmap to folium map

folium.GeoJson(paris_quartiers, name='geojson').add_to(map_paris)#Paris' neighborhood borders

for lat, lng, label in zip(lat_list, lng_list, addresses):
    folium.Marker([lat, lng], popup = label).add_to(map_paris)

map_paris

## Results and Discussion <a name="discussion"></a>
Now this is a much better way to visualize the density of target venues in Paris. As shown in the map, Paris is pretty 'hot' when it comes to fitness facilities, though a few 'cooler' areas can be spotted with ease:

On the left side of 'La Seine':
* The area around 7th Arrondissement
* The center of 5th Arrondissement and the Jardin du Luxembourg
* The area near the Jardin des Plantes
* The area around Croulebarbe neighborhood

On the right side of the river, the two main blank spots are:
* The Cimentière de Père Lachaise
* The area around Park de Bercy

Many of the above-listed venues can be automatically eliminated as location candidates given the fact that most of the space available within the area are already used. For instance, *Jardin du Luxembourg*, *Jardin des Plantes*, and *Cimetière du Père Lachaise* are areas in which no space are available for a new facility.

This leaves us with three locations to choose from; the area around the 7th Arrondissement, the area of Bercy's park, and the Croulebarbe neighbourhood.

**The 7th Arrondissement** is probably the most famous borough in Paris with a world class attractions like Champ de Mars, Eiffel Tower, numerous museums among others, though finding a spot for a spots facility in the middle of an area with this type of characteristics isn't always easy.

The area of **Bercy** in the 18th centry was known specially for being used as a warehousing area for wine due to its location right beside the Seine river. For two hundred years, the area was the thriving center of Paris wine trade and a place with a unique life and culture. The area was part of the Paris' plan to revitalize the east part of the city, specially for Bercy, the program includes:

* [A 12.5 hectare park;](http://stephanekirkland.com/the-new-bercy-neighborhood/)
* [1,500 units of housing, mixing market and subsidized units;](http://stephanekirkland.com/the-new-bercy-neighborhood/)
* [113,000 square meters of office space;](http://stephanekirkland.com/the-new-bercy-neighborhood/)
* [40,000 square meters for wholesale activities.](http://stephanekirkland.com/the-new-bercy-neighborhood/)

The AccorHotels Arena located in Bercy is the [third busiest arena](https://en.wikipedia.org/wiki/AccorHotels_Arena) in the world.

Situated in the 13th Arrondissement, the area around **Croulebarbe** is mainly a residential neighbourhood with a population of ~20.000 habitants.

Having made a breaf analysis for the location candidates, the decition is to be made by the stakeholders.

## Conclusion <a name="conclusion"></a>

Purpose of this project was to identify Paris areas close to its city geographical center with low density of fitness facilities in order to aid stakeholders in narrowing down the search for optimal location for a new fitness center. 

We first divided the city into circular neighborhoods so for each one of them the Foursquare API calls can be made, we obtain informations regarding the fitness facilities around every neighborhood and use a heatmap to identify the areas with less venue density. Few location candidates were quickly spotted and some were eliminated due to unavailability. We were left with three areas to choose from, with each one of them having their own characteristics.

Final decission on optimal location will be made by stakeholders based on specific characteristics of neighborhoods and locations in every recommended zone, taking into consideration additional factors like attractiveness of each location such as proximity to major roads, real estate availability, prices, social and economic dynamics of every neighborhood etc.