# Gym opening in Paris

![gym](https://images.unsplash.com/photo-1571902943202-507ec2618e8f?ixlib=rb-1.2.1&ixid=eyJhcHBfaWQiOjEyMDd9&auto=format&fit=crop&w=968&q=80)

Capstone Data Science Project

## Table of contents
- [Introduction](#intro)
- [Business problem](#problem)
- [Data](#data)
- [Define neighborhoods](#neighbors)

## Introduction <a name="intro"></a>

This project is part of the **Data Science Certification** created by the company **IBM** on the online courses platform **Coursera**.  
The **requirements** for this project are :
- Answer a question related to venues in a big city
- Use Foursquare API venues data
- Use k-means clustering algorithm

Before starting to run any code, make sure to have the following libraries installed in your working environnement !

In [0]:
import pandas as pd
import numpy as np
import requests
import folium
import utm
from geopy.geocoders import Nominatim
from sklearn.cluster import KMeans

As you can see, I am using the **'utm'** library for this project.  
It is a library to **convert latitude and longitude values into 2D coordinates.**  
The reason for this is that I needed to get distances between two specific points and working with coordinates is easier.  
To do so, I checked the **following map** to see if the **whole city of Paris was located in only one area** so I don't have any problems using UTM package.  
Since **the city is only inside one zone (31U)**, I can use coordinates instead of latitude and longitude to get distances between points.

![ParisUTM](https://upload.wikimedia.org/wikipedia/commons/thumb/9/9e/LA2-Europe-UTM-zones.png/800px-LA2-Europe-UTM-zones.png)

## Business problem <a name="problem"></a>

In crowded cities, going to the **gym after your working days** can become a real **challenge** since a lot of people want to exercise at the **same time as you.**  

**It's not rare to wait** for some weights or machines **at peak hours** and this waiting time can transform in a **lack of motivation** to train in a gym during the weekdays.  

The goal of this project is to **find the best places in Paris to open a new gym** in order to reduce the amount of people going to one specific gym.

In fact, since some areas don't have any gym available, some workers are going to a gym relatively far from their workplace and with this project, we will try to find **which places can be of interested for gym companies**, so workers will stay around their working place to workout.

## Data <a name="data"></a>

For this project, data comes from two sources :
- Foursquare API to get venues data
- GeoPy data for longitude and latitude values
- Folium for interactive maps

To define candidates for clustering, I decided that I will not rely on Paris districts since they are too big for this analysis and they are not really relevant groups for gym facilities.  

You can find more details in the following section.

## Define neighborhoods <a name="neighbors"></a>

In [245]:
# Let's say that 'Quai Saint-Michel' is the middle of Paris
address = 'Quai Saint-Michel'
geolocator = Nominatim(user_agent="paris_explorer")
location = geolocator.geocode(address)
paris_latitude = location.latitude
paris_longitude = location.longitude
print('The geographical coordinate of Paris are {}, {}.'.format(paris_latitude, paris_longitude))
print(geolocator.reverse((paris_latitude, paris_longitude)))

The geographical coordinate of Paris are 48.8533342, 2.3459126.
Saint-Michel, Rue Xavier Privas, Îlot Saint-Séverin, Quartier de la Sorbonne, Paris, Île-de-France, France métropolitaine, 75005, France


In [237]:
def to_latlon(x, y):
  # Zone variables for Paris
  zone_number = 31
  zone_letter = 'U'
  transform = utm.to_latlon(x, y, zone_number, zone_letter)
  return transform[0], transform[1]

def to_xy(latitude, longitude):
  transform = utm.from_latlon(latitude, longitude)
  return transform[0], transform[1]

lat, lon = to_latlon(452428.46166680905, 5411728.410196534)
print(lat, lon)
x, y = to_xy(lat, lon)
print(x, y)

48.85669690208804 2.3514615868388007
452428.4607033939 5411728.410436873


In [0]:
def get_centers():
  latitudes = []
  longitudes = []
  paris_x, paris_y = to_xy(paris_latitude, paris_longitude)

  # Define approximately the borders of Paris
  min_x = int(paris_x - 6000)
  max_x = int(paris_x + 5000)
  min_y = int(paris_y - 3500)
  max_y = int(paris_y + 5500)

  for x in range(min_x, max_x, 700):
    for y in range(min_y, max_y, 700):
      d = np.linalg.norm(np.array((x, y)) - np.array((paris_x, paris_y)))
      if d <= 6000:
        lat, lon = to_latlon(x, y)
        latitudes.append(lat)
        longitudes.append(lon)
  return latitudes, longitudes

In [239]:
lat, lon = get_centers()
print(len(lat))

185


In [253]:
map_paris = folium.Map([paris_latitude, paris_longitude], zoom_start = 12, tiles='Stamen Toner')
folium.Marker([paris_latitude, paris_longitude], popup = 'Paris center').add_to(map_paris)
for lati, longi in zip(lat, lon):
    folium.Circle([lati, longi], radius = 350, color = '#006666', fill = False).add_to(map_paris)
map_paris

In [0]:

limit = 100 # limit of number of venues returned by Foursquare API
radius = 700 # define radius
gym_category = '4bf58dd8d48988d175941735'#'52e81612bcbc57f1066b7a0c'

In [0]:
def getNearbyVenues(latitudes, longitudes):
    
    venues_list=[]
    for lat, lng in zip(latitudes, longitudes):
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?client_id={}&client_secret={}&v={}&ll={},{}&categoryId={}&radius={}&limit={}'.format(
        CLIENT_ID, CLIENT_SECRET, version, lat, lng, gym_category, radius, limit)
            
        # make the GET request
        r = requests.get(url).json()
        print(r)
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Neighborhood Latitude', 
                  'Neighborhood Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

In [233]:
paris_venues = getNearbyVenues(lat, lon)

{'meta': {'code': 429, 'errorType': 'quota_exceeded', 'errorDetail': 'Quota exceeded', 'requestId': '5eb551efb4b684001b598dc0'}, 'response': {}}


KeyError: ignored

In [215]:
paris_venues.head(105)

Unnamed: 0,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,48.827714,2.274030,Fitness parc issy,48.826920,2.280927,Gym / Fitness Center
1,48.827714,2.274030,Forest Hill Aquaboulevard,48.831278,2.276187,Gym / Fitness Center
2,48.827714,2.274030,Palais des Sports Robert Charpentier,48.827561,2.268412,Stadium
3,48.827714,2.274030,Salle de Sport Sequana,48.833085,2.269177,Gym
4,48.827714,2.274030,Envido Mairie D'Issy,48.823738,2.272897,Gym Pool
...,...,...,...,...,...,...
239,48.878264,2.301940,CMG Sports Club One Monceau,48.880695,2.306001,Gym / Fitness Center
241,48.878264,2.301940,Espace Bikram,48.883052,2.304334,Yoga Studio
242,48.878264,2.301940,Le Studio du 17ème - Pilates,48.881928,2.303942,Pilates Studio
245,48.884561,2.301853,Eurowin Consulting Group,48.886954,2.304271,Gym / Fitness Center


In [208]:
paris_venues.shape

(1298, 6)

In [0]:
paris_venues.drop_duplicates('Venue Latitude', inplace = True)

In [210]:
paris_venues.shape

(454, 6)

In [211]:
map_paris = folium.Map(location=[paris_latitude, paris_longitude], zoom_start=12)

# add markers to map
for lat, lng, label in zip(paris_venues['Venue Latitude'], paris_venues['Venue Longitude'], paris_venues['Venue Category']):
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=3,
        popup=label,
        color='blue',
        fill=True,
        fill_color='blue',
        fill_opacity=1,
        parse_html=False).add_to(map_paris)  
    
map_paris