<h1 align=center><img src = "https://res.cloudinary.com/adagio/image/upload/s--yMJPmZTl--/c_thumb%2Cdn_72%2Cf_auto%2Ch_380%2Cq_auto%2Cw_1280/v1/destinations/France/03_Photo_villes/Nantes/1_Nantes.jpg?itok=ywOuoAUu" width = 2000> </a> 

<h1 align="center"><font size = 20><strong><span style="color: #0579ab;">The Battle for the bus - Nantes</span></strong></font></h1>

## Table of Contents

<div class="alert alert-block alert-info" style="margin-top: 20px">

1. [Introduction](#0)<br>
2. [Data requirements](#2) <br>
3. [Methodology](#4) <br>
4. [Results and discussion] (#6) <br>
5. [Conclusion] (#8) <br>
</div>
<hr>

## 1. Introduction <a id="0"></a>

#### Nantes is the fifth biggest city in France in terms of number of inhabitants. The city benefits from its proximity to the Atlantic, its international airport and a 2-hour train line connecting it to Paris. Consequently, Nantes hosts more than a million visitors every year (**1,735,000** in 2016, the equivalent of 50% of its population). However, public transportation in Nantes are still being developed and the rapidly growing city is suffering from a more and more congested traffic. 

#### We will explore Nantes’ public transport service and identify areas where it seems to lack. In doing so we will also focus on venues and places which attract tourists and make sure that they are all connected to a sufficiently close transport station. This will allow us to make recommendations to public bodies to improve connections for commuters as well as specific connections for tourists.

------------

### Importing the necessary libraries and modules

In [80]:
import numpy as np # library to handle data in a vectorized manner

import pandas as pd # library for data analsysis
pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', None)

import json # library to handle JSON files

!conda install -c conda-forge geopy --yes # uncomment this line if you haven't completed the Foursquare API lab
from geopy.geocoders import Nominatim # convert an address into latitude and longitude values

import requests # library to handle requests
from pandas.io.json import json_normalize # tranform JSON file into a pandas dataframe

# Matplotlib and associated plotting modules
import matplotlib.cm as cm
import matplotlib.colors as colors

# import k-means from clustering stage
from sklearn.cluster import KMeans

#!conda install -c conda-forge folium=0.5.0 --yes # uncomment this line if you haven't completed the Foursquare API lab
import folium # map rendering library


#To get coordinates
!pip install geocoder
import geocoder

Collecting package metadata (current_repodata.json): done
Solving environment: done

# All requested packages already installed.



## 2. Data requirements <a id="2"></a>

To solve this question, we will leverage several data sources:

#### 2.1. We need data about **existing public transportion** in Nantes. 

Public transports in Nantes include buses, tramways and a public bike-share service. Thankfully, Nantes Metropole is providing open source datasets about the bus/tram stops and their location, as well as the bike stations:

   + _Bus and tram stops:_ [Datasource](https://data.nantesmetropole.fr/explore/dataset/244400404_tan-arrets/table/)



In [105]:
transport_stops=pd.read_csv('https://data.nantesmetropole.fr/explore/dataset/244400404_tan-arrets/download/?format=csv&timezone=Europe/Berlin&lang=fr&use_labels_for_header=true&csv_separator=%3B', sep=';', header=0)
filtered_columns=['ID','Name','Coordinates']
stops = transport_stops[filtered_columns]
stops[['Latitude','Longitude']]=stops.Coordinates.str.split(",",expand=True,)
stops.head()

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  self[k1] = value[k2]


Unnamed: 0,ID,Name,Coordinates,Latitude,Longitude
0,ACHA,Angle Chaillou,"47.26976827,-1.5721161",47.26976827,-1.5721161
1,AIGU,Aiguillon,"47.25447241,-1.49652294",47.25447241,-1.49652294
2,ALCH,Champ de l'Alouette,"47.28282777,-1.5800863",47.28282777,-1.5800863
3,AMER,Américains,"47.23612591,-1.5682026",47.23612591,-1.5682026
4,APAV,Apave,"47.22539944,-1.64741053",47.22539944,-1.64741053


   + _Bike stations:_ [Datasource](https://data.nantesmetropole.fr/explore/dataset/244400404_stations-velos-libre-service-nantes-metropole-disponibilites/table/)




In [110]:
bike_stations=pd.read_csv('https://data.nantesmetropole.fr/explore/dataset/244400404_stations-velos-libre-service-nantes-metropole-disponibilites/download/?format=csv&timezone=Europe/Berlin&lang=fr&use_labels_for_header=true&csv_separator=%3B',sep=";", header=0)
#Let's delete the stations that are closed
bike_stations=bike_stations[bike_stations.status=='OPEN']

filtered_columns=['number','name','position','Bike Stands']
bike = bike_stations[filtered_columns]
bike[['Latitude','Longitude']]=bike.position.str.split(",",expand=True,)
bike.head()

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  self[k1] = value[k2]


Unnamed: 0,number,name,position,Bike Stands,Latitude,Longitude
1,124,#00124 - EPHEMERE SCOPITONE,"47.202084,-1.554154",15,47.202084,-1.554154
2,1012,01012 - BORNE TEST NANTES 2,"47.195299,-1.557559",1,47.195299,-1.557559
3,16,016-EDOUARD NORMAND,"47.2190275447,-1.56341948405",15,47.2190275447,-1.56341948405
4,33,033-RACINE,"47.2135226137,-1.56314087443",15,47.2135226137,-1.56314087443
5,43,043 - MACHINE DE L'ÎLE,"47.2069189587,-1.56480679078",36,47.2069189587,-1.56480679078


#### 2.2. We need data *about popular venues* among tourists.

   + For that part we will leverage _Foursquare API_to collect data about venues in Nantes. [A study published in 2011](https://www.blogdumoderateur.com/portrait-des-utilisateurs-de-foursquare-en-france/), a time when Foursquare was most popular, uncovered that most French users of Foursquare were from Paris and its region (56%) while only 3% were from Pays de la Loire, Nantes' region. We will therefore assume (for the purpose of this rapid study) that the venues we will find on Foursquare are frequented by tourists.


_Define Foursquare Credentials and Version_

In [87]:
CLIENT_ID = 'LMUV1AVDGR53S5CNXBW52G1EMRFMVCWQ5WGZ4MOCZQE1ZQ0C' # my Foursquare ID
CLIENT_SECRET = 'G22Q3KB55LRBBCHYJXRYO1Y4MGQ4KPNW0QRNGNJ35GYQ3CI4' # my Foursquare Secret
VERSION = '20180605' # Foursquare API version

_Let's import the function to explore all the neighborhoods in Nantes using Foursquare_

In [89]:
def getNearbyVenues(names, latitudes, longitudes, radius=500, LIMIT=100):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']

        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Neighborhood', 
                  'Neighborhood Latitude', 
                  'Neighborhood Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

_We will download the venues by Nantes boroughs so first let's import the GeoJSOn for the boroughs_

In [112]:
# download boroughs geojson file
import urllib.request, json 
with urllib.request.urlopen("https://data.nantesmetropole.fr/explore/dataset/244400404_quartiers-nantes/download/?format=geojson&timezone=Europe/Berlin&lang=fr") as url:
    nantes_boroughs = json.loads(url.read().decode())

_From that GeoJSON, let's create a dataframe containing the boroughs names and coordinates_

In [113]:
nantes_data = pd.DataFrame(columns=['idobj','Borough','Latitude','Longitude'])
for i in range(11):
    nantes_data.loc[i,'idobj']=nantes_boroughs['features'][i]['properties']['idobj']
    nantes_data.loc[i,'Borough']=nantes_boroughs['features'][i]['properties']['nom']
    nantes_data.loc[i,'Latitude']=nantes_boroughs['features'][i]['properties']['geo_point_2d'][0]
    nantes_data.loc[i,'Longitude']=nantes_boroughs['features'][i]['properties']['geo_point_2d'][1]

pd.to_numeric(nantes_data['idobj'])
nantes_data

Unnamed: 0,idobj,Borough,Latitude,Longitude
0,6,Ile de Nantes,47.2052,-1.54675
1,4,Hauts Pavés - Saint Félix,47.2281,-1.56349
2,2,Bellevue - Chantenay - Sainte Anne,47.1981,-1.60245
3,7,Breil - Barberie,47.2361,-1.57703
4,11,Nantes Sud,47.1916,-1.53045
5,10,Doulon - Bottière,47.237,-1.50657
6,1,Centre Ville,47.2137,-1.55637
7,9,Nantes Erdre,47.2654,-1.52394
8,8,Nantes Nord,47.2575,-1.56547
9,3,Dervallières - Zola,47.2163,-1.58938


_Now write the code to run the above function on each neighborhood and create a new dataframe called **nantes_venues**._

In [114]:
# type your answer here

nantes_venues = getNearbyVenues(names=nantes_data['Borough'],
                                   latitudes=nantes_data['Latitude'],
                                   longitudes=nantes_data['Longitude']
                                  )



Ile de Nantes
Hauts Pavés - Saint Félix
Bellevue - Chantenay - Sainte Anne
Breil - Barberie
Nantes Sud
Doulon - Bottière
Centre Ville
Nantes Erdre
Nantes Nord
Dervallières - Zola
Malakoff - Saint-Donatien


#### Let's check the size of the resulting dataframe

In [115]:
print(nantes_venues.shape)
nantes_venues.head()

(147, 7)


Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,Ile de Nantes,47.20522,-1.546747,Pizza Tradition,47.201625,-1.546358,Pizza Place
1,Ile de Nantes,47.20522,-1.546747,Jardin des Fonderies,47.205913,-1.545194,Garden
2,Ile de Nantes,47.20522,-1.546747,Les Fonderies,47.205864,-1.545335,French Restaurant
3,Ile de Nantes,47.20522,-1.546747,Mangin Beaulieu,47.204752,-1.543296,Basketball Stadium
4,Ile de Nantes,47.20522,-1.546747,Horizon Vert,47.20424,-1.5502,Health Food Store



   + In addition, in case Foursquare data is too scarce, we might want to use other open source data from Nantes Metropole like:
     - Cultural places (museums, galeries, libraries...) from this [datasource](https://data.nantesmetropole.fr/explore/dataset/244400404_equipements-publics-nantes-metropole/table/?disjunctive.libtheme&disjunctive.libcategorie&disjunctive.libtype&disjunctive.statut&disjunctive.commune)
     - Swimming pools from this [datasource](https://data.nantesmetropole.fr/explore/dataset/244400404_piscines-nantes-metropole/table/?disjunctive.commune&disjunctive.acces_pmr_equipt&disjunctive.bassin_sportif&disjunctive.pataugeoire&disjunctive.toboggan&disjunctive.bassin_apprentissage&disjunctive.plongeoir&disjunctive.solarium&disjunctive.bassin_loisir&disjunctive.accessibilite_handicap&disjunctive.libre_service)


In [120]:
cultural_places=pd.read_csv('https://data.nantesmetropole.fr/explore/dataset/244400404_equipements-publics-nantes-metropole/download/?format=csv&timezone=Europe/Berlin&lang=fr&use_labels_for_header=true&csv_separator=%3B',sep=";", header=0)
cultural_places=cultural_places[cultural_places['Thème']=='CULTURE']
filtered_columns=['Equipement','Catégorie','Géolocalisation']
cult=cultural_places[filtered_columns]
cult[['Latitude','Longitude']]=cult['Géolocalisation'].str.split(",",expand=True,)
cult.head()

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  self[k1] = value[k2]


Unnamed: 0,Equipement,Catégorie,Géolocalisation,Latitude,Longitude
5,La Scène Michelet,Salle de spectacle,"47.2328598142,-1.55689374333",47.2328598142,-1.55689374333
13,Mémorial de l'Abolition de l'Esclavage,"Musée, Château","47.209459254,-1.56458520111",47.209459254,-1.56458520111
46,Fonds Régional d'Art Contemporain des Pays de ...,Salle d'exposition,"47.3003926347,-1.504202425",47.3003926347,-1.504202425
49,Bibliothèque Mauves sur Loire,Médiathèque,"47.2970772767,-1.3925876738",47.2970772767,-1.3925876738
52,"Centre Chorégraphique National de Nantes, J. G...",Salle de spectacle,"47.2259393408,-1.56268266219",47.2259393408,-1.56268266219


In [123]:
swimming_pools=pd.read_csv('https://data.nantesmetropole.fr/explore/dataset/244400404_piscines-nantes-metropole/download/?format=csv&timezone=Europe/Berlin&lang=fr&use_labels_for_header=true&csv_separator=%3B',sep=";", header=0)
filtered_columns=['Identifiant','Nom','Géolocalisation']
swim=swimming_pools[filtered_columns]
swim[['Latitude','Longitude']]=swim['Géolocalisation'].str.split(",",expand=True,)
swim.head()

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  self[k1] = value[k2]


Unnamed: 0,Identifiant,Nom,Géolocalisation,Latitude,Longitude
0,2445,B. Lefèvre,"47.213864369,-1.720392994",47.213864369,-1.720392994
1,2087,Piscine,"47.171742955,-1.621296297",47.171742955,-1.621296297
2,2567,Piscine,"47.164908833,-1.469961172",47.164908833,-1.469961172
3,3634,Petite Amazonie,"47.216392657,-1.531710057",47.216392657,-1.531710057
4,1846,Bourgonnière,"47.207785237,-1.65677255",47.207785237,-1.65677255
