# Predicting the best *Gilets Jaunes*' demonstration localization to limit material damages in Paris

*Morgane Nadal - PSL University & Ecole Normale Supérieure Student, Paris*

## **Business Plan**

  It has been now 5 months that *Gilets Jaunes* are demonstrating in Paris. Some are pacific whereas an increasing number of "black blocs" are decimating the old neighborhoods of the town, trashing streets and buildings and specifically targetting Gastronomic Restaurants, Luxury shops and other so-called symbolic places. The global cost for France is currently above the hundred million of euros.

  In this project, we will try to identify the neighboorhods that are the most likely to be subject to vandalism and try to find neighborhoods where the manifestors could be headed over in order to avoid damage as much as possible.
  
  We believe that it can help the French Government and Paris citizens to estimate what damages could be done in case of a march. It is also essential for the *Gilets Jaunes* who are truly willing to speak their voice during a planned and government accepted-march, and want to demonstrate without the violences and degradations that had accompagnied them every Saturday.
  
  We make this project public and we know that there are plenty of other factors to take into account on this very sensitive subject. This project is just a preliminary to further analyses.

## Importations

In [None]:
import numpy as np 

import pandas as pd

import json 

#!conda install -c conda-forge geopy --yes # uncomment this line if you haven't completed the Foursquare API lab
from geopy.geocoders import Nominatim # convert an address into latitude and longitude values

import requests # library to handle requests
from pandas.io.json import json_normalize # tranform JSON file into a pandas dataframe

# Matplotlib and associated plotting modules
import matplotlib.cm as cm
import matplotlib.colors as colors

# import k-means from clustering stage
from sklearn.cluster import KMeans

#!conda install -c conda-forge folium=0.5.0 --yes # uncomment this line if you haven't completed the Foursquare API lab
import folium # map rendering library

## Data Cleaning

We will use the borough and neighborhood data to find with Foursquare the venues in each neighborhoods. For that, we need a table with the boroough, neighborhood, coordinates and venues. Let's go!

***First, you will have to download the JSON file of the Paris Neighborhoods at this address : ***

    https://opendata.paris.fr/explore/dataset/quartier_paris/export/?location=12,48.85889,2.34692&basemap=jawg.streets&dataChart=eyJxdWVyaWVzIjpbeyJjb25maWciOnsiZGF0YXNldCI6InF1YXJ0aWVyX3BhcmlzIiwib3B0aW9ucyI6e319LCJjaGFydHMiOlt7ImFsaWduTW9udGgiOnRydWUsInR5cGUiOiJjb2x1bW4iLCJmdW5jIjoiQVZHIiwieUF4aXMiOiJuX3NxX3F1Iiwic2NpZW50aWZpY0Rpc3BsYXkiOnRydWUsImNvbG9yIjoiIzI2Mzg5MiJ9XSwieEF4aXMiOiJuX3NxX3F1IiwibWF4cG9pbnRzIjo1MCwic29ydCI6IiJ9XSwidGltZXNjYWxlIjoiIiwiZGlzcGxheUxlZ2VuZCI6dHJ1ZSwiYWxpZ25Nb250aCI6dHJ1ZX0%3D
    
***OR you can find it on my Github repository:*** https://github.com/Smaragdy/Coursera-Project-Gilets-Jaunes
    
    
We then extract a panda dataframe from it:

In [None]:
with open('YOUR_DIRECTORY_quartier_paris.json') as json_data:
    quartier_paris = json.load(json_data)

In [None]:
#Have a look at the data
quartier_paris[0]

In [1]:
# define the dataframe columns
column_names = ['Borough', 'Neighborhood', 'Latitude','Longitude'] 

# instantiate the dataframe
neigh = pd.DataFrame(columns=column_names)

In [None]:
for data in quartier_paris:
    borough = neighborhood_name = data['fields']['c_ar']
    neighborhood_name = data['fields']['l_qu']
        
    neighborhood_latlon = data['geometry']['coordinates']
    neighborhood_lat = neighborhood_latlon[1]
    neighborhood_lon = neighborhood_latlon[0]
    
    neigh = neigh.append({'Borough': borough,
                                          'Neighborhood': neighborhood_name,
                                          'Latitude': neighborhood_lat,
                                          'Longitude': neighborhood_lon}, ignore_index=True)

In [None]:
neigh.head()

In [None]:
print('The dataframe has {} boroughs and {} neighborhoods.'.format(
        len(neigh['Borough'].unique()),
        neigh.shape[0]
    )
)

Now that we have the coordinates associated with each neighboorhood, we will find the venues using Foursquare. Once we have the venue, we will try to cluster the neighborhoods in order to know which are the more likely to have huge damages.

***But FIRST, We will have to take into account the kind of population living in these area, who could join or not the movement, be more impacted, etc. ***

*If the dataset were easily available, it would be nice to add to this datatable the index of criminality in the different neighborhood, as well as a population kind (student, residential, ...) index.*

Instead, we will use Poverty Index and Revenues known in Paris Borough (INSEE 2015). ***Please download the file on my Github repository :***
https://github.com/Smaragdy/Coursera-Project-Gilets-Jaunes

In [None]:
Pov_df = pd.read_excel('YOUR_DIRECTORY/base-cc-filosofi-2015.xls',header = 4)

In [None]:
Pov_df.head()

In [None]:
PRC = Pov_df[["Code géographique","Taux de pauvreté-Ensemble (%)","Médiane du niveau vie (€)"]]

In [None]:
PRC = PRC.drop([0], axis=0)

In [None]:
PRC.rename(columns={'Code géographique':'Borough',
                          'Libellé géographique':'Borough_name',
                          'Taux de pauvreté-Ensemble (%)':'Poverty',
                          'Médiane du niveau vie (€)':'Median_Life_level'}, 
                 inplace=True)

In [None]:
PRC.head()

In [None]:
#PRC = PRC.drop([0], axis=0)

In [None]:
PRCC = PRC[PRC['Borough'].str.contains('751+') == True]
PRCC.head()

In [None]:
P = PRCC[PRCC['Borough_name'].str.contains('Paris') == True]
P

In [None]:
Bor = list(range(1,21))

In [None]:
P['Borough'] = Bor
P = P.drop(['Borough_name'], axis=1)
P

In [None]:
Bo = list(P.iloc[:,0])
Pov = list(P.iloc[:,1])
Med = list(P.iloc[:,2])

In [None]:
L=[]
M=[]
for i in neigh['Borough']:
    for j in Bo:
        if i == j:
            L.append(Pov[j-1])
            M.append(Med[j-1])

In [None]:
neigh['Poverty'] = L
neigh['Median_Life_Level'] = M

In [None]:
neigh.head()

Done ! We have finished the data cleaning !

## Exploration of the venues in each neighborhood

In [None]:
address = 'Paris, France'

geolocator2 = Nominatim()
location = geolocator2.geocode(address)
latitude = location.latitude
longitude = location.longitude

map_paris = folium.Map(location=[latitude, longitude], zoom_start=11.5)

# add markers to map
for lat, lng, label in zip(neigh['Latitude'], neigh['Longitude'], neigh['Neighborhood']):
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=3,
        popup=label,
        color='blue',
        fill=True,
        fill_color='blue',
        fill_opacity=0.3,
        parse_html=False).add_to(map_paris)  
    
map_paris

Let's explore the first neighborhood :

In [None]:
#First enter your Foursquare IDs :

CLIENT_ID = 'your-client-ID' # your Foursquare ID
CLIENT_SECRET = 'your-client-secret' # your Foursquare Secret
VERSION = '20180605' # Foursquare API version

print('Your credentails:')
print('CLIENT_ID: ' + CLIENT_ID)
print('CLIENT_SECRET:' + CLIENT_SECRET)

In [None]:
neigh.loc[0, 'Neighborhood']

In [1]:
neighborhood_latitude = neigh.loc[0, 'Latitude'] # neighborhood latitude value
neighborhood_longitude = neigh.loc[0, 'Longitude'] # neighborhood longitude value

neighborhood_name = neigh.loc[0, 'Neighborhood'] # neighborhood name

print('Latitude and longitude values of {} are {}, {}.'.format(neighborhood_name, 
                                                               neighborhood_latitude, 
                                                               neighborhood_longitude))

In [None]:
LIMIT = 100 # limit of number of venues returned by Foursquare API

radius = 500 # define radius

# create URL
url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
    CLIENT_ID, 
    CLIENT_SECRET, 
    VERSION, 
    neighborhood_latitude, 
    neighborhood_longitude, 
    radius, 
    LIMIT)
url # display URL

In [None]:
results = requests.get(url).json()

In [None]:
# function that extracts the category of the venue
def get_category_type(row):
    try:
        categories_list = row['categories']
    except:
        categories_list = row['venue.categories']
        
    if len(categories_list) == 0:
        return None
    else:
        return categories_list[0]['name']

In [None]:
venues = results['response']['groups'][0]['items']
    
nearby_venues = json_normalize(venues) # flatten JSON

# filter columns
filtered_columns = ['venue.name', 'venue.categories', 'venue.location.lat', 'venue.location.lng']
nearby_venues =nearby_venues.loc[:, filtered_columns]

# filter the category for each row
nearby_venues['venue.categories'] = nearby_venues.apply(get_category_type, axis=1)

# clean columns
nearby_venues.columns = [col.split(".")[-1] for col in nearby_venues.columns]

nearby_venues.head()

In [None]:
print('{} venues were returned by Foursquare.'.format(nearby_venues.shape[0]))

In [None]:
def getNearbyVenues(names, latitudes, longitudes, radius=500):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Neighborhood', 
                  'Neighborhood Latitude', 
                  'Neighborhood Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

In [None]:
paris_venues = getNearbyVenues(names=neigh['Neighborhood'],
                                   latitudes=neigh['Latitude'],
                                   longitudes=neigh['Longitude'])

In [None]:
print(paris_venues.shape)
paris_venues.head()

In [None]:
paris_venues.groupby('Neighborhood').count().head()

In [None]:
print('There are {} uniques categories.'.format(len(paris_venues['Venue Category'].unique())))

In [None]:
# one hot encoding
paris_onehot = pd.get_dummies(paris_venues[['Venue Category']], prefix="", prefix_sep="")

# add neighborhood column back to dataframe
paris_onehot['Neighborhood'] = paris_venues['Neighborhood'] 

# move neighborhood column to the first column
fixed_columns = [paris_onehot.columns[-1]] + list(paris_onehot.columns[:-1])
paris_onehot = paris_onehot[fixed_columns]

paris_onehot.head()

In [None]:
paris_grouped = paris_onehot.groupby('Neighborhood').mean().reset_index()
paris_grouped.head()

In [None]:
A = neigh.groupby('Neighborhood')['Poverty'].mean().reset_index()


In [None]:
X1 = neigh[['Poverty']].to_numpy()
Y1 = (X1-min(X1))/(max(X1)-min(X1))

In [None]:
X2 = neigh[['Median_Life_Level']].to_numpy()
Y2 = (X2-min(X2))/(max(X2)-min(X2))

In [None]:
paris_grouped.shape

In [None]:
num_top_venues = 10

for hood in paris_grouped['Neighborhood']:
    print("----"+hood+"----")
    temp = paris_grouped[paris_grouped['Neighborhood'] == hood].T.reset_index()
    temp.columns = ['venue','freq']
    temp = temp.iloc[1:]
    temp['freq'] = temp['freq'].astype(float)
    temp = temp.round({'freq': 2})
    print(temp.sort_values('freq', ascending=False).reset_index(drop=True).head(num_top_venues))
    print('\n')

In [None]:
def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:num_top_venues]

In [None]:
num_top_venues = 10

indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
columns = ['Neighborhood']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

# create a new dataframe
neighborhoods_venues_sorted = pd.DataFrame(columns=columns)
neighborhoods_venues_sorted['Neighborhood'] = paris_grouped['Neighborhood']

for ind in np.arange(paris_grouped.shape[0]):
    neighborhoods_venues_sorted.iloc[ind, 1:] = return_most_common_venues(paris_grouped.iloc[ind, :], num_top_venues)

neighborhoods_venues_sorted.head()

In [None]:
paris_grouped.drop('Neighborhood', 1).head()

In [None]:
paris_grouped['Poverty_Index']=Y1
paris_grouped['Median_Life_Level_Index']=Y2

In [None]:
paris_grouped.head()

In [None]:
# set number of clusters
kclusters = 5

paris_grouped_clustering = paris_grouped.drop('Neighborhood', 1)

# run k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(paris_grouped_clustering)

# check cluster labels generated for each row in the dataframe
kmeans.labels_[0:10] 

In [None]:
paris_grouped.head()

In [None]:
# add clustering labels
neighborhoods_venues_sorted.insert(0, 'Cluster Labels', kmeans.labels_)

paris_merged = neigh

# merge toronto_grouped with toronto_data to add latitude/longitude for each neighborhood
paris_merged = paris_merged.join(neighborhoods_venues_sorted.set_index('Neighborhood'), on='Neighborhood')

paris_merged.head() # check the last columns!

In [None]:
# create map
map_clusters = folium.Map(location=[latitude, longitude], zoom_start=12)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i + x + (i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(paris_merged['Latitude'], paris_merged['Longitude'], paris_merged['Neighborhood'], paris_merged['Cluster Labels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(map_clusters)
       
map_clusters

In [None]:
paris_merged.loc[paris_merged['Cluster Labels'] == 0, paris_merged.columns[[1] + list(range(5, paris_merged.shape[1]))]]

In [None]:
CHLOROPETH

## Clustering the neighborhoods

## Conclusion

## Discussion