# Capstone Project - The Battle of the Neighborhoods

## Table of contents
* [Introduction: Business Problem](#introduction)
* [Data](#data)
* [Methodology](#methodology)
* [Analysis](#analysis)
* [Results and Discussion](#results)
* [Conclusion](#conclusion)

## Introduction <a name="introduction"></a>

_Everything said in the following haven't been professionnally fact checked (just me) and eventually can be wrong, exagerated or simply neglected as the objective here is not as much to be strictly exact than to make an interesting and compelling story for a data science project in the context of the IBM course at coursera.  
So don't take anything said here too seriously as the subject can get controversial._  

Due to a recent raise of the crime rate in Paris, **the police department of Paris** contracted us to make a report on the situation of Paris.  
So they can redistribute their agents accordingly.  

We'll first **find a list of the highest crime rate neighborhood** and attempt to **find a correlation between crime rate and the vicinity characteristics**.  
The final objective will be to have a **visual insight that gives an intuitive representation of the crime rate by areas of Paris**, and thus where the police should be posted at in higher number.  

## Data <a name="data"></a>

Here we'll try to answer this simple question, using :  
**Foursquare API**: https://fr.foursquare.com/  
**Paris anomalie dataset** : https://opendata.paris.fr/explore/dataset/dans-ma-rue/information/?disjunctive.type&disjunctive.soustype&disjunctive.code_postal&disjunctive.ville&disjunctive.arrondissement&disjunctive.prefixe&disjunctive.conseilquartier

We'll use the anomalies and the foursquare api to evaluate the demographic of a certain area.  
The anomalies dataset are constituted of sighting from diverse parisian citizen.  
Foursquare's API will allow us to look at the vicinity of any given location and check the users' profiles, combined with the anomalies dataset, we should be able to confirm or find stronger correlation.    

For example, in the anomalies we can find a label "Graffitis, tags, affiches et autocollants", we can hypothesis there's offenders or activists that hangs in vicinity, both are also not exclusive and are often synonymous of young people.

Also in general, when there's a lot of anomalies, we can consider the vicinity as prompt to crime.  

In [13]:
data = "data/dans-ma-rue.csv"
geo = "data/dans-ma-rue.geojson"

In [14]:
import pandas as pd

#for some reason the default separator wasn't set to comma. I
df = pd.read_csv(data, sep=';' , error_bad_lines=False)

print(df.shape)
df.head()

(593446, 16)


Unnamed: 0,TYPE,SOUSTYPE,ADRESSE,CODE_POSTAL,VILLE,ARRONDISSEMENT,DATEDECL,ANNEE DECLARATION,MOIS DECLARATION,NUMERO,PREFIXE,INTERVENANT,CONSEIL DE QUARTIER,OBJECTID,geo_shape,geo_point_2d
0,"Graffitis, tags, affiches et autocollants","Graffitis sur mur, façade sur rue, pont","7 rue du général guilhem, 75011 PARIS",75011,Paris 11,11,2014-11-18,2014,11,2514.0,S,graffitis,LEON BLUM - FOLIE-REGNAULT,85010,"{""type"": ""Point"", ""coordinates"": [2.3788269146...","48.8625069516, 2.37882691468"
1,Voirie et espace public,"Trottoirs:Affaissement, trou, bosse, pavé arraché","20 rue des canettes, 75006 PARIS",75006,Paris 6,6,2014-11-18,2014,11,2523.0,S,DVD,ODEON,85019,"{""type"": ""Point"", ""coordinates"": [2.3336679947...","48.8518369959, 2.33366799471"
2,"Graffitis, tags, affiches et autocollants","Graffitis sur mur, façade sur rue, pont","123 rue de turenne, 75003 PARIS",75003,Paris 3,3,2014-11-18,2014,11,2524.0,S,graffitis,TEMPLE,85020,"{""type"": ""Point"", ""coordinates"": [2.3645561681...","48.8637476716, 2.36455616813"
3,"Graffitis, tags, affiches et autocollants","Graffitis sur mur, façade sur rue, pont","1 rue armand gauthier, 75018 PARIS",75018,Paris 18,18,2014-11-18,2014,11,2531.0,S,graffitis,GRANDES CARRIERES - CLICHY,85027,"{""type"": ""Point"", ""coordinates"": [2.3331972935...","48.8893591496, 2.33319729352"
4,"Graffitis, tags, affiches et autocollants","Graffitis sur mur, façade sur rue, pont","9-31 rue des fossés saint-bernard, 75005 PARIS",75005,Paris 5,5,2014-11-18,2014,11,2532.0,S,graffitis,SAINT-VICTOR,85028,"{""type"": ""Point"", ""coordinates"": [2.3542989997...","48.8481139968, 2.35429899972"


In [15]:
df.shape

(593446, 16)

In [16]:
list(df['TYPE'].unique())

['Graffitis, tags, affiches et autocollants',
 'Voirie et espace public',
 'Mobiliers urbains',
 'Propreté',
 'Éclairage / Électricité',
 'Objets abandonnés',
 'Arbres, végétaux et animaux',
 'Eau',
 'Autos, motos, vélos...',
 'Activités commerciales et professionnelles',
 'Du vert près de chez moi',
 'Problème sur un chantier']

As we can see there's 10 primary categories of anomalies.  
We'll summarize each categories to give more context and point out useful categories with a star.   
Note that we consider there's a correlation between dirtiness and vandalism.  

* **Graffitis, tags, affiches et autocollants**: public vandalism *
* **Voirie et espace public**: Street deterioration *
* **Mobiliers urbains**: urban furniture deterioration *
* **Propreté**: Homeless, illegal immigrant * 
* **Éclairage / Électricité**: malfunctions of public lights
* **Objets abandonnés**: cumbersome objects left on the street *
* **Arbres, végétaux et animaux**: dangerous tree or presence of rats, maintenance issue
* **Eau**: flood, water related issues
* **Autos, motos, vélos...**: abandonned vehicles *
* **Activités commerciales et professionnelles**: flyers with fraudulous use of Paris' logo or colors *
* **Du vert près de chez moi**: plants
* **Problème sur un chantier**: a single case, most likely not impactful


We'll also drop some columns and missing data.  
We drop columns mostly because they are redundant, or do not serve for our purpose.  

In [17]:
df.drop( ['ADRESSE', 'CODE_POSTAL', 'VILLE', 'DATEDECL', "NUMERO", "OBJECTID", "PREFIXE", "INTERVENANT", "geo_shape"], axis=1, inplace = True)
df = df.dropna()

print(df.shape)
df.head()

(593401, 7)


Unnamed: 0,TYPE,SOUSTYPE,ARRONDISSEMENT,ANNEE DECLARATION,MOIS DECLARATION,CONSEIL DE QUARTIER,geo_point_2d
0,"Graffitis, tags, affiches et autocollants","Graffitis sur mur, façade sur rue, pont",11,2014,11,LEON BLUM - FOLIE-REGNAULT,"48.8625069516, 2.37882691468"
1,Voirie et espace public,"Trottoirs:Affaissement, trou, bosse, pavé arraché",6,2014,11,ODEON,"48.8518369959, 2.33366799471"
2,"Graffitis, tags, affiches et autocollants","Graffitis sur mur, façade sur rue, pont",3,2014,11,TEMPLE,"48.8637476716, 2.36455616813"
3,"Graffitis, tags, affiches et autocollants","Graffitis sur mur, façade sur rue, pont",18,2014,11,GRANDES CARRIERES - CLICHY,"48.8893591496, 2.33319729352"
4,"Graffitis, tags, affiches et autocollants","Graffitis sur mur, façade sur rue, pont",5,2014,11,SAINT-VICTOR,"48.8481139968, 2.35429899972"


In [21]:
new_names = {
    "TYPE" : "TYPE",
    "SOUSTYPE" : "SUBTYPE",
    "ARRONDISSEMENT" : "BOROUGH",
    "ANNEE DECLARATION" : "YEAR",
    "MOIS DECLARATION" : "MONTH",
    "CONSEIL DE QUARTIER" : "NEIGHBORHOOD",
    "geo_point_2d" : "LOCATION"
}

df.rename(columns = new_names, inplace=True)
df.head()

Unnamed: 0,TYPE,SUBTYPE,BOROUGH,YEAR,MONTH,NEIGHBORHOOD,LOCATION
0,"Graffitis, tags, affiches et autocollants","Graffitis sur mur, façade sur rue, pont",11,2014,11,LEON BLUM - FOLIE-REGNAULT,"48.8625069516, 2.37882691468"
1,Voirie et espace public,"Trottoirs:Affaissement, trou, bosse, pavé arraché",6,2014,11,ODEON,"48.8518369959, 2.33366799471"
2,"Graffitis, tags, affiches et autocollants","Graffitis sur mur, façade sur rue, pont",3,2014,11,TEMPLE,"48.8637476716, 2.36455616813"
3,"Graffitis, tags, affiches et autocollants","Graffitis sur mur, façade sur rue, pont",18,2014,11,GRANDES CARRIERES - CLICHY,"48.8893591496, 2.33319729352"
4,"Graffitis, tags, affiches et autocollants","Graffitis sur mur, façade sur rue, pont",5,2014,11,SAINT-VICTOR,"48.8481139968, 2.35429899972"


## Methodology <a name="methodology"></a>

## Analysis <a name="analysis"></a>

## Results and Discussion <a name="results"></a>

## Conclusion <a name="conclusion"></a>