# 911 Emergencies 

A US-county would like to know what are the main cases they need to focus on to protect their citizens. They hired you to get that kind of recommandations. In addition they give you a map with all the 911 calls they received over the past years. 

1. Import common libraries (including plotly) 

In [None]:
import pandas as pd 
import numpy as np 

from sklearn.preprocessing import  OneHotEncoder, StandardScaler
from sklearn.compose import ColumnTransformer

import plotly.express as px
import plotly.io as pio
pio.renderers.default = "iframe_connected"

2. Import the dataset here 👉👉 <a href="https://full-stack-bigdata-datasets.s3.eu-west-3.amazonaws.com/Machine+Learning+non+Supervis%C3%A9/DBSCAN/Datasets/911.csv" target="_blank">911.csv</a>

In [None]:
data = pd.read_csv("https://full-stack-bigdata-datasets.s3.eu-west-3.amazonaws.com/Machine+Learning+non+Supervis%C3%A9/DBSCAN/Datasets/911.csv")
data.head()

Unnamed: 0,lat,lng,desc,zip,title,timeStamp,twp,addr,e
0,40.297876,-75.581294,REINDEER CT & DEAD END; NEW HANOVER; Station ...,19525.0,EMS: BACK PAINS/INJURY,2015-12-10 17:10:52,NEW HANOVER,REINDEER CT & DEAD END,1
1,40.258061,-75.26468,BRIAR PATH & WHITEMARSH LN; HATFIELD TOWNSHIP...,19446.0,EMS: DIABETIC EMERGENCY,2015-12-10 17:29:21,HATFIELD TOWNSHIP,BRIAR PATH & WHITEMARSH LN,1
2,40.121182,-75.351975,HAWS AVE; NORRISTOWN; 2015-12-10 @ 14:39:21-St...,19401.0,Fire: GAS-ODOR/LEAK,2015-12-10 14:39:21,NORRISTOWN,HAWS AVE,1
3,40.116153,-75.343513,AIRY ST & SWEDE ST; NORRISTOWN; Station 308A;...,19401.0,EMS: CARDIAC EMERGENCY,2015-12-10 16:47:36,NORRISTOWN,AIRY ST & SWEDE ST,1
4,40.251492,-75.60335,CHERRYWOOD CT & DEAD END; LOWER POTTSGROVE; S...,,EMS: DIZZINESS,2015-12-10 16:56:52,LOWER POTTSGROVE,CHERRYWOOD CT & DEAD END,1


3. The dataset is quite big, take a sample of 10 000 observations

In [None]:
data_sample = data.sample(10000)
data_sample.head()

Unnamed: 0,lat,lng,desc,zip,title,timeStamp,twp,addr,e
76432,40.024967,-75.282905,LEE AVE & SAN MARINO AVE; LOWER MERION; Stati...,,EMS: CHOKING,2016-06-27 13:55:00,LOWER MERION,LEE AVE & SAN MARINO AVE,1
213846,40.24132,-75.242828,BETHLEHEM PIKE & NORTH WALES RD; MONTGOMERY; 2...,19454.0,Fire: BUILDING FIRE,2017-06-19 12:58:54,MONTGOMERY,BETHLEHEM PIKE & NORTH WALES RD,1
601910,40.119876,-75.391906,S SCHUYLKILL AVE & PORT INDIAN RD; WEST NORRIT...,19403.0,Traffic: VEHICLE ACCIDENT -,2020-01-27 17:26:36,WEST NORRITON,S SCHUYLKILL AVE & PORT INDIAN RD,1
611362,40.199028,-75.476592,5TH AVE & W MAIN ST; TRAPPE; Station 324; 202...,19426.0,EMS: ALTERED MENTAL STATUS,2020-02-21 16:14:12,TRAPPE,5TH AVE & W MAIN ST,1
266197,40.096753,-75.365676,HENDERSON RD & PRINCE FREDERICK ST; UPPER MERI...,19406.0,Fire: FIRE ALARM,2017-11-01 07:59:17,UPPER MERION,HENDERSON RD & PRINCE FREDERICK ST,1


5. Using plotly scatter mapbox, visualize your data points on a map. You should also differentiate colors depending on `title`

In [None]:
fig = px.scatter_mapbox(
        data_sample, 
        lat="lat", 
        lon="lng",
        color="title",
        mapbox_style="carto-positron"
)

fig.show()

6. The dataset is quite big, let's try to use DBSCAN to help us out. First, create a variable `X` that only includes `lat`, `lng` and `title` columns.

In [None]:
data_sample = data_sample.loc[:, ["lat", "lng", "title"]]
data_sample.head()

Unnamed: 0,lat,lng,title
76432,40.024967,-75.282905,EMS: CHOKING
213846,40.24132,-75.242828,Fire: BUILDING FIRE
601910,40.119876,-75.391906,Traffic: VEHICLE ACCIDENT -
611362,40.199028,-75.476592,EMS: ALTERED MENTAL STATUS
266197,40.096753,-75.365676,Fire: FIRE ALARM


7. Create dummy variables column `title`.

In [None]:
numeric_features = [0, 1] # Positions des colonnes quantitatives dans X
numeric_transformer = StandardScaler()

# Création du transformer pour les variables catégorielles
categorical_features = [2] # Positions des colonnes catégorielles dans X
categorical_transformer = OneHotEncoder(drop='first')

# On combine les transformers dans un ColumnTransformer
preprocessor = ColumnTransformer(
    transformers=[
        ('num', numeric_transformer, numeric_features),
        ('cat', categorical_transformer, categorical_features)
    ])

# Preprocessings sur le dataset
print("Preprocessing sur le train set...")
print(data_sample.head())
X = preprocessor.fit_transform(data_sample) # fit_transform !!
print('...Terminé.')
print(X[0:5, :])
print()

Preprocessing sur le train set...
              lat        lng                        title
76432   40.024967 -75.282905                 EMS: CHOKING
213846  40.241320 -75.242828          Fire: BUILDING FIRE
601910  40.119876 -75.391906  Traffic: VEHICLE ACCIDENT -
611362  40.199028 -75.476592   EMS: ALTERED MENTAL STATUS
266197  40.096753 -75.365676             Fire: FIRE ALARM
...Terminé.
  (0, 0)	-1.2476723742850728
  (0, 1)	0.17125621318127493
  (0, 13)	1.0
  (1, 0)	0.7815226030383629
  (1, 1)	0.39811079730637344
  (1, 54)	1.0
  (2, 0)	-0.35751113470017426
  (2, 1)	-0.44574331152155167
  (2, 83)	1.0
  (3, 0)	0.38486233376422724
  (3, 1)	-0.925106287709338
  (3, 3)	1.0
  (4, 0)	-0.5743794562618597
  (4, 1)	-0.29727092872775446
  (4, 60)	1.0



8. Let's start using DBSCAN, import the module and fit DBSCAN to your data. You should use `eps=0.2`, `min_samples=100` and `metric="manhattan"` as parameters

In [None]:
from sklearn.cluster import DBSCAN

db = DBSCAN(eps=0.2, min_samples=100, metric="manhattan")

db.fit(X)

DBSCAN(eps=0.2, metric='manhattan', min_samples=100)

9. Find out how many clusters DBSCAN created. 

In [None]:
np.unique(db.labels_)

array([-1,  0,  1,  2])

10. Add a new column `"cluster"` to `data_sample` where each observations are going to be the label of the corresponding cluster.

In [None]:
data_sample["cluster"] = db.labels_
data_sample.head()

Unnamed: 0,lat,lng,title,cluster
76432,40.024967,-75.282905,EMS: CHOKING,-1
213846,40.24132,-75.242828,Fire: BUILDING FIRE,-1
601910,40.119876,-75.391906,Traffic: VEHICLE ACCIDENT -,0
611362,40.199028,-75.476592,EMS: ALTERED MENTAL STATUS,-1
266197,40.096753,-75.365676,Fire: FIRE ALARM,-1


11. Visualize all the clusters on a map except all the ones that DBSCAN considered as outliers.

In [None]:
fig = px.scatter_mapbox(
        data_sample[data_sample.cluster != -1], 
        lat="lat", 
        lon="lng",
        color="cluster",
        mapbox_style="carto-positron"
)

fig.show()

12. Visualize all data points on a map except outliers using plotly. You should have different colors per `title`. 

13. What would then be your recommandations for this US county politicians? 

In [None]:
px.scatter_mapbox(
    data_sample.loc[data_sample.cluster != -1, :],
    lat="lat",
    lon="lng",
    color="title",
    mapbox_style="carto-positron"
)

**The map shows the main topics to focus on and the main areas where this events occur. Therefore these are the areas that politics should focus on.** 