# 911 Emergencies 

A US-county would like to know what are the main cases they need to focus on to protect their citizens. They hired you to get that kind of recommandations. In addition they give you a map with all the 911 calls they received over the past years. 

1. Import common libraries (including plotly) 

In [1]:
import pandas as pd 
import numpy as np 

from sklearn.preprocessing import  OneHotEncoder, StandardScaler
from sklearn.compose import ColumnTransformer

import plotly.express as px
import plotly.io as pio
pio.renderers.default = "iframe_connected"

2. Import the dataset here 👉👉 <a href="https://full-stack-bigdata-datasets.s3.eu-west-3.amazonaws.com/Machine+Learning+non+Supervis%C3%A9/DBSCAN/Datasets/911.csv" target="_blank">911.csv</a>

In [2]:
data = pd.read_csv("https://full-stack-bigdata-datasets.s3.eu-west-3.amazonaws.com/Machine+Learning+non+Supervis%C3%A9/DBSCAN/Datasets/911.csv")
data.head()

Unnamed: 0,lat,lng,desc,zip,title,timeStamp,twp,addr,e
0,40.297876,-75.581294,REINDEER CT & DEAD END; NEW HANOVER; Station ...,19525.0,EMS: BACK PAINS/INJURY,2015-12-10 17:10:52,NEW HANOVER,REINDEER CT & DEAD END,1
1,40.258061,-75.26468,BRIAR PATH & WHITEMARSH LN; HATFIELD TOWNSHIP...,19446.0,EMS: DIABETIC EMERGENCY,2015-12-10 17:29:21,HATFIELD TOWNSHIP,BRIAR PATH & WHITEMARSH LN,1
2,40.121182,-75.351975,HAWS AVE; NORRISTOWN; 2015-12-10 @ 14:39:21-St...,19401.0,Fire: GAS-ODOR/LEAK,2015-12-10 14:39:21,NORRISTOWN,HAWS AVE,1
3,40.116153,-75.343513,AIRY ST & SWEDE ST; NORRISTOWN; Station 308A;...,19401.0,EMS: CARDIAC EMERGENCY,2015-12-10 16:47:36,NORRISTOWN,AIRY ST & SWEDE ST,1
4,40.251492,-75.60335,CHERRYWOOD CT & DEAD END; LOWER POTTSGROVE; S...,,EMS: DIZZINESS,2015-12-10 16:56:52,LOWER POTTSGROVE,CHERRYWOOD CT & DEAD END,1


3. The dataset is quite big, take a sample of 10 000 observations

In [3]:
data_sample = data.sample(10000)
data_sample.head()

Unnamed: 0,lat,lng,desc,zip,title,timeStamp,twp,addr,e
557588,40.235141,-75.224847,STUMP RD; MONTGOMERY; Station 345A; 2019-10-1...,,EMS: VEHICLE ACCIDENT,2019-10-12 11:34:28,MONTGOMERY,STUMP RD,1
349097,40.229362,-75.274025,OAKLAND AVE & LAUREL LN; LANSDALE; Station 34...,19446.0,EMS: FALL VICTIM,2018-05-20 10:02:34,LANSDALE,OAKLAND AVE & LAUREL LN,1
104595,40.155362,-75.30196,DEKALB PIKE & YOST RD; WHITPAIN; Station 385;...,19422.0,EMS: ALTERED MENTAL STATUS,2016-09-08 11:51:23,WHITPAIN,DEKALB PIKE & YOST RD,1
648672,40.024967,-75.282905,ROSEMONT AVE & DEAD END; LOWER MERION; Statio...,,EMS: DIABETIC EMERGENCY,2020-06-16 23:16:35,LOWER MERION,ROSEMONT AVE & DEAD END,1
404047,40.274137,-75.660469,UPLAND SQUARE DR & SELL RD; WEST POTTSGROVE; ...,19464.0,EMS: FALL VICTIM,2018-10-03 16:32:33,WEST POTTSGROVE,UPLAND SQUARE DR & SELL RD,1


5. Using plotly scatter mapbox, visualize your data points on a map. You should also differentiate colors depending on `title`

In [4]:
fig = px.scatter_mapbox(
        data_sample, 
        lat="lat", 
        lon="lng",
        color="title",
        mapbox_style="carto-positron"
)

fig.show()

6. The dataset is quite big, let's try to use DBSCAN to help us out. First, create a variable `X` that only includes `lat`, `lng` and `title` columns.

In [12]:
data_sample = data_sample.loc[:, ["lat", "lng", "title"]]
data_sample.head()

Unnamed: 0,lat,lng,title
557588,40.235141,-75.224847,EMS: VEHICLE ACCIDENT
349097,40.229362,-75.274025,EMS: FALL VICTIM
104595,40.155362,-75.30196,EMS: ALTERED MENTAL STATUS
648672,40.024967,-75.282905,EMS: DIABETIC EMERGENCY
404047,40.274137,-75.660469,EMS: FALL VICTIM


7. Create dummy variables column `title`.

In [13]:
numeric_features = [0, 1] # Positions des colonnes quantitatives dans X
numeric_transformer = StandardScaler()

# Création du transformer pour les variables catégorielles
categorical_features = [2] # Positions des colonnes catégorielles dans X
categorical_transformer = OneHotEncoder(drop='first')

# On combine les transformers dans un ColumnTransformer
preprocessor = ColumnTransformer(
    transformers=[
        ('num', numeric_transformer, numeric_features),
        ('cat', categorical_transformer, categorical_features)
    ])

# Preprocessings sur le dataset
print("Preprocessing sur le train set...")
print(data_sample.head())
X = preprocessor.fit_transform(data_sample) # fit_transform !!
print('...Terminé.')
print(X[0:5, :])
print()

Preprocessing sur le train set...
              lat        lng                       title
557588  40.235141 -75.224847       EMS: VEHICLE ACCIDENT
349097  40.229362 -75.274025            EMS: FALL VICTIM
104595  40.155362 -75.301960  EMS: ALTERED MENTAL STATUS
648672  40.024967 -75.282905     EMS: DIABETIC EMERGENCY
404047  40.274137 -75.660469            EMS: FALL VICTIM
...Terminé.
  (0, 0)	0.4005301586942308
  (0, 1)	0.04554618819250998
  (0, 55)	1.0
  (1, 0)	0.3704042592327946
  (1, 1)	0.015551998121577768
  (1, 23)	1.0
  (2, 0)	-0.015372303014613526
  (2, 1)	-0.0014859275203084365
  (2, 3)	1.0
  (3, 0)	-0.6951453848746587
  (3, 1)	0.01013633472639139
  (3, 17)	1.0
  (4, 0)	0.6038252417145258
  (4, 1)	-0.2201449991145194
  (4, 23)	1.0



In [14]:
from sklearn.cluster import DBSCAN

db = DBSCAN(eps=0.2, min_samples=100, metric="manhattan")

db.fit(X)

DBSCAN(eps=0.2, metric='manhattan', min_samples=100)

9. Find out how many clusters DBSCAN created. 

In [15]:
np.unique(db.labels_)

array([-1,  0,  1,  2,  3,  4,  5,  6,  7,  8,  9], dtype=int64)

10. Add a new column `"cluster"` to `data_sample` where each observations are going to be the label of the corresponding cluster.

In [16]:
data_sample["cluster"] = db.labels_
data_sample.head()

Unnamed: 0,lat,lng,title,cluster
557588,40.235141,-75.224847,EMS: VEHICLE ACCIDENT,-1
349097,40.229362,-75.274025,EMS: FALL VICTIM,-1
104595,40.155362,-75.30196,EMS: ALTERED MENTAL STATUS,-1
648672,40.024967,-75.282905,EMS: DIABETIC EMERGENCY,-1
404047,40.274137,-75.660469,EMS: FALL VICTIM,-1


11. Visualize all the clusters on a map except all the ones that DBSCAN considered as outliers.

In [17]:
fig = px.scatter_mapbox(
        data_sample[data_sample.cluster != -1], 
        lat="lat", 
        lon="lng",
        color="cluster",
        mapbox_style="carto-positron"
)

fig.show()

12. Visualize all data points on a map except outliers using plotly. You should have different colors per `title`. 

13. What would then be your recommandations for this US county politicians? 

In [18]:
px.scatter_mapbox(
    data_sample.loc[data_sample.cluster != -1, :],
    lat="lat",
    lon="lng",
    color="title",
    mapbox_style="carto-positron"
)

**The map shows the main topics to focus on and the main areas where this events occur. Therefore these are the areas that politics should focus on.** 