# Modelo predictivo  (Espacial) de siniestros en las calles de Santiago
- UDD - MDS18 - BDA
- Final Delivery 
- 30_Final_Geo_Project_Final Datasets_Creation
- 09 August 2019

**CHANGE KERNEL:** geo_p_3_7

## Datasets (Landing)

###  CONASET
http://mapas-conaset.opendata.arcgis.com/search?groupIds=fca1f61c6556499db843c09cc80c70c0

Estas capas contienen la geocodificación de los accidentes de tránsito registrados en la Región Metropolitana entre los años 2013 y 2018. Contiene los detalles de fecha (en la mayoria solo día/mes), tipo de accidente, causa basal del accidente, dirección donde ocurrió el accidente, fallecidos y lesionados según gravedad. 
- [Siniestros RM - 2013](http://mapas-conaset.opendata.arcgis.com/datasets/12cb58c27a2846dfa60cf629a14d611a_0)
- [Siniestros RM - 2014](http://mapas-conaset.opendata.arcgis.com/datasets/aa5b5322bc564b809aa29c70658b9cf9_0)
- [Siniestros RM - 2015](http://mapas-conaset.opendata.arcgis.com/datasets/dafa26dbce99467985596d8a58216b79_0)
- [Siniestros RM - 2016](http://mapas-conaset.opendata.arcgis.com/datasets/32ee49c703b840b885b9c80b37ae72d0_0)
- [Siniestros RM - 2017](http://mapas-conaset.opendata.arcgis.com/datasets/907addac92b74e3fa30d40edb72d1813_0)
- [Siniestros RM - 2018](http://mapas-conaset.opendata.arcgis.com/datasets/3a084373b58b45d0ae01d9c14a231cf8_0)

### OpenStreetMap
- POI and streets of Santiago
<br><br>
http://download.geofabrik.de/south-america/chile.html

## Main Libraries

In [1]:
from math import radians, cos, sin, asin, sqrt, atan2
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import matplotlib as mpl
from pylab import rcParams
import seaborn as sns
import folium
import networkx as nx
import osmnx as ox
import geopandas as gpd
from shapely.geometry import Point, Polygon, LineString, MultiLineString
import shapefile as shp
import gpd_lite_toolbox as glt
from gpd_lite_toolbox.utils_carto import m_plot_dataframe, m_plot_multipolygon
import warnings
warnings.filterwarnings('ignore')

In [2]:
sns.set(context='paper', style='ticks', palette='inferno')
sns.mpl.rc("figure", figsize=(10, 6))
mpl.rcParams['figure.dpi']= 150

## Creating Final Train and Test datasets 

In [3]:
!ls ../data

[34mCONASET[m[m                              geo_stgo_100_crash_test_dataset.csv
[34mOSM_Chile[m[m                            geo_stgo_100_crash_train_dataset.csv
final_test_dataset_grid_100.csv      geo_stgo_100_estatic_dataset.csv
final_train_dataset_grid_100.csv


In [4]:
Static_features = pd.read_csv("../data/geo_stgo_100_estatic_dataset.csv")
Static_features.shape

(63029, 37)

In [5]:
Static_features.head(2)

Unnamed: 0.1,Unnamed: 0,X,Y,bank,bench,beverages,bus_stop,bus_stop_100,cafe,convenience,...,restaurant_100,school,school_100,school_200,stop,stop_100,taxi,traffic_signals,traffic_signals_100,turning_circle
0,0,-70.824738,-33.557697,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
1,1,-70.823831,-33.556787,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0


In [6]:
Static_features.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 63029 entries, 0 to 63028
Data columns (total 37 columns):
Unnamed: 0             63029 non-null int64
X                      63029 non-null float64
Y                      63029 non-null float64
bank                   63029 non-null int64
bench                  63029 non-null int64
beverages              63029 non-null int64
bus_stop               63029 non-null int64
bus_stop_100           63029 non-null int64
cafe                   63029 non-null int64
convenience            63029 non-null int64
convenience_100        63029 non-null int64
convenience_200        63029 non-null int64
crossing               63029 non-null int64
crossing_100           63029 non-null int64
fast_food              63029 non-null int64
fast_food_100          63029 non-null int64
fast_food_200          63029 non-null int64
fuel                   63029 non-null int64
intercect              63029 non-null int64
kindergarten           63029 non-null int64
motorwa

In [7]:
train_dyna_features = pd.read_csv("../data/geo_stgo_100_crash_train_dataset.csv")
train_dyna_features.shape

(63029, 28)

In [8]:
test_dyna_features = pd.read_csv("../data/geo_stgo_100_crash_test_dataset.csv")
test_dyna_features.shape

(63029, 28)

In [9]:
test_dyna_features.head(2)

Unnamed: 0.1,Unnamed: 0,ATROPELLO,ATROPELLO_100,ATROPELLO_200,CAIDA,CAIDA_100,CAIDA_200,CHOQUE,CHOQUE_100,CHOQUE_200,...,OTRO TIPO_100,OTRO TIPO_200,SEV_Index_1,SEV_Index_100,SEV_Index_200,VOLCADURA,VOLCADURA_100,VOLCADURA_200,X,Y
0,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,-70.824738,-33.557697
1,1,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,-70.823831,-33.556787


In [10]:
test_dyna_features.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 63029 entries, 0 to 63028
Data columns (total 28 columns):
Unnamed: 0       63029 non-null int64
ATROPELLO        63029 non-null int64
ATROPELLO_100    63029 non-null int64
ATROPELLO_200    63029 non-null int64
CAIDA            63029 non-null int64
CAIDA_100        63029 non-null int64
CAIDA_200        63029 non-null int64
CHOQUE           63029 non-null int64
CHOQUE_100       63029 non-null int64
CHOQUE_200       63029 non-null int64
COLISION         63029 non-null int64
COLISION_100     63029 non-null int64
COLISION_200     63029 non-null int64
FID              63029 non-null int64
INCENDIO         63029 non-null int64
INCENDIO_100     63029 non-null int64
INCENDIO_200     63029 non-null int64
OTRO TIPO        63029 non-null int64
OTRO TIPO_100    63029 non-null int64
OTRO TIPO_200    63029 non-null int64
SEV_Index_1      63029 non-null int64
SEV_Index_100    63029 non-null int64
SEV_Index_200    63029 non-null int64
VOLCADURA        

In [11]:
def create_dep_var(row):
    if (row['ATROPELLO'] + row['CAIDA'] + row['COLISION'] + row['INCENDIO'] +
            row['OTRO TIPO'] + row['VOLCADURA']) == 0:
        return 0
    else:
        return 1

In [12]:
train_dyna_features['SINIESTRO'] = train_dyna_features.apply(create_dep_var, axis=1)
test_dyna_features['SINIESTRO'] = test_dyna_features.apply(create_dep_var, axis=1)

In [13]:
train_dyna_features.head(2)

Unnamed: 0.1,Unnamed: 0,ATROPELLO,ATROPELLO_100,ATROPELLO_200,CAIDA,CAIDA_100,CAIDA_200,CHOQUE,CHOQUE_100,CHOQUE_200,...,OTRO TIPO_200,SEV_Index_1,SEV_Index_100,SEV_Index_200,VOLCADURA,VOLCADURA_100,VOLCADURA_200,X,Y,SINIESTRO
0,0,0,0,0,0,0,0,0,0,0,...,0,0.0,0.0,0.0,0,0,0,-70.824738,-33.557697,0
1,1,0,0,0,0,0,0,0,0,0,...,0,0.0,0.0,0.0,0,0,0,-70.823831,-33.556787,0


## Creating and Saving Train Dataset

In [14]:
data = Static_features.copy()

In [15]:
data = data.rename(columns={'Unnamed: 0':'id'})

In [16]:
data.head(2)

Unnamed: 0,id,X,Y,bank,bench,beverages,bus_stop,bus_stop_100,cafe,convenience,...,restaurant_100,school,school_100,school_200,stop,stop_100,taxi,traffic_signals,traffic_signals_100,turning_circle
0,0,-70.824738,-33.557697,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
1,1,-70.823831,-33.556787,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0


In [17]:
train_dyna_features.columns

Index(['Unnamed: 0', 'ATROPELLO', 'ATROPELLO_100', 'ATROPELLO_200', 'CAIDA',
       'CAIDA_100', 'CAIDA_200', 'CHOQUE', 'CHOQUE_100', 'CHOQUE_200',
       'COLISION', 'COLISION_100', 'COLISION_200', 'FID', 'INCENDIO',
       'INCENDIO_100', 'INCENDIO_200', 'OTRO TIPO', 'OTRO TIPO_100',
       'OTRO TIPO_200', 'SEV_Index_1', 'SEV_Index_100', 'SEV_Index_200',
       'VOLCADURA', 'VOLCADURA_100', 'VOLCADURA_200', 'X', 'Y', 'SINIESTRO'],
      dtype='object')

In [18]:
data_train = train_dyna_features[[
    'ATROPELLO_100', 'ATROPELLO_200', 'CAIDA_100', 'CAIDA_200', 'CHOQUE_100',
    'CHOQUE_200', 'COLISION_100', 'COLISION_200', 'INCENDIO_100',
    'INCENDIO_200', 'OTRO TIPO_100', 'OTRO TIPO_200', 'SEV_Index_100',
    'SEV_Index_200', 'VOLCADURA_100', 'VOLCADURA_200', 'SINIESTRO'
]]

In [19]:
train = pd.concat([data,data_train], axis=1)

In [20]:
train.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 63029 entries, 0 to 63028
Data columns (total 54 columns):
id                     63029 non-null int64
X                      63029 non-null float64
Y                      63029 non-null float64
bank                   63029 non-null int64
bench                  63029 non-null int64
beverages              63029 non-null int64
bus_stop               63029 non-null int64
bus_stop_100           63029 non-null int64
cafe                   63029 non-null int64
convenience            63029 non-null int64
convenience_100        63029 non-null int64
convenience_200        63029 non-null int64
crossing               63029 non-null int64
crossing_100           63029 non-null int64
fast_food              63029 non-null int64
fast_food_100          63029 non-null int64
fast_food_200          63029 non-null int64
fuel                   63029 non-null int64
intercect              63029 non-null int64
kindergarten           63029 non-null int64
motorwa

In [21]:
train.sample(5)

Unnamed: 0,id,X,Y,bank,bench,beverages,bus_stop,bus_stop_100,cafe,convenience,...,COLISION_200,INCENDIO_100,INCENDIO_200,OTRO TIPO_100,OTRO TIPO_200,SEV_Index_100,SEV_Index_200,VOLCADURA_100,VOLCADURA_200,SINIESTRO
23386,23386,-70.672922,-33.567697,0,0,0,0,3,0,0,...,25,0,0,0,1,1.0,1.0,0,0,0
45267,45267,-70.58474,-33.39406,0,0,0,0,0,0,0,...,2,0,0,0,1,1.0,1.0,0,0,0
5798,5798,-70.748376,-33.476787,0,0,0,0,2,0,0,...,37,0,0,1,2,1.0,1.0,0,0,0
22290,22290,-70.676558,-33.506787,0,0,0,0,3,0,0,...,6,0,0,0,1,1.0,1.0,0,0,0
18147,18147,-70.692013,-33.358606,0,0,0,0,0,0,0,...,1,0,0,0,0,0.0,1.0,0,0,0


In [22]:
train.to_csv("../data/final_train_dataset_grid_100.csv")

## Creating and Saving Test Dataset

In [23]:
data.head(2)

Unnamed: 0,id,X,Y,bank,bench,beverages,bus_stop,bus_stop_100,cafe,convenience,...,restaurant_100,school,school_100,school_200,stop,stop_100,taxi,traffic_signals,traffic_signals_100,turning_circle
0,0,-70.824738,-33.557697,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
1,1,-70.823831,-33.556787,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0


In [24]:
test_dyna_features.columns

Index(['Unnamed: 0', 'ATROPELLO', 'ATROPELLO_100', 'ATROPELLO_200', 'CAIDA',
       'CAIDA_100', 'CAIDA_200', 'CHOQUE', 'CHOQUE_100', 'CHOQUE_200',
       'COLISION', 'COLISION_100', 'COLISION_200', 'FID', 'INCENDIO',
       'INCENDIO_100', 'INCENDIO_200', 'OTRO TIPO', 'OTRO TIPO_100',
       'OTRO TIPO_200', 'SEV_Index_1', 'SEV_Index_100', 'SEV_Index_200',
       'VOLCADURA', 'VOLCADURA_100', 'VOLCADURA_200', 'X', 'Y', 'SINIESTRO'],
      dtype='object')

In [25]:
data_test = test_dyna_features[[
    'ATROPELLO_100', 'ATROPELLO_200', 'CAIDA_100', 'CAIDA_200', 'CHOQUE_100',
    'CHOQUE_200', 'COLISION_100', 'COLISION_200', 'INCENDIO_100',
    'INCENDIO_200', 'OTRO TIPO_100', 'OTRO TIPO_200', 'SEV_Index_100',
    'SEV_Index_200', 'VOLCADURA_100', 'VOLCADURA_200', 'SINIESTRO'
]]

In [26]:
test = pd.concat([data,data_test], axis=1)

In [27]:
test.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 63029 entries, 0 to 63028
Data columns (total 54 columns):
id                     63029 non-null int64
X                      63029 non-null float64
Y                      63029 non-null float64
bank                   63029 non-null int64
bench                  63029 non-null int64
beverages              63029 non-null int64
bus_stop               63029 non-null int64
bus_stop_100           63029 non-null int64
cafe                   63029 non-null int64
convenience            63029 non-null int64
convenience_100        63029 non-null int64
convenience_200        63029 non-null int64
crossing               63029 non-null int64
crossing_100           63029 non-null int64
fast_food              63029 non-null int64
fast_food_100          63029 non-null int64
fast_food_200          63029 non-null int64
fuel                   63029 non-null int64
intercect              63029 non-null int64
kindergarten           63029 non-null int64
motorwa

In [28]:
test.sample(5)

Unnamed: 0,id,X,Y,bank,bench,beverages,bus_stop,bus_stop_100,cafe,convenience,...,COLISION_200,INCENDIO_100,INCENDIO_200,OTRO TIPO_100,OTRO TIPO_200,SEV_Index_100,SEV_Index_200,VOLCADURA_100,VOLCADURA_200,SINIESTRO
16864,16864,-70.697467,-33.515878,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
47409,47409,-70.578376,-33.62406,0,0,0,0,0,0,0,...,2,0,0,0,0,1,1,0,0,0
40955,40955,-70.601103,-33.530424,0,0,0,0,0,0,0,...,9,0,0,0,0,0,1,0,0,0
59373,59373,-70.526558,-33.484969,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
44106,44106,-70.589285,-33.554969,0,0,0,0,1,0,0,...,1,0,0,0,0,1,1,0,0,0


In [29]:
test.SINIESTRO.value_counts()

0    56342
1     6687
Name: SINIESTRO, dtype: int64

In [30]:
test.to_csv("../data/final_test_dataset_grid_100.csv")

---