## Preparación de Datos (Accidentes) parte 3: Accidentes por momento del día

El conjunto de datos que provee la alcaldia tiene una columna llamada *HORA_ACCIDENTE* de la cual no podemos sacar mucha información en su estado original. Como parte del proceso de minería de datos, para la extracción de patrones resulta mucho más útil agrupar las horas por momentos en el día, específicamente mañana, medio día, tarde, norche y madrugada.

In [1]:
import numpy as np
import pandas as pd

df = pd.read_csv('Accidentes_coordinates.csv')
df.head(10)

Unnamed: 0,FECHA_ACCIDENTE,AÑO_ACCIDENTE,MES_ACCIDENTE,DIA_ACCIDENTE,HORA_ACCIDENTE,GRAVEDAD_ACCIDENTE,CLASE_ACCIDENTE,SITIO_EXACTO_ACCIDENTE,CANT_HERIDOS_EN _SITIO_ACCIDENTE,CANT_MUERTOS_EN _SITIO_ACCIDENTE,CANTIDAD_ACCIDENTES,LATITUD,LONGITUD
0,12/24/2017 12:00:00 AM,2017,12,Dom,09:30:00:PM,Con heridos,Choque,CR 6 CL 94,1.0,0.0,1,10.946586,-74.826513
1,01/01/2015 12:00:00 AM,2015,1,Jue,02:10:00:PM,Con heridos,Choque,VIA 40 CON 77,1.0,0.0,1,11.016189,-74.795327
2,01/01/2015 12:00:00 AM,2015,1,Jue,02:15:00:PM,Solo daños,Choque,CALLE 14 CR 13,0.0,0.0,1,10.952965,-74.771882
3,01/01/2015 12:00:00 AM,2015,1,Jue,02:20:00:PM,Solo daños,Choque,CL 74 CR 38C,0.0,0.0,1,10.985707,-74.81242
4,01/01/2015 12:00:00 AM,2015,1,Jue,03:30:00:PM,Con heridos,Choque,CL 45 CR 19,2.0,0.0,1,10.958396,-74.79471
5,01/01/2015 12:00:00 AM,2015,1,Jue,04:20:00:AM,Solo daños,Choque,CRA 15 CLLE 21,0.0,0.0,1,10.953501,-74.776203
6,01/01/2015 12:00:00 AM,2015,1,Jue,04:40:00:PM,Con heridos,Choque,CRA 14 CLLE 35,2.0,0.0,1,10.951457,-74.788801
7,01/01/2015 12:00:00 AM,2015,1,Jue,04:50:00:PM,Con heridos,Atropello,CRA 6 CLLE 90,1.0,0.0,1,10.944943,-74.823909
8,01/01/2015 12:00:00 AM,2015,1,Jue,06:00:00:AM,Solo daños,Choque,CRA 6 CLLE 92,0.0,0.0,1,10.944446,-74.825528
9,01/01/2015 12:00:00 AM,2015,1,Jue,07:50:00:PM,Solo daños,Choque,CALLE 99 CR 56,0.0,0.0,1,11.016135,-74.826478


Por defecto el formato de la columna *FECHA_ACCIDENTE* es de tipo str, para poder hacer la conversión nos apoyaremos de la función *to_datetime* de Pandas, que recibe un objeto dataframe como parámetro de entrada y un formato de conversión para retornar un objeto de tipo *datetime* con los que se puede manipular horas de una forma mucho más sencilla que con varibles de tipo str.

In [2]:
dates = df['FECHA_ACCIDENTE'].apply(lambda x: x.split(' ')[0].strip())
times = df['HORA_ACCIDENTE']
dates = dates + ' ' + times
df['FECHA_ACCIDENTE'] = pd.to_datetime(dates, format="%m/%d/%Y %I:%M:%S:%p")
df.head(10)

Unnamed: 0,FECHA_ACCIDENTE,AÑO_ACCIDENTE,MES_ACCIDENTE,DIA_ACCIDENTE,HORA_ACCIDENTE,GRAVEDAD_ACCIDENTE,CLASE_ACCIDENTE,SITIO_EXACTO_ACCIDENTE,CANT_HERIDOS_EN _SITIO_ACCIDENTE,CANT_MUERTOS_EN _SITIO_ACCIDENTE,CANTIDAD_ACCIDENTES,LATITUD,LONGITUD
0,2017-12-24 21:30:00,2017,12,Dom,09:30:00:PM,Con heridos,Choque,CR 6 CL 94,1.0,0.0,1,10.946586,-74.826513
1,2015-01-01 14:10:00,2015,1,Jue,02:10:00:PM,Con heridos,Choque,VIA 40 CON 77,1.0,0.0,1,11.016189,-74.795327
2,2015-01-01 14:15:00,2015,1,Jue,02:15:00:PM,Solo daños,Choque,CALLE 14 CR 13,0.0,0.0,1,10.952965,-74.771882
3,2015-01-01 14:20:00,2015,1,Jue,02:20:00:PM,Solo daños,Choque,CL 74 CR 38C,0.0,0.0,1,10.985707,-74.81242
4,2015-01-01 15:30:00,2015,1,Jue,03:30:00:PM,Con heridos,Choque,CL 45 CR 19,2.0,0.0,1,10.958396,-74.79471
5,2015-01-01 04:20:00,2015,1,Jue,04:20:00:AM,Solo daños,Choque,CRA 15 CLLE 21,0.0,0.0,1,10.953501,-74.776203
6,2015-01-01 16:40:00,2015,1,Jue,04:40:00:PM,Con heridos,Choque,CRA 14 CLLE 35,2.0,0.0,1,10.951457,-74.788801
7,2015-01-01 16:50:00,2015,1,Jue,04:50:00:PM,Con heridos,Atropello,CRA 6 CLLE 90,1.0,0.0,1,10.944943,-74.823909
8,2015-01-01 06:00:00,2015,1,Jue,06:00:00:AM,Solo daños,Choque,CRA 6 CLLE 92,0.0,0.0,1,10.944446,-74.825528
9,2015-01-01 19:50:00,2015,1,Jue,07:50:00:PM,Solo daños,Choque,CALLE 99 CR 56,0.0,0.0,1,11.016135,-74.826478


Ahora la columna *FECHA_ACCIDENTE* es de tipo *datetime64*. Con este formato podemos filtrar las horas de forma sencilla accediendo a atributos como *hour* y *minute*. La conversión de las horas será acorde con la siguiente tabla:

![distribution_of_hours](https://i.imgur.com/LadbQGU.jpg "Distribución de horas por categoría")

In [3]:
def convertion(time):
    if (6, 0) <= (time.hour, time.minute) < (11, 0):
        return 'Mañana'
    elif (11, 0) <= (time.hour, time.minute) < (14, 0):
        return 'Medio dia'
    elif (14, 0) <= (time.hour, time.minute) < (19, 0):
        return 'Tarde'
    elif (19, 0) <= (time.hour, time.minute) < (24, 0):
        return 'Noche'
    else:
        return 'Madrugada'
        
time_in_day = []
for time in df['FECHA_ACCIDENTE']:
    time_in_day.append(convertion(time))

df['MOMENTO_DIA'] = time_in_day
df.head(10)

Unnamed: 0,FECHA_ACCIDENTE,AÑO_ACCIDENTE,MES_ACCIDENTE,DIA_ACCIDENTE,HORA_ACCIDENTE,GRAVEDAD_ACCIDENTE,CLASE_ACCIDENTE,SITIO_EXACTO_ACCIDENTE,CANT_HERIDOS_EN _SITIO_ACCIDENTE,CANT_MUERTOS_EN _SITIO_ACCIDENTE,CANTIDAD_ACCIDENTES,LATITUD,LONGITUD,MOMENTO_DIA
0,2017-12-24 21:30:00,2017,12,Dom,09:30:00:PM,Con heridos,Choque,CR 6 CL 94,1.0,0.0,1,10.946586,-74.826513,Noche
1,2015-01-01 14:10:00,2015,1,Jue,02:10:00:PM,Con heridos,Choque,VIA 40 CON 77,1.0,0.0,1,11.016189,-74.795327,Tarde
2,2015-01-01 14:15:00,2015,1,Jue,02:15:00:PM,Solo daños,Choque,CALLE 14 CR 13,0.0,0.0,1,10.952965,-74.771882,Tarde
3,2015-01-01 14:20:00,2015,1,Jue,02:20:00:PM,Solo daños,Choque,CL 74 CR 38C,0.0,0.0,1,10.985707,-74.81242,Tarde
4,2015-01-01 15:30:00,2015,1,Jue,03:30:00:PM,Con heridos,Choque,CL 45 CR 19,2.0,0.0,1,10.958396,-74.79471,Tarde
5,2015-01-01 04:20:00,2015,1,Jue,04:20:00:AM,Solo daños,Choque,CRA 15 CLLE 21,0.0,0.0,1,10.953501,-74.776203,Madrugada
6,2015-01-01 16:40:00,2015,1,Jue,04:40:00:PM,Con heridos,Choque,CRA 14 CLLE 35,2.0,0.0,1,10.951457,-74.788801,Tarde
7,2015-01-01 16:50:00,2015,1,Jue,04:50:00:PM,Con heridos,Atropello,CRA 6 CLLE 90,1.0,0.0,1,10.944943,-74.823909,Tarde
8,2015-01-01 06:00:00,2015,1,Jue,06:00:00:AM,Solo daños,Choque,CRA 6 CLLE 92,0.0,0.0,1,10.944446,-74.825528,Mañana
9,2015-01-01 19:50:00,2015,1,Jue,07:50:00:PM,Solo daños,Choque,CALLE 99 CR 56,0.0,0.0,1,11.016135,-74.826478,Noche


Finalmente, el archivo CSV es actualizado con la nueva columna con el dataframe actual reorganizando el orden de las columnas.

In [4]:
df = pd.concat([df.iloc[:,:5], df.iloc[:,13:14], df.iloc[:,5:8], df.iloc[:,11:13], df.iloc[:, 8:11]], axis='columns')
df.head(10)

Unnamed: 0,FECHA_ACCIDENTE,AÑO_ACCIDENTE,MES_ACCIDENTE,DIA_ACCIDENTE,HORA_ACCIDENTE,MOMENTO_DIA,GRAVEDAD_ACCIDENTE,CLASE_ACCIDENTE,SITIO_EXACTO_ACCIDENTE,LATITUD,LONGITUD,CANT_HERIDOS_EN _SITIO_ACCIDENTE,CANT_MUERTOS_EN _SITIO_ACCIDENTE,CANTIDAD_ACCIDENTES
0,2017-12-24 21:30:00,2017,12,Dom,09:30:00:PM,Noche,Con heridos,Choque,CR 6 CL 94,10.946586,-74.826513,1.0,0.0,1
1,2015-01-01 14:10:00,2015,1,Jue,02:10:00:PM,Tarde,Con heridos,Choque,VIA 40 CON 77,11.016189,-74.795327,1.0,0.0,1
2,2015-01-01 14:15:00,2015,1,Jue,02:15:00:PM,Tarde,Solo daños,Choque,CALLE 14 CR 13,10.952965,-74.771882,0.0,0.0,1
3,2015-01-01 14:20:00,2015,1,Jue,02:20:00:PM,Tarde,Solo daños,Choque,CL 74 CR 38C,10.985707,-74.81242,0.0,0.0,1
4,2015-01-01 15:30:00,2015,1,Jue,03:30:00:PM,Tarde,Con heridos,Choque,CL 45 CR 19,10.958396,-74.79471,2.0,0.0,1
5,2015-01-01 04:20:00,2015,1,Jue,04:20:00:AM,Madrugada,Solo daños,Choque,CRA 15 CLLE 21,10.953501,-74.776203,0.0,0.0,1
6,2015-01-01 16:40:00,2015,1,Jue,04:40:00:PM,Tarde,Con heridos,Choque,CRA 14 CLLE 35,10.951457,-74.788801,2.0,0.0,1
7,2015-01-01 16:50:00,2015,1,Jue,04:50:00:PM,Tarde,Con heridos,Atropello,CRA 6 CLLE 90,10.944943,-74.823909,1.0,0.0,1
8,2015-01-01 06:00:00,2015,1,Jue,06:00:00:AM,Mañana,Solo daños,Choque,CRA 6 CLLE 92,10.944446,-74.825528,0.0,0.0,1
9,2015-01-01 19:50:00,2015,1,Jue,07:50:00:PM,Noche,Solo daños,Choque,CALLE 99 CR 56,11.016135,-74.826478,0.0,0.0,1


In [5]:
df.to_csv('Accidentes_momento_dia.csv', index=False)