# Basic Data Exploration for Tourist Crimes Bogotá (2019-2021):

Let's do an exploration data analysis for the high-impact crimes committed against tourists in  Bogotá:

1. Has the numbers of crimes increase from 2019 to 2021?
2. During which hours are crimes most likely to occur?
3. Are there more crimes on weekdays than weekends?
4. What are the top 5 neighborhood where the crimes occur?
5. For each neighborhood, during which hours are crimes most likely to occur?
6. What are the top 5 modalities used for crimes? 
7. What are the top 5 places where the crimes occur?
8. What types of weapons are most involved in crimes per town?

In [1]:
!pip install -r requirements.txt --user



In [2]:
import numpy                 as np
import pandas                as pd
import seaborn               as sns
import matplotlib.pyplot as plt
import plotly.express as px

In [3]:
with open('data/DelitosLocalidades.csv') as f:
    df_crimes=pd.read_csv(f, delimiter=',', encoding="UTF-8")
df_crimes.shape

(177, 28)

In [4]:
df_crimes.head()

Unnamed: 0,JURISMETROPOLITANADEPTO,MUNICIPIO_HECHO,COMUNAS_ZONAS_DESCRIPCION,DESCRIPCION_CONDUCTA,BARRIOS_HECHO,FECHA_HECHO,MES_LARGO,DIA_SEMANA,INTERVALOS_HORA,HORA_HECHO,...,AGRUPA_EDAD_PERSONA,GRADO_INSTRUCCION_PERSONA,JURISESTACIONAREA,JURISCAI,MES,ANIO,DIA,LATITUD_Y,LONGITUD_X,Localidad
0,M. BOGOTÁ,BOGOTÁ D.C. (CT),UPZ No. 93 LAS NIEVES ENO REPORTADO3,ARTÍCULO 239. HURTO PERSONAS,VERACRUZ ENO REPORTADO3,9/8/2019,Agosto,Viernes,18:00 NO REPORTADO 23:59,9:30:00 p. m.,...,ADULTOS,SECUNDARIA,ESTACION ENO REPORTADO03 SANTA FE,CAI COLSEGUROS ENO REPORTADO3,8,2019,9,4.601635,-74.069705,Santa Fe
1,M. BOGOTÁ,BOGOTÁ D.C. (CT),UPZ No. 94 LA CANDELARIA ENO REPORTADO17,ARTÍCULO 239. HURTO PERSONAS,LA CATEDRAL ENO REPORTADO17,1/4/2019,Abril,Lunes,18:00 NO REPORTADO 23:59,8:30:00 p. m.,...,ADULTOS,SUPERIOR,ESTACION ENO REPORTADO17 CANDELARIA,CAI ROSARIO ENO REPORTADO17,4,2019,1,4.598116,-74.074853,La Candelaria
2,M. BOGOTÁ,BOGOTÁ D.C. (CT),UPZ No. 94 LA CANDELARIA ENO REPORTADO17,ARTÍCULO 239. HURTO PERSONAS,CANDELARIA ENO REPORTADO17,9/8/2019,Agosto,Viernes,00:00 NO REPORTADO 05:59,5:00:00 a. m.,...,ADULTOS,SECUNDARIA,ESTACION ENO REPORTADO17 CANDELARIA,CAI ROSARIO ENO REPORTADO17,8,2019,9,4.596988,-74.069571,La Candelaria
3,M. BOGOTÁ,BOGOTÁ D.C. (CT),UPZ No. 94 LA CANDELARIA ENO REPORTADO17,ARTÍCULO 239. HURTO PERSONAS,LA CONCORDIA ENO REPORTADO17,9/8/2019,Agosto,Viernes,18:00 NO REPORTADO 23:59,9:00:00 p. m.,...,ADULTOS,SUPERIOR,ESTACION ENO REPORTADO17 CANDELARIA,CAI ROSARIO ENO REPORTADO17,8,2019,9,4.596988,-74.069571,La Candelaria
4,M. BOGOTÁ,BOGOTÁ D.C. (CT),UPZ No. 94 LA CANDELARIA ENO REPORTADO17,ARTÍCULO 239. HURTO PERSONAS,CANDELARIA ENO REPORTADO17,17/1/2019,Enero,Jueves,12:00 NO REPORTADO 17:59,4:30:00 p. m.,...,ADULTOS,SUPERIOR,ESTACION ENO REPORTADO17 CANDELARIA,CAI BOLIVIA ENO REPORTADO17,1,2019,17,4.595336,-74.072607,La Candelaria


In [5]:
df_crimes.columns

Index(['JURISMETROPOLITANADEPTO', 'MUNICIPIO_HECHO',
       'COMUNAS_ZONAS_DESCRIPCION', 'DESCRIPCION_CONDUCTA', 'BARRIOS_HECHO',
       'FECHA_HECHO', 'MES_LARGO', 'DIA_SEMANA', 'INTERVALOS_HORA',
       'HORA_HECHO', 'GENERO', 'MODALIDAD', 'PAIS_PERSONA', 'CARGO_PERSONA',
       'CLASE_SITIO', 'ZONA', 'ARMAS_MEDIOS', 'EDAD', 'AGRUPA_EDAD_PERSONA',
       'GRADO_INSTRUCCION_PERSONA', 'JURISESTACIONAREA', 'JURISCAI', 'MES',
       'ANIO', 'DIA', 'LATITUD_Y', 'LONGITUD_X', 'Localidad'],
      dtype='object')

The following table shows some descriptive statistics variables that allows to make some conclusions:
1. 50% of the crimes have ocurred only in 2019
2. Between 2019 and 2021, have been reported 179 crimes against tourists in Bogotá
3. The crimes's accusers are between 19 and 70 years old

In [6]:
df_crimes.describe()

Unnamed: 0,EDAD,MES,ANIO,DIA,LATITUD_Y,LONGITUD_X
count,177.0,177.0,177.0,177.0,177.0,177.0
mean,34.435028,4.655367,2019.457627,16.225989,4.613876,-74.071433
std,13.133818,2.656443,0.714828,8.764261,0.026223,0.013636
min,19.0,1.0,2019.0,1.0,4.584104,-74.133795
25%,25.0,2.0,2019.0,9.0,4.598116,-74.074732
50%,30.0,5.0,2019.0,15.0,4.6018,-74.070762
75%,41.0,7.0,2020.0,25.0,4.613952,-74.067457
max,70.0,12.0,2021.0,31.0,4.70244,-74.013396


In [7]:
df_crimes.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 177 entries, 0 to 176
Data columns (total 28 columns):
 #   Column                     Non-Null Count  Dtype  
---  ------                     --------------  -----  
 0   JURISMETROPOLITANADEPTO    177 non-null    object 
 1   MUNICIPIO_HECHO            177 non-null    object 
 2   COMUNAS_ZONAS_DESCRIPCION  177 non-null    object 
 3   DESCRIPCION_CONDUCTA       177 non-null    object 
 4   BARRIOS_HECHO              177 non-null    object 
 5   FECHA_HECHO                177 non-null    object 
 6   MES_LARGO                  177 non-null    object 
 7   DIA_SEMANA                 177 non-null    object 
 8   INTERVALOS_HORA            177 non-null    object 
 9   HORA_HECHO                 177 non-null    object 
 10  GENERO                     177 non-null    object 
 11  MODALIDAD                  177 non-null    object 
 12  PAIS_PERSONA               177 non-null    object 
 13  CARGO_PERSONA              177 non-null    object 

In [8]:
def crimesbymonth(df):

    df['FECHA_HECHO'] = pd.to_datetime(df['FECHA_HECHO'], format="%d/%m/%Y", errors='coerce')
    monthly_crimes = df.groupby(df.FECHA_HECHO.dt.to_period('M')).size()
    return monthly_crimes

a = crimesbymonth(df_crimes)

For our next step, let's do a comparison of the crime frecuency between 2019 and 2022

In [99]:
monthly_crimes_df = crimesbymonth(df_crimes).to_frame().reset_index()
monthly_crimes_df.columns = ["PERIODO",'FRECUENCIA']
monthly_crimes_df['ANO'] = monthly_crimes_df['PERIODO'].dt.year
monthly_crimes_df['MES'] = monthly_crimes_df['PERIODO'].dt.month
fig = px.line(monthly_crimes_df, x="MES", y="FRECUENCIA", title='Delitos Contra Turistas entre 2019 y 2021', 
              color='ANO',width=800, 
              height=500,
              color_discrete_sequence=px.colors.diverging.Portland,
              category_orders={"MES": [1,2,3,4,5,6,7,8,9,10,11,12]})
fig.update_xaxes(type='category')
fig.show()

The plot above shows that compared to 2019, there was a decrease in 2020 and 2021 related to crimes against tourists in Bogotá. We can see also a high peak between June 2019 and Septembre 2019.

In [12]:
def crimesbyhour(df):

    df['HORA'] = pd.to_datetime(df['HORA_HECHO'], errors='coerce').dt.hour
    hourly_crimes = df.groupby('HORA').size()
    
    return hourly_crimes

From the below plot, we can analyze what are the crimes frequency per hour of the day

In [13]:
hourly_crimes_df = crimesbyhour(df_crimes).to_frame().reset_index()
hourly_crimes_df.columns.values[1] = "FRECUENCIA"
fig = px.bar(hourly_crimes_df, x="HORA", y="FRECUENCIA", 
             title='Crímenes por hora del día',
             width=600, 
             height=400,
             color_discrete_sequence=px.colors.sequential.YlGnBu_r,
             labels={'FRECUENCIA':'Cantidad de delitos','HORA':'Hora'})
fig.update_xaxes(type='category')
fig.show()

In [41]:
def crimesperhourperloc(df):

    df['HORA'] = pd.to_datetime(df['HORA_HECHO'], errors='coerce').dt.hour
    hourly_crimes = df.groupby(['HORA','Localidad']).size().to_frame().reset_index()
    hourly_crimes.columns.values[2] = "FRECUENCIA"
    return hourly_crimes

In [116]:
hourly_loc_crimes_df = crimesperhourperloc(df_crimes).sort_values(by='HORA', ascending=True)


In [16]:
hourly_loc_crimes_df = crimesperhourperloc(df_crimes).sort_values(by='HORA', ascending=True)

fig = px.bar(hourly_loc_crimes_df, x=hourly_loc_crimes_df["HORA"], y=hourly_loc_crimes_df["FRECUENCIA"], 
             color='Localidad',
             title='Crímenes por Hora del día por Localidad',
             category_orders={"HORA": [0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23]},
             width=800, 
             height=600,
             color_discrete_sequence=px.colors.sequential.Viridis,
             labels={'FRECUENCIA':'Cantidad de delitos','HORA':'Hora'})
fig.update_xaxes(type='category')
fig.show()

In [113]:
df_crimes_by_mod = df_crimes.groupby(['MODALIDAD']).size().to_frame().reset_index()
df_crimes_by_mod.columns.values[1] = "FRECUENCIA"
fig = px.treemap(df_crimes_by_mod, path=['MODALIDAD'], 
                 values='FRECUENCIA',
                 title='Frecuencia Tipo de hurtos',
                 color_discrete_sequence=px.colors.sequential.Viridis
                )
fig.show()

The plot shows that the most crimes ocurr between 11:00 and 15:00 hours, and during de morning ocurr the least numbers of crimes.

In [114]:
def crimesbyweekdays(df):

    weekday_crimes = df.groupby(['DIA_SEMANA','MODALIDAD']).size()
    return weekday_crimes

a = crimesbyweekdays(df_crimes)

In [115]:
weekday_crimes_df = crimesbyweekdays(df_crimes).to_frame().reset_index()
weekday_crimes_df.columns = ["DIA",'MODALIDAD','FRECUENCIA']
weekDayOrder = ['Lunes', 'Martes', 'Miércoles', 'Jueves', 'Viernes', 'Sábado', 'Domingo']
fig = px.bar(weekday_crimes_df, x= "DIA", y="FRECUENCIA", title='Frecuencia de Delitos por Día de la semana',
             category_orders={"DIA": weekDayOrder},
             color='MODALIDAD',
             color_discrete_sequence=px.colors.sequential.Viridis,
             width=800, 
             height=500,
             labels={'FRECUENCIA':'Cantidad de delitos','DIA':'Dia'})
fig.show()

The previos plot shows a decreasing in the number of crimes, from Monday to Sunday, being Saturday the day with the least number of crimes. And the other hand, the most crimes happen mainly at weekdays from Monday to Friday. 

In [20]:
def crimesbyzone(df):
    
    zone_df = df.groupby(df['COMUNAS_ZONAS_DESCRIPCION']).size().to_frame().reset_index()
    zone_df.columns.values[1] = "FRECUENCIA"
    
    return zone_df

In [21]:
top5_zone_crimes_df = crimesbyzone(df_crimes).sort_values(by='FRECUENCIA', ascending=False).head(5)
fig = px.bar(top5_zone_crimes_df, x= "FRECUENCIA", y="COMUNAS_ZONAS_DESCRIPCION", title='Las 5 Zonas más peligrosas para turistas',
             width=800, 
             height=400,
             color_discrete_sequence=px.colors.sequential.YlGnBu_r,
             labels={'FRECUENCIA':'Cantidad de delitos en la zona','COMUNAS_ZONAS_DESCRIPCION':'Zona'})
fig.update_yaxes(categoryorder='total ascending')
fig.show()

The plot shows that the zone with most crimes is La Candelaria, which has almost 3 times the crimes of the second area with the most crimes (Las Nieves), and it seems to have more than the 50% of the total crimes committed inside the top 5 areas.

In [83]:
top5_zone_crimes_df = crimesbyzone(df_crimes).sort_values(by='FRECUENCIA', ascending=False).head(5)
fig = px.pie(top5_zone_crimes_df, values='FRECUENCIA', 
             names='COMUNAS_ZONAS_DESCRIPCION', 
             labels={'FRECUENCIA':'Cantidad de delitos en la zona','COMUNAS_ZONAS_DESCRIPCION':'Zona'},
             title='Distribución porcentual de delitos en las 5 zonas más peligrosas',
             color_discrete_sequence=px.colors.sequential.YlGnBu_r,
             width=700, 
             height=400)
fig.show()

In [23]:
def crimesbyloc(df):
    
    loc_df = df.groupby(df['Localidad']).size().to_frame().reset_index()
    loc_df.columns.values[1] = "FRECUENCIA"
    
    return loc_df

In [107]:
loc_crimes_df = crimesbyloc(df_crimes).sort_values(by='FRECUENCIA', ascending=False)#.head(5)
fig = px.bar(loc_crimes_df, x='FRECUENCIA', 
             y='Localidad', 
             title='Delitos por Localidad',
             labels={'FRECUENCIA':'Delitos por Localidad'},
             color_discrete_sequence=px.colors.sequential.YlGnBu_r,
             width=800, 
             height=500)
fig.update_yaxes(categoryorder='total ascending')
fig.show()

In [25]:
def crimesbytypeoftourist(df):

    nat_tourist_df = df.groupby(df['CARGO_PERSONA']).size().to_frame().reset_index()
    nat_tourist_df.columns.values[1] = "FRECUENCIA"
    
    return nat_tourist_df

In [105]:
typeoftourist_crimes_df = crimesbytypeoftourist(df_crimes).sort_values(by='FRECUENCIA', ascending=False)
fig = px.pie(typeoftourist_crimes_df, values='FRECUENCIA', 
             names='CARGO_PERSONA', 
             title='Distribución porcentual de Delitos por tipo de turista',
             labels={'FRECUENCIA':'Cantidad de delitos','CARGO_PERSONA':'Tipo de turista'},
             color_discrete_sequence=px.colors.diverging.Portland,
             width=800, 
             height=400)

fig.show()

In [78]:
def crimesbymonthtypOfTourist(df):

    df['FECHA_HECHO'] = pd.to_datetime(df['FECHA_HECHO'], format="%d/%m/%Y", errors='coerce')
    monthly_crimes = df.groupby([df.FECHA_HECHO.dt.to_period('M'),'CARGO_PERSONA']).size()
    return monthly_crimes

a = crimesbymonthtypOfTourist(df_crimes)


In [77]:
#typeoftourist_crimes_df = crimesbymonthtypOfTourist(df_crimes).to_frame().reset_index()
#typeoftourist_crimes_df.columns = ["PERIODO",'CARGO_PERSONA','FRECUENCIA']
#typeoftourist_crimes_df['PERIODO'] = monthly_crimes_df['PERIODO'].astype(str)
#fig = px.line(typeoftourist_crimes_df, x="PERIODO", y="FRECUENCIA", title='Delitos Contra Turistas entre 2019 Y 2021',color='CARGO_PERSONA',width=800, height=500,)
#fig.show()