## Crimes
In this notebook, a dataset about the crimes in the City of Buenos Aires (CABA) is prepared for use in the final visualization. It contains instances of non-violent theft, violent robberies, harm, and even road-related crimes.

### Source
Source is the dataset [Delitos 2022](https://data.buenosaires.gob.ar/dataset/delitos/resource/3fbc3808-14c7-4559-8ba5-f68e919fee40) from the Government of the City of Buenos Aires.

### Details
For this, Pandas will be used as the main tool. The main normalization will be translating some terms from Spanish to English. Also, deleting some columns that do not have to do with our use case.

I normalized a dataset using pandas by modifying the original DataFrame directly. This was achieved by utilizing the *inplace=True* parameter within the pandas functions used for normalization. This approach offers the advantage of memory efficiency as it avoids creating a new DataFrame to store the normalized data. However, a potential disadvantage is that if I need to revert to the original, unnormalized data, I would need to either maintain a separate copy or re-load the original dataset.

In [1]:
import pandas as pd

In [2]:
crimes_dataset = pd.read_csv('delitos_2022.csv')

As we can see right down below, most terms are in Spanish.

In [3]:
crimes_dataset.head()

Unnamed: 0,id-mapa,anio,mes,dia,fecha,franja,tipo,subtipo,uso_arma,uso_moto,barrio,comuna,latitud,longitud,cantidad
0,1,2022,OCTUBRE,VIERNES,2022-10-14,3.0,Robo,Robo total,NO,NO,CHACARITA,15.0,-34.584136,-58.454704,1
1,2,2022,OCTUBRE,JUEVES,2022-10-27,5.0,Robo,Robo total,NO,NO,BARRACAS,4.0,-34.645043,-58.373194,1
2,3,2022,NOVIEMBRE,MARTES,2022-11-29,0.0,Robo,Robo total,NO,NO,CHACARITA,15.0,-34.589982,-58.446471,1
3,4,2022,NOVIEMBRE,LUNES,2022-11-28,0.0,Robo,Robo total,NO,NO,CHACARITA,15.0,-34.58832,-58.441232,1
4,5,2022,NOVIEMBRE,MIERCOLES,2022-11-30,3.0,Robo,Robo total,NO,NO,RECOLETA,2.0,-34.596748,-58.413609,1


### Translation of terms
#### Column names

In [4]:
crimes_dataset.rename(columns={'anio': 'year',
                  'dia': 'day',
                   'mes': 'month',
                  'fecha': 'date',
                  'franja': 'at_around',
                  'tipo': 'type',
                   'subtipo': 'subtype',
                   'uso_arma': 'used_gun',
                   'uso_moto': 'used_motorbike',
                   'latitud': 'lat',
                   'barrio': 'neighborhood',
                   'longitud': 'long',   
                  'id-mapa': 'id-map'}, inplace=True)

#### Months
Now, terms for the month in which the crimes are said to have happened. For that, I first took a look to see how many distinct values are there for each mont, in order to
1. See if there are no duplicates due to errors (e.g. a typo, having both 'January' and 'Jaunary')
2. See how many months are to be translated

In [None]:
print(crimes_dataset['month'].unique())

['OCTUBRE' 'NOVIEMBRE' 'MAYO' 'AGOSTO' 'ENERO' 'SEPTIEMBRE' 'DICIEMBRE'
 'MARZO' 'FEBRERO' 'JUNIO' 'ABRIL' 'JULIO']


In [6]:
crimes_dataset.loc[crimes_dataset["month"] == "DICIEMBRE", "month"] = "December"
crimes_dataset.loc[crimes_dataset["month"] == "NOVIEMBRE", "month"] = "November"
crimes_dataset.loc[crimes_dataset["month"] == "OCTUBRE", "month"] = "October"
crimes_dataset.loc[crimes_dataset["month"] == "SEPTIEMBRE", "month"] = "September"
crimes_dataset.loc[crimes_dataset["month"] == "AGOSTO", "month"] = "August"
crimes_dataset.loc[crimes_dataset["month"] == "JULIO", "month"] = "July"
crimes_dataset.loc[crimes_dataset["month"] == "JUNIO", "month"] = "June"
crimes_dataset.loc[crimes_dataset["month"] == "MAYO", "month"] = "May"
crimes_dataset.loc[crimes_dataset["month"] == "ABRIL", "month"] = "April"
crimes_dataset.loc[crimes_dataset["month"] == "MARZO", "month"] = "March"
crimes_dataset.loc[crimes_dataset["month"] == "FEBRERO", "month"] = "February"
crimes_dataset.loc[crimes_dataset["month"] == "ENERO", "month"] = "January"

In [None]:
print(crimes_dataset['month'].unique())

['October' 'November' 'May' 'August' 'January' 'September' 'December'
 'March' 'February' 'June' 'April' 'July']


#### Days of the week
Exact same process for the names of the week

In [None]:
print(crimes_dataset['day'].unique())

['VIERNES' 'JUEVES' 'MARTES' 'LUNES' 'MIERCOLES' 'DOMINGO' 'SABADO']


In [9]:
crimes_dataset.loc[crimes_dataset["day"] == "SABADO", "day"] = "Saturday"
crimes_dataset.loc[crimes_dataset["day"] == "VIERNES", "day"] = "Friday"
crimes_dataset.loc[crimes_dataset["day"] == "JUEVES", "day"] = "Thursday"
crimes_dataset.loc[crimes_dataset["day"] == "MIERCOLES", "day"] = "Wednesday"
crimes_dataset.loc[crimes_dataset["day"] == "MARTES", "day"] = "Tuesday"
crimes_dataset.loc[crimes_dataset["day"] == "LUNES", "day"] = "Monday"
crimes_dataset.loc[crimes_dataset["day"] == "DOMINGO", "day"] = "Sunday"

In [10]:
print(crimes_dataset['day'].unique())

['Friday' 'Thursday' 'Tuesday' 'Monday' 'Wednesday' 'Sunday' 'Saturday']


#### Crime types
This is not such a linear translation. Being Spanish my mother tongue, I tried to be as specific as possible when translating

In [11]:
print(crimes_dataset['type'].unique())

['Robo' 'Hurto' 'Vialidad' 'Homicidios' 'Lesiones' 'Amenazas']


In [12]:
crimes_dataset.loc[crimes_dataset["type"] == "Robo", "type"] = "Violent robbery"
crimes_dataset.loc[crimes_dataset["type"] == "Hurto", "type"] = "Non-violent theft"
crimes_dataset.loc[crimes_dataset["type"] == "Vialidad", "type"] = "Road-related"
crimes_dataset.loc[crimes_dataset["type"] == "Homicidios", "type"] = "Homicide"
crimes_dataset.loc[crimes_dataset["type"] == "Lesiones", "type"] = "Physical harm"
crimes_dataset.loc[crimes_dataset["type"] == "Amenazas", "type"] = "Threats"

In [13]:
print(crimes_dataset['type'].unique())

['Violent robbery' 'Non-violent theft' 'Road-related' 'Homicide'
 'Physical harm' 'Threats']


#### Crime sub-types
Same exact process

In [14]:
print(crimes_dataset['subtype'].unique())

['Robo total' 'Hurto total' 'Robo automotor' 'Hurto automotor'
 'Lesiones por siniestros viales' 'Homicidios dolosos' 'Femicidios'
 'Lesiones Dolosas' 'Amenazas' 'Muertes por siniestros viales']


In [15]:
crimes_dataset.loc[crimes_dataset["subtype"] == "Robo total", "subtype"] = "Total Robbery"
crimes_dataset.loc[crimes_dataset["subtype"] == "Hurto total", "subtype"] = "Total Non-violent theft"
crimes_dataset.loc[crimes_dataset["subtype"] == "Hurto automotor", "subtype"] = "Non-violent theft of a vehicle"
crimes_dataset.loc[crimes_dataset["subtype"] == "Robo automotor", "subtype"] = "Robbery of a vehicle"
crimes_dataset.loc[crimes_dataset["subtype"] == "Lesiones por siniestros viales", "subtype"] = "Injuries due to road accident"
crimes_dataset.loc[crimes_dataset["subtype"] == "Homicidios dolosos", "subtype"] = "Intentional Homicide"
crimes_dataset.loc[crimes_dataset["subtype"] == "Lesiones Dolosas", "subtype"] = "Intentional Harm"
crimes_dataset.loc[crimes_dataset["subtype"] == "Femicidios", "subtype"] = "Femicide"
crimes_dataset.loc[crimes_dataset["subtype"] == "Amenazas", "subtype"] = "Threats"
crimes_dataset.loc[crimes_dataset["subtype"] == "Muertes por siniestros viales", "subtype"] = "Death due to road accident"

In [16]:
print(crimes_dataset['subtype'].unique())

['Total Robbery' 'Total Non-violent theft' 'Robbery of a vehicle'
 'Non-violent theft of a vehicle' 'Injuries due to road accident'
 'Intentional Homicide' 'Femicide' 'Intentional Harm' 'Threats'
 'Death due to road accident']


### Data-type changes
#### Boolean indicating the presence or absence of gun usage in a crime.
Changing the string-based value "No" and "Si" (Yes) for their boolean equivalents.

In [17]:
crimes_dataset['used_gun'] = crimes_dataset['used_gun'].map({'NO': False, 'SI': True})
crimes_dataset['used_motorbike'] = crimes_dataset['used_gun'].map({'NO': False, 'SI': True})

#### Deleting the meaningless attribute "Cantidad"
or "Quantity" or "Count" in Spanish, since each row describes a single instance of a crime. Let's verify first that this attribute is always 1 (one) for every row.

In [18]:
print(crimes_dataset['cantidad'].unique())

[1]


In [19]:
crimes_dataset.drop(columns=['cantidad'], inplace=True)

### Last look and export
Before exporting, let's take a look at the DataFrame in its final form

In [None]:
crimes_dataset.head()

Unnamed: 0,id-map,year,month,day,date,at_around,type,subtype,used_gun,used_motorbike,neighborhood,comuna,lat,long
0,1,2022,October,Friday,2022-10-14,3.0,Violent robbery,Total Robbery,False,,CHACARITA,15.0,-34.584136,-58.454704
1,2,2022,October,Thursday,2022-10-27,5.0,Violent robbery,Total Robbery,False,,BARRACAS,4.0,-34.645043,-58.373194
2,3,2022,November,Tuesday,2022-11-29,0.0,Violent robbery,Total Robbery,False,,CHACARITA,15.0,-34.589982,-58.446471
3,4,2022,November,Monday,2022-11-28,0.0,Violent robbery,Total Robbery,False,,CHACARITA,15.0,-34.58832,-58.441232
4,5,2022,November,Wednesday,2022-11-30,3.0,Violent robbery,Total Robbery,False,,RECOLETA,2.0,-34.596748,-58.413609


In [21]:
crimes_dataset.to_csv("crimes_2022.csv", encoding='utf-8', index=False)