# Matching Datasets PGJ Delitos y Zonas Patrullaje
We are using two separate datasets and have been performing ETL operations on them in order to be able to link a crime incident to Police Patrol Zones. We are doing this by measuring the distance from the reported crime to the center of the patrol zone. By doing this, we hope to get insights in the effectiveness of police patrolling, to see if we need more police cars, or whatever other crime statistics we can find.

(We have taken a list of distance from the center of the patrol zone to each point in the perimeter of the zone - Look at Jupyter Notebook ETL_ZonasPatrullaje_Transforming)

In [1]:
# We are importing the necessary libraries to start working
import pandas as pd
import numpy as np
from os import path
from ast import literal_eval
from vincenty import vincenty

In [2]:
# Setting filename and location
file_zonas = path.join("..","Clean","after_transform","transformed_ZonasPatrullaje.csv")
file_delitos_lite = path.join("..","Clean","after_transform","transformed_lite_CarpetasInvestigacion.csv")

# Read Purchasing File and store into Pandas data frame
patrullaje_df = pd.read_csv(file_zonas, sep=';')
delitos_df = pd.read_csv(file_delitos_lite, sep=';')

In [4]:
# Making sure that this is correct clean dataset
patrullaje_df.head()

Unnamed: 0,Geopoint,Geoshape,Alcaldía,Sector 18,Área km2,x,y,mean,standard_Deviation,distances
0,"19.4559485754, -99.1339187632","{""type"": ""Polygon"", ""coordinates"": [[[-99.1373...",CUAUHTEMOC,TLATELOLCO,0.599031,-99.133437,19.455969,0.527474,0.014291,"[0.534167, 0.437955, 0.262072, 0.536854, 0.648..."
1,"19.4489311584, -99.1492549723","{""type"": ""Polygon"", ""coordinates"": [[[-99.1529...",CUAUHTEMOC,BUENAVISTA,0.542691,-99.149153,19.448988,0.453007,0.006291,"[0.526167, 0.320717, 0.468643, 0.542789, 0.444..."
2,"19.4466167038, -99.1372059309","{""type"": ""Polygon"", ""coordinates"": [[[-99.1388...",CUAUHTEMOC,BUENAVISTA,0.139906,-99.136628,19.446343,0.309541,0.007696,"[0.403358, 0.375808, 0.340076, 0.304342, 0.299..."
3,"19.4345863188, -99.1559474685","{""type"": ""Polygon"", ""coordinates"": [[[-99.1587...",CUAUHTEMOC,REVOLUCION,0.263673,-99.156147,19.434677,0.362777,0.025262,"[0.443341, 0.16611, 0.507736, 0.496656, 0.2699..."
4,"19.4287224255, -99.1566989128","{""type"": ""Polygon"", ""coordinates"": [[[-99.1544...",CUAUHTEMOC,REVOLUCION,0.294612,-99.156832,19.428315,0.37827,0.007552,"[0.469645, 0.426766, 0.407346, 0.198526, 0.457..."


In [7]:
# Making sure that this is correct clean dataset
delitos_df.head()

Unnamed: 0,0,"19.35177196, -99.220583382"
0,1,"19.5085668561, -99.2083000919"
1,2,"19.4663394581, -99.1243321831"
2,3,"19.4628200485, -99.1163214933"
3,4,"19.4503050199, -99.2199589505"
4,5,"19.3819955868, -99.2307361552"


In [11]:
print(delitos_df.columns)
print(delitos_df.count())

Index(['0', '19.35177196, -99.220583382'], dtype='object')
0                             245159
19.35177196, -99.220583382    245159
dtype: int64


In [15]:
# Prepare the delitos_df dataframe
delitos_y = list()
delitos_x = list()
for delito in delitos_df['19.35177196, -99.220583382']:
    data = delito.split(',')
    delitos_y.append(data[0])
    delitos_x.append(data[1])

delitos_df = pd.DataFrame({'y':delitos_y, 'x':delitos_x})
delitos_df.head()

Unnamed: 0,y,x
0,19.5085668561,-99.2083000919
1,19.4663394581,-99.1243321831
2,19.4628200485,-99.1163214933
3,19.4503050199,-99.2199589505
4,19.3819955868,-99.2307361552


# Measuring the mean distance (with a 2 standard Deviation Margin of error)
To see if a crime was committed near a Patrol Zone we are calculating the distance between each row's x & y in delitos_df against each geopoint in patrullaje_df. To do this, we are using Vicenty's Formula for distance in an elipsis to measure the distance in KM.


In [48]:
counter = 0

#This list will contain all the crimes committed near each patrol zone
all_crimes_near = list()

# We are comparing each patrol zone (about 698) against each crime reported in 2018 (about 250K)
# Not very efficient, will optimize better in the future
for index, row in patrullaje_df.iterrows():
    
    # This is the Patrol Zone's info
    geopoint = row['Geopoint'].split(",")   
    x_origin = float(geopoint[1])
    y_origin = float(geopoint[0])
    mean = row['mean']
    standard_deviation = row['standard_Deviation']
    
    # This list will contain all the crimes committed near this partcular row's geopoint
    crimes_near = list()
    
    # Measuring the distance
    for index_2 in range(len(delitos_y)):
        distance = vincenty([x_origin, y_origin],[float(delitos_x[index_2]), float(delitos_y[index_2])])
        
        #if(index_2 < 20):
            #print(f"{counter}) distance: {distance}, mean: {mean}, sd: {standard_deviation}")
        
        # I only care for crimes committed near the patrol zone
        if(distance <= ( mean + (2*standard_deviation) ) ):
            crimes_near.append(index_2)
    
    #We are now finished with this particular row's geopoint
    all_crimes_near.append(crimes_near)
    print(f"Finished with Patrol Zone #{counter} -> crimes: {len(crimes_near)}")
    counter +=1

    
print("Finished")

Finished with Patrol Zone #0 -> crimes: 5814
Finished with Patrol Zone #1 -> crimes: 5769
Finished with Patrol Zone #2 -> crimes: 2333
Finished with Patrol Zone #3 -> crimes: 4370
Finished with Patrol Zone #4 -> crimes: 3420
Finished with Patrol Zone #5 -> crimes: 4859
Finished with Patrol Zone #6 -> crimes: 5421
Finished with Patrol Zone #7 -> crimes: 1761
Finished with Patrol Zone #8 -> crimes: 1933
Finished with Patrol Zone #9 -> crimes: 2355
Finished with Patrol Zone #10 -> crimes: 2733
Finished with Patrol Zone #11 -> crimes: 2697
Finished with Patrol Zone #12 -> crimes: 1141
Finished with Patrol Zone #13 -> crimes: 2136
Finished with Patrol Zone #14 -> crimes: 1310
Finished with Patrol Zone #15 -> crimes: 963
Finished with Patrol Zone #16 -> crimes: 1202
Finished with Patrol Zone #17 -> crimes: 2199
Finished with Patrol Zone #18 -> crimes: 1530
Finished with Patrol Zone #19 -> crimes: 1663
Finished with Patrol Zone #20 -> crimes: 1357
Finished with Patrol Zone #21 -> crimes: 1650

Finished with Patrol Zone #177 -> crimes: 35723
Finished with Patrol Zone #178 -> crimes: 21691
Finished with Patrol Zone #179 -> crimes: 7521
Finished with Patrol Zone #180 -> crimes: 2858
Finished with Patrol Zone #181 -> crimes: 5528
Finished with Patrol Zone #182 -> crimes: 1947
Finished with Patrol Zone #183 -> crimes: 4659
Finished with Patrol Zone #184 -> crimes: 4970
Finished with Patrol Zone #185 -> crimes: 6749
Finished with Patrol Zone #186 -> crimes: 2598
Finished with Patrol Zone #187 -> crimes: 4415
Finished with Patrol Zone #188 -> crimes: 1506
Finished with Patrol Zone #189 -> crimes: 1192
Finished with Patrol Zone #190 -> crimes: 2680
Finished with Patrol Zone #191 -> crimes: 1186
Finished with Patrol Zone #192 -> crimes: 3494
Finished with Patrol Zone #193 -> crimes: 5954
Finished with Patrol Zone #194 -> crimes: 3240
Finished with Patrol Zone #195 -> crimes: 5575
Finished with Patrol Zone #196 -> crimes: 520
Finished with Patrol Zone #197 -> crimes: 10326
Finished wi

Finished with Patrol Zone #352 -> crimes: 3669
Finished with Patrol Zone #353 -> crimes: 2225
Finished with Patrol Zone #354 -> crimes: 2148
Finished with Patrol Zone #355 -> crimes: 885
Finished with Patrol Zone #356 -> crimes: 1258
Finished with Patrol Zone #357 -> crimes: 8481
Finished with Patrol Zone #358 -> crimes: 925
Finished with Patrol Zone #359 -> crimes: 388
Finished with Patrol Zone #360 -> crimes: 2474
Finished with Patrol Zone #361 -> crimes: 1601
Finished with Patrol Zone #362 -> crimes: 1707
Finished with Patrol Zone #363 -> crimes: 2208
Finished with Patrol Zone #364 -> crimes: 2357
Finished with Patrol Zone #365 -> crimes: 3228
Finished with Patrol Zone #366 -> crimes: 1763
Finished with Patrol Zone #367 -> crimes: 7827
Finished with Patrol Zone #368 -> crimes: 2533
Finished with Patrol Zone #369 -> crimes: 1561
Finished with Patrol Zone #370 -> crimes: 1695
Finished with Patrol Zone #371 -> crimes: 1532
Finished with Patrol Zone #372 -> crimes: 1854
Finished with Pa

Finished with Patrol Zone #527 -> crimes: 16049
Finished with Patrol Zone #528 -> crimes: 4493
Finished with Patrol Zone #529 -> crimes: 2024
Finished with Patrol Zone #530 -> crimes: 5000
Finished with Patrol Zone #531 -> crimes: 3097
Finished with Patrol Zone #532 -> crimes: 3618
Finished with Patrol Zone #533 -> crimes: 8987
Finished with Patrol Zone #534 -> crimes: 2133
Finished with Patrol Zone #535 -> crimes: 964
Finished with Patrol Zone #536 -> crimes: 496
Finished with Patrol Zone #537 -> crimes: 396
Finished with Patrol Zone #538 -> crimes: 2150
Finished with Patrol Zone #539 -> crimes: 196
Finished with Patrol Zone #540 -> crimes: 22989
Finished with Patrol Zone #541 -> crimes: 10143
Finished with Patrol Zone #542 -> crimes: 514
Finished with Patrol Zone #543 -> crimes: 1443
Finished with Patrol Zone #544 -> crimes: 1634
Finished with Patrol Zone #545 -> crimes: 4384
Finished with Patrol Zone #546 -> crimes: 1778
Finished with Patrol Zone #547 -> crimes: 982
Finished with Pa

In [49]:
print(len(all_crimes_near))
print(all_crimes_near[0])

698
[27, 68, 205, 285, 340, 423, 447, 448, 494, 633, 691, 737, 739, 784, 806, 835, 864, 933, 1014, 1022, 1149, 1193, 1212, 1267, 1295, 1301, 1306, 1403, 1462, 1580, 1617, 1765, 1850, 1889, 1921, 1972, 1976, 2027, 2184, 2208, 2226, 2246, 2322, 2399, 2405, 2422, 2495, 2580, 2606, 2611, 2655, 2665, 2675, 2730, 2884, 2897, 2928, 2931, 2960, 3006, 3053, 3115, 3128, 3131, 3142, 3206, 3246, 3451, 3457, 3468, 3491, 3536, 3570, 3645, 3711, 3724, 3749, 3775, 3797, 3854, 3856, 3876, 3969, 4003, 4024, 4161, 4183, 4192, 4203, 4210, 4216, 4231, 4235, 4243, 4270, 4288, 4299, 4338, 4400, 4410, 4450, 4455, 4576, 4648, 4649, 4685, 4750, 4775, 4787, 4801, 4803, 4831, 4836, 4850, 4856, 4877, 4910, 4934, 4945, 4971, 4985, 5100, 5148, 5165, 5188, 5244, 5272, 5313, 5348, 5424, 5436, 5474, 5486, 5499, 5513, 5522, 5707, 5728, 5782, 5844, 5858, 5894, 5930, 6035, 6056, 6141, 6161, 6171, 6178, 6181, 6224, 6550, 6569, 6583, 6587, 6648, 6782, 6811, 6821, 6826, 7025, 7033, 7167, 7229, 7257, 7278, 7280, 7407, 7457, 7

In [50]:
# Creating a CSV with all the crimes reported per Patrol Zone for backup
backup_all_crimes_near = pd.DataFrame({'patrol_zone_index': range(len(all_crimes_near)), 'crimes_near':all_crimes_near})
backup_all_crimes_near.head()

Unnamed: 0,patrol_zone_index,crimes_near
0,0,"[27, 68, 205, 285, 340, 423, 447, 448, 494, 63..."
1,1,"[8, 23, 49, 71, 184, 201, 206, 249, 269, 314, ..."
2,2,"[96, 458, 633, 691, 854, 864, 925, 933, 950, 1..."
3,3,"[75, 139, 158, 170, 273, 313, 358, 425, 435, 4..."
4,4,"[10, 75, 139, 158, 170, 273, 313, 358, 425, 43..."


In [51]:
#Exporting clean dataset version 1 of Zonas de Patrullaje
fileExport = path.join("..","Clean","after_transform","transformed_crimes_near_patrol_Zones.csv")
backup_all_crimes_near.to_csv(fileExport, sep=';', index=False)

In [52]:
#Checking the exported file
exportFile_df = pd.read_csv(fileExport, sep=';')

exportFile_df.head()

Unnamed: 0,patrol_zone_index,crimes_near
0,0,"[27, 68, 205, 285, 340, 423, 447, 448, 494, 63..."
1,1,"[8, 23, 49, 71, 184, 201, 206, 249, 269, 314, ..."
2,2,"[96, 458, 633, 691, 854, 864, 925, 933, 950, 1..."
3,3,"[75, 139, 158, 170, 273, 313, 358, 425, 435, 4..."
4,4,"[10, 75, 139, 158, 170, 273, 313, 358, 425, 43..."


In [53]:
# Separating each row's crimes list (from all_crimes_near) to have a single pair of Crime_index & patrol_zone_index per row
# This is the data that is going to populate the mysql database table called "delitos_zonas_patrullaje"
crime_pair = list()
patrol_pair = list()

for index in range(len(all_crimes_near)):
    for crime_index in all_crimes_near[index]:
        crime_pair.append(crime_index)
        patrol_pair.append(index)
    print(f"Finished separating Patrol Zone {index}'s crimes'")
print(len(crime_pair))

Finished separating Patrol Zone 0's crimes'
Finished separating Patrol Zone 1's crimes'
Finished separating Patrol Zone 2's crimes'
Finished separating Patrol Zone 3's crimes'
Finished separating Patrol Zone 4's crimes'
Finished separating Patrol Zone 5's crimes'
Finished separating Patrol Zone 6's crimes'
Finished separating Patrol Zone 7's crimes'
Finished separating Patrol Zone 8's crimes'
Finished separating Patrol Zone 9's crimes'
Finished separating Patrol Zone 10's crimes'
Finished separating Patrol Zone 11's crimes'
Finished separating Patrol Zone 12's crimes'
Finished separating Patrol Zone 13's crimes'
Finished separating Patrol Zone 14's crimes'
Finished separating Patrol Zone 15's crimes'
Finished separating Patrol Zone 16's crimes'
Finished separating Patrol Zone 17's crimes'
Finished separating Patrol Zone 18's crimes'
Finished separating Patrol Zone 19's crimes'
Finished separating Patrol Zone 20's crimes'
Finished separating Patrol Zone 21's crimes'
Finished separating 

Finished separating Patrol Zone 268's crimes'
Finished separating Patrol Zone 269's crimes'
Finished separating Patrol Zone 270's crimes'
Finished separating Patrol Zone 271's crimes'
Finished separating Patrol Zone 272's crimes'
Finished separating Patrol Zone 273's crimes'
Finished separating Patrol Zone 274's crimes'
Finished separating Patrol Zone 275's crimes'
Finished separating Patrol Zone 276's crimes'
Finished separating Patrol Zone 277's crimes'
Finished separating Patrol Zone 278's crimes'
Finished separating Patrol Zone 279's crimes'
Finished separating Patrol Zone 280's crimes'
Finished separating Patrol Zone 281's crimes'
Finished separating Patrol Zone 282's crimes'
Finished separating Patrol Zone 283's crimes'
Finished separating Patrol Zone 284's crimes'
Finished separating Patrol Zone 285's crimes'
Finished separating Patrol Zone 286's crimes'
Finished separating Patrol Zone 287's crimes'
Finished separating Patrol Zone 288's crimes'
Finished separating Patrol Zone 28

Finished separating Patrol Zone 459's crimes'
Finished separating Patrol Zone 460's crimes'
Finished separating Patrol Zone 461's crimes'
Finished separating Patrol Zone 462's crimes'
Finished separating Patrol Zone 463's crimes'
Finished separating Patrol Zone 464's crimes'
Finished separating Patrol Zone 465's crimes'
Finished separating Patrol Zone 466's crimes'
Finished separating Patrol Zone 467's crimes'
Finished separating Patrol Zone 468's crimes'
Finished separating Patrol Zone 469's crimes'
Finished separating Patrol Zone 470's crimes'
Finished separating Patrol Zone 471's crimes'
Finished separating Patrol Zone 472's crimes'
Finished separating Patrol Zone 473's crimes'
Finished separating Patrol Zone 474's crimes'
Finished separating Patrol Zone 475's crimes'
Finished separating Patrol Zone 476's crimes'
Finished separating Patrol Zone 477's crimes'
Finished separating Patrol Zone 478's crimes'
Finished separating Patrol Zone 479's crimes'
Finished separating Patrol Zone 48

Finished separating Patrol Zone 673's crimes'
Finished separating Patrol Zone 674's crimes'
Finished separating Patrol Zone 675's crimes'
Finished separating Patrol Zone 676's crimes'
Finished separating Patrol Zone 677's crimes'
Finished separating Patrol Zone 678's crimes'
Finished separating Patrol Zone 679's crimes'
Finished separating Patrol Zone 680's crimes'
Finished separating Patrol Zone 681's crimes'
Finished separating Patrol Zone 682's crimes'
Finished separating Patrol Zone 683's crimes'
Finished separating Patrol Zone 684's crimes'
Finished separating Patrol Zone 685's crimes'
Finished separating Patrol Zone 686's crimes'
Finished separating Patrol Zone 687's crimes'
Finished separating Patrol Zone 688's crimes'
Finished separating Patrol Zone 689's crimes'
Finished separating Patrol Zone 690's crimes'
Finished separating Patrol Zone 691's crimes'
Finished separating Patrol Zone 692's crimes'
Finished separating Patrol Zone 693's crimes'
Finished separating Patrol Zone 69

In [54]:
#Creating Dataframe of crime_patrol_pair
crime_patrol_pair_df = pd.DataFrame({'crime_index': crime_pair, 'patrol_index':patrol_pair})
crime_patrol_pair_df.head()

Unnamed: 0,crime_index,patrol_index
0,27,0
1,68,0
2,205,0
3,285,0
4,340,0


In [56]:
#Exporting final dataset of crimes linked to patrol zones
fileExport = path.join("..","Clean","final_version","final_crime_patrol_pair.csv")
crime_patrol_pair_df.to_csv(fileExport, sep=';', index=False)

#Exporting cinal dataset of Zonas de Patrullaje
fileExport = path.join("..","Clean","final_version","final_zonas_patrullaje_df.csv")
patrullaje_df.to_csv(fileExport, sep=';', index=False)

In [58]:
#Checking the exported file
exportFile_df = pd.read_csv(path.join("..","Clean","final_version","final_database.csv"), sep=';')

exportFile_df.head()

Unnamed: 0,Año,Mes,Alcaldía,Categoría de delito,Delito,Unidad de investigación,Fecha inicio,Colonia,Geopoint
0,2018,Octubre,ALVARO OBREGON,DELITO DE BAJO IMPACTO,ABUSO DE CONFIANZA,UI-3CD,2018-10-17 13:00:24,LAS AGUILAS 3ER PARQUE,"19.35177196, -99.220583382"
1,2018,Octubre,AZCAPOTZALCO,ROBO DE VEHÍCULO CON Y SIN VIOLENCIA,ROBO DE VEHICULO DE SERVICIO PÚBLICO CON VIOLE...,UI-3SD,2018-10-17 13:03:39,EL ROSARIO,"19.5085668561, -99.2083000919"
2,2018,Octubre,GUSTAVO A MADERO,DELITO DE BAJO IMPACTO,ROBO DE OBJETOS,UI-3CD,2018-10-17 13:05:09,GUADALUPE TEPEYAC,"19.4663394581, -99.1243321831"
3,2018,Octubre,GUSTAVO A MADERO,DELITO DE BAJO IMPACTO,ABUSO DE CONFIANZA,UI-3SD,2018-10-17 13:09:04,EMILIANO ZAPATA,"19.4628200485, -99.1163214933"
4,2018,Octubre,MIGUEL HIDALGO,ROBO A REPARTIDOR CON Y SIN VIOLENCIA,ROBO A REPARTIDOR SIN VIOLENCIA,UI-1SD,2018-10-17 13:09:22,PERIODISTA,"19.4503050199, -99.2199589505"
