# Crime data
#### Description
The data provided by the German Federal Criminal Police Office (BKA). 
The data are taken from Table 50 covering "Kreise und kreisfreie Städte" (Regions and independent cities) 
of the annual publications of BKA for the years 2014 to 2018. The 2013 data set was not available. The data 
were retrieved 2019-08-07 from the Pol https://www.bka.de/DE/AktuelleInformationen/StatistikenLagebilder/PolizeilicheKriminalstatistik/pks_node.html. 
- 2018: https://www.bka.de/SharedDocs/Downloads/DE/Publikationen/PolizeilicheKriminalstatistik/2018/BKATabellen/TatverdaechtigeLaenderKreiseStaedte/BKA-LKS-TV-11-T50-Kreise-TV-nichtdeutsch_excel.xlsx?__blob=publicationFile&v=3
- 2017: https://www.bka.de/SharedDocs/Downloads/DE/Publikationen/PolizeilicheKriminalstatistik/2017/BKATabellen/TatverdaechtigeLaenderKreiseStaedte/BKA-LKS-TV-11-T50-Kreise-TV-nichtdeutsch_excel.xlsx?__blob=publicationFile&v=3
- 2016: https://www.bka.de/SharedDocs/Downloads/DE/Publikationen/PolizeilicheKriminalstatistik/2016/BKATabellen/TatverdaechtigeLaenderKreiseStaedte/BKA-LKS-TV-11-T50-Kreise-TV-nichtdeutsch_excel.xlsx?__blob=publicationFile&v=3
- 2015: https://www.bka.de/SharedDocs/Downloads/DE/Publikationen/PolizeilicheKriminalstatistik/2015/BKATabellen/TatverdaechtigeLaenderKreiseStaedte/tb50_TatverdaechtigeNichtdeutschAlterKreise_excel.xlsx?__blob=publicationFile&v=3
- 2014: https://www.bka.de/SharedDocs/Downloads/DE/Publikationen/PolizeilicheKriminalstatistik/2014/BKATabellen/TatverdaechtigeLaenderKreiseStaedte/tb50_TatverdaechtigeNichtdeutschAlterLaender_excel.xlsx?__blob=publicationFile&v=2

In [1]:
%matplotlib notebook
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import scipy as stats
import re
import camelot

In [2]:
bka14 = pd.read_excel('2014tb50_TatverdaechtigeNichtdeutschAlterKreise_excel.xlsx', skip=[0,1,2,3,4,5,6,7,8],header=8)
bka14 = bka14[bka14.iloc[:,4]=='X']
bka14 = bka14.iloc[:,[0,1,2,3,5]]
bka14.columns = ['crime_key', 'crime', 'region_key', 'region', 'NSuspects14']
bka14.head()

Unnamed: 0,crime_key,crime,region_key,region,NSuspects14
2,------,Straftaten insgesamt,1001,Flensburg,1256
5,------,Straftaten insgesamt,1002,Kiel,1480
8,------,Straftaten insgesamt,1003,Lübeck,1718
11,------,Straftaten insgesamt,1004,Neumünster,4570
14,------,Straftaten insgesamt,1051,Dithmarschen,263


In [3]:
def osterodeGoettingen(bka, target):
    '''
    2014, 2015 and 2016 two regions Göttingen and Osterode am Harz need to be merged 
    as from 2017 the region has been united as Oettingen
    '''
    osterode = bka[bka.region=='Osterode am Harz']
    for key in osterode.crime_key:
        oster_val = osterode.loc[osterode.crime_key==key,target].values[0]
        #print(oster_val)
        goett_val = bka.loc[(bka.crime_key==key)&(bka.region=='Göttingen'),target].values[0]
        #print(goett_val)
        bka.loc[(bka.crime_key==key)&(bka.region=='Göttingen'),target] = oster_val+goett_val
        #print(oster_val+goett_val)
    bka = bka[bka.region!='Osterode am Harz']
    bka.loc[bka.region=='Göttingen','region_key'] = 3159
    return bka

In [4]:
bka14 = osterodeGoettingen(bka14, 'NSuspects14')

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  self.obj[item] = s


In [5]:
bka15 = pd.read_excel('2015tb50_TatverdaechtigeNichtdeutschAlterKreise_excel.xlsx', skip=[0,1,2,3,4,5,6,7,8],header=8)
bka15 = bka15[bka15.iloc[:,5]=='X']
bka15 = bka15.iloc[:,[0,1,2,3,6]]
bka15.columns = ['crime_key', 'crime', 'region_key', 'region', 'NSuspects15']
bka15 = osterodeGoettingen(bka15, 'NSuspects15')

In [6]:
bka16 = pd.read_excel('2016tb50_TatverdaechtigeNichtdeutschAlterKreise_excel.xlsx', skip=[0,1,2,3,4,5,6,7,8],header=8)
bka16 = bka16[bka16.iloc[:,5]=='X']
bka16 = bka16.iloc[:,[0,1,2,3,6]]
bka16.columns = ['crime_key', 'crime', 'region_key', 'region', 'NSuspects16']
bka16 = osterodeGoettingen(bka16, 'NSuspects16')
bka16.head()
bka16.shape

(16441, 5)

In [7]:
bka17 = pd.read_excel('2017tb50_TatverdaechtigeNichtdeutschAlterKreise_excel.xlsx', 
                      skip=[0,1,2,3,4,5,6,7,8],header=8, usecols = [0,1,2,3,4,5,6])
bka17 = bka17[bka17.iloc[:,5]=='X']
bka17 = bka17.iloc[:,[0,1,2,3,6]]
bka17.columns = ['crime_key', 'crime', 'region_key', 'region', 'NSuspects17']

bka17.head()

Unnamed: 0,crime_key,crime,region_key,region,NSuspects17
802,------,Straftaten insgesamt,1001,Flensburg,1182
803,------,Straftaten insgesamt,1002,Kiel,2194
804,------,Straftaten insgesamt,1003,Lübeck,1958
805,------,Straftaten insgesamt,1004,Neumünster,2496
806,------,Straftaten insgesamt,1051,Dithmarschen,445


In [8]:
bka18 = pd.read_excel('2018tb50_TatverdaechtigeNichtdeutschAlterKreise_excel.xlsx', 
                      skip=[0,1,2,3,4,5,6,7,8],header=8, usecols = [0,1,2,3,4,5,6])
bka18 = bka18[bka18.iloc[:,5]=='X']
bka18 = bka18.iloc[:,[0,1,2,3,6]]
bka18.columns = ['crime_key', 'crime', 'region_key', 'region', 'NSuspects18']

bka18.head()

Unnamed: 0,crime_key,crime,region_key,region,NSuspects18
802,------,Straftaten insgesamt,1001,Flensburg,1178
803,------,Straftaten insgesamt,1002,Kiel,2251
804,------,Straftaten insgesamt,1003,Lübeck,2068
805,------,Straftaten insgesamt,1004,Neumünster,2465
806,------,Straftaten insgesamt,1051,Dithmarschen,577


In [20]:
bka = bka14.merge(bka15
                  ,left_on=['crime_key', 'crime', 'region_key', 'region'], 
                  right_on=['crime_key', 'crime', 'region_key', 'region'],
                 how='left', indicator=True)
# in 2015 there are additional rows for crimes without crimes against asylum laws

In [21]:
# eliminating indicator columns for new merger
bka = bka.iloc[:,[0,1,2,3,4,5]]

In [22]:
bka = bka.merge(bka16
                  ,left_on=['crime_key', 'crime', 'region_key', 'region'], 
                  right_on=['crime_key', 'crime', 'region_key', 'region'],
                 how='outer')#, indicator=True)

In [23]:
# eliminating indicator columns for new merger
bka = bka.iloc[:,[0,1,2,3,4,5,6]]

In [24]:
bka = bka.merge(bka17
                  ,left_on=['crime_key', 'crime', 'region_key', 'region'], 
                  right_on=['crime_key', 'crime', 'region_key', 'region'],
                 how='outer')#, indicator=True)

In [25]:
bka.head()
print(bka14.shape)
print(bka15.shape)
print(bka16.shape)
print(bka17.shape)
print(bka18.shape)
bka.shape

(15639, 5)
(16040, 5)
(16441, 5)
(16441, 5)
(16842, 5)


(20050, 8)

In [26]:
bka = bka.merge(bka18
                  ,left_on=['crime_key', 'crime', 'region_key', 'region'], 
                  right_on=['crime_key', 'crime', 'region_key', 'region'],
                 how='outer')#, indicator=True)

In [29]:
# write csv for translation and relabelling. Due to changes in criminal law the German 
# labels have been changing over time, e.g., some paragraphs extending the definition of 
# criminal law covering sexual violance.
(bka.groupby('crime').agg(np.sum)).to_excel('labels_ueber_zeit.xlsx')

In [31]:
# read translations and regrouped labels back in
labels_new = pd.read_excel('labels_translation.xlsx')
labels_new.head()

Unnamed: 0,crime,description,english_label
0,Beförderungserschleichung,carrier approximation,approximation
1,"Begünstigung, Strafvereitelung (ohne Strafvere...","Advantage, prevention of punishment (without p...",advantage
2,"Betrug §§ 263, 263a, 264, 264a, 265, 265a, 265...","Fraud §§ 263, 263a, 264, 264a, 265, 265a, 265b...",fraud
3,"Betrug §§ 263, 263a, 264, 264a, 265, 265a-e StGB","Fraud §§ 263, 263a, 264, 264a, 265, 265a-e StG...",fraud
4,Brandstiftung und Herbeiführen einer Brandgefa...,"Arson and a fire hazard §§ 306-306d, 306f StGB...",arson


In [34]:
# mapping them into the dataframe
bka['crime_eng']=bka['crime'].map(dict(zip(labels_new.crime, labels_new.english_label)))

In [57]:
bka = bka.groupby(['region', 'crime_eng']).agg(np.sum)
bka.reset_index(inplace=True)
# getting regional key back
bka.merge(bka14.iloc[:,[2,3]], left_on='region',right_on='region')


One of the most important columns, foreign suspects without considering crimes committed against migrational laws 
is only reported by BKA for 2016, 2017 and 2018. It needs to be calculated manually here.

In [87]:
aux_total1415 = bka.loc[bka['crime_eng']=='total',['NSuspects14','NSuspects15']]
aux_asylum1415 = bka.loc[bka['crime_eng']=='asylum_free_movement',['NSuspects14','NSuspects15']]
aux_total_minus_asysum1415 = aux_total1415.values - aux_asylum1415.values
bka.loc[bka['crime_eng']=='without_asylum',['NSuspects14','NSuspects15']] = aux_total_minus_asysum1415


In [93]:
# computing change rates year-on-year
bka['cr_chg_15'] = (bka.NSuspects15-bka.NSuspects14)/bka.NSuspects14
bka['cr_chg_16'] = (bka.NSuspects16-bka.NSuspects14)/bka.NSuspects14
bka['cr_chg_17'] = (bka.NSuspects17-bka.NSuspects14)/bka.NSuspects14
bka['cr_chg_18'] = (bka.NSuspects18-bka.NSuspects14)/bka.NSuspects14
bka = bka[-bka.crime_eng.str.contains('cyb|_attack')]


In [101]:
#bka[['crime_eng','cr_chg_15', 'cr_chg_16', 'cr_chg_17', 'cr_chg_18']].groupby('crime_eng').median()
bka.head()

Unnamed: 0,region,crime_eng,NSuspects14,NSuspects15,NSuspects16,NSuspects17,NSuspects18,cr_chg_15,cr_chg_16,cr_chg_17,cr_chg_18
0,Aachen,advantage,87.0,90.0,99.0,98.0,92.0,0.034483,0.137931,0.126437,0.057471
1,Aachen,approximation,443.0,561.0,531.0,648.0,514.0,0.266366,0.198646,0.462754,0.160271
2,Aachen,arson,12.0,11.0,15.0,9.0,22.0,-0.083333,0.25,-0.25,0.833333
3,Aachen,asylum_free_movement,2993.0,3348.0,1824.0,2287.0,2199.0,0.11861,-0.390578,-0.235884,-0.265286
4,Aachen,computer,43.0,43.0,59.0,46.0,36.0,0.0,0.372093,0.069767,-0.162791


In [132]:
# recovering regional keys
bka14_aux = bka14[bka14['crime']=='Straftaten insgesamt']
bka14_aux = bka14_aux.loc[:,['region_key','region']]
bka14_aux.head()

Unnamed: 0,region_key,region
2,1001,Flensburg
5,1002,Kiel
8,1003,Lübeck
11,1004,Neumünster
14,1051,Dithmarschen


### Preparing dataframe for export
- reshaping the dataframe to have only regions as rows
- maxima of suspects per crime per year
- maxima of change per year of total suspects


In [155]:
bka_new = bka[bka.crime_eng=='without_asylum']
bka_new['max_perc_chg']=bka_new[['cr_chg_15','cr_chg_16','cr_chg_17','cr_chg_18']].max(axis=1)
bka_new = bka_new.iloc[:,[0,1,11]]
bka_new = bka_new.merge(bka14_aux, left_on='region',right_on='region')
bka_new = bka_new.iloc[:,[3,0,2]]
bka_new.head()


A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  


Unnamed: 0,region_key,region,max_perc_chg
0,5334,Aachen,0.1904
1,7131,Ahrweiler,0.062016
2,9771,Aichach-Friedberg,0.284689
3,8425,Alb-Donau-Kreis,0.146371
4,16077,Altenburger Land,1.041667


In [163]:
for crime in bka.crime_eng.unique():
    #crime = bka.crime_eng[0]
    bka_aux = bka[bka.crime_eng==crime]
    bka_aux[crime]=bka_aux[['region','NSuspects14','NSuspects15','NSuspects16','NSuspects17','NSuspects18']].max(axis=1)
    bka_aux = bka_aux.loc[:,['region',crime]]
    bka_new = bka_new.merge(bka_aux)
bka_new.head()


A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  after removing the cwd from sys.path.


Unnamed: 0,region_key,region,max_perc_chg,advantage,approximation,arson,asylum_free_movement,computer,damage_property,daylight_burglary,...,theft_aggrevated,theft_bicycle,theft_cars,theft_motor_incl_out_of,theft_motorcycle,theft_simple,theft_total,total,violent_crime,without_asylum
0,5334,Aachen,0.1904,99.0,648.0,22.0,3348.0,59.0,213.0,59.0,...,434.0,58.0,70.0,84.0,39.0,1467.0,1770.0,8446.0,633.0,5952.0
1,7131,Ahrweiler,0.062016,11.0,80.0,5.0,44.0,9.0,28.0,15.0,...,96.0,6.0,16.0,21.0,2.0,130.0,201.0,839.0,73.0,822.0
2,9771,Aichach-Friedberg,0.284689,4.0,46.0,3.0,521.0,10.0,29.0,7.0,...,42.0,9.0,9.0,4.0,0.0,77.0,119.0,955.0,46.0,537.0
3,8425,Alb-Donau-Kreis,0.146371,14.0,66.0,6.0,50.0,11.0,44.0,15.0,...,84.0,11.0,8.0,14.0,2.0,216.0,288.0,958.0,93.0,932.0
4,16077,Altenburger Land,1.041667,5.0,33.0,4.0,254.0,4.0,22.0,4.0,...,40.0,1.0,6.0,6.0,1.0,92.0,125.0,536.0,36.0,343.0


In [164]:
bka_new.to_excel('bka.xlsx')