# What variables affect conviction of a crime and how do they affect it.

This project uses data from Stats SA to predict number of crime convictions using household data (household income, distance to the nearest police station, living conditions etc.)

1) Determine the relationship between the number of crime convictions (Conviction of theft of personal property,Conviction of fraud,Conviction of assault etc.) and household living condtions (Type of toilet facility,Access to/use electricity,Victim of other crimes etc.). Does having no access to electricity lead to a person commiting more crimes? Is Conviction of assault result from being a victim of assault?

2) Determine if the a statistical relationship between any crime conviction to a persons living conditions, place of dwelling and overall community development. Perform statical modeling methods to predict the relationship



## Introduction

The general level of crime as estimated by VOCS has been declining during the past five years but increased in 2016/17 and 2017/18.

Household crimes increased by 5% to a total of 1,5 million incidences of crime while individual crime also increased by 5% to a total of 1,6 incidences, affecting 1,4 million individuals aged 16 and above. Northern Cape had the highest increase in both household and individual crimes. Housebreaking or burglary was the most dominant (54%) crime category among crimes measured by the Victims of Crime Survey (VOCS). An estimated total of 830 thousand incidences of housebreaking occurred in 2017/18, affecting 4,25% of all South African households. Nearly 32% of items stolen during housebreaking were clothes, followed by cellphones (24%) and food (22%).




### Data Description 

link to dataset: http://nesstar.statssa.gov.za:8282/webview/

* Collection method: Survey of 23380 household across all 9 provinces
* Date collected: April 10, 2017
* Date Downloaded: April 07, 2021
* Data size: 23380 rows, 307 columns

### Data Wrangling

In [5]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

In [10]:
DF = pd.read_csv('LCS-2014-2015-HOUSEHOLD/LCS-2014-2015-HOUSEHOLD_F1.csv')
DF.head(10)

Unnamed: 0,UQNO,SURVEYDATE,Q11CPARTHH,Q11CMANY,Q15OTHPERS,Q116PRESENT,Q116PERSONNO,Q51MAIN,Q51OTHER,Q51NOOTH,...,income_pcp_quintile,expend_inkind_decile,expend_inkind_quintile,income_inkind_decile,income_inkind_quintile,Expenditure_weighted,Expenditure_inkind_weighted,Income_weighted,Income_inkind_weighted,hholds_wgt
0,813004940000021702,1032015,2,88,2,5,1,2,88,1,...,1,1,1,1,1,1516301.0,1516301.0,0.0,0.0,624.763809
1,607001710000011901,2042015,2,88,2,6,1,1,88,1,...,1,1,1,1,1,2596009.0,2596009.0,0.0,0.0,509.986828
2,774017360000010201,4102015,2,88,2,6,1,7,88,1,...,1,1,1,1,1,5079606.0,5079606.0,0.0,0.0,828.185078
3,236006020000000901,1022015,2,88,2,6,1,2,2,8,...,1,1,1,1,1,2105227.0,2105227.0,0.0,0.0,341.725292
4,773024020070002701,3072015,1,99,2,5,2,1,99,9,...,1,1,1,1,1,6038148.0,6038148.0,0.0,0.0,928.854284
5,918000650000031201,3082015,2,88,2,6,1,1,88,1,...,1,1,1,1,1,3378416.0,3378416.0,0.0,0.0,504.494338
6,238012490000017801,1032015,2,88,9,5,1,1,88,1,...,1,1,1,1,1,5501461.0,5501461.0,0.0,0.0,766.18958
7,619001110000002601,2042015,1,1,2,4,1,1,88,1,...,1,1,1,1,1,4127847.0,4127847.0,0.0,0.0,542.361023
8,776020700000010001,1032015,2,88,2,1,1,8,88,1,...,1,1,1,1,1,23385200.0,23385200.0,0.0,0.0,2790.044188
9,171023800000018001,2062015,2,88,2,6,1,6,88,1,...,1,1,1,1,1,5586046.0,5586046.0,0.0,0.0,628.512698


In [42]:
for i in DF.columns:
    if i[1:4] == '224':
        print(i)


Q224ANOMONEY
Q224BNOMONEY5


In [81]:
# selecting of columns that we will uses for our models.
DATA = DF[['Q116PRESENT','Q51MAIN','Q51NOOTH','Q52AWALLS','Q52AROOF','Q53WALLS',
           'Q53ROOF','Q54ADWELLING','Q161CPUBATT','Q161CPRIVATT','Q54AOTHER','Q56DRINK','Q56OTHER','Q518TOILET',
          'Q524ELECT','Q531POLICE','Q532FOOD','Q62OWNSHIP','Q651TOTROOMS','Q658VALUE',
          'Q61035POLICE','Q224BNOMONEY5','Q2341SAVING','province_code','SETTLEMENT_TYPE',
           'Ageofhead','hhsize','income']]

In [87]:
#The crime victims columns are in the range 142-159
# We are adding them to data.

#The total number of instances where one person has been a victim of crime is the sum of all specific instance 
# of a person being a victim.
# 2 represents no, 1 represents yes, 9 represents unspecified

victims = DF.iloc[:,141:159]
victims.replace({2:0,9:0},inplace=True)
# Total_IOPBV - total number of Instances Of Person Being a Victim
victims['Total_IOPBV'] = victims.sum(axis=1)
victims

Unnamed: 0,Q7101ASSAULT,Q7102ROBBERY,Q7103CARHIJACK,Q7104THEFTPROP,Q7105THEFTBIKE,Q7106FRAUD,Q7107CORRUPT,Q7108THEFTVEH,Q7109VEHVAND,Q7110HOUSEBR,Q7111HOMEROB,Q7112THEFTLIVES,Q7113THEFTCROPS,Q7114MURDER,Q7115DELIBERATE,Q7116CARTHEFT,Q7117SEXUAL,Q7118OTHER,Total_IOPBV
0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
1,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1
2,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
3,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
4,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
23375,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
23376,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,1
23377,0,0,0,1,0,0,0,1,0,1,0,0,0,0,0,0,0,0,3
23378,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1


In [88]:
#The crime conviction columns are in the range 160-177
# We are adding them to data.

convictions = DF.iloc[:,159:177]

#The total number of convictions is the sum of all specific convictions.
# 2 represents no, 1 represents yes, 9 represents unspecified
convictions.replace({2:0,9:0},inplace=True)
convictions['Total_number_of_convictions'] = convictions.sum(axis=1)
convictions

Unnamed: 0,Q7201ASSAULT,Q7202ROBBERY,Q7203CARHIJACK,Q7204THEFTPROP,Q7205THEFTBIKE,Q7206FRAUD,Q7207CORRUPT,Q7208THEFTVEH,Q7209VEHVAND,Q7210HOUSEBR,Q7211HOMEROB,Q7212THEFTLIVE,Q7213THEFTCROPS,Q7214MURDER,Q7215DELIBERATE,Q7216CARTHEFT,Q7217SEXUAL,Q7218OTHER,Total_number_of_convictions
0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
2,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
3,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
4,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
23375,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
23376,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
23377,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
23378,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0


In [89]:
DATA = pd.concat([DATA,victims['Total_IOPBV']],axis=1)
DATA = pd.concat([DATA,convictions['Total_number_of_convictions']],axis = 1)
DATA

Unnamed: 0,Q116PRESENT,Q51MAIN,Q51NOOTH,Q52AWALLS,Q52AROOF,Q53WALLS,Q53ROOF,Q54ADWELLING,Q161CPUBATT,Q161CPRIVATT,...,Q61035POLICE,Q224BNOMONEY5,Q2341SAVING,province_code,SETTLEMENT_TYPE,Ageofhead,hhsize,income,Total_IOPBV,Total_number_of_convictions
0,5,2,1,10,3,1,1,2,0,0,...,2,1,2,8,4,35,1,0.000000e+00,0,0
1,6,1,1,1,3,3,3,2,0,0,...,2,1,2,6,5,51,1,0.000000e+00,1,0
2,6,7,1,1,3,3,3,2,9,9,...,1,8,2,7,1,53,2,0.000000e+00,0,0
3,6,2,8,7,3,4,4,2,0,0,...,2,2,2,2,4,54,1,0.000000e+00,0,0
4,5,1,9,1,3,1,1,2,0,9,...,2,1,2,7,2,55,2,0.000000e+00,0,0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
23375,3,1,1,1,9,4,4,2,0,0,...,1,8,2,5,1,40,3,3.224941e+06,0,0
23376,3,1,8,1,9,1,3,2,0,0,...,1,8,2,7,1,55,2,2.299141e+06,1,0
23377,2,1,1,1,3,5,5,2,0,0,...,1,8,2,1,1,44,1,1.687435e+06,3,0
23378,2,1,1,1,9,5,5,2,0,0,...,1,8,2,7,1,49,4,7.943237e+06,1,0
