<a href="https://colab.research.google.com/github/alby1976/Data607608Project/blob/master/notebook/Data607Project_OffenderProfile_NN.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# **Data 607 Project:**
**Members: Graeme Kempthorne, Li Lam, Albert Leung, Yu Nakamura**

# Can we predict an offender's profile using neural networks?

# Multilayer Perceptron (MLP) Neural Network

The multilayer perceptron (MLP) is a simple and classic type of neural network. It takes variables in the first input layer and calculates weights as connections to the subsequent hidden layers in the network. Each layer in the hidden section of the network also has weights calculated and a bias node is added. The activation function in the hidden layers is rectified linear unit. Dropout layers are possible options to reduce the number of active nodes and weights. The last dense layer results in the final output. This is a feed forward network so input data goes forward only and does not revise and return to a layer that has been passed. In other words, MLP uses forward propagation to create its algorithm in a supervised learning environment. 

When training a neural network, the loss function accesses the quality of fit and should be minimized. In this case, the mean squared error is used for a regression problem, and the categorical cross-entropy is used for the k-class classification problem. 
The optimizer has an important tunable parameter called the learning rate where it dictates the speed that the algorithm will learn. A higher learning rate would be appropriate at the beginning of the network and a slower learning rate would be utilized near the end of the network to catch the finer grains of information. 
A model is then compiled when it is equipped with a loss function and an optimizer. 


First, there are three offender’s characteristics that are available in the dataset to predict: sex, race and age. Variables that are included to guide this question include:



*   “Solved”: whether or not an offender was identified at the time of the report
*   “FIPS”: numeric code to represent state and county
*   “Year”: year of the incidence or when the body is found
*   “Month”: month of the incidence or when the body is found
*   “VicAge”: age of victim
*   “VicSex”: victim’s sex (male, female, unknown)
*   “VicRace”: victim’s race (American Indian/Alaskan Native, Asian, Black,  Native Hawaiian/Pacific Islander, White or Unknown)
*   “VicCount”: the number of additional victims excluding the primary victim
*   “OffAge”: age of offender
*   “OffSex”: offender’s race (American Indian/Alaskan Native, Asian, Black,  Native Hawaiian/Pacific Islander, White or Unknown)
*   “OffRace”: offender’s race (American Indian/Alaskan Native, Asian, Black,  Native Hawaiian/Pacific Islander, White or Unknown)
*   “OffCount”: the number of additional offenders excluding the primary offender
*   “Weapon”: weapon used in committing the homicide
*   “Relationship”: relationship between victim and offender
*   “Circumstance”: motivation of the crime


**Let's begin by building a model to predict an offender's sex.**

In [None]:
#import some libraries
import pandas as pd
import numpy as np
from sklearn.preprocessing import LabelEncoder, OneHotEncoder, MinMaxScaler
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import BatchNormalization, Conv2D, MaxPool2D, Activation, Dropout, Dense, Flatten, Input
from tensorflow.keras.models import Model
from sklearn.model_selection import train_test_split

In [None]:
#read csv
df = pd.read_csv("/content/drive/MyDrive/Colab Notebooks/SHR76_19.csv")
df

Unnamed: 0,ID,FIPS,CNTYFIPS,Ori,State,Agency,Agentype,Source,Solved,Year,Month,Incident,ActionType,Homicide,Situation,VicAge,VicSex,VicRace,VicEthnic,OffAge,OffSex,OffRace,OffEthnic,Weapon,Relationship,Circumstance,Subcircum,VicCount,OffCount,FileDate
0,197603001AK00101,2020,"Anchorage, AK",AK00101,Alaska,Anchorage,Municipal police,FBI,Yes,1976,March,1,Normal update,Murder and non-negligent manslaughter,Single victim/single offender,48,Male,Unknown,Unknown or not reported,68,Male,Black,Unknown or not reported,"Handgun - pistol, revolver, etc",Relationship not determined,Other arguments,,0,0,30180.0
1,197604001AK00101,2020,"Anchorage, AK",AK00101,Alaska,Anchorage,Municipal police,FBI,Yes,1976,April,1,Normal update,Murder and non-negligent manslaughter,Single victim/single offender,33,Female,White,Unknown or not reported,44,Male,White,Unknown or not reported,"Handgun - pistol, revolver, etc",Girlfriend,Other arguments,,0,0,30180.0
2,197606001AK00101,2020,"Anchorage, AK",AK00101,Alaska,Anchorage,Municipal police,FBI,Yes,1976,June,1,Normal update,Murder and non-negligent manslaughter,Single victim/single offender,38,Male,White,Unknown or not reported,27,Male,Black,Unknown or not reported,"Handgun - pistol, revolver, etc",Stranger,Other,,0,0,30180.0
3,197606002AK00101,2020,"Anchorage, AK",AK00101,Alaska,Anchorage,Municipal police,FBI,Yes,1976,June,2,Normal update,Murder and non-negligent manslaughter,Single victim/single offender,41,Male,White,Unknown or not reported,34,Male,White,Unknown or not reported,"Handgun - pistol, revolver, etc",Other - known to victim,Other arguments,,0,0,30180.0
4,197607001AK00101,2020,"Anchorage, AK",AK00101,Alaska,Anchorage,Municipal police,FBI,Yes,1976,July,1,Normal update,Murder and non-negligent manslaughter,Single victim/single offender,33,Male,American Indian or Alaskan Native,Unknown or not reported,37,Female,American Indian or Alaskan Native,Unknown or not reported,Knife or cutting instrument,Brother,Other arguments,,0,0,30180.0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
804746,201511001WYDI050,56013,"Fremont, WY",WYDI050,Wyoming,Wind River Agency,Tribal,FBI,Yes,2015,November,1,Normal update,Murder and non-negligent manslaughter,Single victim/single offender,37,Female,Unknown,Unknown or not reported,51,Male,American Indian or Alaskan Native,Not of Hispanic origin,"Handgun - pistol, revolver, etc",Common-law husband,Other - not specified,,0,0,32316.0
804747,201707001WYDI050,56013,"Fremont, WY",WYDI050,Wyoming,Wind River Agency,Tribal,FBI,Yes,2017,July,1,Normal update,Murder and non-negligent manslaughter,Single victim/single offender,43,Male,American Indian or Alaskan Native,Not of Hispanic origin,38,Male,American Indian or Alaskan Native,Not of Hispanic origin,Knife or cutting instrument,Friend,Brawl due to influence of alcohol,,0,0,102418.0
804748,201711001WYDI050,56013,"Fremont, WY",WYDI050,Wyoming,Wind River Agency,Tribal,FBI,Yes,2017,November,1,Normal update,Murder and non-negligent manslaughter,Single victim/single offender,36,Male,American Indian or Alaskan Native,Not of Hispanic origin,39,Male,American Indian or Alaskan Native,Not of Hispanic origin,"Blunt object - hammer, club, etc",Acquaintance,Brawl due to influence of alcohol,,0,0,102418.0
804749,201808001WYDI050,56013,"Fremont, WY",WYDI050,Wyoming,Wind River Agency,Tribal,FBI,No,2018,August,1,Normal update,Murder and non-negligent manslaughter,Single victim/unknown offender(s),29,Male,American Indian or Alaskan Native,Not of Hispanic origin,999,Unknown,Unknown,Unknown or not reported,Shotgun,Other - known to victim,Narcotic drug laws,,0,0,93019.0


The data needs to be preprocessed in order to be suitable to feed into the neural network. The data is separated into categorical based columns and numeric based columns. 

The categorical variables are converted to dummies and a given variable is separated into the number of columns as possible categories. For example, the weapon variable will be separated into 17 columns. The possible values for the columns are zero or one. This is important because the data should not imply ordering and they should be distinct entities without relationships between the values.

The numeric variables are scaled because normalized data is more efficient in learning the model. The data is modified to have a mean zero and a variance of one. 

Once the categorical and numeric variables are processed, they are combined into one dataframe can be fed into the neural network.

In [None]:
#dataframe with only categorical columns

dfc = df[["Solved", "Month", "VicSex", "VicRace", "OffRace","Weapon","Relationship","Circumstance"]]
dfc = pd.get_dummies(dfc)
dfc

Unnamed: 0,Solved_No,Solved_Yes,Month_April,Month_August,Month_December,Month_February,Month_January,Month_July,Month_June,Month_March,Month_May,Month_November,Month_October,Month_September,VicSex_Female,VicSex_Male,VicSex_Unknown,VicRace_American Indian or Alaskan Native,VicRace_Asian,VicRace_Black,VicRace_Native Hawaiian or Pacific Islander,VicRace_Unknown,VicRace_White,OffRace_American Indian or Alaskan Native,OffRace_Asian,OffRace_Black,OffRace_Native Hawaiian or Pacific Islander,OffRace_Unknown,OffRace_White,Weapon_Asphyxiation - includes death by gas,"Weapon_Blunt object - hammer, club, etc",Weapon_Drowning,Weapon_Explosives,Weapon_Fire,"Weapon_Firearm, type not stated","Weapon_Handgun - pistol, revolver, etc",Weapon_Knife or cutting instrument,"Weapon_Narcotics or drugs, sleeping pills",Weapon_Other gun,Weapon_Other or type unknown,...,Relationship_Sister,Relationship_Son,Relationship_Stepdaughter,Relationship_Stepfather,Relationship_Stepmother,Relationship_Stepson,Relationship_Stranger,Relationship_Wife,Circumstance_Abortion,Circumstance_All other manslaughter by negligence,Circumstance_All suspected felony type,Circumstance_Argument over money or property,Circumstance_Arson,Circumstance_Brawl due to influence of alcohol,Circumstance_Brawl due to influence of narcotics,Circumstance_Burglary,Circumstance_Child killed by babysitter,Circumstance_Children playing with gun,Circumstance_Circumstances undetermined,Circumstance_Felon killed by police,Circumstance_Felon killed by private citizen,Circumstance_Gambling,Circumstance_Gangland killings,Circumstance_Gun-cleaning death - other than self,Circumstance_Institutional killings,Circumstance_Juvenile gang killings,Circumstance_Larceny,Circumstance_Lovers triangle,Circumstance_Motor vehicle theft,Circumstance_Narcotic drug laws,Circumstance_Other,Circumstance_Other - not specified,Circumstance_Other arguments,Circumstance_Other negligent handling of gun,Circumstance_Other sex offense,Circumstance_Prostitution and commercialized vice,Circumstance_Rape,Circumstance_Robbery,Circumstance_Sniper attack,Circumstance_Victim shot in hunting accident
0,0,1,0,0,0,0,0,0,0,1,0,0,0,0,0,1,0,0,0,0,0,1,0,0,0,1,0,0,0,0,0,0,0,0,0,1,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0
1,0,1,1,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,1,0,0,0,0,0,1,0,0,0,0,0,0,1,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0
2,0,1,0,0,0,0,0,0,1,0,0,0,0,0,0,1,0,0,0,0,0,0,1,0,0,1,0,0,0,0,0,0,0,0,0,1,0,0,0,0,...,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0
3,0,1,0,0,0,0,0,0,1,0,0,0,0,0,0,1,0,0,0,0,0,0,1,0,0,0,0,0,1,0,0,0,0,0,0,1,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0
4,0,1,0,0,0,0,0,1,0,0,0,0,0,0,0,1,0,1,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,...,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
804746,0,1,0,0,0,0,0,0,0,0,0,1,0,0,1,0,0,0,0,0,0,1,0,1,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0
804747,0,1,0,0,0,0,0,1,0,0,0,0,0,0,0,1,0,1,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,...,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
804748,0,1,0,0,0,0,0,0,0,0,0,1,0,0,0,1,0,1,0,0,0,0,0,1,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
804749,1,0,0,1,0,0,0,0,0,0,0,0,0,0,0,1,0,1,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0


In [None]:
#from the original dataframe, retrieve the numeric columns
dfn=df[["FIPS", "Year", "VicAge",	"OffAge", "VicCount",	"OffCount"]]
dfn

Unnamed: 0,FIPS,Year,VicAge,OffAge,VicCount,OffCount
0,2020,1976,48,68,0,0
1,2020,1976,33,44,0,0
2,2020,1976,38,27,0,0
3,2020,1976,41,34,0,0
4,2020,1976,33,37,0,0
...,...,...,...,...,...,...
804746,56013,2015,37,51,0,0
804747,56013,2017,43,38,0,0
804748,56013,2017,36,39,0,0
804749,56013,2018,29,999,0,0


In [None]:
#combine the dataframes
df2=pd.concat([dfn,dfc], axis=1)
df2

Unnamed: 0,FIPS,Year,VicAge,OffAge,VicCount,OffCount,Solved_No,Solved_Yes,Month_April,Month_August,Month_December,Month_February,Month_January,Month_July,Month_June,Month_March,Month_May,Month_November,Month_October,Month_September,VicSex_Female,VicSex_Male,VicSex_Unknown,VicRace_American Indian or Alaskan Native,VicRace_Asian,VicRace_Black,VicRace_Native Hawaiian or Pacific Islander,VicRace_Unknown,VicRace_White,OffRace_American Indian or Alaskan Native,OffRace_Asian,OffRace_Black,OffRace_Native Hawaiian or Pacific Islander,OffRace_Unknown,OffRace_White,Weapon_Asphyxiation - includes death by gas,"Weapon_Blunt object - hammer, club, etc",Weapon_Drowning,Weapon_Explosives,Weapon_Fire,...,Relationship_Sister,Relationship_Son,Relationship_Stepdaughter,Relationship_Stepfather,Relationship_Stepmother,Relationship_Stepson,Relationship_Stranger,Relationship_Wife,Circumstance_Abortion,Circumstance_All other manslaughter by negligence,Circumstance_All suspected felony type,Circumstance_Argument over money or property,Circumstance_Arson,Circumstance_Brawl due to influence of alcohol,Circumstance_Brawl due to influence of narcotics,Circumstance_Burglary,Circumstance_Child killed by babysitter,Circumstance_Children playing with gun,Circumstance_Circumstances undetermined,Circumstance_Felon killed by police,Circumstance_Felon killed by private citizen,Circumstance_Gambling,Circumstance_Gangland killings,Circumstance_Gun-cleaning death - other than self,Circumstance_Institutional killings,Circumstance_Juvenile gang killings,Circumstance_Larceny,Circumstance_Lovers triangle,Circumstance_Motor vehicle theft,Circumstance_Narcotic drug laws,Circumstance_Other,Circumstance_Other - not specified,Circumstance_Other arguments,Circumstance_Other negligent handling of gun,Circumstance_Other sex offense,Circumstance_Prostitution and commercialized vice,Circumstance_Rape,Circumstance_Robbery,Circumstance_Sniper attack,Circumstance_Victim shot in hunting accident
0,2020,1976,48,68,0,0,0,1,0,0,0,0,0,0,0,1,0,0,0,0,0,1,0,0,0,0,0,1,0,0,0,1,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0
1,2020,1976,33,44,0,0,0,1,1,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,1,0,0,0,0,0,1,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0
2,2020,1976,38,27,0,0,0,1,0,0,0,0,0,0,1,0,0,0,0,0,0,1,0,0,0,0,0,0,1,0,0,1,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0
3,2020,1976,41,34,0,0,0,1,0,0,0,0,0,0,1,0,0,0,0,0,0,1,0,0,0,0,0,0,1,0,0,0,0,0,1,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0
4,2020,1976,33,37,0,0,0,1,0,0,0,0,0,1,0,0,0,0,0,0,0,1,0,1,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
804746,56013,2015,37,51,0,0,0,1,0,0,0,0,0,0,0,0,0,1,0,0,1,0,0,0,0,0,0,1,0,1,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0
804747,56013,2017,43,38,0,0,0,1,0,0,0,0,0,1,0,0,0,0,0,0,0,1,0,1,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
804748,56013,2017,36,39,0,0,0,1,0,0,0,0,0,0,0,0,0,1,0,0,0,1,0,1,0,0,0,0,0,1,0,0,0,0,0,0,1,0,0,0,...,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
804749,56013,2018,29,999,0,0,1,0,0,1,0,0,0,0,0,0,0,0,0,0,0,1,0,1,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0


In [None]:
#for my checking
df2.iloc[:, 6:]

Unnamed: 0,Solved_No,Solved_Yes,Month_April,Month_August,Month_December,Month_February,Month_January,Month_July,Month_June,Month_March,Month_May,Month_November,Month_October,Month_September,VicSex_Female,VicSex_Male,VicSex_Unknown,VicRace_American Indian or Alaskan Native,VicRace_Asian,VicRace_Black,VicRace_Native Hawaiian or Pacific Islander,VicRace_Unknown,VicRace_White,OffRace_American Indian or Alaskan Native,OffRace_Asian,OffRace_Black,OffRace_Native Hawaiian or Pacific Islander,OffRace_Unknown,OffRace_White,Weapon_Asphyxiation - includes death by gas,"Weapon_Blunt object - hammer, club, etc",Weapon_Drowning,Weapon_Explosives,Weapon_Fire,"Weapon_Firearm, type not stated","Weapon_Handgun - pistol, revolver, etc",Weapon_Knife or cutting instrument,"Weapon_Narcotics or drugs, sleeping pills",Weapon_Other gun,Weapon_Other or type unknown,...,Relationship_Sister,Relationship_Son,Relationship_Stepdaughter,Relationship_Stepfather,Relationship_Stepmother,Relationship_Stepson,Relationship_Stranger,Relationship_Wife,Circumstance_Abortion,Circumstance_All other manslaughter by negligence,Circumstance_All suspected felony type,Circumstance_Argument over money or property,Circumstance_Arson,Circumstance_Brawl due to influence of alcohol,Circumstance_Brawl due to influence of narcotics,Circumstance_Burglary,Circumstance_Child killed by babysitter,Circumstance_Children playing with gun,Circumstance_Circumstances undetermined,Circumstance_Felon killed by police,Circumstance_Felon killed by private citizen,Circumstance_Gambling,Circumstance_Gangland killings,Circumstance_Gun-cleaning death - other than self,Circumstance_Institutional killings,Circumstance_Juvenile gang killings,Circumstance_Larceny,Circumstance_Lovers triangle,Circumstance_Motor vehicle theft,Circumstance_Narcotic drug laws,Circumstance_Other,Circumstance_Other - not specified,Circumstance_Other arguments,Circumstance_Other negligent handling of gun,Circumstance_Other sex offense,Circumstance_Prostitution and commercialized vice,Circumstance_Rape,Circumstance_Robbery,Circumstance_Sniper attack,Circumstance_Victim shot in hunting accident
0,0,1,0,0,0,0,0,0,0,1,0,0,0,0,0,1,0,0,0,0,0,1,0,0,0,1,0,0,0,0,0,0,0,0,0,1,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0
1,0,1,1,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,1,0,0,0,0,0,1,0,0,0,0,0,0,1,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0
2,0,1,0,0,0,0,0,0,1,0,0,0,0,0,0,1,0,0,0,0,0,0,1,0,0,1,0,0,0,0,0,0,0,0,0,1,0,0,0,0,...,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0
3,0,1,0,0,0,0,0,0,1,0,0,0,0,0,0,1,0,0,0,0,0,0,1,0,0,0,0,0,1,0,0,0,0,0,0,1,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0
4,0,1,0,0,0,0,0,1,0,0,0,0,0,0,0,1,0,1,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,...,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
804746,0,1,0,0,0,0,0,0,0,0,0,1,0,0,1,0,0,0,0,0,0,1,0,1,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0
804747,0,1,0,0,0,0,0,1,0,0,0,0,0,0,0,1,0,1,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,...,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
804748,0,1,0,0,0,0,0,0,0,0,0,1,0,0,0,1,0,1,0,0,0,0,0,1,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
804749,1,0,0,1,0,0,0,0,0,0,0,0,0,0,0,1,0,1,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0


In [None]:
#process the crime attributes function 

def process_crime_attributes(df, train, test):

  #scale the numeric columns
  continuous=["FIPS", "Year", "VicAge", "OffAge", "VicCount", "OffCount"]
  cs = MinMaxScaler()
  trainContinuous=cs.fit_transform(train[continuous])
  testContinuous=cs.transform(test[continuous])
  
  #retrieve the categorical columns
  trainCategorical = train.iloc[:, 6:]
  testCategorical = test.iloc[:, 6:]

  #combine the categorical and numeric columns into train and testing X
  trainX = np.hstack([trainCategorical, trainContinuous])
  testX = np.hstack([testCategorical, testContinuous])  
  return(trainX,testX)

In [None]:
#make the y variable (offender's sex) into a dummy 

y=pd.get_dummies(df.OffSex)
print(y)

        Female  Male  Unknown
0            0     1        0
1            0     1        0
2            0     1        0
3            0     1        0
4            1     0        0
...        ...   ...      ...
804746       0     1        0
804747       0     1        0
804748       0     1        0
804749       0     0        1
804750       0     1        0

[804751 rows x 3 columns]


In [None]:
#create the MLP sequential model
#hidden layers have a relu activation function
#dropout layers help reduce overfitting
#final layer has three categories for offender's sex with 
#a softmax activation function


def create_mlp(dim):
  model=Sequential()
  model.add(Dense(100, input_dim=dim, activation="relu"))
  
  model.add(Dropout(0.2))
  model.add(Dense(28, activation="relu"))
  model.add(Dense(3, activation="softmax"))
  return model
                    

In [None]:
#from original data, split into training and testing sets

splits = train_test_split(df2, y, test_size=0.2, random_state=42)
(trainAttrX, testAttrX, trainY, testY) = splits

In [None]:
#more processing of the data
(trainAttrX, testAttrX) = process_crime_attributes(df2,trainAttrX, testAttrX)

In [None]:
#create the model
mlp=create_mlp(trainAttrX.shape[1])
model=mlp

In [None]:
#compile model
#categorical cross entropy is appropriate loss function
#because the y's are already one hot encoded

model.compile(loss="categorical_crossentropy", optimizer="adam", metrics=["accuracy"])
model.summary()

Model: "sequential"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
dense (Dense)                (None, 100)               11400     
_________________________________________________________________
dropout (Dropout)            (None, 100)               0         
_________________________________________________________________
dense_1 (Dense)              (None, 28)                2828      
_________________________________________________________________
dense_2 (Dense)              (None, 3)                 87        
Total params: 14,315
Trainable params: 14,315
Non-trainable params: 0
_________________________________________________________________


In [None]:
#fit the model
model.fit(x=trainAttrX, y=trainY, validation_data=(testAttrX, testY), epochs=10, batch_size=32)


Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10


<tensorflow.python.keras.callbacks.History at 0x7f965da4ea50>

In [None]:
#accuracy rate for the test data
pred=model.predict([testAttrX])
testY = np.asarray(testY)
np.mean(testY.argmax(axis=1) == pred.argmax(axis=1))


0.9538244558903021

The model after 10 epochs has approximately 95% accuracy for the training and validation data for determining the offender's sex.

**Let's continue the prediction by building a MLP model of the offender's race.**

The same code is applied as above but the offender's sex has been switched to the offender's race. Please move ahead if you wish to skip the code.

In [None]:
#read data
df = pd.read_csv("/content/drive/MyDrive/Colab Notebooks/SHR76_19.csv")


In [None]:
#dataframe with only categorical columns
dfc = df[["Solved", "Month", "VicSex", "VicRace", "OffSex","Weapon","Relationship","Circumstance"]]
dfc = pd.get_dummies(dfc)

#from the original dataframe, retrieve the numeric columns
dfn=df[["FIPS", "Year", "VicAge",	"OffAge", "VicCount",	"OffCount"]]

#combine the dataframes
df2=pd.concat([dfn,dfc], axis=1)
df2

Unnamed: 0,FIPS,Year,VicAge,OffAge,VicCount,OffCount,Solved_No,Solved_Yes,Month_April,Month_August,Month_December,Month_February,Month_January,Month_July,Month_June,Month_March,Month_May,Month_November,Month_October,Month_September,VicSex_Female,VicSex_Male,VicSex_Unknown,VicRace_American Indian or Alaskan Native,VicRace_Asian,VicRace_Black,VicRace_Native Hawaiian or Pacific Islander,VicRace_Unknown,VicRace_White,OffSex_Female,OffSex_Male,OffSex_Unknown,Weapon_Asphyxiation - includes death by gas,"Weapon_Blunt object - hammer, club, etc",Weapon_Drowning,Weapon_Explosives,Weapon_Fire,"Weapon_Firearm, type not stated","Weapon_Handgun - pistol, revolver, etc",Weapon_Knife or cutting instrument,...,Relationship_Sister,Relationship_Son,Relationship_Stepdaughter,Relationship_Stepfather,Relationship_Stepmother,Relationship_Stepson,Relationship_Stranger,Relationship_Wife,Circumstance_Abortion,Circumstance_All other manslaughter by negligence,Circumstance_All suspected felony type,Circumstance_Argument over money or property,Circumstance_Arson,Circumstance_Brawl due to influence of alcohol,Circumstance_Brawl due to influence of narcotics,Circumstance_Burglary,Circumstance_Child killed by babysitter,Circumstance_Children playing with gun,Circumstance_Circumstances undetermined,Circumstance_Felon killed by police,Circumstance_Felon killed by private citizen,Circumstance_Gambling,Circumstance_Gangland killings,Circumstance_Gun-cleaning death - other than self,Circumstance_Institutional killings,Circumstance_Juvenile gang killings,Circumstance_Larceny,Circumstance_Lovers triangle,Circumstance_Motor vehicle theft,Circumstance_Narcotic drug laws,Circumstance_Other,Circumstance_Other - not specified,Circumstance_Other arguments,Circumstance_Other negligent handling of gun,Circumstance_Other sex offense,Circumstance_Prostitution and commercialized vice,Circumstance_Rape,Circumstance_Robbery,Circumstance_Sniper attack,Circumstance_Victim shot in hunting accident
0,2020,1976,48,68,0,0,0,1,0,0,0,0,0,0,0,1,0,0,0,0,0,1,0,0,0,0,0,1,0,0,1,0,0,0,0,0,0,0,1,0,...,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0
1,2020,1976,33,44,0,0,0,1,1,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,1,0,1,0,0,0,0,0,0,0,1,0,...,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0
2,2020,1976,38,27,0,0,0,1,0,0,0,0,0,0,1,0,0,0,0,0,0,1,0,0,0,0,0,0,1,0,1,0,0,0,0,0,0,0,1,0,...,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0
3,2020,1976,41,34,0,0,0,1,0,0,0,0,0,0,1,0,0,0,0,0,0,1,0,0,0,0,0,0,1,0,1,0,0,0,0,0,0,0,1,0,...,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0
4,2020,1976,33,37,0,0,0,1,0,0,0,0,0,1,0,0,0,0,0,0,0,1,0,1,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,1,...,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
804746,56013,2015,37,51,0,0,0,1,0,0,0,0,0,0,0,0,0,1,0,0,1,0,0,0,0,0,0,1,0,0,1,0,0,0,0,0,0,0,1,0,...,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0
804747,56013,2017,43,38,0,0,0,1,0,0,0,0,0,1,0,0,0,0,0,0,0,1,0,1,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,1,...,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
804748,56013,2017,36,39,0,0,0,1,0,0,0,0,0,0,0,0,0,1,0,0,0,1,0,1,0,0,0,0,0,0,1,0,0,1,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
804749,56013,2018,29,999,0,0,1,0,0,1,0,0,0,0,0,0,0,0,0,0,0,1,0,1,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0


In [None]:
#for my checking
df2.iloc[:, 6:]

Unnamed: 0,Solved_No,Solved_Yes,Month_April,Month_August,Month_December,Month_February,Month_January,Month_July,Month_June,Month_March,Month_May,Month_November,Month_October,Month_September,VicSex_Female,VicSex_Male,VicSex_Unknown,VicRace_American Indian or Alaskan Native,VicRace_Asian,VicRace_Black,VicRace_Native Hawaiian or Pacific Islander,VicRace_Unknown,VicRace_White,OffSex_Female,OffSex_Male,OffSex_Unknown,Weapon_Asphyxiation - includes death by gas,"Weapon_Blunt object - hammer, club, etc",Weapon_Drowning,Weapon_Explosives,Weapon_Fire,"Weapon_Firearm, type not stated","Weapon_Handgun - pistol, revolver, etc",Weapon_Knife or cutting instrument,"Weapon_Narcotics or drugs, sleeping pills",Weapon_Other gun,Weapon_Other or type unknown,"Weapon_Personal weapons, includes beating",Weapon_Poison - does not include gas,Weapon_Pushed or thrown out window,...,Relationship_Sister,Relationship_Son,Relationship_Stepdaughter,Relationship_Stepfather,Relationship_Stepmother,Relationship_Stepson,Relationship_Stranger,Relationship_Wife,Circumstance_Abortion,Circumstance_All other manslaughter by negligence,Circumstance_All suspected felony type,Circumstance_Argument over money or property,Circumstance_Arson,Circumstance_Brawl due to influence of alcohol,Circumstance_Brawl due to influence of narcotics,Circumstance_Burglary,Circumstance_Child killed by babysitter,Circumstance_Children playing with gun,Circumstance_Circumstances undetermined,Circumstance_Felon killed by police,Circumstance_Felon killed by private citizen,Circumstance_Gambling,Circumstance_Gangland killings,Circumstance_Gun-cleaning death - other than self,Circumstance_Institutional killings,Circumstance_Juvenile gang killings,Circumstance_Larceny,Circumstance_Lovers triangle,Circumstance_Motor vehicle theft,Circumstance_Narcotic drug laws,Circumstance_Other,Circumstance_Other - not specified,Circumstance_Other arguments,Circumstance_Other negligent handling of gun,Circumstance_Other sex offense,Circumstance_Prostitution and commercialized vice,Circumstance_Rape,Circumstance_Robbery,Circumstance_Sniper attack,Circumstance_Victim shot in hunting accident
0,0,1,0,0,0,0,0,0,0,1,0,0,0,0,0,1,0,0,0,0,0,1,0,0,1,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0
1,0,1,1,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,1,0,1,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0
2,0,1,0,0,0,0,0,0,1,0,0,0,0,0,0,1,0,0,0,0,0,0,1,0,1,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,...,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0
3,0,1,0,0,0,0,0,0,1,0,0,0,0,0,0,1,0,0,0,0,0,0,1,0,1,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0
4,0,1,0,0,0,0,0,1,0,0,0,0,0,0,0,1,0,1,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
804746,0,1,0,0,0,0,0,0,0,0,0,1,0,0,1,0,0,0,0,0,0,1,0,0,1,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0
804747,0,1,0,0,0,0,0,1,0,0,0,0,0,0,0,1,0,1,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
804748,0,1,0,0,0,0,0,0,0,0,0,1,0,0,0,1,0,1,0,0,0,0,0,0,1,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
804749,1,0,0,1,0,0,0,0,0,0,0,0,0,0,0,1,0,1,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0


In [None]:
#process the crime attributes 

def process_crime_attributes(df, train, test):

  #scale the numeric columns
  continuous=["FIPS", "Year", "VicAge", "OffAge", "VicCount", "OffCount"]
  cs = MinMaxScaler()
  trainContinuous=cs.fit_transform(train[continuous])
  testContinuous=cs.transform(test[continuous])
  
  #retrieve the categorical columns
  trainCategorical = train.iloc[:, 6:]
  testCategorical = test.iloc[:, 6:]

  #combine the categorical and numeric columns into train and testing X
  trainX = np.hstack([trainCategorical, trainContinuous])
  testX = np.hstack([testCategorical, testContinuous])  
  return(trainX,testX)

In [None]:
#make the y variable (offender's race) into a dummy 

y=pd.get_dummies(df.OffRace)
print(y)

        American Indian or Alaskan Native  Asian  ...  Unknown  White
0                                       0      0  ...        0      0
1                                       0      0  ...        0      1
2                                       0      0  ...        0      0
3                                       0      0  ...        0      1
4                                       1      0  ...        0      0
...                                   ...    ...  ...      ...    ...
804746                                  1      0  ...        0      0
804747                                  1      0  ...        0      0
804748                                  1      0  ...        0      0
804749                                  0      0  ...        1      0
804750                                  1      0  ...        0      0

[804751 rows x 6 columns]


In [None]:
#multilayer perceptron model creation
def create_mlp(dim):
  model=Sequential()
  model.add(Dense(128, input_dim=dim, activation="relu"))
  
  model.add(Dropout(0.2))
  model.add(Dense(28, activation="relu"))
  model.add(Dense(6, activation="softmax"))
  return model
                 

In [None]:
#split data into training and testing set

splits = train_test_split(df2, y, test_size=0.2, random_state=42)
(trainAttrX, testAttrX, trainY, testY) = splits

In [None]:
#more processing
(trainAttrX, testAttrX) = process_crime_attributes(df2,trainAttrX, testAttrX)

In [None]:
#compile the model
mlp=create_mlp(trainAttrX.shape[1])
model=mlp
model.compile(loss="categorical_crossentropy", optimizer="adam", metrics=["accuracy"])
model.summary()

Model: "sequential_1"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
dense_3 (Dense)              (None, 128)               14208     
_________________________________________________________________
dropout_1 (Dropout)          (None, 128)               0         
_________________________________________________________________
dense_4 (Dense)              (None, 28)                3612      
_________________________________________________________________
dense_5 (Dense)              (None, 6)                 174       
Total params: 17,994
Trainable params: 17,994
Non-trainable params: 0
_________________________________________________________________


In [None]:
model.fit(x=trainAttrX, y=trainY, validation_data=(testAttrX, testY), epochs=10, batch_size=32)

Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10


<tensorflow.python.keras.callbacks.History at 0x7f96562ecdd0>

In [None]:
#prediction accuracy
pred=model.predict([testAttrX])
testY = np.asarray(testY)
np.mean(testY.argmax(axis=1) == pred.argmax(axis=1))


0.9070151785325968

The model after 10 epochs has approximately 90% accuracy for the training and validation data for determining the offender's race.

**Let's continue the prediction by building a MLP model of the offender's age.**

In [None]:
df = pd.read_csv("/content/drive/MyDrive/Colab Notebooks/SHR76_19.csv")

#dataframe with only categorical columns

dfc = df[["Solved", "Month", "VicSex", "VicRace", "OffSex", "OffRace", "Weapon","Relationship","Circumstance"]]
dfc = pd.get_dummies(dfc)

#from the original dataframe, retrieve the numeric columns
dfn=df[["FIPS", "Year", "VicAge", "VicCount",	"OffCount"]]

#combine the dataframes
df2=pd.concat([dfn,dfc], axis=1)
df2

Unnamed: 0,FIPS,Year,VicAge,VicCount,OffCount,Solved_No,Solved_Yes,Month_April,Month_August,Month_December,Month_February,Month_January,Month_July,Month_June,Month_March,Month_May,Month_November,Month_October,Month_September,VicSex_Female,VicSex_Male,VicSex_Unknown,VicRace_American Indian or Alaskan Native,VicRace_Asian,VicRace_Black,VicRace_Native Hawaiian or Pacific Islander,VicRace_Unknown,VicRace_White,OffSex_Female,OffSex_Male,OffSex_Unknown,OffRace_American Indian or Alaskan Native,OffRace_Asian,OffRace_Black,OffRace_Native Hawaiian or Pacific Islander,OffRace_Unknown,OffRace_White,Weapon_Asphyxiation - includes death by gas,"Weapon_Blunt object - hammer, club, etc",Weapon_Drowning,...,Relationship_Sister,Relationship_Son,Relationship_Stepdaughter,Relationship_Stepfather,Relationship_Stepmother,Relationship_Stepson,Relationship_Stranger,Relationship_Wife,Circumstance_Abortion,Circumstance_All other manslaughter by negligence,Circumstance_All suspected felony type,Circumstance_Argument over money or property,Circumstance_Arson,Circumstance_Brawl due to influence of alcohol,Circumstance_Brawl due to influence of narcotics,Circumstance_Burglary,Circumstance_Child killed by babysitter,Circumstance_Children playing with gun,Circumstance_Circumstances undetermined,Circumstance_Felon killed by police,Circumstance_Felon killed by private citizen,Circumstance_Gambling,Circumstance_Gangland killings,Circumstance_Gun-cleaning death - other than self,Circumstance_Institutional killings,Circumstance_Juvenile gang killings,Circumstance_Larceny,Circumstance_Lovers triangle,Circumstance_Motor vehicle theft,Circumstance_Narcotic drug laws,Circumstance_Other,Circumstance_Other - not specified,Circumstance_Other arguments,Circumstance_Other negligent handling of gun,Circumstance_Other sex offense,Circumstance_Prostitution and commercialized vice,Circumstance_Rape,Circumstance_Robbery,Circumstance_Sniper attack,Circumstance_Victim shot in hunting accident
0,2020,1976,48,0,0,0,1,0,0,0,0,0,0,0,1,0,0,0,0,0,1,0,0,0,0,0,1,0,0,1,0,0,0,1,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0
1,2020,1976,33,0,0,0,1,1,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,1,0,1,0,0,0,0,0,0,1,0,0,0,...,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0
2,2020,1976,38,0,0,0,1,0,0,0,0,0,0,1,0,0,0,0,0,0,1,0,0,0,0,0,0,1,0,1,0,0,0,1,0,0,0,0,0,0,...,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0
3,2020,1976,41,0,0,0,1,0,0,0,0,0,0,1,0,0,0,0,0,0,1,0,0,0,0,0,0,1,0,1,0,0,0,0,0,0,1,0,0,0,...,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0
4,2020,1976,33,0,0,0,1,0,0,0,0,0,1,0,0,0,0,0,0,0,1,0,1,0,0,0,0,0,1,0,0,1,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
804746,56013,2015,37,0,0,0,1,0,0,0,0,0,0,0,0,0,1,0,0,1,0,0,0,0,0,0,1,0,0,1,0,1,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0
804747,56013,2017,43,0,0,0,1,0,0,0,0,0,1,0,0,0,0,0,0,0,1,0,1,0,0,0,0,0,0,1,0,1,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
804748,56013,2017,36,0,0,0,1,0,0,0,0,0,0,0,0,0,1,0,0,0,1,0,1,0,0,0,0,0,0,1,0,1,0,0,0,0,0,0,1,0,...,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
804749,56013,2018,29,0,0,1,0,0,1,0,0,0,0,0,0,0,0,0,0,0,1,0,1,0,0,0,0,0,0,0,1,0,0,0,0,1,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0


In [None]:
#for my checking
df2.iloc[:, 5:]

Unnamed: 0,Solved_No,Solved_Yes,Month_April,Month_August,Month_December,Month_February,Month_January,Month_July,Month_June,Month_March,Month_May,Month_November,Month_October,Month_September,VicSex_Female,VicSex_Male,VicSex_Unknown,VicRace_American Indian or Alaskan Native,VicRace_Asian,VicRace_Black,VicRace_Native Hawaiian or Pacific Islander,VicRace_Unknown,VicRace_White,OffSex_Female,OffSex_Male,OffSex_Unknown,OffRace_American Indian or Alaskan Native,OffRace_Asian,OffRace_Black,OffRace_Native Hawaiian or Pacific Islander,OffRace_Unknown,OffRace_White,Weapon_Asphyxiation - includes death by gas,"Weapon_Blunt object - hammer, club, etc",Weapon_Drowning,Weapon_Explosives,Weapon_Fire,"Weapon_Firearm, type not stated","Weapon_Handgun - pistol, revolver, etc",Weapon_Knife or cutting instrument,...,Relationship_Sister,Relationship_Son,Relationship_Stepdaughter,Relationship_Stepfather,Relationship_Stepmother,Relationship_Stepson,Relationship_Stranger,Relationship_Wife,Circumstance_Abortion,Circumstance_All other manslaughter by negligence,Circumstance_All suspected felony type,Circumstance_Argument over money or property,Circumstance_Arson,Circumstance_Brawl due to influence of alcohol,Circumstance_Brawl due to influence of narcotics,Circumstance_Burglary,Circumstance_Child killed by babysitter,Circumstance_Children playing with gun,Circumstance_Circumstances undetermined,Circumstance_Felon killed by police,Circumstance_Felon killed by private citizen,Circumstance_Gambling,Circumstance_Gangland killings,Circumstance_Gun-cleaning death - other than self,Circumstance_Institutional killings,Circumstance_Juvenile gang killings,Circumstance_Larceny,Circumstance_Lovers triangle,Circumstance_Motor vehicle theft,Circumstance_Narcotic drug laws,Circumstance_Other,Circumstance_Other - not specified,Circumstance_Other arguments,Circumstance_Other negligent handling of gun,Circumstance_Other sex offense,Circumstance_Prostitution and commercialized vice,Circumstance_Rape,Circumstance_Robbery,Circumstance_Sniper attack,Circumstance_Victim shot in hunting accident
0,0,1,0,0,0,0,0,0,0,1,0,0,0,0,0,1,0,0,0,0,0,1,0,0,1,0,0,0,1,0,0,0,0,0,0,0,0,0,1,0,...,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0
1,0,1,1,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,1,0,1,0,0,0,0,0,0,1,0,0,0,0,0,0,1,0,...,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0
2,0,1,0,0,0,0,0,0,1,0,0,0,0,0,0,1,0,0,0,0,0,0,1,0,1,0,0,0,1,0,0,0,0,0,0,0,0,0,1,0,...,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0
3,0,1,0,0,0,0,0,0,1,0,0,0,0,0,0,1,0,0,0,0,0,0,1,0,1,0,0,0,0,0,0,1,0,0,0,0,0,0,1,0,...,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0
4,0,1,0,0,0,0,0,1,0,0,0,0,0,0,0,1,0,1,0,0,0,0,0,1,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,1,...,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
804746,0,1,0,0,0,0,0,0,0,0,0,1,0,0,1,0,0,0,0,0,0,1,0,0,1,0,1,0,0,0,0,0,0,0,0,0,0,0,1,0,...,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0
804747,0,1,0,0,0,0,0,1,0,0,0,0,0,0,0,1,0,1,0,0,0,0,0,0,1,0,1,0,0,0,0,0,0,0,0,0,0,0,0,1,...,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
804748,0,1,0,0,0,0,0,0,0,0,0,1,0,0,0,1,0,1,0,0,0,0,0,0,1,0,1,0,0,0,0,0,0,1,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
804749,1,0,0,1,0,0,0,0,0,0,0,0,0,0,0,1,0,1,0,0,0,0,0,0,0,1,0,0,0,0,1,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0


In [None]:
#process the crime attributes 

def process_crime_attributes(df, train, test):

  #scale the numeric columns
  continuous=["FIPS", "Year", "VicAge", "VicCount", "OffCount"]
  cs = MinMaxScaler()
  trainContinuous=cs.fit_transform(train[continuous])
  testContinuous=cs.transform(test[continuous])
  
  #retrieve the categorical columns
  trainCategorical = train.iloc[:, 5:]
  testCategorical = test.iloc[:, 5:]

  #combine the categorical and numeric columns into train and testing X
  trainX = np.hstack([trainCategorical, trainContinuous])
  testX = np.hstack([testCategorical, testContinuous])  
  return(trainX,testX)

In [None]:
#y variable: OffAge
#need to scale?

mms = MinMaxScaler()
y = np.array([df.OffAge]).reshape(-1,1)
y=mms.fit_transform(y)


#y=df.OffAge
print(y)

[[0.06806807]
 [0.04404404]
 [0.02702703]
 ...
 [0.03903904]
 [1.        ]
 [0.03003003]]


In [None]:
#multilayer perceptron
def create_mlp(dim):
  model=Sequential()
  model.add(Dense(128, input_dim=dim, activation="relu"))
  
  model.add(Dropout(0.2))
  model.add(Dense(28, activation="relu"))

  #because now it's a regression problem, the activation function is 
  #to linear with one output
  model.add(Dense(1, activation="linear"))
  return model
                 

In [None]:
#splitting the data into training and testing sets

splits = train_test_split(df2, y, test_size=0.2, random_state=42)
(trainAttrX, testAttrX, trainY, testY) = splits
(trainAttrX, testAttrX) = process_crime_attributes(df2,trainAttrX, testAttrX)

In [None]:
mlp=create_mlp(trainAttrX.shape[1])
model=mlp
model.compile(loss="mean_squared_error", optimizer="adam", metrics=["mse"])
#model.compile(loss="mean_absolute_error", optimizer="adam", metrics=["mse"])
model.summary()

Model: "sequential_5"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
dense_15 (Dense)             (None, 128)               14848     
_________________________________________________________________
dropout_5 (Dropout)          (None, 128)               0         
_________________________________________________________________
dense_16 (Dense)             (None, 28)                3612      
_________________________________________________________________
dense_17 (Dense)             (None, 1)                 29        
Total params: 18,489
Trainable params: 18,489
Non-trainable params: 0
_________________________________________________________________


In [None]:
#fit the model: non scaled y, mean squared error
model.fit(x=trainAttrX, y=trainY, validation_data=(testAttrX, testY), epochs=10, batch_size=32)

Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10


<tensorflow.python.keras.callbacks.History at 0x7f9653986410>

In [None]:
#fit the model: non scaled y, mean absolute error
model.fit(x=trainAttrX, y=trainY, validation_data=(testAttrX, testY), epochs=10, batch_size=32)

Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10


<tensorflow.python.keras.callbacks.History at 0x7f96599fb110>

In [None]:
#scaled y, mse
model.fit(x=trainAttrX, y=trainY, validation_data=(testAttrX, testY), epochs=10, batch_size=32)

Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10


<tensorflow.python.keras.callbacks.History at 0x7f9656aae810>

In [None]:
#scaled y, mean absolute error
model.fit(x=trainAttrX, y=trainY, validation_data=(testAttrX, testY), epochs=10, batch_size=32)

Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10


<tensorflow.python.keras.callbacks.History at 0x7f965304b250>

The model after 10 epochs has approximately 0.03 mean squared error for the training and validation data for determining the offender's age.

In conclusion, the three offender's characteristics of sex, race and age are modelled using multilayer perceptron neural networks. The dense layers are alternated with different parameters including the dropout layer but there are no overwhelming changes to the result. With each successive epoch of the model fitting, the accuracy and the mean squared errors remain stable with minimal fluctations. Since the training and testing sets' quality measures are similar, there is no indication of under or over fitting the model. The three models yield high accuracy and low mean squared error  demonstrating the robustness of the model. With this dataset, the murderer's profile can easily be determined using MLP!