## Homicide Reports, 1980-2014 Data Visualization


importing libraries and plotly libraries

In [None]:
import pandas as pd #Importing Pandas data library to manipulate the data and clean and preprocess the data
import plotly.express as px #importing plotless express for quick visualizations
import plotly.graph_objects as go #importing graph objects
import plotly.figure_factory as ff #importing figure factory 

Importing the data and checking if any issues with the data

https://www.kaggle.com/datasets/murderaccountability/homicide-reports

In [None]:
df = pd.read_csv('database.csv') #using pandas library to read the import the database and assign to df variable. 
pd.set_option('display.max_columns', None) #expanding hidden columns
df = pd.DataFrame(df) #creating a data frame and assigining to the variable df
df.head(5) # displaying the first 5 rows of the df

In [None]:
df.shape # checking the size of the rows. 

In [None]:
df.columns #Double check all the columns

In [None]:
unique_values = df.apply(lambda x: x.unique()) # Checking the unique values for each column
print(unique_values)

## Data Preprocessing

Changing Data Type of 'Peperator Age' from Object to Numeric Vlaues


In [None]:
df.dtypes #checking the types, "Peperator age = Object" need to change it to int value

In [None]:
#Tried to use astype method to change data type from object to int didnâ€™t work.
df['Perpetrator Age'] = pd.to_numeric(df['Perpetrator Age'], errors='coerce') #  Used new method
df['Victim Age'] = df['Victim Age'].astype(float) #convereted both data types to float
df.dtypes

Checking if there are Null Values in the Data Set & Correcting it

In [None]:
df.isnull().sum().sum() 

# one null value

In [None]:
#locating where the null value is

null_columns = df.columns[df.isnull().any()]
print(null_columns)

In [None]:
#removing null values

df = df.dropna()


In [None]:
df.isnull().sum().sum()  

#Double checking if null value == 0


Incorrect Data Values in 'Victim Age' and Removing The Incorrect Rows

In [None]:
#'Victim Age' data incorrect. 
counts = df.groupby('Victim Age').size()
counts # 99 = 9281 & 998 = 974. Both values are irragular.

In [None]:
df = df.loc[~df['Victim Age'].isin([998, 99])]

In [None]:
counts = df.groupby('Victim Age').size()
counts #checking if the data rows have been removed

Dropping Columns Which are Irrelevant in The Data Frame


In [None]:
#removing unwanted columns

df.drop(['Agency Code', 'Record ID', 'Record Source'], axis = 1, inplace=True)
df.head()

Arranging Data Frame Attributes Correctly

In [None]:
# indexing the columns for easy analyzing based on my preference

df = df.reindex(columns=['Year', 'Month', 'City', 'Agency Type', 'Agency Name', 'Crime Type', 'Crime Solved', 'Victim Sex', 'Victim Age', 'Victim Race', 'Victim Ethnicity', 'Victim Count', 'Perpetrator Sex', 'Perpetrator Age', 'Perpetrator Race', 'Perpetrator Ethnicity', 'Perpetrator Count', 'Weapon', 'Relationship'])
df.head()

New Measurement to Calculate the Age Difference between 'Victim Age & Perpretor Age'


In [None]:
#creating a new column 'Victim and Pepreator Age Difference

def age_difference(row):
    return row['Victim Age'] - row['Perpetrator Age']

df['age_difference'] = df.apply(age_difference, axis=1)

df.head()

In [None]:
df['age_difference'] = df['age_difference'].apply(abs) #making it absoulute values
df.head()

In [None]:
df['age_difference'] = df['age_difference'].apply(lambda x: '{:.0f}'.format(x)) #Remove the decimal values, so the data looks clean
df.head()

In [None]:
# Rename the column 'age_difference' to 'Age Difference'. So the final data frame looks clean

df = df.rename(columns={'age_difference': 'Age Difference'})
df.head()

## Histogram on Victim Age Perpetrator Age

Visualizing Victim age and Perpetrator age vs number of murders to see if there is a difference.

In [None]:
fig = px.histogram(df,
                   x='Victim Age',
                   nbins=10,          # Created 10 bins to represent 10 year bins.
                   color = 'Victim Sex', 
                   title='Victim Murder Count by Age Group', 
                   template="plotly_dark")
# Update the y-axis label
fig.update_layout(yaxis_title="Number of Murders")
fig.show()

175,000 victims are male and in the age group of 20 to 29

In [None]:
# Histogram on Murder count by age group for perpetrators
fig = px.histogram(df,
                   x='Perpetrator Age', 
                   nbins=10,  #creating bins for the age by 10 years
                   color = 'Perpetrator Sex', 
                   title='Perpetrator Murder Count by Age Group', 
                   template="plotly_dark")
#adding annontation
annotation = {'x': 5, 'y': 150000, 'showarrow': True, 'arrowhead': 4, 'font': {'color': 'white'}, 'text': 'Over 185,000 Unknown Perpetrators'} 
fig.update_layout({'annotations': [annotation]})
fig.update_layout(yaxis_title="Number of Murders") #update y axis title
fig.show()

Over 184,000 murders have been committed by unknown perpetrators. Therefore, these are unknown killers who cannot be identified and may or may not have been prosecuted.

# Sunburst Chart on Victim and Perpetrator Race and Weapon Type
Identifying patterns in racial killings among whites and blacks, including the most commonly used weapons by perpetrators. (Sunburst Chart)

In [None]:
labels = ['Victim Race', 'Perpetrator Race', 'Weapon'] #creating labels list
fig = px.sunburst(df, 
                  path=labels, 
                  values='Victim Count',
                  title='Examining the Impact of Race on Victims and Perpetrator Race in Violent Incidents Involving Weapon Use', 
                  template="plotly_dark")
fig.update_traces(textinfo='label+value', insidetextorientation='radial')
fig.show()

Of the victims, 64% are white, and of the perpetrators, 67% are white. Most of the killings were committed using handguns.
Of the 25742 black victims, the perpetrators in 15450 of these incidents used handguns to kill.

Based on the above visualization, the crimes commited are not racial killings. However, 'Handguns' are used to commite most of the crimes.

Visualizing the deaths of victims by year and month to see if there is a pattern in the months when killings occur.


In [None]:
fig = px.histogram(data_frame=df, x='Year', y = 'Victim Count', color='Month', nbins = 20, template="plotly_dark")
fig.update_layout(bargap=0.1) # creating a gap between bins
high_victims_april = {'x': 1995, 'y': 2000, 'showarrow': True, 'arrowhead': 4, 'font': {'color': 'white'}, 'text': 'High Victim Counts'}
fig.update_layout({'annotations': [high_victims_april]}) #adding annontation
fig.update_layout(yaxis_title="Victim Count") #update y axis title
fig.show()

There is a huge number of deaths in April 1995 and analyze what the reason is.


In [None]:
#create new data frame that includes for the period Year is 1995 and Month is April
df_1995_april = df[(df['Year'] == 1995) & (df['Month'] == 'April')]
df_1995_april.head()


Scatter Plot to see what Weapon type caused such a spike during the period


In [None]:
fig = px.scatter(df_1995_april, x='Victim Age', y= 'Perpetrator Age', color='Weapon', size='Victim Count', template="plotly_dark")
fig.show()

In [None]:
df_1995_april_explosives = df_1995_april[(df_1995_april['Weapon'] == 'Explosives')] #filtering values which are explosives under weapon
df_1995_april_explosives.head()

A large number victims were caused by explosions and the age of the perperator is a 27 yar old Male and White in race who carried out this explosion.

Upon further investigation this was the 'Okahoma City Bombing' on Google search engine

https://www.fbi.gov/history/famous-cases/oklahoma-city-bombing

In [None]:
fig = px.scatter(df, x='Victim Count', y= 'Perpetrator Count', color='Weapon', size='Victim Count', hover_data=['City'], template="plotly_dark")
high_victims = {'x': 10, 'y': 2, 'showarrow': True, 'arrowhead': 4, 'font': {'color': 'white'}, 'text': 'High Victim Counts due to Explosions'}
fig.update_layout({'annotations': [high_victims]})
fig.show()

Bar Plot on Victim Age, Perpetrator Age

In [None]:
fig = px.box(data_frame=df, y='Perpetrator Age', color='Weapon', title='Perpetrator Age against Weapon Type',template="plotly_dark") #creating a box plot fpr perptrator age
fig.show()

In [None]:
fig = px.box(data_frame=df, y='Victim Age', color='Weapon', title='Vistim Age against Weapon Type', template="plotly_dark") #creating a box plot fpr victim age
fig.show()

Crime Solved Against Victim Race (Bar Plot )



In [None]:
fig = go.Figure()
fig = px.histogram(data_frame=df, x='Victim Race', color='Crime Solved', title='Crimes Solved by Victim Race', template="plotly_dark") #creating histogram for unsloved cases by race
fig.show()

We see that there is a high number in Black unsolved cases followed by white unsolved cases. Is this due to a racial reason?

In [None]:
fig = px.scatter(df, x='Victim Age', y= 'Perpetrator Age', color='Crime Solved', template = 'plotly_dark')
layout = go.Layout(
# Add a square shape around the area of the chart for high unsolved cases
    shapes=[
        {
            'type': 'rect',
            'x0': 14,
            'y0': 9,
            'x1': 32,
            'y1': 37,
            'line': {
                'color': 'white',
                'width': 3
            }
        }
    ]
)
fig.update_layout(layout)
fig.show()

High number of unsoved cases Victims between 16 to 30 and the peprators between 18 to 30 years in age.

In [None]:
df_victims_16_30 = df.loc[(df['Victim Age'] >= 16) & (df['Victim Age'] <= 30) & (df['Crime Solved'] == 'No')] # creating a data frame as per the victime age between 16 to 30
df_victims_16_30.head()

Creating a heatmap to visualize what weapon type caused a high rate in unsolved crime rate. Does the weapon type play a role in unsolving crimes.

In [None]:
# heat map for for the subset data frame to visualize the type of weapn across the years.
fig = px.density_heatmap(df_victims_16_30, 
                         x="Year", 
                         y="Weapon",
                         marginal_x="histogram", 
                         marginal_y="histogram", 
                         title='Heat Map of Weapon type used through the Years', 
                         template="plotly_dark")
fig.show()

A high number of murders in the year 1993 with the use of handguns.


In [None]:
fig = px.density_heatmap(df_victims_16_30, 
                         x="Perpetrator Sex", 
                         y="Weapon", 
                         facet_row="Crime Solved", 
                         title='Weapon Type vs Sex of Preptrator', 
                         template="plotly_dark")
fig.show()

In [None]:
fig = px.density_heatmap(df_victims_16_30, 
                         x="Perpetrator Sex", 
                         y="Weapon", 
                         facet_row="Victim Sex", 
                         facet_col="Month", 
                         template="plotly_dark")
fig.show()

## Checking Why These murders are not solvable

In [None]:
victim_trace = go.Histogram(x=df['Victim Age'], name='Victim Ages')

# Trace for the perpetrator ages
perpetrator_trace = go.Histogram(x=df['Perpetrator Age'], name='Perpetrator Ages')

# figure with both histogram traces
fig = go.Figure(data=[victim_trace, perpetrator_trace],
                layout=go.Layout(barmode='stack'))
#adding anontation
high_perpetrator_age = {'x': 0, 'y': 210000, 'showarrow': True, 'arrowhead': 4, 'font': {'color': 'white'}, 'text': 'High Perpetrator Age Count'}
fig.update_layout({'annotations': [high_perpetrator_age]})
fig.update_layout(template='plotly_dark')


fig.show()

Perperater value being 0 there is a high value. These values are quite large and there is no reason why 

In [None]:
df_Perperater_0 = df.loc[df['Perpetrator Age'] == 0]
df_Perperater_0.head()

In [None]:
fig = px.sunburst(df_Perperater_0, path=['Perpetrator Race', 'Perpetrator Ethnicity'], values = 'Perpetrator Count', template="plotly_dark")
fig.show()

In [None]:
df_filtered = df[df['Relationship'].isin(['Wife', 'Brother', 'Husband', 'Sister', 'Family','Father', 'Son', 'Mother','Daughter'])]
fig = px.scatter(df_filtered, x='Victim Age', 
                 y= 'Perpetrator Age', 
                 color='Relationship', 
                 size = 'Victim Count', 
                 template="plotly_dark",
                 title = 'Relationship among Immediate family Members between 1980 to 2014',
                 animation_frame = 'Year',
                 log_x=True,
                 log_y=True,
                 size_max=100,
                 )
fig.show()

## Checking if there is a correlation among the attributes

In [None]:
import plotly.graph_objects as go
import pandas as pd
import numpy as np

# Assuming 'df' is your DataFrame
# Select only numeric columns for correlation
numeric_df = df.select_dtypes(include=[np.number])

# Create a correlation matrix
correlation = numeric_df.corr(method='pearson')

# Set up the correlation plot
fig = go.Figure(go.Heatmap(
    z=correlation.values,
    x=correlation.columns,
    y=correlation.columns,
    colorscale='rainbow', 
    zmin=-1, zmax=1))

fig.update_layout(template='plotly_dark')

# Show the plot
fig.show()


creating a scatter plot based on the above metnioned corelation

In [None]:
fig = px.scatter(data_frame=df,x='Victim Age',y='Perpetrator Age',color='Crime Solved', template="plotly_dark")
fig.show()


Based on the above visulalization, many unsolved cases are between 16 to 30 age groups

In [None]:
fig = px.scatter(data_frame=df,x='Victim Count',y='Perpetrator Count',color='Crime Solved', template="plotly_dark")
fig.show()


Identfying Trends in weapon type used by perpetrators

In [None]:
fig = go.Figure()
df_filtered = df[df['Weapon'].isin(['Poison', 'Drowning'])]
fig = px.scatter(df_filtered, 
                 x='Victim Age', 
                 y= 'Perpetrator Age', 
                 color='Year', 
                 size='Victim Count', 
                 animation_frame='Weapon', 
                 hover_data=['Relationship', 'Month'], 
                 title='Trends in Poison and Drowning Homicides Over the Years')
fig.update_layout(template='plotly_dark')
fig.show()

In [None]:
fig = px.scatter(df, x='Perpetrator Age',
                 y='Age Difference', 
                 color='Crime Solved', 
                 size='Victim Count', 
                 template="plotly_dark",
                 title = 'Assessing the Age Gap Between Perpetrators and Victims in Homicide Cases')

# Add button to filter data by whether the crime was solved or not
fig.update_layout(updatemenus=[dict(buttons=list([
    dict(label='All',
         method='update',
         args=[{'visible': [True, True, True, True, True, True, True]},
               {'title': 'Victim Age vs Perpetrator Age'}]),
    dict(label='Yes',
         method='update',
         args=[{'visible': [True, False, True, True, False, False, True]},
               {'title': 'Victim Age vs Perpetrator Age (Crime Solved = Yes)'}]),
    dict(label='No',
         method='update',
         args=[{'visible': [False, True, False, False, True, True, False]},
               {'title': 'Victim Age vs Perpetrator Age (Crime Solved = No)'}])
]))])
fig.show()

There are no known perpetrators above the age 58 to 98 years. 
