# Analysis of Crimes reported in Chicago, New York and United States

# Author: Brijesh Taunk

# Primary dataset summary

The dataset gives information about the crimes committed in Chicago City from 2001 to present. It has columns that gives important information like the date of the crime, primary type of the crime (for example: theft, assault, robbery), description of the crime( pocket picking, retail theft), location description where the crime took place (like residence, bus etc.), whether any arrest took place in the case, the location information like district, ward, X and Y coordinates, the year it took place on, latitude and longitude of the location. It also has identity information like case number and ID. It contains a column which gives information whether the crime was Domestic or not.The dataset is regularly updated and it contains a columns which provides information about the time a particular case was last updated. The dataset has a column called "Beat" which basically provides information about an area that a particular police officer controls at that time. It has a total of 22 columns which provides all the information related to crime. The data is in CSV format. 

Now, we will perform necessary operations on dataset to make an interactive dashboard.

In [1]:
#importing necessary libraries
import ipywidgets
import matplotlib.pyplot as plt
import matplotlib
import bqplot
%matplotlib inline
import bqplot.pyplot
import numpy as np
import pandas as pd
from PIL import Image

In [None]:
# loading the data through relative path
chicago_crime = pd.read_csv('https://data.cityofchicago.org/api/views/ijzp-q8t2/rows.csv?accessType=DOWNLOAD')

In [None]:
# reading the dataset
chicago_crime

In [None]:
# Cleaning the data and dropping null values
chicago_crime_new = chicago_crime.dropna()

In [None]:
#replacing the True and False values in Arrest column by 'Y' and 'N'

#Using numpy to replace the True and False values under Arrest column by 'Y' and 'N'
chicago_crime_new['Arrest'] = np.where(chicago_crime_new['Arrest'], 'Y', 'N')
chicago_crime_new

In [None]:
#creating the datafram using pivot table in pandas
pivot_data = pd.pivot_table(chicago_crime_new, index = 'Primary Type', columns = 'Location Description', values = 'ID', aggfunc = 'count', fill_value=0)

In [None]:
#reading the new dataframe
pivot_data

In [None]:
# defining the scales, setting them as ordinal and linear
x_scale = bqplot.OrdinalScale()
y_scale = bqplot.LinearScale()

# setting the axes, x and y and assigning labels and scales
x_axis = bqplot.Axis(label='Arrest Made or Not', scale=x_scale)
y_axis = bqplot.Axis(label='Count', scale=y_scale, 
                   orientation = 'vertical')

# creating the barplot mark and assigning scales
plot = bqplot.pyplot.bar(x = ['Y', 'N'], 
                         y = [0, 0],
                         scales={'x':x_scale, 'y':y_scale})

graph = bqplot.Figure(marks=[plot], axes=[x_axis,y_axis])

In [None]:
# defining the color scale for heatmap. Using NumPy nanmin and nanmax to calculate min and max for dataset
col_sc = bqplot.ColorScale(scheme='Red', min=float(np.nanmin(pivot_data)), max=float(np.nanmax(pivot_data)))
x_sc = bqplot.OrdinalScale()
y_sc = bqplot.OrdinalScale() 

#define the axis for our heatmap and assigning the labels and scales 
col_ax = bqplot.ColorAxis(scale=col_sc, orientation='vertical', side='right')
x_ax = bqplot.Axis(scale=x_sc, label='Primary Type')
y_ax = bqplot.Axis(scale=y_sc, orientation='vertical', label='Location Description')

In [None]:
# Creating the heat map   
heat_map = bqplot.GridHeatMap(color=pivot_data.values, 
                             row = pivot_data.index, column = pivot_data.columns, 
                             scales={'color':col_sc, 'row':y_sc, 'column':x_sc}, 
                             interactions={'click':'select'}, 
                             anchor_style={'fill':'blue'})

In [None]:
#passing the parameters x and y from plot function
def barplot(x, y):
    print('X: ' + x)
    print('Y: ' + y)
    df = chicago_crime_new[(chicago_crime_new['Primary Type'] == x) & (chicago_crime_new['Location Description'] == y)]
    # Counting the groupby of the Arrest column data with case ID
    df = df.groupby('Arrest')['ID'].count()
    return df

# Interactive Visualization using Primary dataset

In [None]:
# creating a mySelectedLabel for the value selected by the user
mySelectedLabel = ipywidgets.Label()

def on_select(change):
     # setting up a constraint to choose only one box
    if len(change['owner'].selected) == 1: 
        i,j = change['owner'].selected[0]
        v = pivot_data.values[i,j]
        dataset_latest = barplot(pivot_data.index[i], pivot_data.columns[j])
        print(dataset_latest)
        plot.x = dataset_latest.index.to_list()
        plot.y = dataset_latest.to_list()
        
heat_map.observe(on_select, 'selected')

heatmap_figure = bqplot.Figure(marks=[heat_map], axes=[col_ax, x_ax, y_ax], fig_margin={'top':0, 'bottom':100, 'left':250, 'right':100})

heatmap_figure.layout.min_width = '1200px'
graph.layout.min_width = '1200px'
myDashboard1 = ipywidgets.HBox([heatmap_figure, graph])
myDashboard = ipywidgets.VBox([mySelectedLabel, myDashboard1])
myDashboard

The above interactive visualization and dashboard presents a heat map to us. It contains two columns in the heat map, first is 'Primary Type', which shows the type of crime committed, and 'Location Description', which shows the category of place where crime was committed. Whenever we click on any cell in heat map, a bar plot will be created on the right side, which will show how many arrests have been committed for the total number of crimes committed, and how many number of cases are there in which no arrest was made. The bar plot generated on the right uses another column 'Arrest' from the dataset. It shows 'Yes' or 'No' for arrests made in any particular case. The bar plot generated has Count on the Y-axis, and 'Yes' and 'No', for depicting arrests made, on the X-axis.

# Second Visualization using Primary dataset

In [None]:
#Using GroupBy and count on Primary Type column
crime_type = chicago_crime_new.groupby('Primary Type')['Primary Type'].count()

In [None]:
crime_type

In [None]:
crime_type.plot.bar(figsize = (25, 15), xlabel= "Primary Type", ylabel = "Count", title = "Different types of Crime committed")

The above bar chart depicts the crimes occured in Chicago City from 2001 to 2021. It shows crime types and number of times they were committed. The number is shown in Millions. X axis shows the 'Crime Type' and Y axis shows the number of times a particular crime was committed. We can see that 'theft' was the most committed crime in the city. 'Battery' a type of assault, was the second highest type of crime committed. Apart from that 'Criminal damage' and 'Narcotics' are also committed on major basis. We can also see that number of cases of 'Kidnapping', 'license violation', 'Intimidation' and some few more crimes are low, when compared to 'Theft' and 'Assault'.

# Contextual Visualization 1

In [None]:
viz1 = Image.open("viz1.png") 

In [None]:
viz1

Above we have a visualization that depicts different types of crime rate in United States from 1991 to 2017. It is stated in form of number of cases per 100,000 people. My original dataset also contains crime cases of Chicago. The above visualization contains line charts of different crimes such as Robbery rate, burglary rate, aggravated assault, violent crime rate, larceny-theft rate, motor vehicle theft rate, property crime rate, and murder rate. We can see that every type of crime rate has dropped more than half of what the rate was in 1991. We can see that ‘Property Crime rate’ is very high than other crimes. Similar to our analysis of crimes done in Chicago’s crime primary dataset, we can also see a higher rate of Theft and assault, that shows that theft and assault consists of the major number of crimes in US.

# Contextual Visualization 2

In [None]:
viz2 = Image.open("viz2.png")

In [None]:
viz2

In the above bar chart, we have the visualization of crime cases of New York City from 2006 to 2019.  I have taken this visualization of New York City to compare the crime pattern with Chicago city. In graph, X axis shows the count of type of crime, while Y axis shows the different types of crimes reported in city. We can see that Larceny is the most committed crime in New York too. In my analysis of Chicago crime data, theft was the most frequently committed crime. Assault, and drugs are the other most reported crimes in New York. The same pattern was found in Chicago city in my analysis. We can conclude that there are major similarities in types of crimes committed in Chicago, and New York city.

References: My Final Project Part 2. 

Link to dataset: Publisher data.cityofchicago.org. (2022, May 5). Crimes - 2001 to present. Crimes - 2001 to Present - CKAN. Retrieved May 6, 2022, from https://catalog.data.gov/dataset/crimes-2001-to-present 

Citation of my first contextual visualization: Routley, N. (2019, March 12). The Crime Rate Perception Gap. Visual Capitalist. Retrieved May 6, 2022, from https://www.visualcapitalist.com/crime-rate-perception-gap/ 

Citation of my first contextual visualization: Mendes, B. (2021, February 27). Analysis of NYC reported crime data using pandas. Medium. Retrieved May 6, 2022, from https://towardsdatascience.com/analysis-of-nyc-reported-crime-data-using-pandas-821753cd7e22 