<a href="https://colab.research.google.com/github/bradleymclellan/stc510/blob/main/Python_Transformations_Basic.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

The following script analyzes a dataset containing crime incidents in Phoenix, AZ. It begins by importing the required libraries, such as pandas, seaborn, and matplotlib. Afterward, the script downloads and loads the crime data from an URL into a panda data frame.

Next, the script explores the data from different perspectives by creating a data frame to contain the UCR Crime Categories and the number of crimes associated with each category. It then groups the data by the number of crimes that occurred since 2015 and plots the trending year-over-year crime data using a line plot.

The script then defines lists of violent and non-violent crime categories. First, it creates a new column in the dataframe to categorize each crime incident as either violent or non-violent. It then groups the crime incidents by crime type and counts the number of incidents in each group, plotting the results using a color-coded bar chart.

Next, the script converts the 'OCCURRED ON' column to datetime format, filters the data only to include the past 30 days, groups the data by crimes by the premise, and counts the number of crimes in each group. It then plots the results using a bar chart.

The script goes on to group the data by crime type that occurred in the last 30 days, count the number of incidents, and plot the results using a bar chart. Then, the script groups the data by zip code to see the prevalence of crime by location. Finally, it sorts the grouped data by the count of crimes by zip code and plots the highest crime areas using a bar chart.


In [218]:
# Import the required libraries
import pandas as pd
import matplotlib.pyplot as plt
import plotly.graph_objects as go
from plotly.subplots import make_subplots
from datetime import datetime, timedelta
import datetime as dt
import seaborn as sns
import urllib.request
import requests
import io

In [219]:
# Load the crime data from the URL into a pandas dataframe
url = 'https://www.phoenixopendata.com/dataset/cc08aace-9ca9-467f-b6c1-f0879ab1a358/resource/0ce3411a-2fc6-4302-a33f-167f68608a20/download/crimestat.csv'
response = requests.get(url)
crime_data = pd.read_csv(io.StringIO(response.text))


Columns (0) have mixed types.Specify dtype option on import or set low_memory=False.



In [220]:
# Create a dataframe that contains the UCR Crime Categories and the number of crimes associated with each category
Number_crimes = crime_data['UCR CRIME CATEGORY'].value_counts()
values = Number_crimes.values
categories = pd.DataFrame(data=Number_crimes.index, columns=["UCR CRIME CATEGORY"])
categories['values'] = values

In [None]:
# Group the data by number of crimes ocurring since 2015 fowards
crime_data['OCCURRED ON'] = pd.to_datetime(crime_data['OCCURRED ON'])
crime_by_date = crime_data.groupby(crime_data['OCCURRED ON'].dt.date).size().reset_index(name='counts')
crime_by_date.set_index('OCCURRED ON', inplace=True)

# Plot the trending year over year crime data 
fig, ax = plt.subplots(figsize=(15,8))
crime_by_date.plot(kind='line', ax=ax)
ax.set(xlabel='Date', ylabel='Number of Crimes', title='Number of Crimes Over Time')
plt.show()

In [None]:
# Define lists of violent and non-violent crime categories
violent_crimes = ['HOMICIDE', 'RAPE', 'ROBBERY', 'AGGRAVATED ASSAULT']
non_violent_crimes = ['BURGLARY', 'THEFT', 'MOTOR VEHICLE THEFT', 'ARSON', 'DRUG OFFENSE']

# Create a new column in the dataframe to categorize each crime incident as violent or non-violent
crime_data['CRIME TYPE'] = 'VIOLENT'
crime_data.loc[crime_data['UCR CRIME CATEGORY'].isin(non_violent_crimes), 'CRIME TYPE'] = 'NON-VIOLENT'

# Group the crime incidents by crime type and count the number of incidents in each group
crime_grouped = crime_data.groupby('CRIME TYPE').count()['INC NUMBER']

# Plot the crime incidents grouped by crime type using a color coded bar chart
fig = go.Figure(data=[go.Bar(x=crime_grouped.index, y=crime_grouped.values, marker_color=['red', 'blue'])])
fig.update_layout(title_text='Non-Violent vs. Violent Crime Incidents Over Time')
fig.show()

In [223]:
# Convert the 'OCCURRED ON' column to datetime format
crime_data['OCCURRED ON'] = pd.to_datetime(crime_data['OCCURRED ON'])

In [224]:
# Filter data to only include the past 30 days
recent_crimes = crime_data[crime_data['OCCURRED ON'] >= (dt.datetime.now() - dt.timedelta(days=30))]

In [None]:
# Group the data by crimes by premise and count the number of crimes in each group
fig, ax = plt.subplots(figsize=(20, 10))
premise_group = recent_crimes.groupby('PREMISE TYPE')['INC NUMBER'].count().reset_index()
premise_group = premise_group.sort_values(by='INC NUMBER', ascending=False)

# Plot the crimes grouped by premise type
ax.bar(premise_group['PREMISE TYPE'], premise_group['INC NUMBER'], color='red')
ax.set_xlabel('Premise Type')
ax.set_ylabel('Number of Crimes')
ax.set_title('Crimes by Premise in the past 30 days')
plt.xticks(rotation=90)
plt.tight_layout()
plt.show()

In [None]:
# Group the data by crime type and count in the last 30 days
crime_type = recent_crimes.groupby('UCR CRIME CATEGORY').size().reset_index(name='counts')
crime_type = crime_type.sort_values(by='counts', ascending=False)
crime_type = crime_type.head(10)
crime_type.plot.bar(x='UCR CRIME CATEGORY', y='counts', rot=90)

# Plot the data using a bar chart
plt.title('Top Crime Types in the past 30 days')
plt.xlabel('Crime Type')
plt.ylabel('Number of Crimes')
plt.show()

In [None]:
# Group the data by zip code and crime count in the last 30 days
crime_trends = recent_crimes.groupby("ZIP").size().reset_index(name='counts')

# Sort the grouped data by the count of crimes in each zip code
crime_trends.sort_values(by='counts', ascending=False, inplace=True)

# Plot the data using a bar chart
fig, ax = plt.subplots(figsize=(15,8))
ax.bar(crime_trends['ZIP'], crime_trends['counts'])
ax.set_title('Crime Rates by Zip Code in the Past 30 Days')
ax.set_xlabel('Zip Code')
ax.set_ylabel('Number of Crimes')
plt.show()

In [None]:
# Group the data by zip code to see which zip codes have the highest number of crimes.
crime_by_zip = recent_crimes.groupby('ZIP').size().reset_index(name='counts')
crime_by_zip.sort_values(by='counts', ascending=False, inplace=True)

# Plot the data using a bar chart
fig, ax = plt.subplots(figsize=(15,8))
sns.barplot(x='ZIP', y='counts', data=crime_by_zip.head(10), palette='Blues_d', ax=ax)
ax.set_title('Top Crime Rates by Zip Code in the Past 30 Days')
ax.set_xlabel('Zip Code')
ax.set_ylabel('Number of Crimes')
plt.show()