In [None]:
from google.colab import drive
drive.mount('/content/drive')

In [None]:
import matplotlib.pyplot as plt
import seaborn as sns
import pandas as pd
from google.colab import data_table

data_table.disable_dataframe_formatter()

In [None]:
# read data
df = pd.read_csv('/content/drive/MyDrive/BD & AI/week14/Data/Data.csv')

In [None]:
# show data 
df.head()

In [None]:
# No. of rows and columns
df.shape

In [None]:
# some information about data
df.info()

In [None]:
# basic descriptive statistics for numerical features 
df.describe()

In [None]:
# basic descriptive statistics for categorical features
df.describe(include='O')

In [None]:
df.corr().style.background_gradient(cmap="Spectral")

## Percentage of the number of entries in each year

In [None]:
plt.figure(figsize=(15,7))
colors=['#832b5a','#a42a5b','#c33e56','#d76154','#e0876a','#e7aa8b','#edcbb4']
plt.pie(df['Year'].value_counts(), labels = ['2016','2017','2018','2019','2020','2021','2022'],colors=colors , autopct='%.0f%%')
plt.title('Percentage of the number of entries in each year')
plt.show()

 This pie chart shows the number of recorded crimes in each year. Note that the first four years have almost the same percentage 16%, while the 20th and 21st years have the lowest percentage with 11% and 13% respectively. The least year in which the number of crimes was recorded is this year 2022, because the number of crimes was recorded only until the month of October. 

## Data distribution ratio in the day and night.

In [None]:
plt.figure(figsize=(15,7))
plt.pie(df['AM/PM'].value_counts(), labels = ['PM','AM'], colors=['#9f395c','#ecc4ac'] ,autopct='%.0f%%')
plt.title('AM/PM')
plt.show()

This is pie chart shows whether crimes occurred in the morning or evening. It is clear most of the crimes occurred in the evening, from 12 noon until 12 midnight.

## Crimes for which the convicts have been arrested

In [None]:
plt.figure(figsize=(15,7))
plt.pie(df['Arrest'].value_counts(), labels =['No','Yes'],colors=['#9f395c','#ecc4ac'] , autopct='%.0f%%')
plt.title('An Arrest was Made')
plt.show()

This is a pie chart showing whether the crimes were domestic-related as defined by the Illinois Domestic Violence Act or not. the meaning of domestic crimes is any crimes related to the home or household member like any harasses, or interferes with the personal liberty. Domestic violence cases accounted for 18% of all crime offenses in 7 years. That's high percentage for a place like home that should be safe and secure.

## Domestic crimes under the Illinois Domestic Violence Act


In [None]:
plt.figure(figsize=(15,7))
plt.pie(df['Domestic'].value_counts(), labels = ['No','Yes'],colors=['#9f395c','#ecc4ac'] ,autopct='%.0f%%')
plt.title('Domestic crimes under the Illinois Domestic Violence Act')
plt.show()

This is a pie chart showing whether the convicts were arrested or not. More than 80% of convicts are not arrested. These crimes may be either violations, assault, theft, or other matters that exceed the seriousness of the offender if he is free.

## The Most Frequent Primary Type



In [None]:
Primary_Type = df.groupby(['Primary Type']).size().sort_values(ascending=False)
plt.figure(figsize=(15,6))
sns.barplot(x=Primary_Type[:10].index, y=Primary_Type[:10],palette="rocket")
plt.xticks(rotation=70)
plt.xlabel('Primary Type')
plt.ylabel('Frequent')
plt.title('Most Frequent Primary Type')
plt.show()

This bar graph shows the most common primary type of crimes. Theft crimes were the largest, as they occurred more than 350,000 times in just seven years. Note also that stolen vehicles are in a different category and have occurred over 75,000 times. This is followed by assault crimes. Note that Assault and Battery are all assaults in different forms, occurred approximately 440,000 times.

## The Most Frequent Location

In [None]:
Location_Description = df.groupby(['Location Description']).size().sort_values(ascending=False)
plt.figure(figsize=(15,6))
sns.barplot(x=Location_Description[:10].index, y=Location_Description[:10],palette="rocket")
plt.xticks(rotation=70)
plt.xlabel('Location')
plt.ylabel('Frequent')
plt.title('Most Frequent Location')

This bar graph shows where crimes occur. Note that streets, sidewalks and alleys recorded the highest crime rate, with more than 500,000. This is followed by residence and apartments, and finally small retail store and restaurants with a crime rate of less than 70,000.

## The crime update rate

In [None]:
Description = df.groupby(['Case Number']).size().sort_values(ascending=False)
plt.figure(figsize=(10,6))

sns.barplot(x=Description[:10], y=Description[:10].index,palette="rocket")
plt.xticks(rotation=70)
plt.xlabel('No. of Update')
plt.ylabel('Case Number')
plt.title('The crime update rate')

##The Most FBI Offices that Have the Highest Number of Crime Reports

In [None]:
FBI_Code= df.groupby(['FBI Code']).size().sort_values(ascending=False)
plt.figure(figsize=(10,6))
sns.barplot(x=FBI_Code[:10], y=FBI_Code[:10].index,palette="rocket")
plt.xticks(rotation=70)
plt.xlabel('Frequent')
plt.ylabel('FBI Offices code')
plt.title('The Most FBI Offices that Have the Highest Number of Crime Reports')

This bar graph shows which FBI offices have the highest number of crime reports. Office 06 has received over 350,000 offenses over the past seven years while Office 08B has received over 250,000 offenses. These offices may need to step up their efforts and impose stricter laws to reduce potential crime. 

## The distribution of crime incidence during the hours of the day in each block.

In [None]:
block=df['Block'].value_counts().keys()[:10].tolist()
df1=df[df['Block'].isin(block)]
plt.figure(figsize=(16,8))
sns.boxplot(x='Hour', y='Block', data = df1,palette="rocket")
plt.title(' The distribution of crime incidence during the hours of the day in each block')
plt.xlabel('Hour')
plt.xticks(rotation=70)
plt.show()

This box graph shows the distribution of crime incidence during the hours of the day in the most common crime block. Note that most of the crime on the blocks occurs during daylight hours from 11am to 3pm, which is a short period of time. With the exception of boxes 0064 and 0000, the period doubles and thus the crime rate doubles.

## The distribution of crime incidence during the hours of the day by the most common type of crimes.

In [None]:
Primary_Type=df['Primary Type'].value_counts().keys()[:10].tolist()
df1=df[df['Primary Type'].isin(Primary_Type)]
plt.figure(figsize=(16,8))
sns.boxplot(x='Hour', y='Primary Type', data = df1,palette="rocket")
plt.title('The distribution of crime incidence during the hours of the day by the most common type of crimes')
plt.ylabel('Primary Type')
plt.xlabel('Hour')
plt.xticks(rotation=70)
plt.show()

This box graph shows the distribution of crime incidence during the hours of the day by the most common type of crime. We note that most types of crimes occur during daylight hours from 6 am to 6 pm.