### NAME :- ROHAN KUMAR
### REGISTRATION No. :- 12016504
### SECTION :- K20KT
### ROLL No. :- RK20KTB59

## CRIMES IN CHICAGO 2012-2017

#### CONTEXT

This dataset reflects reported incidents of crime (with the exception of murders where data exists for each victim) that occurred in the City of Chicago from 2012 to 2017. Data is extracted from the Chicago Police Department's CLEAR (Citizen Law Enforcement Analysis and Reporting) system. In order to protect the privacy of crime victims, addresses are shown at the block level only and specific locations are not identified.

Disclaimer: These crimes may be based upon preliminary information supplied to the Police Department by the reporting parties that have not been verified. The preliminary crime classifications may be changed at a later date based upon additional investigation and there is always the possibility of mechanical or human error. Therefore, the Chicago Police Department does not guarantee (either expressed or implied) the accuracy, completeness, timeliness, or correct sequencing of the information and the information should not be used for comparison purposes over time. The Chicago Police Department will not be responsible for any error or omission, or for the use of, or the results obtained from the use of this information. All data visualizations on maps should be considered approximate and attempts to derive specific addresses are strictly prohibited.
The Chicago Police Department is not responsible for the content of any off-site pages that are referenced by or that reference this web page other than an official City of Chicago or Chicago Police Department web page. The user specifically acknowledges that the Chicago Police Department is not responsible for any defamatory, offensive, misleading, or illegal conduct of other users, links, or third parties and that the risk of injury from the foregoing rests entirely with the user. The unauthorized use of the words "Chicago Police Department," "Chicago Police," or any colorable imitation of these words or the unauthorized use of the Chicago Police Department logo is unlawful. This web page does not, in any way, authorize such use. Data are updated daily. The dataset contains more than 10,00,000 records/rows of data and cannot be viewed in full in Microsoft Excel. To access a list of Chicago Police Department - Illinois Uniform Crime Reporting (IUCR) codes, go to http://data.cityofchicago.org/Public-Safety/Chicago-Police-Department-Illinois-Uniform-Crime-R/c7ck-438e

#### CONTENT

ID           - Unique identifier for the record.

Case Number  - The Chicago Police Department RD Number (Records Division Number), which is unique to the incident.

Date         - Date when the incident occurred. this is sometimes a best estimate.

Block        - The partially redacted address where the incident occurred, placing it on the same block as the actual address.

IUCR         - The Illinois Unifrom Crime Reporting code. This is directly linked to the Primary Type and Description. See the                  list of IUCR codes at https://data.cityofchicago.org/d/c7ck-438e.

Primary Type - The primary description of the IUCR code.

Description  - The secondary description of the IUCR code, a subcategory of the primary description.

Location Description - Description of the location where the incident occurred.

Arrest - Indicates whether an arrest was made.

Domestic - Indicates whether the incident was domestic-related as defined by the Illinois Domestic Violence Act.

Beat - Indicates the beat where the incident occurred. A beat is the smallest police geographic area – each beat has a dedicated police beat car. Three to five beats make up a police sector, and three sectors make up a police district. The Chicago Police Department has 22 police districts. See the beats at https://data.cityofchicago.org/d/aerh-rz74.

District - Indicates the police district where the incident occurred. See the districts at https://data.cityofchicago.org/d/fthy-xz3r.

Ward - The ward (City Council district) where the incident occurred. See the wards at https://data.cityofchicago.org/d/sp34-6z76.

Community Area - Indicates the community area where the incident occurred. Chicago has 77 community areas. See the community areas at https://data.cityofchicago.org/d/cauq-8yn6.

FBI Code - Indicates the crime classification as outlined in the FBI's National Incident-Based Reporting System (NIBRS). See the Chicago Police Department listing of these classifications at http://gis.chicagopolice.org/clearmap_crime_sums/crime_types.html.

X Coordinate - The x coordinate of the location where the incident occurred in State Plane Illinois East NAD 1983 projection. This location is shifted from the actual location for partial redaction but falls on the same block.

Y Coordinate - The y coordinate of the location where the incident occurred in State Plane Illinois East NAD 1983 projection. This location is shifted from the actual location for partial redaction but falls on the same block.

Year - Year the incident occurred.

Updated On - Date and time the record was last updated.

Latitude - The latitude of the location where the incident occurred. This location is shifted from the actual location for partial redaction but falls on the same block.

Longitude - The longitude of the location where the incident occurred. This location is shifted from the actual location for partial redaction but falls on the same block.

Location - The location where the incident occurred in a format that allows for creation of maps and other geographic operations on this data portal. This location is shifted from the actual location for partial redaction but falls on the same block.

#### IMPORTING LIBRARIES

In [None]:
import pandas as pd
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt
import matplotlib.ticker as mtick
from matplotlib.pyplot import figure
import plotly.graph_objects as go
import plotly.offline as py
import plotly.express as px
from datetime import datetime

import warnings
warnings.filterwarnings('ignore')

#### IMPORTING DATASET

In [None]:
crime_df = pd.read_csv('Chicago_Crimes_2012_to_2017.csv')

## DATA CLEANING

In [None]:
crime_df.head()

In [None]:
crime_df.tail()

In [None]:
crime_df.shape

#### My dataset consists of 1456714 ROWS and 23 COLUMNS

In [None]:
crime_df.info()

#### It consist of 4 columns of data type INT64, 10 columns of data type OBJECT, 7 columns of FLOAT64, and 2 columns of BOOL

#### DELETING THE UNNAMED COLUMN

In [None]:
crime_df.columns.str.match("Unnamed")

In [None]:
crime_df.loc[:,~crime_df.columns.str.match("Unnamed")]

In [None]:
crime_df.drop("Unnamed: 0",axis=1)

### Missing Value Analysis

In [None]:
missing_values = crime_df.isnull().sum().sort_values(ascending = False)
missing_values

In [None]:
percentage_missing_values = (crime_df.isnull().sum() * 100 / len(crime_df)).round(2).sort_values(ascending = False)
percentage_missing_values

In [None]:
missing_data = pd.concat([missing_values, percentage_missing_values], axis = 1, keys = ['Total', 'Percent'])
missing_data

In [None]:
f, ax = plt.subplots(figsize = (15, 5))
plt.xticks(font = "Times new roman", fontsize = 15, rotation = '45')
cplot = sns.barplot(x = missing_data.index, y = missing_data['Percent'], palette = 'GnBu_r', lw = 2, ec = 'Black')
cplot.set_title('Missing Value Inspection', font = 'Algerian', fontsize = 25, weight = 'bold')
cplot.set_ylabel('Count', font = 'Algerian', fontsize = 20)
cplot.set_xlabel('Features', font = 'Algerian', fontsize = 20);

### Duplicate Values Analysis (In progress)

In [None]:
#if (len(crime_df[crime_df.duplicated()]) > 0):
#    print('No. of duplicated entries: ', len(crime_df[crime_df.duplicated()]))
#    print(crime_df[crime_df.duplicated(keep = False)].sort_values(by = list(crime_df.columns)).head())
#else:
#    print('No duplicated entries found.')

In [None]:
crime_df.nunique().to_frame(name = 'Unique Values')

#### Here we can see that there are 1456598 Unique Case Numbers and as we know that our dataset consists of 1456714 rows in total. That means there are 116 duplicate values in column 'Case Number'.

### Checking whether all the data in the row of those duplicate values of column "Case Number" are same or not.

In [None]:
crime_df['Case Number'].value_counts()

In [None]:
crime_df.loc[crime_df['Case Number'] == 'HZ140230']

In [None]:
crime_df.loc[crime_df['Case Number'] == 'HZ554936']

In [None]:
crime_df.loc[crime_df['Case Number'] == 'HY346207']

In [None]:
crime_df.loc[crime_df['Case Number'] == 'HZ403466']

In [None]:
crime_df.loc[crime_df['Case Number'] == 'HV217424']

#### From above analysis we can say that, Yes all the data in rows are same .
#### That means now we can delete these rows.

## Deleting the duplicate rows

In [None]:
crime_df.drop_duplicates(subset='Case Number', keep="first")