# Brainstation
# Data Science
# Author: Viktor Stepanyuk
# Date: Oct 10,2023
# Chicago Crimes 
# Part 1 Data Cleaning

**Introduction:** This dataset represents recorded criminal incidents (excluding murders with available victim data) that took place in Chicago from 2001 to the present, excluding the last week. The information is sourced from the Chicago Police Department's CLEAR (Citizen Law Enforcement Analysis and Reporting) system 

# Table of Contents
- [Preliminary Data Exploration](#Preliminary-Data-Exploration)
- [Targeted Data Frame](#Targeted-Data-Frame)
- [Data Dictionary](#Data-Dictionary)
- [Checking for Duplicate Data](#Checking-for-Duplicate-Data)
- [Dealing with Missing Data](#Dealing-with-Missing-Data)
- [Saving the Data](#Saving-the-Data)

## Preliminary data exploration <a id='Preliminary-Data-Exploration'></a>

In [1]:
# import relevant packages
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import statsmodels.api as sm
from scipy import stats
from scipy.stats import norm

In [4]:
# reading file 
initial_df = pd.read_csv('/Users/vikst/Desktop/data/Crimes_-_2001_to_Present.csv')

In [5]:
# sanity check
initial_df.head()

Unnamed: 0,ID,Case Number,Date,Block,IUCR,Primary Type,Description,Location Description,Arrest,Domestic,...,Ward,Community Area,FBI Code,X Coordinate,Y Coordinate,Year,Updated On,Latitude,Longitude,Location
0,5741943,HN549294,08/25/2007 09:22:18 AM,074XX N ROGERS AVE,560,ASSAULT,SIMPLE,OTHER,False,False,...,49.0,1.0,08A,,,2007,08/17/2015 03:03:40 PM,,,
1,1930689,HH109118,01/05/2002 09:24:00 PM,007XX E 103 ST,820,THEFT,$500 AND UNDER,GAS STATION,True,False,...,,,06,,,2002,02/04/2016 06:33:39 AM,,,
2,13203321,JG415333,09/06/2023 05:00:00 PM,002XX N Wells st,1320,CRIMINAL DAMAGE,TO VEHICLE,PARKING LOT / GARAGE (NON RESIDENTIAL),False,False,...,42.0,32.0,14,,,2023,09/14/2023 03:43:09 PM,,,
3,13210088,JG423627,08/31/2023 12:00:00 PM,023XX W JACKSON BLVD,1153,DECEPTIVE PRACTICE,FINANCIAL IDENTITY THEFT OVER $ 300,STREET,False,False,...,27.0,28.0,11,1160870.0,1898642.0,2023,09/16/2023 03:41:56 PM,41.877565,-87.684791,"(41.877565108, -87.68479102)"
4,13210004,JG422532,07/24/2023 09:45:00 PM,073XX S JEFFERY BLVD,281,CRIMINAL SEXUAL ASSAULT,NON-AGGRAVATED,APARTMENT,False,False,...,7.0,43.0,02,1190812.0,1856743.0,2023,09/16/2023 03:41:56 PM,41.761919,-87.576209,"(41.7619185, -87.576209245)"


In [7]:
#gitting shape of data  
initial_df.shape

(7898456, 22)

In [8]:
print("The original data has {} rows". format(initial_df.shape[0]))
print("The original data has {} columns".format(initial_df.shape[1]))

The original data has 7898456 rows
The original data has 22 columns


In [9]:
## Quickly check the data types
initial_df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 7898456 entries, 0 to 7898455
Data columns (total 22 columns):
 #   Column                Dtype  
---  ------                -----  
 0   ID                    int64  
 1   Case Number           object 
 2   Date                  object 
 3   Block                 object 
 4   IUCR                  object 
 5   Primary Type          object 
 6   Description           object 
 7   Location Description  object 
 8   Arrest                bool   
 9   Domestic              bool   
 10  Beat                  int64  
 11  District              float64
 12  Ward                  float64
 13  Community Area        float64
 14  FBI Code              object 
 15  X Coordinate          float64
 16  Y Coordinate          float64
 17  Year                  int64  
 18  Updated On            object 
 19  Latitude              float64
 20  Longitude             float64
 21  Location              object 
dtypes: bool(2), float64(7), int64(3), object(1

In [10]:
# Getting genearl inderstanding of number of crimes per year
initial_df['Year'].value_counts()

2002    486807
2001    485888
2003    475985
2004    469423
2005    453774
2006    448178
2007    437090
2008    427184
2009    392827
2010    370517
2011    351993
2012    336321
2013    307540
2014    275792
2016    269834
2017    269109
2018    268912
2015    264793
2019    261352
2022    238945
2020    212218
2021    208870
2023    185104
Name: Year, dtype: int64

## Targeted data frame <a id ='Targeted-Data-Frame'></a>

Creating a new dataframe, and getting information only from 2021 to present.

In [11]:
df = initial_df[initial_df['Year']>=2021]
# Reseting index
df.reset_index(drop=True, inplace=True)

In [12]:
df.head()

Unnamed: 0,ID,Case Number,Date,Block,IUCR,Primary Type,Description,Location Description,Arrest,Domestic,...,Ward,Community Area,FBI Code,X Coordinate,Y Coordinate,Year,Updated On,Latitude,Longitude,Location
0,13203321,JG415333,09/06/2023 05:00:00 PM,002XX N Wells st,1320,CRIMINAL DAMAGE,TO VEHICLE,PARKING LOT / GARAGE (NON RESIDENTIAL),False,False,...,42.0,32.0,14,,,2023,09/14/2023 03:43:09 PM,,,
1,13210088,JG423627,08/31/2023 12:00:00 PM,023XX W JACKSON BLVD,1153,DECEPTIVE PRACTICE,FINANCIAL IDENTITY THEFT OVER $ 300,STREET,False,False,...,27.0,28.0,11,1160870.0,1898642.0,2023,09/16/2023 03:41:56 PM,41.877565,-87.684791,"(41.877565108, -87.68479102)"
2,13210004,JG422532,07/24/2023 09:45:00 PM,073XX S JEFFERY BLVD,281,CRIMINAL SEXUAL ASSAULT,NON-AGGRAVATED,APARTMENT,False,False,...,7.0,43.0,2,1190812.0,1856743.0,2023,09/16/2023 03:41:56 PM,41.761919,-87.576209,"(41.7619185, -87.576209245)"
3,13210062,JG423596,08/27/2023 07:00:00 AM,034XX N LAWNDALE AVE,820,THEFT,$500 AND UNDER,APARTMENT,False,False,...,30.0,21.0,6,1151117.0,1922554.0,2023,09/16/2023 03:41:56 PM,41.943379,-87.719974,"(41.943378528, -87.7199738)"
4,13210107,JG411849,09/04/2023 09:30:00 PM,053XX S HOMAN AVE,1310,CRIMINAL DAMAGE,TO PROPERTY,RESIDENCE - GARAGE,False,False,...,14.0,63.0,14,1154617.0,1869046.0,2023,09/16/2023 03:41:56 PM,41.796477,-87.708541,"(41.796477414, -87.708540915)"


In [13]:
df.shape

(632919, 22)

In [11]:
print('Original data has {} rows '.format(initial_df.shape[0]))
print('Updatated dataframe has {} row'.format(df.shape[0]))

Original data has 7898456 rows 
Updatated dataframe has 632919 row


Upon initial review I noticed that some columns have duplicated data. Location, X coordinate and Y coordinate have the same information as Lon  and Lat columns. Also, Case Number columns have no significance information, since I  have ID case number. I will be dropping those columns.

In [14]:
df2 = df.drop(columns=['Case Number','X Coordinate', 'Y Coordinate','Location'])

In [15]:
# sanity check 
df2.head()

Unnamed: 0,ID,Date,Block,IUCR,Primary Type,Description,Location Description,Arrest,Domestic,Beat,District,Ward,Community Area,FBI Code,Year,Updated On,Latitude,Longitude
0,13203321,09/06/2023 05:00:00 PM,002XX N Wells st,1320,CRIMINAL DAMAGE,TO VEHICLE,PARKING LOT / GARAGE (NON RESIDENTIAL),False,False,122,1.0,42.0,32.0,14,2023,09/14/2023 03:43:09 PM,,
1,13210088,08/31/2023 12:00:00 PM,023XX W JACKSON BLVD,1153,DECEPTIVE PRACTICE,FINANCIAL IDENTITY THEFT OVER $ 300,STREET,False,False,1225,12.0,27.0,28.0,11,2023,09/16/2023 03:41:56 PM,41.877565,-87.684791
2,13210004,07/24/2023 09:45:00 PM,073XX S JEFFERY BLVD,281,CRIMINAL SEXUAL ASSAULT,NON-AGGRAVATED,APARTMENT,False,False,333,3.0,7.0,43.0,2,2023,09/16/2023 03:41:56 PM,41.761919,-87.576209
3,13210062,08/27/2023 07:00:00 AM,034XX N LAWNDALE AVE,820,THEFT,$500 AND UNDER,APARTMENT,False,False,1732,17.0,30.0,21.0,6,2023,09/16/2023 03:41:56 PM,41.943379,-87.719974
4,13210107,09/04/2023 09:30:00 PM,053XX S HOMAN AVE,1310,CRIMINAL DAMAGE,TO PROPERTY,RESIDENCE - GARAGE,False,False,822,8.0,14.0,63.0,14,2023,09/16/2023 03:41:56 PM,41.796477,-87.708541


In [16]:
df2.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 632919 entries, 0 to 632918
Data columns (total 18 columns):
 #   Column                Non-Null Count   Dtype  
---  ------                --------------   -----  
 0   ID                    632919 non-null  int64  
 1   Date                  632919 non-null  object 
 2   Block                 632919 non-null  object 
 3   IUCR                  632919 non-null  object 
 4   Primary Type          632919 non-null  object 
 5   Description           632919 non-null  object 
 6   Location Description  630092 non-null  object 
 7   Arrest                632919 non-null  bool   
 8   Domestic              632919 non-null  bool   
 9   Beat                  632919 non-null  int64  
 10  District              632919 non-null  float64
 11  Ward                  632895 non-null  float64
 12  Community Area        632919 non-null  float64
 13  FBI Code              632919 non-null  object 
 14  Year                  632919 non-null  int64  
 15  

## Data Dictionary <a id='Data-Dictionary'></a>
- **ID**:    Unique identifier for the record
- **Date**:  Date when the incident occurred
- **Block**: The partially redacted address where the incident occurred, placing it on the same block as the actual address
- **IUCR**:  The Illinois Unifrom Crime Reporting code
- **Primary Type**: The primary description of the IUCR code
- **Location Description** : Description of the location where the incident occurred 
- **Arrest** : Indicates whether an arrest was made
- **Domestic** : Indicates whether the incident was domestic-related as defined by the Illinois Domestic Violence Act.
- **Beat**: Indicates the beat where the incident occurred. A beat is the smallest police geographic area – each beat has a dedicated police beat car
- **District** : Indicates the police district where the incident occurred
- **Ward** : The ward (City Council district) where the incident occurred
- **Community Area** : Indicates the community area where the incident occurred.
- **FBI Code**: Indicates the crime classification as outlined in the FBI's National Incident-Based Reporting System (NIBRS)
- **Year** : Year the incident occurred
- **Updated On**: Updated On 
- **Latitude** : The latitude of the location where the incident occurred
- **Longitude**: The longitude of the location where the incident occurred

In [62]:
# Converting Date column to a datetime format
df2['Date'] = pd.to_datetime(df2['Date'])

In [63]:
# sanity check
df2.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 632919 entries, 0 to 632918
Data columns (total 18 columns):
 #   Column                Non-Null Count   Dtype         
---  ------                --------------   -----         
 0   ID                    632919 non-null  int64         
 1   Date                  632919 non-null  datetime64[ns]
 2   Block                 632919 non-null  object        
 3   IUCR                  632919 non-null  object        
 4   Primary Type          632919 non-null  object        
 5   Description           632919 non-null  object        
 6   Location Description  630092 non-null  object        
 7   Arrest                632919 non-null  bool          
 8   Domestic              632919 non-null  bool          
 9   Beat                  632919 non-null  int64         
 10  District              632919 non-null  float64       
 11  Ward                  632895 non-null  float64       
 12  Community Area        632919 non-null  float64       
 13 

## Checking for duplicate data <a id ='Checking-for-Duplicate-Data'></a>

In [64]:
# checking for duplicated rows 
df2.duplicated().sum()

0

Based on the above results. There is no duplicated rows 

In [18]:
# checking for duplicated columns
df2.T.duplicated()

ID                      False
Date                    False
Block                   False
IUCR                    False
Primary Type            False
Description             False
Location Description    False
Arrest                  False
Domestic                False
Beat                    False
District                False
Ward                    False
Community Area          False
FBI Code                False
Year                    False
Updated On              False
Latitude                False
Longitude               False
dtype: bool

Creating a copy of the data frame to have a back up. Operation for duplicated columns takes a very long time to compute 

In [19]:
df3 = df2
# sanity check 
df3.head()

Unnamed: 0,ID,Date,Block,IUCR,Primary Type,Description,Location Description,Arrest,Domestic,Beat,District,Ward,Community Area,FBI Code,Year,Updated On,Latitude,Longitude
0,13203321,09/06/2023 05:00:00 PM,002XX N Wells st,1320,CRIMINAL DAMAGE,TO VEHICLE,PARKING LOT / GARAGE (NON RESIDENTIAL),False,False,122,1.0,42.0,32.0,14,2023,09/14/2023 03:43:09 PM,,
1,13210088,08/31/2023 12:00:00 PM,023XX W JACKSON BLVD,1153,DECEPTIVE PRACTICE,FINANCIAL IDENTITY THEFT OVER $ 300,STREET,False,False,1225,12.0,27.0,28.0,11,2023,09/16/2023 03:41:56 PM,41.877565,-87.684791
2,13210004,07/24/2023 09:45:00 PM,073XX S JEFFERY BLVD,281,CRIMINAL SEXUAL ASSAULT,NON-AGGRAVATED,APARTMENT,False,False,333,3.0,7.0,43.0,2,2023,09/16/2023 03:41:56 PM,41.761919,-87.576209
3,13210062,08/27/2023 07:00:00 AM,034XX N LAWNDALE AVE,820,THEFT,$500 AND UNDER,APARTMENT,False,False,1732,17.0,30.0,21.0,6,2023,09/16/2023 03:41:56 PM,41.943379,-87.719974
4,13210107,09/04/2023 09:30:00 PM,053XX S HOMAN AVE,1310,CRIMINAL DAMAGE,TO PROPERTY,RESIDENCE - GARAGE,False,False,822,8.0,14.0,63.0,14,2023,09/16/2023 03:41:56 PM,41.796477,-87.708541


## Dealing with missing data <a id ='Dealing-with-Missing-Data'></a>

In [20]:
df3.isna().sum()

ID                          0
Date                        0
Block                       0
IUCR                        0
Primary Type                0
Description                 0
Location Description     2827
Arrest                      0
Domestic                    0
Beat                        0
District                    0
Ward                       24
Community Area              0
FBI Code                    0
Year                        0
Updated On                  0
Latitude                12828
Longitude               12828
dtype: int64

In [21]:
# Percentage of missing values in each column
df3.isna().sum() / df3.shape[0]*100

ID                      0.000000
Date                    0.000000
Block                   0.000000
IUCR                    0.000000
Primary Type            0.000000
Description             0.000000
Location Description    0.446661
Arrest                  0.000000
Domestic                0.000000
Beat                    0.000000
District                0.000000
Ward                    0.003792
Community Area          0.000000
FBI Code                0.000000
Year                    0.000000
Updated On              0.000000
Latitude                2.026800
Longitude               2.026800
dtype: float64

Four column have missing information:
-  Ward 0.038%
- Location Description 0.45%
- Latitude 2.03%
- Longitude 2.03%


In [22]:
#The Ward column has a very low percentage of missing information. Therefore, i will droop all rows where Ward is NAN
df3 = df3.dropna(subset=['Ward'])

In [23]:
# sanity check
df3.isna().sum()

ID                          0
Date                        0
Block                       0
IUCR                        0
Primary Type                0
Description                 0
Location Description     2826
Arrest                      0
Domestic                    0
Beat                        0
District                    0
Ward                        0
Community Area              0
FBI Code                    0
Year                        0
Updated On                  0
Latitude                12827
Longitude               12827
dtype: int64

In [31]:
# exploring row with NaN value for Location Description
nan_rows = df3[pd.isna(df3['Location Description'])]
nan_rows

Unnamed: 0,ID,Date,Block,IUCR,Primary Type,Description,Location Description,Arrest,Domestic,Beat,District,Ward,Community Area,FBI Code,Year,Updated On,Latitude,Longitude
387,13207457,09/06/2023 02:20:00 PM,032XX N OCONTO AVE,1153,DECEPTIVE PRACTICE,FINANCIAL IDENTITY THEFT OVER $ 300,,False,False,1631,16.0,29.0,17.0,11,2023,09/14/2023 03:43:09 PM,41.938826,-87.807986
856,13209655,05/30/2023 09:30:00 AM,003XX W WASHINGTON ST,1242,DECEPTIVE PRACTICE,COMPUTER FRAUD,,False,False,122,1.0,42.0,32.0,11,2023,09/16/2023 03:41:56 PM,41.883203,-87.635823
1094,13203348,09/07/2023 02:35:00 PM,052XX S NATCHEZ AVE,1153,DECEPTIVE PRACTICE,FINANCIAL IDENTITY THEFT OVER $ 300,,False,False,811,8.0,23.0,56.0,11,2023,09/15/2023 03:42:23 PM,41.796829,-87.784389
1471,13209657,02/02/2023 08:15:00 AM,018XX N HONORE ST,1153,DECEPTIVE PRACTICE,FINANCIAL IDENTITY THEFT OVER $ 300,,False,False,1434,14.0,32.0,22.0,11,2023,09/15/2023 03:41:25 PM,41.914764,-87.673903
1753,13211803,08/31/2023 11:14:00 AM,020XX W DIVISION ST,1154,DECEPTIVE PRACTICE,FINANCIAL IDENTITY THEFT $300 AND UNDER,,False,False,1212,12.0,1.0,24.0,11,2023,09/16/2023 03:42:58 PM,,
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
631781,13042362,04/14/2023 06:45:00 PM,033XX W BYRON ST,1153,DECEPTIVE PRACTICE,FINANCIAL IDENTITY THEFT OVER $ 300,,False,False,1733,17.0,33.0,16.0,11,2023,08/19/2023 03:40:26 PM,41.952047,-87.712185
632057,13078297,02/01/2023 12:00:00 PM,031XX S MAY ST,1153,DECEPTIVE PRACTICE,FINANCIAL IDENTITY THEFT OVER $ 300,,False,False,913,9.0,11.0,60.0,11,2023,08/19/2023 03:40:26 PM,41.836678,-87.654768
632488,13048752,04/19/2023 10:35:00 PM,112XX S EDBROOKE AVE,1153,DECEPTIVE PRACTICE,FINANCIAL IDENTITY THEFT OVER $ 300,,False,False,531,5.0,9.0,49.0,11,2023,08/19/2023 03:40:26 PM,41.690268,-87.619568
632655,13048696,04/19/2023 01:20:00 PM,004XX N WABASH AVE,1153,DECEPTIVE PRACTICE,FINANCIAL IDENTITY THEFT OVER $ 300,,False,False,1834,18.0,42.0,8.0,11,2023,08/19/2023 03:40:26 PM,41.888994,-87.626935


In [36]:
# Filling NaN value for Location Description with 'Unknown'
df3['Location Description'] = df3['Location Description'].fillna('Unknown')

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df3['Location Description'] = df3['Location Description'].fillna('Unknown')


In [37]:
# sanity check 
df3.isna().sum()

ID                          0
Date                        0
Block                       0
IUCR                        0
Primary Type                0
Description                 0
Location Description        0
Arrest                      0
Domestic                    0
Beat                        0
District                    0
Ward                        0
Community Area              0
FBI Code                    0
Year                        0
Updated On                  0
Latitude                12827
Longitude               12827
dtype: int64

In [39]:
# getting the number of rows where the values for Latitude and Longitude are NAN
rows_with_nan_values = df3[df3['Latitude'].isna() & df3['Longitude'].isna()]           

In [41]:
# sanity check
rows_with_nan_values

Unnamed: 0,ID,Date,Block,IUCR,Primary Type,Description,Location Description,Arrest,Domestic,Beat,District,Ward,Community Area,FBI Code,Year,Updated On,Latitude,Longitude
0,13203321,09/06/2023 05:00:00 PM,002XX N Wells st,1320,CRIMINAL DAMAGE,TO VEHICLE,PARKING LOT / GARAGE (NON RESIDENTIAL),False,False,122,1.0,42.0,32.0,14,2023,09/14/2023 03:43:09 PM,,
18,12955336,11/16/2022 10:23:00 AM,087XX S KINGSTON AVE,1562,SEX OFFENSE,AGGRAVATED CRIMINAL SEXUAL ABUSE,RESIDENCE,False,True,423,4.0,7.0,46.0,17,2022,09/14/2023 03:41:59 PM,,
19,12978205,08/01/2022 10:00:00 PM,028XX S KEELER AVE,1562,SEX OFFENSE,AGGRAVATED CRIMINAL SEXUAL ABUSE,APARTMENT,False,True,1031,10.0,22.0,30.0,17,2022,09/14/2023 03:41:59 PM,,
20,13041906,10/15/2022 01:30:00 AM,051XX W WRIGHTWOOD AVE,0486,BATTERY,DOMESTIC BATTERY SIMPLE,APARTMENT,True,True,2521,25.0,31.0,19.0,08B,2022,09/14/2023 03:41:59 PM,,
147,13209581,08/01/2021 12:00:00 AM,012XX E 78TH ST,1563,SEX OFFENSE,CRIMINAL SEXUAL ABUSE,APARTMENT,False,False,411,4.0,8.0,45.0,17,2021,09/14/2023 03:43:09 PM,,
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
632192,13062643,04/12/2022 12:00:00 AM,074XX S WABASH AVE,1154,DECEPTIVE PRACTICE,FINANCIAL IDENTITY THEFT $300 AND UNDER,RESIDENCE,False,False,323,3.0,6.0,69.0,11,2022,05/05/2023 03:41:14 PM,,
632242,13092803,08/23/2021 12:00:00 AM,036XX W SHAKESPEARE AVE,1563,SEX OFFENSE,CRIMINAL SEXUAL ABUSE,RESIDENCE,False,True,2525,25.0,35.0,22.0,17,2021,06/02/2023 03:41:42 PM,,
632471,13099981,03/01/2022 12:00:00 AM,019XX S WABASH AVE,0460,BATTERY,SIMPLE,APARTMENT,False,False,131,1.0,3.0,33.0,08B,2022,06/08/2023 03:42:44 PM,,
632549,13168008,08/05/2023 02:30:00 AM,017XX W Van Buren St,0810,THEFT,OVER $500,HOSPITAL BUILDING / GROUNDS,False,False,1231,12.0,27.0,28.0,06,2023,08/12/2023 03:41:48 PM,,


In [42]:
# Getting unigue block names 
unique_blocks = rows_with_nan_values['Block'].value_counts()

print(unique_blocks)

033XX W FILLMORE ST         428
006XX W OHARE ST            136
100XX W OHARE ST             71
014XX S MUSEUM CAMPUS DR     70
003XX E RANDOLPH ST          62
                           ... 
017XX W 48TH ST               1
088XX S STONY ISLAND AVE      1
002XX W 31ST ST               1
035XX W SCHRAEDER DR          1
051XX S Cornell Ave           1
Name: Block, Length: 7571, dtype: int64


I encountered a situation where 12,827 rows lacked longitude and latitude details for various city blocks, out of which 7,571 were distinct blocks in Chicago. Although there were methods like calculating mean values or using interpolation to fill these gaps, these approaches were deemed inaccurate. Considering only 2% of the rows were affected, I opted to remove the rows with missing data

In [51]:
# droping colums where Latitude and Longitude have NAN values
df3 = df3.dropna(subset=['Latitude','Longitude'])

In [53]:
#sanity check 
df3.isna().sum()

ID                      0
Date                    0
Block                   0
IUCR                    0
Primary Type            0
Description             0
Location Description    0
Arrest                  0
Domestic                0
Beat                    0
District                0
Ward                    0
Community Area          0
FBI Code                0
Year                    0
Updated On              0
Latitude                0
Longitude               0
dtype: int64

## Saving the data <a id='Saving-the-Data'></a>

In [55]:
# Saving the cleaned data for future analysis 
df3.to_csv('/Users/vikst/Desktop/data/Chicago_Crime-cleaned.csv', index=False)