# Prepare Chicago Crime Data for a GitHub Repository

- Original Notebook Source: https://github.com/coding-dojo-data-science/preparing-chicago-crime-data
- Updated 11/17/22

>- This notebook will process a "Crimes - 2001 to Preset.csv" crime file in your Downloads folder and save it as smaller .csv's in a new "Data/Chicago/" folder inside this notebook's folder/repo.

# INSTRUCTIONS

- 1) Go to the Chicago Data Portal's page for ["Crimes - 2001 to Preset"](https://data.cityofchicago.org/Public-Safety/Crimes-2001-to-Present/ijzp-q8t2).

- 2) Click on the Export button on the top right and select CSV. 
    - Save the file to your Downloads folder instead of your repository. **The file is too big for a repository.**
    
    
    
- 3) Wait for the full file to download. 
    - It is very large (over >1.7GB and may take several minutes to fully download.)
    
    
- 4) Once the download is complete, change `RAW_FILE` variable below to match the filepath to the downloaded file.

## 🚨 Set the correct `RAW_FILE` path

- The cell below will attempt to check your Downloads folder for any file with a name that contains "Crimes_-_2001_to_Present".
    - If you know the file path already, you can skip the next cell and just manually set the RAW_FILE variable in the following code cell.

In [1]:
## Run the cell below to attempt to programmatically find your crime file
import os,glob

## Getting the home folder from environment variables
home_folder = os.environ['HOME']
# print("- Your Home Folder is: " + home_folder)

## Check for downloads folder
if 'Downloads' in os.listdir(home_folder):
    
    
    # Print the Downloads folder path
    dl_folder = os.path.abspath(os.path.join(home_folder,'Downloads'))
    print(f"- Your Downloads folder is '{dl_folder}/'\n")
    
    ## checking for crime files using glob
    crime_files = sorted(glob.glob(dl_folder+'/**/Crimes_-_2001_to_Present*',recursive=True))
    
    # If more than 
    if len(crime_files)==1:
        RAW_FILE = crime_files[0]
        
    elif len(crime_files)>1:
        print('[i] The following files were found:')
        
        for i, fname in enumerate(crime_files):
            print(f"\tcrime_files[{i}] = '{fname}'")
        print(f'\n- Please fill in the RAW_FILE variable in the code cell below with the correct filepath.')

else:
    print(f'[!] Could not programmatically find your downloads folder.')
    print('- Try using Finder (on Mac) or File Explorer (Windows) to navigate to your Downloads folder.')


- Your Downloads folder is 'C:\Users\nicke\Downloads/'



In [2]:
## (Required) MAKE SURE TO CHANGE THIS VARIABLE TO MATCH YOUR LOCAL FILE NAME
RAW_FILE = r"C:\Users\nicke\Downloads\Crimes_-_2001_to_Present.csv" #(or slice correct index from the crime_files list)

if RAW_FILE == "YOUR FILEPATH HERE":
	raise Exception("You must update the RAW_FILE variable to match your local filepath.")
	
RAW_FILE

'C:\\Users\\nicke\\Downloads\\Crimes_-_2001_to_Present.csv'

In [3]:
## (Optional) SET THE FOLDER FOR FINAL FILES
OUTPUT_FOLDER = 'Data/Chicago/'
os.makedirs(OUTPUT_FOLDER, exist_ok=True)

# 🔄 Full Workflow

- Now that your RAW_FILE variable is set either:
    - On the toolbar, click on the Kernel menu > "Restart and Run All".
    - OR click on this cell first, then on the toolbar click on the "Cell" menu > "Run All Below"

In [4]:
import pandas as pd
pd.set_option('display.max_columns', 100)
pd.set_option('display.float_format',lambda x: f"{x:,.2f}")

In [5]:
chicago_full = pd.read_csv(RAW_FILE)
chicago_full

Unnamed: 0,ID,Case Number,Date,Block,IUCR,Primary Type,Description,Location Description,Arrest,Domestic,Beat,District,Ward,Community Area,FBI Code,X Coordinate,Y Coordinate,Year,Updated On,Latitude,Longitude,Location
0,10224738,HY411648,09/05/2015 01:30:00 PM,043XX S WOOD ST,0486,BATTERY,DOMESTIC BATTERY SIMPLE,RESIDENCE,False,True,924,9.00,12.00,61.00,08B,1165074.00,1875917.00,2015,02/10/2018 03:50:01 PM,41.82,-87.67,"(41.815117282, -87.669999562)"
1,10224739,HY411615,09/04/2015 11:30:00 AM,008XX N CENTRAL AVE,0870,THEFT,POCKET-PICKING,CTA BUS,False,False,1511,15.00,29.00,25.00,06,1138875.00,1904869.00,2015,02/10/2018 03:50:01 PM,41.90,-87.77,"(41.895080471, -87.765400451)"
2,11646166,JC213529,09/01/2018 12:01:00 AM,082XX S INGLESIDE AVE,0810,THEFT,OVER $500,RESIDENCE,False,True,631,6.00,8.00,44.00,06,,,2018,04/06/2019 04:04:43 PM,,,
3,10224740,HY411595,09/05/2015 12:45:00 PM,035XX W BARRY AVE,2023,NARCOTICS,POSS: HEROIN(BRN/TAN),SIDEWALK,True,False,1412,14.00,35.00,21.00,18,1152037.00,1920384.00,2015,02/10/2018 03:50:01 PM,41.94,-87.72,"(41.937405765, -87.716649687)"
4,10224741,HY411610,09/05/2015 01:00:00 PM,0000X N LARAMIE AVE,0560,ASSAULT,SIMPLE,APARTMENT,False,True,1522,15.00,28.00,25.00,08A,1141706.00,1900086.00,2015,02/10/2018 03:50:01 PM,41.88,-87.76,"(41.881903443, -87.755121152)"
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
7744317,12936285,JF526139,06/27/2022 10:05:00 AM,025XX N HALSTED ST,1153,DECEPTIVE PRACTICE,FINANCIAL IDENTITY THEFT OVER $ 300,,False,False,1935,19.00,43.00,7.00,11,1170513.00,1917030.00,2022,01/03/2023 03:46:28 PM,41.93,-87.65,"(41.927817456, -87.648845932)"
7744318,12936301,JF526810,12/22/2022 06:00:00 PM,020XX W CORNELIA AVE,1320,CRIMINAL DAMAGE,TO VEHICLE,STREET,False,False,1921,19.00,32.00,5.00,14,1161968.00,1923233.00,2022,01/03/2023 03:46:28 PM,41.95,-87.68,"(41.945021752, -87.680071764)"
7744319,12938501,JF523997,12/26/2022 10:30:00 PM,021XX W DEVON AVE,0915,MOTOR VEHICLE THEFT,"TRUCK, BUS, MOTOR HOME",PARKING LOT / GARAGE (NON RESIDENTIAL),False,False,2413,24.00,50.00,2.00,07,1160681.00,1942466.00,2022,01/03/2023 03:46:28 PM,42.00,-87.68,"(41.997824802, -87.684266677)"
7744320,12936397,JF526745,12/19/2022 02:00:00 PM,044XX N ROCKWELL ST,0620,BURGLARY,UNLAWFUL ENTRY,APARTMENT,False,False,1911,19.00,47.00,4.00,05,1158237.00,1929586.00,2022,01/03/2023 03:46:28 PM,41.96,-87.69,"(41.962531969, -87.693611152)"


In [6]:
# explicitly setting the format to speed up pd.to_datetime
date_format = "%m/%d/%Y %H:%M:%S %p"


### Demonstrating/testing date_format
example = chicago_full.loc[0,'Date']
display(example)
pd.to_datetime(example,format=date_format)

'09/05/2015 01:30:00 PM'

Timestamp('2015-09-05 01:30:00')

In [7]:
# this cell can take up to 1 min to run
chicago_full['Datetime'] = pd.to_datetime(chicago_full['Date'], format=date_format)
chicago_full = chicago_full.sort_values('Datetime')
chicago_full = chicago_full.set_index('Datetime')
chicago_full

Unnamed: 0_level_0,ID,Case Number,Date,Block,IUCR,Primary Type,Description,Location Description,Arrest,Domestic,Beat,District,Ward,Community Area,FBI Code,X Coordinate,Y Coordinate,Year,Updated On,Latitude,Longitude,Location
Datetime,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1,Unnamed: 22_level_1
2001-01-01 01:00:00,1312557,G001203,01/01/2001 01:00:00 PM,109XX S MICHIGAN AV,0910,MOTOR VEHICLE THEFT,AUTOMOBILE,STREET,False,False,513,5.00,,,07,1178895.00,1832264.00,2001,08/17/2015 03:03:40 PM,41.70,-87.62,"(41.695024578, -87.620628832)"
2001-01-01 01:00:00,1311626,G001009,01/01/2001 01:00:00 AM,023XX S TROY ST,1320,CRIMINAL DAMAGE,TO VEHICLE,STREET,False,False,1033,10.00,,,14,1155692.00,1888116.00,2001,08/17/2015 03:03:40 PM,41.85,-87.70,"(41.848786421, -87.704086603)"
2001-01-01 01:00:00,1309918,G000412,01/01/2001 01:00:00 AM,032XX N SHEFFIELD AV,0820,THEFT,$500 AND UNDER,TAVERN/LIQUOR STORE,False,False,1924,19.00,,,06,1169005.00,1921458.00,2001,08/17/2015 03:03:40 PM,41.94,-87.65,"(41.940000996, -87.654258339)"
2001-01-01 01:00:00,1314713,G001044,01/01/2001 01:00:00 PM,017XX W AUGUSTA BV,0520,ASSAULT,AGGRAVATED:KNIFE/CUTTING INSTR,RESIDENCE,True,True,1322,12.00,,,04A,1164232.00,1906669.00,2001,08/17/2015 03:03:40 PM,41.90,-87.67,"(41.899521453, -87.672219588)"
2001-01-01 01:00:00,1310288,G000636,01/01/2001 01:00:00 AM,075XX S UNION AV,1310,CRIMINAL DAMAGE,TO PROPERTY,RESIDENCE,False,False,621,6.00,,,14,1172985.00,1854673.00,2001,08/17/2015 03:03:40 PM,41.76,-87.64,"(41.756650158, -87.641607815)"
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
2023-02-22 12:50:00,12992297,JG161855,02/22/2023 12:50:00 PM,046XX S LAFLIN ST,4651,OTHER OFFENSE,SEX OFFENDER - FAIL TO REGISTER NEW ADDRESS,APARTMENT,False,False,924,9.00,15.00,61.00,26,1167115.00,1874007.00,2023,03/01/2023 03:48:50 PM,41.81,-87.66,"(41.809832554, -87.662567546)"
2023-02-22 12:50:00,12990273,JG161395,02/22/2023 12:50:00 AM,003XX W MONTROSE HARBOR DR,143A,WEAPONS VIOLATION,UNLAWFUL POSSESSION - HANDGUN,STREET,True,False,1915,19.00,46.00,3.00,15,1174133.00,1929342.00,2023,03/01/2023 03:48:50 PM,41.96,-87.64,"(41.961521895, -87.635175446)"
2023-02-22 12:54:00,12990326,JG161430,02/22/2023 12:54:00 AM,013XX W COLUMBIA AVE,1320,CRIMINAL DAMAGE,TO VEHICLE,STREET,False,False,2432,24.00,49.00,1.00,14,1166106.00,1944933.00,2023,03/01/2023 03:48:50 PM,42.00,-87.66,"(42.004479725, -87.664239187)"
2023-02-22 12:54:00,12990251,JG161401,02/22/2023 12:54:00 AM,094XX S HALSTED ST,051A,ASSAULT,AGGRAVATED - HANDGUN,SIDEWALK,False,False,2223,22.00,21.00,73.00,04A,1172633.00,1842466.00,2023,03/01/2023 03:48:50 PM,41.72,-87.64,"(41.72316028, -87.643256557)"


In [8]:
(chicago_full.isna().sum()/len(chicago_full)).round(2)

ID                     0.00
Case Number            0.00
Date                   0.00
Block                  0.00
IUCR                   0.00
Primary Type           0.00
Description            0.00
Location Description   0.00
Arrest                 0.00
Domestic               0.00
Beat                   0.00
District               0.00
Ward                   0.08
Community Area         0.08
FBI Code               0.00
X Coordinate           0.01
Y Coordinate           0.01
Year                   0.00
Updated On             0.00
Latitude               0.01
Longitude              0.01
Location               0.01
dtype: float64

## Separate the Full Dataset by Years

In [9]:
# save the years for every crime
chicago_full["Year"] = chicago_full.index.year
chicago_full["Year"] = chicago_full["Year"].astype(str)
chicago_full["Year"].value_counts()

2002    486800
2001    485872
2003    475979
2004    469420
2005    453768
2006    448174
2007    437081
2008    427159
2009    392816
2010    370492
2011    351959
2012    336259
2013    307460
2014    275724
2016    269777
2017    269044
2018    268744
2015    264736
2019    261213
2022    237113
2020    212039
2021    208427
2023     34266
Name: Year, dtype: int64

In [10]:
## Dropping unneeded columns to reduce file size
drop_cols = ["X Coordinate","Y Coordinate", "Community Area","FBI Code",
             "Case Number","Updated On",'Block','Location','IUCR']

In [11]:
# save final df
chicago_final = chicago_full.drop(columns=drop_cols).sort_index()#.reset_index()
chicago_final

Unnamed: 0_level_0,ID,Date,Primary Type,Description,Location Description,Arrest,Domestic,Beat,District,Ward,Year,Latitude,Longitude
Datetime,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1
2001-01-01 01:00:00,1312557,01/01/2001 01:00:00 PM,MOTOR VEHICLE THEFT,AUTOMOBILE,STREET,False,False,513,5.00,,2001,41.70,-87.62
2001-01-01 01:00:00,1311626,01/01/2001 01:00:00 AM,CRIMINAL DAMAGE,TO VEHICLE,STREET,False,False,1033,10.00,,2001,41.85,-87.70
2001-01-01 01:00:00,1309918,01/01/2001 01:00:00 AM,THEFT,$500 AND UNDER,TAVERN/LIQUOR STORE,False,False,1924,19.00,,2001,41.94,-87.65
2001-01-01 01:00:00,1314713,01/01/2001 01:00:00 PM,ASSAULT,AGGRAVATED:KNIFE/CUTTING INSTR,RESIDENCE,True,True,1322,12.00,,2001,41.90,-87.67
2001-01-01 01:00:00,1310288,01/01/2001 01:00:00 AM,CRIMINAL DAMAGE,TO PROPERTY,RESIDENCE,False,False,621,6.00,,2001,41.76,-87.64
...,...,...,...,...,...,...,...,...,...,...,...,...,...
2023-02-22 12:50:00,12992297,02/22/2023 12:50:00 PM,OTHER OFFENSE,SEX OFFENDER - FAIL TO REGISTER NEW ADDRESS,APARTMENT,False,False,924,9.00,15.00,2023,41.81,-87.66
2023-02-22 12:50:00,12990273,02/22/2023 12:50:00 AM,WEAPONS VIOLATION,UNLAWFUL POSSESSION - HANDGUN,STREET,True,False,1915,19.00,46.00,2023,41.96,-87.64
2023-02-22 12:54:00,12990326,02/22/2023 12:54:00 AM,CRIMINAL DAMAGE,TO VEHICLE,STREET,False,False,2432,24.00,49.00,2023,42.00,-87.66
2023-02-22 12:54:00,12990251,02/22/2023 12:54:00 AM,ASSAULT,AGGRAVATED - HANDGUN,SIDEWALK,False,False,2223,22.00,21.00,2023,41.72,-87.64


In [12]:
chicago_final.info()

<class 'pandas.core.frame.DataFrame'>
DatetimeIndex: 7744322 entries, 2001-01-01 01:00:00 to 2023-02-22 12:56:00
Data columns (total 13 columns):
 #   Column                Dtype  
---  ------                -----  
 0   ID                    int64  
 1   Date                  object 
 2   Primary Type          object 
 3   Description           object 
 4   Location Description  object 
 5   Arrest                bool   
 6   Domestic              bool   
 7   Beat                  int64  
 8   District              float64
 9   Ward                  float64
 10  Year                  object 
 11  Latitude              float64
 12  Longitude             float64
dtypes: bool(2), float64(4), int64(2), object(5)
memory usage: 723.8+ MB


In [13]:
chicago_final.memory_usage(deep=True).astype(float)

Index                   61,954,576.00
ID                      61,954,576.00
Date                   611,801,438.00
Primary Type           519,106,897.00
Description            566,784,871.00
Location Description   527,827,362.00
Arrest                   7,744,322.00
Domestic                 7,744,322.00
Beat                    61,954,576.00
District                61,954,576.00
Ward                    61,954,576.00
Year                   472,403,642.00
Latitude                61,954,576.00
Longitude               61,954,576.00
dtype: float64

In [14]:
# unique # of year bins
year_bins = chicago_final['Year'].astype(str).unique()
year_bins

array(['2001', '2002', '2003', '2004', '2005', '2006', '2007', '2008',
       '2009', '2010', '2011', '2012', '2013', '2014', '2015', '2016',
       '2017', '2018', '2019', '2020', '2021', '2022', '2023'],
      dtype=object)

In [15]:
FINAL_DROP = ['Datetime','Year']#,'Location Description']

In [16]:
## set save location 

os.makedirs(OUTPUT_FOLDER, exist_ok=True)
print(f"[i] Saving .csv's to {OUTPUT_FOLDER}")
## loop through years
for year in year_bins:
    
    ## save temp slices of dfs to save.
    temp_df = chicago_final.loc[ year]
    temp_df = temp_df.reset_index(drop=False)
    temp_df = temp_df.drop(columns=FINAL_DROP)

    # save as csv to output folder
    fname_temp = f"{OUTPUT_FOLDER}Chicago-Crime_{year}.csv"#.gz
    temp_df.to_csv(fname_temp,index=False)

    print(f"- Succesfully saved {fname_temp}")

[i] Saving .csv's to Data/Chicago/
- Succesfully saved Data/Chicago/Chicago-Crime_2001.csv
- Succesfully saved Data/Chicago/Chicago-Crime_2002.csv
- Succesfully saved Data/Chicago/Chicago-Crime_2003.csv
- Succesfully saved Data/Chicago/Chicago-Crime_2004.csv
- Succesfully saved Data/Chicago/Chicago-Crime_2005.csv
- Succesfully saved Data/Chicago/Chicago-Crime_2006.csv
- Succesfully saved Data/Chicago/Chicago-Crime_2007.csv
- Succesfully saved Data/Chicago/Chicago-Crime_2008.csv
- Succesfully saved Data/Chicago/Chicago-Crime_2009.csv
- Succesfully saved Data/Chicago/Chicago-Crime_2010.csv
- Succesfully saved Data/Chicago/Chicago-Crime_2011.csv
- Succesfully saved Data/Chicago/Chicago-Crime_2012.csv
- Succesfully saved Data/Chicago/Chicago-Crime_2013.csv
- Succesfully saved Data/Chicago/Chicago-Crime_2014.csv
- Succesfully saved Data/Chicago/Chicago-Crime_2015.csv
- Succesfully saved Data/Chicago/Chicago-Crime_2016.csv
- Succesfully saved Data/Chicago/Chicago-Crime_2017.csv
- Succesfully

In [17]:
saved_files = sorted(glob.glob(OUTPUT_FOLDER+'*.*csv'))
saved_files

['Data/Chicago\\Chicago-Crime_2001.csv',
 'Data/Chicago\\Chicago-Crime_2002.csv',
 'Data/Chicago\\Chicago-Crime_2003.csv',
 'Data/Chicago\\Chicago-Crime_2004.csv',
 'Data/Chicago\\Chicago-Crime_2005.csv',
 'Data/Chicago\\Chicago-Crime_2006.csv',
 'Data/Chicago\\Chicago-Crime_2007.csv',
 'Data/Chicago\\Chicago-Crime_2008.csv',
 'Data/Chicago\\Chicago-Crime_2009.csv',
 'Data/Chicago\\Chicago-Crime_2010.csv',
 'Data/Chicago\\Chicago-Crime_2011.csv',
 'Data/Chicago\\Chicago-Crime_2012.csv',
 'Data/Chicago\\Chicago-Crime_2013.csv',
 'Data/Chicago\\Chicago-Crime_2014.csv',
 'Data/Chicago\\Chicago-Crime_2015.csv',
 'Data/Chicago\\Chicago-Crime_2016.csv',
 'Data/Chicago\\Chicago-Crime_2017.csv',
 'Data/Chicago\\Chicago-Crime_2018.csv',
 'Data/Chicago\\Chicago-Crime_2019.csv',
 'Data/Chicago\\Chicago-Crime_2020.csv',
 'Data/Chicago\\Chicago-Crime_2021.csv',
 'Data/Chicago\\Chicago-Crime_2022.csv',
 'Data/Chicago\\Chicago-Crime_2023.csv']

In [18]:
## create a README.txt for the zip files
readme = """Source URL: 
- https://data.cityofchicago.org/Public-Safety/Crimes-2001-to-Present/ijzp-q8t2
- Filtered for years 2000-Present.

Downloaded 07/18/2022
- Files are split into 1 year per file.

EXAMPLE USAGE:
>> import glob
>> import pandas as pd
>> folder = "Data/Chicago/"
>> crime_files = sorted(glob.glob(folder+"*.csv"))
>> df = pd.concat([pd.read_csv(f) for f in crime_files])
"""
print(readme)


with open(f"{OUTPUT_FOLDER}README.txt",'w') as f:
    f.write(readme)

Source URL: 
- https://data.cityofchicago.org/Public-Safety/Crimes-2001-to-Present/ijzp-q8t2
- Filtered for years 2000-Present.

Downloaded 07/18/2022
- Files are split into 1 year per file.

EXAMPLE USAGE:
>> import glob
>> import pandas as pd
>> folder = "Data/Chicago/"
>> crime_files = sorted(glob.glob(folder+"*.csv"))
>> df = pd.concat([pd.read_csv(f) for f in crime_files])



## Confirmation

- Follow the example usage above to test if your files were created successfully.

In [19]:
# get list of files from folder
crime_files = sorted(glob.glob(OUTPUT_FOLDER+"*.csv"))
df = pd.concat([pd.read_csv(f) for f in crime_files])
df

Unnamed: 0,ID,Date,Primary Type,Description,Location Description,Arrest,Domestic,Beat,District,Ward,Latitude,Longitude
0,1312557,01/01/2001 01:00:00 PM,MOTOR VEHICLE THEFT,AUTOMOBILE,STREET,False,False,513,5.00,,41.70,-87.62
1,1311626,01/01/2001 01:00:00 AM,CRIMINAL DAMAGE,TO VEHICLE,STREET,False,False,1033,10.00,,41.85,-87.70
2,1309918,01/01/2001 01:00:00 AM,THEFT,$500 AND UNDER,TAVERN/LIQUOR STORE,False,False,1924,19.00,,41.94,-87.65
3,1314713,01/01/2001 01:00:00 PM,ASSAULT,AGGRAVATED:KNIFE/CUTTING INSTR,RESIDENCE,True,True,1322,12.00,,41.90,-87.67
4,1310288,01/01/2001 01:00:00 AM,CRIMINAL DAMAGE,TO PROPERTY,RESIDENCE,False,False,621,6.00,,41.76,-87.64
...,...,...,...,...,...,...,...,...,...,...,...,...
34261,12992297,02/22/2023 12:50:00 PM,OTHER OFFENSE,SEX OFFENDER - FAIL TO REGISTER NEW ADDRESS,APARTMENT,False,False,924,9.00,15.00,41.81,-87.66
34262,12990273,02/22/2023 12:50:00 AM,WEAPONS VIOLATION,UNLAWFUL POSSESSION - HANDGUN,STREET,True,False,1915,19.00,46.00,41.96,-87.64
34263,12990326,02/22/2023 12:54:00 AM,CRIMINAL DAMAGE,TO VEHICLE,STREET,False,False,2432,24.00,49.00,42.00,-87.66
34264,12990251,02/22/2023 12:54:00 AM,ASSAULT,AGGRAVATED - HANDGUN,SIDEWALK,False,False,2223,22.00,21.00,41.72,-87.64


In [20]:
years = df['Date'].map(lambda x: x.split()[0].split('/')[-1])
years.value_counts().sort_index()

2001    485872
2002    486800
2003    475979
2004    469420
2005    453768
2006    448174
2007    437081
2008    427159
2009    392816
2010    370492
2011    351959
2012    336259
2013    307460
2014    275724
2015    264736
2016    269777
2017    269044
2018    268744
2019    261213
2020    212039
2021    208427
2022    237113
2023     34266
Name: Date, dtype: int64

## Summary

- The chicago crime dataset has now been saved to your repository as csv files. 
- You should save your notebook, commit your work and push to GitHub using GitHub desktop.

# Comparing Police Districts

In [23]:
eda_df = df.copy()

In [24]:
eda_df['Datetime'] = pd.to_datetime(df['Date'])

In [25]:
eda_df['District'].value_counts()

8.00     520648
11.00    497433
6.00     452660
7.00     448623
25.00    440727
4.00     439718
3.00     392798
12.00    383209
9.00     378632
2.00     366421
19.00    347361
18.00    347269
5.00     343421
10.00    333642
15.00    332425
1.00     312420
14.00    299813
16.00    258917
22.00    253817
24.00    233782
17.00    223493
20.00    136810
31.00       232
21.00         4
Name: District, dtype: int64

The most crimes seem to happen in district 8, and the least in district 21

# Crimes Across the Years

In [30]:
eda_df['Year'] = eda_df['Datetime'].dt.year
eda_df['Year'].value_counts()

2002    486800
2001    485872
2003    475979
2004    469420
2005    453768
2006    448174
2007    437081
2008    427159
2009    392816
2010    370492
2011    351959
2012    336259
2013    307460
2014    275724
2016    269777
2017    269044
2018    268744
2015    264736
2019    261213
2022    237113
2020    212039
2021    208427
2023     34266
Name: Year, dtype: int64

Crimes appear to have drastically decreased over the years.

# Comparing Months

In [40]:
eda_df['Month'] = eda_df['Datetime'].dt.month
eda_df['Month'].value_counts()

7     716977
8     710143
5     682728
6     681499
10    676010
9     667931
3     629513
4     626950
1     621470
11    608684
12    579401
2     543016
Name: Month, dtype: int64

July is the month with the most crime, and February is the month with the least crime.