# Prepare Chicago Crime Data for a GitHub Repository

- Original Notebook Source: https://github.com/coding-dojo-data-science/preparing-chicago-crime-data
- Updated 11/17/22

>- This notebook will process a "Crimes - 2001 to Preset.csv" crime file in your Downloads folder and save it as smaller .csv's in a new "Data/Chicago/" folder inside this notebook's folder/repo.

# INSTRUCTIONS

- 1) Go to the Chicago Data Portal's page for ["Crimes - 2001 to Preset"](https://data.cityofchicago.org/Public-Safety/Crimes-2001-to-Present/ijzp-q8t2).

- 2) Click on the Export button on the top right and select CSV. 
    - Save the file to your Downloads folder instead of your repository. **The file is too big for a repository.**
    
    
    
- 3) Wait for the full file to download. 
    - It is very large (over >1.7GB and may take several minutes to fully download.)
    
    
- 4) Once the download is complete, change `RAW_FILE` variable below to match the filepath to the downloaded file.

## 🚨 Set the correct `RAW_FILE` path

- The cell below will attempt to check your Downloads folder for any file with a name that contains "Crimes_-_2001_to_Present".
    - If you know the file path already, you can skip the next cell and just manually set the RAW_FILE variable in the following code cell.

In [1]:
## Run the cell below to attempt to programmatically find your crime file
import os,glob

## Getting the home folder from environment variables
home_folder = os.environ['HOME']
# print("- Your Home Folder is: " + home_folder)

## Check for downloads folder
if 'Downloads' in os.listdir(home_folder):
    
    
    # Print the Downloads folder path
    dl_folder = os.path.abspath(os.path.join(home_folder,'Downloads'))
    print(f"- Your Downloads folder is '{dl_folder}/'\n")
    
    ## checking for crime files using glob
    crime_files = sorted(glob.glob(dl_folder+'/**/Crimes_-_2001_to_Present*',recursive=True))
    
    # If more than 
    if len(crime_files)==1:
        RAW_FILE = crime_files[0]
        
    elif len(crime_files)>1:
        print('[i] The following files were found:')
        
        for i, fname in enumerate(crime_files):
            print(f"\tcrime_files[{i}] = '{fname}'")
        print(f'\n- Please fill in the RAW_FILE variable in the code cell below with the correct filepath.')

else:
    print(f'[!] Could not programmatically find your downloads folder.')
    print('- Try using Finder (on Mac) or File Explorer (Windows) to navigate to your Downloads folder.')


- Your Downloads folder is 'C:\Users\16024\Downloads/'



In [2]:
## (Required) MAKE SURE TO CHANGE THIS VARIABLE TO MATCH YOUR LOCAL FILE NAME
RAW_FILE = "C:/Users/16024/Downloads/Crimes_-_2001_to_Present.csv" #(or slice correct index from the crime_files list)

if RAW_FILE == "YOUR FILEPATH HERE":
	raise Exception("You must update the RAW_FILE variable to match your local filepath.")
	
RAW_FILE

'C:/Users/16024/Downloads/Crimes_-_2001_to_Present.csv'

In [3]:
## (Optional) SET THE FOLDER FOR FINAL FILES
OUTPUT_FOLDER = 'Data/Chicago/'
os.makedirs(OUTPUT_FOLDER, exist_ok=True)

# 🔄 Full Workflow

- Now that your RAW_FILE variable is set either:
    - On the toolbar, click on the Kernel menu > "Restart and Run All".
    - OR click on this cell first, then on the toolbar click on the "Cell" menu > "Run All Below"

In [4]:
import pandas as pd
pd.set_option('display.max_columns', 100)
pd.set_option('display.float_format',lambda x: f"{x:,.2f}")

In [5]:
chicago_full = pd.read_csv(RAW_FILE)
chicago_full

Unnamed: 0,ID,Case Number,Date,Block,IUCR,Primary Type,Description,Location Description,Arrest,Domestic,Beat,District,Ward,Community Area,FBI Code,X Coordinate,Y Coordinate,Year,Updated On,Latitude,Longitude,Location
0,11646166,JC213529,9/1/2018 0:01,082XX S INGLESIDE AVE,810,THEFT,OVER $500,RESIDENCE,False,True,631,6.00,8.00,44.00,6,,,2018,4/6/2019 16:04,,,
1,11645836,JC212333,5/1/2016 0:25,055XX S ROCKWELL ST,1153,DECEPTIVE PRACTICE,FINANCIAL IDENTITY THEFT OVER $ 300,,False,False,824,8.00,15.00,63.00,11,,,2016,4/6/2019 16:04,,,
2,11243268,JB167760,1/1/2017 0:01,047XX N CLARK ST,1562,SEX OFFENSE,AGG CRIMINAL SEXUAL ABUSE,APARTMENT,False,False,1913,19.00,47.00,3.00,17,,,2017,9/13/2018 15:56,,,
3,1896258,G749215,12/15/2001 2:00,011XX N STATE ST,460,BATTERY,SIMPLE,STREET,False,False,1824,18.00,,,08B,,,2001,8/17/2015 15:03,,,
4,11645527,JC212744,2/2/2015 10:00,069XX W ARCHER AVE,1153,DECEPTIVE PRACTICE,FINANCIAL IDENTITY THEFT OVER $ 300,OTHER,False,False,811,8.00,23.00,56.00,11,,,2015,4/6/2019 16:04,,,
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
1048570,11379778,JB349079,7/14/2018 5:20,021XX N MAJOR AVE,479,BATTERY,AGG: HANDS/FIST/FEET SERIOUS INJURY,SIDEWALK,False,False,2515,25.00,29.00,19.00,04B,1138038.00,1913794.00,2018,7/21/2018 15:49,41.92,-87.77,"(41.919586896, -87.768258642)"
1048571,11379779,JB349258,7/6/2018 10:00,007XX E 130TH PL,890,THEFT,FROM BUILDING,RESIDENCE,False,False,533,5.00,9.00,54.00,6,1183459.00,1819069.00,2018,7/18/2018 15:53,41.66,-87.60,"(41.658710798, -87.604327498)"
1048572,11379781,JB349137,7/14/2018 3:30,014XX N LUNA AVE,486,BATTERY,DOMESTIC BATTERY SIMPLE,APARTMENT,False,True,2532,25.00,37.00,25.00,08B,1139073.00,1909046.00,2018,7/21/2018 15:49,41.91,-87.76,"(41.906539063, -87.764571529)"
1048573,11379782,JB349267,7/14/2018 2:45,036XX W DICKENS AVE,1365,CRIMINAL TRESPASS,TO RESIDENCE,RESIDENTIAL YARD (FRONT/BACK),False,False,2525,25.00,26.00,22.00,26,1151707.00,1913710.00,2018,7/21/2018 15:49,41.92,-87.72,"(41.919098255, -87.718038515)"


In [7]:
# explicitly setting the format to speed up pd.to_datetime
date_format = "%m/%d/%Y %H:%M"


### Demonstrating/testing date_format
example = chicago_full.loc[0,'Date']
display(example)
pd.to_datetime(example,format=date_format)

'9/1/2018 0:01'

Timestamp('2018-09-01 00:01:00')

In [8]:
# this cell can take up to 1 min to run
chicago_full['Datetime'] = pd.to_datetime(chicago_full['Date'], format=date_format)
chicago_full = chicago_full.sort_values('Datetime')
chicago_full = chicago_full.set_index('Datetime')
chicago_full

Unnamed: 0_level_0,ID,Case Number,Date,Block,IUCR,Primary Type,Description,Location Description,Arrest,Domestic,Beat,District,Ward,Community Area,FBI Code,X Coordinate,Y Coordinate,Year,Updated On,Latitude,Longitude,Location
Datetime,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1,Unnamed: 22_level_1
2001-01-01 00:00:00,9755072,HX403533,1/1/2001 0:00,031XX W DOUGLAS BLVD,1562,SEX OFFENSE,AGG CRIMINAL SEXUAL ABUSE,CHURCH/SYNAGOGUE/PLACE OF WORSHIP,False,False,1022,10.00,24.00,29.00,17,,,2001,8/17/2015 15:03,,,
2001-01-01 00:00:00,9755147,HX403543,1/1/2001 0:00,031XX W DOUGLAS BLVD,1562,SEX OFFENSE,AGG CRIMINAL SEXUAL ABUSE,CHURCH/SYNAGOGUE/PLACE OF WORSHIP,False,False,1022,10.00,24.00,29.00,17,,,2001,8/17/2015 15:03,,,
2001-01-01 00:00:00,11950657,JD114742,1/1/2001 0:00,061XX S FAIRFIELD AVE,1753,OFFENSE INVOLVING CHILDREN,SEXUAL ASSAULT OF CHILD BY FAMILY MEMBER,RESIDENCE,True,True,825,8.00,16.00,66.00,2,,,2001,11/7/2020 15:51,,,
2001-01-01 00:00:00,1311351,G002096,1/1/2001 0:00,003XX W 40 PL,620,BURGLARY,UNLAWFUL ENTRY,FACTORY/MANUFACTURING BUILDING,False,False,925,9.00,,,5,,,2001,8/17/2015 15:03,,,
2001-01-01 00:00:00,11513580,JB524424,1/1/2001 0:00,030XX W WARREN BLVD,1753,OFFENSE INVOLVING CHILDREN,SEXUAL ASSAULT OF CHILD BY FAMILY MEMBER,RESIDENCE,True,True,1222,12.00,27.00,27.00,2,,,2001,7/1/2023 16:45,,,
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
2023-06-29 23:30:00,13124495,JG321816,6/29/2023 23:30,008XX N HOMAN AVE,910,MOTOR VEHICLE THEFT,AUTOMOBILE,STREET,False,False,1121,11.00,27.00,23.00,7,1153562.00,1905270.00,2023,7/6/2023 16:48,41.90,-87.71,"(41.895901406, -87.71144781)"
2023-06-29 23:33:00,13124229,JG321563,6/29/2023 23:33,033XX W FULLERTON AVE,486,BATTERY,DOMESTIC BATTERY SIMPLE,SIDEWALK,True,True,1413,14.00,35.00,22.00,08B,1153528.00,1915754.00,2023,7/6/2023 16:48,41.92,-87.71,"(41.924671116, -87.711293436)"
2023-06-29 23:45:00,13124350,JG321720,6/29/2023 23:45,041XX W ARTHINGTON ST,920,MOTOR VEHICLE THEFT,ATTEMPT - AUTOMOBILE,STREET,False,False,1132,11.00,24.00,26.00,7,1148938.00,1895793.00,2023,7/6/2023 16:48,41.87,-87.73,"(41.86998618, -87.728676148)"
2023-06-29 23:50:00,13124208,JG321567,6/29/2023 23:50,036XX W IRVING PARK RD,502P,OTHER OFFENSE,FALSE / STOLEN / ALTERED TRP,STREET,False,False,1723,17.00,45.00,16.00,26,1151492.00,1926339.00,2023,7/6/2023 16:48,41.95,-87.72,"(41.953757479, -87.718495677)"


In [9]:
(chicago_full.isna().sum()/len(chicago_full)).round(2)

ID                     0.00
Case Number            0.00
Date                   0.00
Block                  0.00
IUCR                   0.00
Primary Type           0.00
Description            0.00
Location Description   0.01
Arrest                 0.00
Domestic               0.00
Beat                   0.00
District               0.00
Ward                   0.02
Community Area         0.02
FBI Code               0.00
X Coordinate           0.08
Y Coordinate           0.08
Year                   0.00
Updated On             0.00
Latitude               0.08
Longitude              0.08
Location               0.08
dtype: float64

## Separate the Full Dataset by Years

In [10]:
# save the years for every crime
chicago_full["Year"] = chicago_full.index.year
chicago_full["Year"] = chicago_full["Year"].astype(str)
chicago_full["Year"].value_counts()

2016    268970
2017    268358
2015    218052
2018    142263
2002     20800
2008     12282
2009     11501
2003      9405
2001      9128
2005      9057
2022      8066
2021      8028
2006      7912
2004      7710
2007      6713
2014      6274
2020      6032
2010      5214
2019      5087
2013      5043
2012      5041
2011      4991
2023      2648
Name: Year, dtype: int64

In [11]:
## Dropping unneeded columns to reduce file size
drop_cols = ["X Coordinate","Y Coordinate", "Community Area","FBI Code",
             "Case Number","Updated On",'Block','Location','IUCR']

In [12]:
# save final df
chicago_final = chicago_full.drop(columns=drop_cols).sort_index()#.reset_index()
chicago_final

Unnamed: 0_level_0,ID,Date,Primary Type,Description,Location Description,Arrest,Domestic,Beat,District,Ward,Year,Latitude,Longitude
Datetime,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1
2001-01-01 00:00:00,9755072,1/1/2001 0:00,SEX OFFENSE,AGG CRIMINAL SEXUAL ABUSE,CHURCH/SYNAGOGUE/PLACE OF WORSHIP,False,False,1022,10.00,24.00,2001,,
2001-01-01 00:00:00,9755147,1/1/2001 0:00,SEX OFFENSE,AGG CRIMINAL SEXUAL ABUSE,CHURCH/SYNAGOGUE/PLACE OF WORSHIP,False,False,1022,10.00,24.00,2001,,
2001-01-01 00:00:00,11950657,1/1/2001 0:00,OFFENSE INVOLVING CHILDREN,SEXUAL ASSAULT OF CHILD BY FAMILY MEMBER,RESIDENCE,True,True,825,8.00,16.00,2001,,
2001-01-01 00:00:00,1311351,1/1/2001 0:00,BURGLARY,UNLAWFUL ENTRY,FACTORY/MANUFACTURING BUILDING,False,False,925,9.00,,2001,,
2001-01-01 00:00:00,11513580,1/1/2001 0:00,OFFENSE INVOLVING CHILDREN,SEXUAL ASSAULT OF CHILD BY FAMILY MEMBER,RESIDENCE,True,True,1222,12.00,27.00,2001,,
...,...,...,...,...,...,...,...,...,...,...,...,...,...
2023-06-29 23:30:00,13124495,6/29/2023 23:30,MOTOR VEHICLE THEFT,AUTOMOBILE,STREET,False,False,1121,11.00,27.00,2023,41.90,-87.71
2023-06-29 23:33:00,13124229,6/29/2023 23:33,BATTERY,DOMESTIC BATTERY SIMPLE,SIDEWALK,True,True,1413,14.00,35.00,2023,41.92,-87.71
2023-06-29 23:45:00,13124350,6/29/2023 23:45,MOTOR VEHICLE THEFT,ATTEMPT - AUTOMOBILE,STREET,False,False,1132,11.00,24.00,2023,41.87,-87.73
2023-06-29 23:50:00,13124208,6/29/2023 23:50,OTHER OFFENSE,FALSE / STOLEN / ALTERED TRP,STREET,False,False,1723,17.00,45.00,2023,41.95,-87.72


In [13]:
chicago_final.info()

<class 'pandas.core.frame.DataFrame'>
DatetimeIndex: 1048575 entries, 2001-01-01 00:00:00 to 2023-06-29 23:57:00
Data columns (total 13 columns):
 #   Column                Non-Null Count    Dtype  
---  ------                --------------    -----  
 0   ID                    1048575 non-null  int64  
 1   Date                  1048575 non-null  object 
 2   Primary Type          1048575 non-null  object 
 3   Description           1048575 non-null  object 
 4   Location Description  1042634 non-null  object 
 5   Arrest                1048575 non-null  bool   
 6   Domestic              1048575 non-null  bool   
 7   Beat                  1048575 non-null  int64  
 8   District              1048574 non-null  float64
 9   Ward                  1031879 non-null  float64
 10  Year                  1048575 non-null  object 
 11  Latitude              960711 non-null   float64
 12  Longitude             960711 non-null   float64
dtypes: bool(2), float64(4), int64(2), object(5)
memory usa

In [14]:
chicago_final.memory_usage(deep=True).astype(float)

Index                   8,388,600.00
ID                      8,388,600.00
Date                   75,110,792.00
Primary Type           70,452,524.00
Description            77,176,005.00
Location Description   71,484,470.00
Arrest                  1,048,575.00
Domestic                1,048,575.00
Beat                    8,388,600.00
District                8,388,600.00
Ward                    8,388,600.00
Year                   63,963,075.00
Latitude                8,388,600.00
Longitude               8,388,600.00
dtype: float64

In [15]:
# unique # of year bins
year_bins = chicago_final['Year'].astype(str).unique()
year_bins

array(['2001', '2002', '2003', '2004', '2005', '2006', '2007', '2008',
       '2009', '2010', '2011', '2012', '2013', '2014', '2015', '2016',
       '2017', '2018', '2019', '2020', '2021', '2022', '2023'],
      dtype=object)

In [16]:
FINAL_DROP = ['Datetime','Year']#,'Location Description']

In [17]:
## set save location 

os.makedirs(OUTPUT_FOLDER, exist_ok=True)
print(f"[i] Saving .csv's to {OUTPUT_FOLDER}")
## loop through years
for year in year_bins:
    
    ## save temp slices of dfs to save.
    temp_df = chicago_final.loc[ year]
    temp_df = temp_df.reset_index(drop=False)
    temp_df = temp_df.drop(columns=FINAL_DROP)

    # save as csv to output folder
    fname_temp = f"{OUTPUT_FOLDER}Chicago-Crime_{year}.csv"#.gz
    temp_df.to_csv(fname_temp,index=False)

    print(f"- Succesfully saved {fname_temp}")

[i] Saving .csv's to Data/Chicago/
- Succesfully saved Data/Chicago/Chicago-Crime_2001.csv
- Succesfully saved Data/Chicago/Chicago-Crime_2002.csv
- Succesfully saved Data/Chicago/Chicago-Crime_2003.csv
- Succesfully saved Data/Chicago/Chicago-Crime_2004.csv
- Succesfully saved Data/Chicago/Chicago-Crime_2005.csv
- Succesfully saved Data/Chicago/Chicago-Crime_2006.csv
- Succesfully saved Data/Chicago/Chicago-Crime_2007.csv
- Succesfully saved Data/Chicago/Chicago-Crime_2008.csv
- Succesfully saved Data/Chicago/Chicago-Crime_2009.csv
- Succesfully saved Data/Chicago/Chicago-Crime_2010.csv
- Succesfully saved Data/Chicago/Chicago-Crime_2011.csv
- Succesfully saved Data/Chicago/Chicago-Crime_2012.csv
- Succesfully saved Data/Chicago/Chicago-Crime_2013.csv
- Succesfully saved Data/Chicago/Chicago-Crime_2014.csv
- Succesfully saved Data/Chicago/Chicago-Crime_2015.csv
- Succesfully saved Data/Chicago/Chicago-Crime_2016.csv
- Succesfully saved Data/Chicago/Chicago-Crime_2017.csv
- Succesfully

In [18]:
saved_files = sorted(glob.glob(OUTPUT_FOLDER+'*.*csv'))
saved_files

['Data/Chicago\\Chicago-Crime_2001.csv',
 'Data/Chicago\\Chicago-Crime_2002.csv',
 'Data/Chicago\\Chicago-Crime_2003.csv',
 'Data/Chicago\\Chicago-Crime_2004.csv',
 'Data/Chicago\\Chicago-Crime_2005.csv',
 'Data/Chicago\\Chicago-Crime_2006.csv',
 'Data/Chicago\\Chicago-Crime_2007.csv',
 'Data/Chicago\\Chicago-Crime_2008.csv',
 'Data/Chicago\\Chicago-Crime_2009.csv',
 'Data/Chicago\\Chicago-Crime_2010.csv',
 'Data/Chicago\\Chicago-Crime_2011.csv',
 'Data/Chicago\\Chicago-Crime_2012.csv',
 'Data/Chicago\\Chicago-Crime_2013.csv',
 'Data/Chicago\\Chicago-Crime_2014.csv',
 'Data/Chicago\\Chicago-Crime_2015.csv',
 'Data/Chicago\\Chicago-Crime_2016.csv',
 'Data/Chicago\\Chicago-Crime_2017.csv',
 'Data/Chicago\\Chicago-Crime_2018.csv',
 'Data/Chicago\\Chicago-Crime_2019.csv',
 'Data/Chicago\\Chicago-Crime_2020.csv',
 'Data/Chicago\\Chicago-Crime_2021.csv',
 'Data/Chicago\\Chicago-Crime_2022.csv',
 'Data/Chicago\\Chicago-Crime_2023.csv']

In [19]:
## create a README.txt for the zip files
readme = """Source URL: 
- https://data.cityofchicago.org/Public-Safety/Crimes-2001-to-Present/ijzp-q8t2
- Filtered for years 2000-Present.

Downloaded 07/18/2022
- Files are split into 1 year per file.

EXAMPLE USAGE:
>> import glob
>> import pandas as pd
>> folder = "Data/Chicago/"
>> crime_files = sorted(glob.glob(folder+"*.csv"))
>> df = pd.concat([pd.read_csv(f) for f in crime_files])
"""
print(readme)


with open(f"{OUTPUT_FOLDER}README.txt",'w') as f:
    f.write(readme)

Source URL: 
- https://data.cityofchicago.org/Public-Safety/Crimes-2001-to-Present/ijzp-q8t2
- Filtered for years 2000-Present.

Downloaded 07/18/2022
- Files are split into 1 year per file.

EXAMPLE USAGE:
>> import glob
>> import pandas as pd
>> folder = "Data/Chicago/"
>> crime_files = sorted(glob.glob(folder+"*.csv"))
>> df = pd.concat([pd.read_csv(f) for f in crime_files])



## Confirmation

- Follow the example usage above to test if your files were created successfully.

In [20]:
# get list of files from folder
crime_files = sorted(glob.glob(OUTPUT_FOLDER+"*.csv"))
df = pd.concat([pd.read_csv(f) for f in crime_files])
df

Unnamed: 0,ID,Date,Primary Type,Description,Location Description,Arrest,Domestic,Beat,District,Ward,Latitude,Longitude
0,9755072,1/1/2001 0:00,SEX OFFENSE,AGG CRIMINAL SEXUAL ABUSE,CHURCH/SYNAGOGUE/PLACE OF WORSHIP,False,False,1022,10.00,24.00,,
1,9755147,1/1/2001 0:00,SEX OFFENSE,AGG CRIMINAL SEXUAL ABUSE,CHURCH/SYNAGOGUE/PLACE OF WORSHIP,False,False,1022,10.00,24.00,,
2,11950657,1/1/2001 0:00,OFFENSE INVOLVING CHILDREN,SEXUAL ASSAULT OF CHILD BY FAMILY MEMBER,RESIDENCE,True,True,825,8.00,16.00,,
3,1311351,1/1/2001 0:00,BURGLARY,UNLAWFUL ENTRY,FACTORY/MANUFACTURING BUILDING,False,False,925,9.00,,,
4,11513580,1/1/2001 0:00,OFFENSE INVOLVING CHILDREN,SEXUAL ASSAULT OF CHILD BY FAMILY MEMBER,RESIDENCE,True,True,1222,12.00,27.00,,
...,...,...,...,...,...,...,...,...,...,...,...,...
2643,13124495,6/29/2023 23:30,MOTOR VEHICLE THEFT,AUTOMOBILE,STREET,False,False,1121,11.00,27.00,41.90,-87.71
2644,13124229,6/29/2023 23:33,BATTERY,DOMESTIC BATTERY SIMPLE,SIDEWALK,True,True,1413,14.00,35.00,41.92,-87.71
2645,13124350,6/29/2023 23:45,MOTOR VEHICLE THEFT,ATTEMPT - AUTOMOBILE,STREET,False,False,1132,11.00,24.00,41.87,-87.73
2646,13124208,6/29/2023 23:50,OTHER OFFENSE,FALSE / STOLEN / ALTERED TRP,STREET,False,False,1723,17.00,45.00,41.95,-87.72


In [21]:
years = df['Date'].map(lambda x: x.split()[0].split('/')[-1])
years.value_counts().sort_index()

2001      9128
2002     20800
2003      9405
2004      7710
2005      9057
2006      7912
2007      6713
2008     12282
2009     11501
2010      5214
2011      4991
2012      5041
2013      5043
2014      6274
2015    218052
2016    268970
2017    268358
2018    142263
2019      5087
2020      6032
2021      8028
2022      8066
2023      2648
Name: Date, dtype: int64

## Summary

- The chicago crime dataset has now been saved to your repository as csv files. 
- You should save your notebook, commit your work and push to GitHub using GitHub desktop.