# **Collect Crime and Weather Datasets for August 2023 and Mearge with January 2010 through July 2023 Datasets**
* Create features 
    * 'week' 
    * 'month' 
    * 'year'
    * 'mon_year' 
    * 'season' 
    * 'is_holiday'
    * 'is_weekend'
* Sum 'Offense Count' by the aggregated 'NIBRS Class', and by 'RMSOccurrenceDate'
* Combined it with the weather dataset
* Merged with the dataset from January 2010 through July 2023
* Saved the daily total as 'daily crime numbers and weather data for time series analysis_082003.csv'
* Saved the crime dataset as 'all_crime_features_jan2010_aug2023_w_nibrs_class.csv'

## **Import Module**

In [1]:
#### Import the libraries needed
import pandas as pd
import numpy as np

import warnings
# warnings.simplefilter('ignore', ConvergenceWarning)
warnings.filterwarnings('ignore')
%matplotlib inline

## **Set Environment**

In [2]:
# Set pd.options to add slide bars
pd.set_option('display.max_rows', None)
pd.set_option('display.max_columns', None)
# pd.set_option('display.max_rows', 10)
# pd.set_option('display.max_columns', 10)

## **Load Data**

* Load "NIBRSPublicViewAug23.xlsx"

In [3]:
df_crime_aug2023 = pd.read_excel("data/NIBRSPublicViewAug23.xlsx", 
                   parse_dates=['RMSOccurrenceDate'], 
                   index_col='RMSOccurrenceDate',
                   dtype={'RMSOccurrenceHour': int, 'NIBRSClass': str, 'OffenseCount': int, 'ZIPCode': str})

In [4]:
# Extract data for August
df_crime_aug2023 = df_crime_aug2023.loc['2023-08-01':, ]

In [5]:
df_crime_aug2023.head(3)

Unnamed: 0_level_0,Incident,RMSOccurrenceHour,NIBRSClass,NIBRSDescription,OffenseCount,Beat,Premise,StreetNo,StreetName,StreetType,Suffix,City,ZIPCode,MapLongitude,MapLatitude
RMSOccurrenceDate,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1
2023-08-01,109054223,7,90J,Trespass of real property,1,150000000000.0,Park/Playground,8800,MULLINS,DR,,HOUSTON,77096,-95.486595,29.685225
2023-08-01,109587223,9,290,"Destruction, damage, vandalism",1,150000000000.0,"Residence, Home (Includes Apartment)",5900,BRAESWOOD,BLVD,N,HOUSTON,77074,-95.494935,29.679116
2023-08-01,110834923,9,23C,Shoplifting,1,150000000000.0,"Department, Discount Store",300,MEYERLAND PLAZA,,,HOUSTON,77096,-95.461402,29.686654


In [6]:
df_crime_aug2023.tail(3)

Unnamed: 0_level_0,Incident,RMSOccurrenceHour,NIBRSClass,NIBRSDescription,OffenseCount,Beat,Premise,StreetNo,StreetName,StreetType,Suffix,City,ZIPCode,MapLongitude,MapLatitude
RMSOccurrenceDate,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1
2023-08-31,124835223,15,90Z,All other offenses,1,HCSO,"Residence, Home (Includes Apartment)",9100,MILLS,RD,,HARRIS CO,77070.0,-95.555181,29.955979
2023-08-31,124980523,20,290,"Destruction, damage, vandalism",1,OOJ,"Parking Lot, Garage",1415,GUADALUPE,RD,,STAFFORD,77477.0,-95.55546,29.595597
2023-08-31,124913123,18,35A,"Drug, narcotic violations",1,,"Highway, Road, Street, Alley",13000,OLD HUMBLE,RD,,HARRIS CO,,,


In [7]:
df_crime_aug2023.shape

(20298, 15)

In [8]:
df_crime_aug2023.info()

<class 'pandas.core.frame.DataFrame'>
DatetimeIndex: 20298 entries, 2023-08-01 to 2023-08-31
Data columns (total 15 columns):
 #   Column             Non-Null Count  Dtype  
---  ------             --------------  -----  
 0   Incident           20298 non-null  int64  
 1   RMSOccurrenceHour  20298 non-null  int32  
 2   NIBRSClass         20298 non-null  object 
 3   NIBRSDescription   20298 non-null  object 
 4   OffenseCount       20298 non-null  int32  
 5   Beat               20288 non-null  object 
 6   Premise            20298 non-null  object 
 7   StreetNo           20231 non-null  object 
 8   StreetName         20298 non-null  object 
 9   StreetType         18866 non-null  object 
 10  Suffix             3091 non-null   object 
 11  City               20298 non-null  object 
 12  ZIPCode            20027 non-null  object 
 13  MapLongitude       20079 non-null  float64
 14  MapLatitude        20079 non-null  float64
dtypes: float64(2), int32(2), int64(1), object(10)
memory 

* Load 'tx 2023-08-01 to 2023-10-12.csv'

In [9]:
# # Check weather dataset
df_weather_aug2023 = pd.read_csv('data/houston, tx 2023-08-01 to 2023-10-12.csv', parse_dates=['datetime'], index_col='datetime')
df_weather_aug2023.drop('name', axis=1, inplace=True)
df_weather_aug2023 = df_weather_aug2023.loc['2023-08-01':'2023-08-31', ]

In [10]:
df_weather_aug2023.drop('severerisk', axis=1, inplace=True)
df_weather_aug2023.head(3)

Unnamed: 0_level_0,tempmax,tempmin,temp,feelslikemax,feelslikemin,feelslike,dew,humidity,precip,precipprob,precipcover,preciptype,snow,snowdepth,windgust,windspeed,winddir,sealevelpressure,cloudcover,visibility,solarradiation,solarenergy,uvindex,sunrise,sunset,moonphase,conditions,description,icon,stations
datetime,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1,Unnamed: 22_level_1,Unnamed: 23_level_1,Unnamed: 24_level_1,Unnamed: 25_level_1,Unnamed: 26_level_1,Unnamed: 27_level_1,Unnamed: 28_level_1,Unnamed: 29_level_1,Unnamed: 30_level_1
2023-08-01,99.6,82.4,88.7,106.5,89.6,96.6,73.4,63.0,0.0,0,0.0,,0,0,21.9,14.4,193.8,1016.5,38.6,9.9,186.7,16.2,9,2023-08-01T06:40:54,2023-08-01T20:14:28,0.5,Partially cloudy,Partly cloudy throughout the day.,partly-cloudy-day,"KHOU,72059400188,KIAH,KMCJ,72244012918,7224301..."
2023-08-02,98.9,80.4,88.4,102.4,85.7,94.4,71.6,61.2,0.0,0,0.0,,0,0,23.0,17.6,164.5,1016.4,21.5,9.9,217.7,18.6,9,2023-08-02T06:41:30,2023-08-02T20:13:44,0.53,Partially cloudy,Partly cloudy throughout the day.,partly-cloudy-day,"KHOU,72059400188,KIAH,KMCJ,72244012918,7224301..."
2023-08-03,98.5,82.2,88.1,107.2,90.5,96.8,74.7,66.7,0.0,0,0.0,,0,0,31.7,19.7,174.1,1015.3,19.6,9.9,202.8,17.5,10,2023-08-03T06:42:06,2023-08-03T20:12:59,0.57,Clear,Clear conditions throughout the day.,clear-day,"KHOU,72059400188,KIAH,KMCJ,72244012918,7224301..."


In [11]:
df_weather_aug2023.shape

(31, 30)

In [12]:
df_weather_aug2023.info()

<class 'pandas.core.frame.DataFrame'>
DatetimeIndex: 31 entries, 2023-08-01 to 2023-08-31
Data columns (total 30 columns):
 #   Column            Non-Null Count  Dtype  
---  ------            --------------  -----  
 0   tempmax           31 non-null     float64
 1   tempmin           31 non-null     float64
 2   temp              31 non-null     float64
 3   feelslikemax      31 non-null     float64
 4   feelslikemin      31 non-null     float64
 5   feelslike         31 non-null     float64
 6   dew               31 non-null     float64
 7   humidity          31 non-null     float64
 8   precip            31 non-null     float64
 9   precipprob        31 non-null     int64  
 10  precipcover       31 non-null     float64
 11  preciptype        2 non-null      object 
 12  snow              31 non-null     int64  
 13  snowdepth         31 non-null     int64  
 14  windgust          31 non-null     float64
 15  windspeed         31 non-null     float64
 16  winddir           31 non-n

## **Feature Engineering**

* Create features 
    * 'week' 
    * 'month' 
    * 'year'
    * 'mon_year' 
    * 'season' 
    * 'is_holiday'
    * 'is_weekend'

In [13]:
# Weekly average 'Offense Count'
df_crime_aug2023['week'] = df_crime_aug2023.index.map(lambda m: m.day_name())

# Monthly average 'Offense Count'
df_crime_aug2023['month'] = df_crime_aug2023.index.map(lambda m: m.month)

# Yearly average 'Offense Count'
df_crime_aug2023['year']= df_crime_aug2023.index.map(lambda m: m.year)

# Monthly average 'Offense Count'
df_crime_aug2023['mon_year'] = df_crime_aug2023.index.map(lambda m: m.strftime('%b-%Y'))

# Average 'Offense Count' by season
df_crime_aug2023['season'] = df_crime_aug2023.index.map(lambda m: (m.month % 12) // 3 + 1)
df_crime_aug2023['season'] =df_crime_aug2023['season'].map({1: 'Winter', 2: 'Spring', 3: 'Summer', 4: 'Autumn'})

In [14]:
# importing holidays module
import holidays

In [15]:
# Create holiday object for Texas, USA for the years 2010 through 2023
us_tx_holidays = holidays.US(years=range(2010, 2024), state='TX')

In [16]:
# Extract only federal holiday dates
fed_holidays = ["New Year's Day", "Independence Day", "Thanksgiving", "Martin Luther King Jr. Day",
                "Washingtons Birthday", "Memorial Day", "Labor Day", "Columbus Day", "Veterans Day",
                "Friday After Thanksgiving", "Christmas Day", "New Year's Day (Observed)", "Independence Day (Observed)",
                "Thanksgiving (Observed)", "Martin Luther King Jr. Day (Observed)", "Washingtons Birthday (Observed)",
                "Memorial Day (Observed)", "Labor Day (Observed)", "Columbus Day (Observed)", "Veterans Day (Observed)",
                "Friday After Thanksgiving (Observed)", "Christmas Day (Observed)"]

holidays_dates = [k for k, v in us_tx_holidays.items() if v in fed_holidays]

In [17]:
# Check if Occurence Date falls into holidays
result = [1 if day.date() in holidays_dates else 0 for day in df_crime_aug2023.index]

# Create a new variable 'is_holiday'
df_crime_aug2023['is_holiday'] = result

In [18]:
# Create a variable 'is_weekend'
is_weekend = [1 if day.isoweekday() > 5 else 0 for day in df_crime_aug2023.index]
df_crime_aug2023['is_weekend'] = is_weekend

* Aggregate 'cleaned_class' to its parent level (Ex, 13A, 13B, 13C --> 13)
    * 'aggregate_cleaned_class'

In [19]:
aggregate_cleaned_class = []
for code in df_crime_aug2023['NIBRSClass']:
    parent = ""
    for i in range(len(code)):
        if code[i].isnumeric():
            parent += code[i]
    if parent == '90':
        parent = code
    aggregate_cleaned_class.append(parent) 

In [20]:
# Add 'aggregate_cleaned_class' to all_df
df_crime_aug2023['aggregate_cleaned_class'] = aggregate_cleaned_class

In [21]:
df_crime_aug2023.head(3)

Unnamed: 0_level_0,Incident,RMSOccurrenceHour,NIBRSClass,NIBRSDescription,OffenseCount,Beat,Premise,StreetNo,StreetName,StreetType,Suffix,City,ZIPCode,MapLongitude,MapLatitude,week,month,year,mon_year,season,is_holiday,is_weekend,aggregate_cleaned_class
RMSOccurrenceDate,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1,Unnamed: 22_level_1,Unnamed: 23_level_1
2023-08-01,109054223,7,90J,Trespass of real property,1,150000000000.0,Park/Playground,8800,MULLINS,DR,,HOUSTON,77096,-95.486595,29.685225,Tuesday,8,2023,Aug-2023,Summer,0,0,90J
2023-08-01,109587223,9,290,"Destruction, damage, vandalism",1,150000000000.0,"Residence, Home (Includes Apartment)",5900,BRAESWOOD,BLVD,N,HOUSTON,77074,-95.494935,29.679116,Tuesday,8,2023,Aug-2023,Summer,0,0,290
2023-08-01,110834923,9,23C,Shoplifting,1,150000000000.0,"Department, Discount Store",300,MEYERLAND PLAZA,,,HOUSTON,77096,-95.461402,29.686654,Tuesday,8,2023,Aug-2023,Summer,0,0,23


In [22]:
df_crime_aug2023.tail(3)

Unnamed: 0_level_0,Incident,RMSOccurrenceHour,NIBRSClass,NIBRSDescription,OffenseCount,Beat,Premise,StreetNo,StreetName,StreetType,Suffix,City,ZIPCode,MapLongitude,MapLatitude,week,month,year,mon_year,season,is_holiday,is_weekend,aggregate_cleaned_class
RMSOccurrenceDate,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1,Unnamed: 22_level_1,Unnamed: 23_level_1
2023-08-31,124835223,15,90Z,All other offenses,1,HCSO,"Residence, Home (Includes Apartment)",9100,MILLS,RD,,HARRIS CO,77070.0,-95.555181,29.955979,Thursday,8,2023,Aug-2023,Summer,0,0,90Z
2023-08-31,124980523,20,290,"Destruction, damage, vandalism",1,OOJ,"Parking Lot, Garage",1415,GUADALUPE,RD,,STAFFORD,77477.0,-95.55546,29.595597,Thursday,8,2023,Aug-2023,Summer,0,0,290
2023-08-31,124913123,18,35A,"Drug, narcotic violations",1,,"Highway, Road, Street, Alley",13000,OLD HUMBLE,RD,,HARRIS CO,,,,Thursday,8,2023,Aug-2023,Summer,0,0,35


## **Create Daily Total Offense Count Dataset**

In [23]:
# Extract 'aggregate_cleaned_class' and 'OffenseCount'
df_crime_aug2023_daily_total = df_crime_aug2023.loc[:, ['aggregate_cleaned_class', 'OffenseCount']]

In [24]:
# Get daily 'OffenseCount' total by the aggregated NIBRSClass 
df_crime_aug2023_daily_total = df_crime_aug2023_daily_total.groupby([df_crime_aug2023_daily_total.index, 'aggregate_cleaned_class'])['OffenseCount'].sum().reset_index()

In [25]:
# Create a pivot table with NIBRS class
df_crime_aug2023_daily_total = df_crime_aug2023_daily_total.pivot(index='RMSOccurrenceDate', columns='aggregate_cleaned_class', values='OffenseCount')

In [26]:
df_crime_aug2023_daily_total = df_crime_aug2023_daily_total.fillna(0)

In [27]:
df_crime_aug2023_daily_total.head()

aggregate_cleaned_class,09,100,11,120,13,200,210,220,23,240,250,26,270,280,290,35,36,370,40,510,520,64,720,90B,90C,90D,90E,90F,90G,90I,90J,90Z
RMSOccurrenceDate,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1,Unnamed: 22_level_1,Unnamed: 23_level_1,Unnamed: 24_level_1,Unnamed: 25_level_1,Unnamed: 26_level_1,Unnamed: 27_level_1,Unnamed: 28_level_1,Unnamed: 29_level_1,Unnamed: 30_level_1,Unnamed: 31_level_1,Unnamed: 32_level_1
2023-08-01,0.0,0.0,4.0,13.0,157.0,0.0,2.0,28.0,209.0,69.0,13.0,17.0,1.0,4.0,74.0,17.0,0.0,3.0,6.0,0.0,7.0,1.0,1.0,0.0,7.0,5.0,0.0,1.0,0.0,0.0,27.0,36.0
2023-08-02,2.0,1.0,4.0,15.0,151.0,0.0,0.0,30.0,194.0,81.0,9.0,13.0,0.0,1.0,72.0,20.0,0.0,0.0,3.0,0.0,4.0,0.0,2.0,0.0,5.0,7.0,0.0,1.0,0.0,0.0,31.0,57.0
2023-08-03,0.0,0.0,6.0,11.0,152.0,1.0,0.0,29.0,193.0,56.0,7.0,24.0,0.0,1.0,55.0,21.0,0.0,1.0,0.0,0.0,6.0,0.0,1.0,0.0,8.0,8.0,0.0,1.0,1.0,0.0,23.0,49.0
2023-08-04,1.0,0.0,4.0,17.0,161.0,0.0,0.0,39.0,212.0,65.0,6.0,27.0,1.0,2.0,52.0,19.0,0.0,1.0,1.0,0.0,6.0,0.0,2.0,0.0,8.0,15.0,0.0,0.0,0.0,0.0,32.0,58.0
2023-08-05,1.0,2.0,9.0,21.0,203.0,3.0,0.0,22.0,185.0,66.0,3.0,11.0,0.0,1.0,69.0,21.0,0.0,1.0,0.0,0.0,11.0,0.0,2.0,0.0,4.0,18.0,0.0,2.0,0.0,0.0,21.0,51.0


In [28]:
# Load Label_meaning_crime_code_offense categories.csv
label_names = pd.read_csv('Label_meaning_crime_code_offense categories.csv', index_col='aggregate_cleaned_class', dtype={'aggregate_cleaned_class': str})
label_names.rename(index={'9': '09'}, inplace=True)

In [29]:
# Create a list of column mapped NIBRSClass to categories
offense_categories = [label_names.loc[idx, 'offense categories'] for idx in df_crime_aug2023_daily_total.columns]

In [30]:
# Rename codes of NIBRSClass to label categories
df_crime_aug2023_daily_total.columns = offense_categories

In [31]:
# Create 'Offense Count' to hold the daily total
df_crime_aug2023_daily_total['Offense Count'] = df_crime_aug2023_daily_total.sum(axis=1)

In [32]:
df_crime_aug2023_daily_total.head()

Unnamed: 0_level_0,Homicide Offenses,Kidnapping/Abduction,"Sex Offenses, Forcible",Robbery,Assault Offenses,Arson,Extortion/Blackmail,Burglary/Breaking & Entering,Larceny/Theft Offenses,Motor Vehicle Theft,Counterfeiting/Forgery,Fraud Offenses,Embezzlement,Stolen Property Offenses,Destruction/Damage/Vandalism of Property,Drug/Narcotic Offenses,"Sex Offenses, Nonforcible",Pornography/Obscene Material,Prostitution Offenses,Bribery,Weapon Law Violations,Human Trafficking or Kidnapping/Abduction,Animal Cruelty,Curfew/Loitering/Vagrancy Violations,Disorderly Conduct,Driving Under the Influence,Drunkenness,"Family Offenses, Nonviolent",Liquor Law Violations,Runaway,Trespass of Real Property,All Other Offenses,Offense Count
RMSOccurrenceDate,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1,Unnamed: 22_level_1,Unnamed: 23_level_1,Unnamed: 24_level_1,Unnamed: 25_level_1,Unnamed: 26_level_1,Unnamed: 27_level_1,Unnamed: 28_level_1,Unnamed: 29_level_1,Unnamed: 30_level_1,Unnamed: 31_level_1,Unnamed: 32_level_1,Unnamed: 33_level_1
2023-08-01,0.0,0.0,4.0,13.0,157.0,0.0,2.0,28.0,209.0,69.0,13.0,17.0,1.0,4.0,74.0,17.0,0.0,3.0,6.0,0.0,7.0,1.0,1.0,0.0,7.0,5.0,0.0,1.0,0.0,0.0,27.0,36.0,702.0
2023-08-02,2.0,1.0,4.0,15.0,151.0,0.0,0.0,30.0,194.0,81.0,9.0,13.0,0.0,1.0,72.0,20.0,0.0,0.0,3.0,0.0,4.0,0.0,2.0,0.0,5.0,7.0,0.0,1.0,0.0,0.0,31.0,57.0,703.0
2023-08-03,0.0,0.0,6.0,11.0,152.0,1.0,0.0,29.0,193.0,56.0,7.0,24.0,0.0,1.0,55.0,21.0,0.0,1.0,0.0,0.0,6.0,0.0,1.0,0.0,8.0,8.0,0.0,1.0,1.0,0.0,23.0,49.0,654.0
2023-08-04,1.0,0.0,4.0,17.0,161.0,0.0,0.0,39.0,212.0,65.0,6.0,27.0,1.0,2.0,52.0,19.0,0.0,1.0,1.0,0.0,6.0,0.0,2.0,0.0,8.0,15.0,0.0,0.0,0.0,0.0,32.0,58.0,729.0
2023-08-05,1.0,2.0,9.0,21.0,203.0,3.0,0.0,22.0,185.0,66.0,3.0,11.0,0.0,1.0,69.0,21.0,0.0,1.0,0.0,0.0,11.0,0.0,2.0,0.0,4.0,18.0,0.0,2.0,0.0,0.0,21.0,51.0,727.0


## **Merge1-1: Daily Total by NIBRS Class for Aug 2023 and Weather for Aug 2023**

* df_crime_aug2023_daily_total and df_weather_aug2023

In [33]:
df_crime_daily_total_weather_aug2023 = pd.concat([df_crime_aug2023_daily_total, df_weather_aug2023], axis=1)
df_crime_daily_total_weather_aug2023.head(3)

Unnamed: 0,Homicide Offenses,Kidnapping/Abduction,"Sex Offenses, Forcible",Robbery,Assault Offenses,Arson,Extortion/Blackmail,Burglary/Breaking & Entering,Larceny/Theft Offenses,Motor Vehicle Theft,Counterfeiting/Forgery,Fraud Offenses,Embezzlement,Stolen Property Offenses,Destruction/Damage/Vandalism of Property,Drug/Narcotic Offenses,"Sex Offenses, Nonforcible",Pornography/Obscene Material,Prostitution Offenses,Bribery,Weapon Law Violations,Human Trafficking or Kidnapping/Abduction,Animal Cruelty,Curfew/Loitering/Vagrancy Violations,Disorderly Conduct,Driving Under the Influence,Drunkenness,"Family Offenses, Nonviolent",Liquor Law Violations,Runaway,Trespass of Real Property,All Other Offenses,Offense Count,tempmax,tempmin,temp,feelslikemax,feelslikemin,feelslike,dew,humidity,precip,precipprob,precipcover,preciptype,snow,snowdepth,windgust,windspeed,winddir,sealevelpressure,cloudcover,visibility,solarradiation,solarenergy,uvindex,sunrise,sunset,moonphase,conditions,description,icon,stations
2023-08-01,0.0,0.0,4.0,13.0,157.0,0.0,2.0,28.0,209.0,69.0,13.0,17.0,1.0,4.0,74.0,17.0,0.0,3.0,6.0,0.0,7.0,1.0,1.0,0.0,7.0,5.0,0.0,1.0,0.0,0.0,27.0,36.0,702.0,99.6,82.4,88.7,106.5,89.6,96.6,73.4,63.0,0.0,0,0.0,,0,0,21.9,14.4,193.8,1016.5,38.6,9.9,186.7,16.2,9,2023-08-01T06:40:54,2023-08-01T20:14:28,0.5,Partially cloudy,Partly cloudy throughout the day.,partly-cloudy-day,"KHOU,72059400188,KIAH,KMCJ,72244012918,7224301..."
2023-08-02,2.0,1.0,4.0,15.0,151.0,0.0,0.0,30.0,194.0,81.0,9.0,13.0,0.0,1.0,72.0,20.0,0.0,0.0,3.0,0.0,4.0,0.0,2.0,0.0,5.0,7.0,0.0,1.0,0.0,0.0,31.0,57.0,703.0,98.9,80.4,88.4,102.4,85.7,94.4,71.6,61.2,0.0,0,0.0,,0,0,23.0,17.6,164.5,1016.4,21.5,9.9,217.7,18.6,9,2023-08-02T06:41:30,2023-08-02T20:13:44,0.53,Partially cloudy,Partly cloudy throughout the day.,partly-cloudy-day,"KHOU,72059400188,KIAH,KMCJ,72244012918,7224301..."
2023-08-03,0.0,0.0,6.0,11.0,152.0,1.0,0.0,29.0,193.0,56.0,7.0,24.0,0.0,1.0,55.0,21.0,0.0,1.0,0.0,0.0,6.0,0.0,1.0,0.0,8.0,8.0,0.0,1.0,1.0,0.0,23.0,49.0,654.0,98.5,82.2,88.1,107.2,90.5,96.8,74.7,66.7,0.0,0,0.0,,0,0,31.7,19.7,174.1,1015.3,19.6,9.9,202.8,17.5,10,2023-08-03T06:42:06,2023-08-03T20:12:59,0.57,Clear,Clear conditions throughout the day.,clear-day,"KHOU,72059400188,KIAH,KMCJ,72244012918,7224301..."


In [34]:
# Extract unique 'date', 'week', 'month', 'year', 'mon_year' , 'season' ,'is_holiday', 'is_weekend'
df_features_to_add = df_crime_aug2023.reset_index(names='date')[['date', 'week', 'month', 'year', 'mon_year' , 'season' ,'is_holiday', 'is_weekend']].drop_duplicates()
df_features_to_add = df_features_to_add.set_index('date')
df_features_to_add.head()

Unnamed: 0_level_0,week,month,year,mon_year,season,is_holiday,is_weekend
date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1
2023-08-01,Tuesday,8,2023,Aug-2023,Summer,0,0
2023-08-02,Wednesday,8,2023,Aug-2023,Summer,0,0
2023-08-03,Thursday,8,2023,Aug-2023,Summer,0,0
2023-08-04,Friday,8,2023,Aug-2023,Summer,0,0
2023-08-05,Saturday,8,2023,Aug-2023,Summer,0,1


In [35]:
# Merge df_crime_daily_total_weather_aug2023
df_crime_daily_total_weather_aug2023 = pd.concat([df_crime_daily_total_weather_aug2023, df_features_to_add], axis=1)
df_crime_daily_total_weather_aug2023.head()

Unnamed: 0,Homicide Offenses,Kidnapping/Abduction,"Sex Offenses, Forcible",Robbery,Assault Offenses,Arson,Extortion/Blackmail,Burglary/Breaking & Entering,Larceny/Theft Offenses,Motor Vehicle Theft,Counterfeiting/Forgery,Fraud Offenses,Embezzlement,Stolen Property Offenses,Destruction/Damage/Vandalism of Property,Drug/Narcotic Offenses,"Sex Offenses, Nonforcible",Pornography/Obscene Material,Prostitution Offenses,Bribery,Weapon Law Violations,Human Trafficking or Kidnapping/Abduction,Animal Cruelty,Curfew/Loitering/Vagrancy Violations,Disorderly Conduct,Driving Under the Influence,Drunkenness,"Family Offenses, Nonviolent",Liquor Law Violations,Runaway,Trespass of Real Property,All Other Offenses,Offense Count,tempmax,tempmin,temp,feelslikemax,feelslikemin,feelslike,dew,humidity,precip,precipprob,precipcover,preciptype,snow,snowdepth,windgust,windspeed,winddir,sealevelpressure,cloudcover,visibility,solarradiation,solarenergy,uvindex,sunrise,sunset,moonphase,conditions,description,icon,stations,week,month,year,mon_year,season,is_holiday,is_weekend
2023-08-01,0.0,0.0,4.0,13.0,157.0,0.0,2.0,28.0,209.0,69.0,13.0,17.0,1.0,4.0,74.0,17.0,0.0,3.0,6.0,0.0,7.0,1.0,1.0,0.0,7.0,5.0,0.0,1.0,0.0,0.0,27.0,36.0,702.0,99.6,82.4,88.7,106.5,89.6,96.6,73.4,63.0,0.0,0,0.0,,0,0,21.9,14.4,193.8,1016.5,38.6,9.9,186.7,16.2,9,2023-08-01T06:40:54,2023-08-01T20:14:28,0.5,Partially cloudy,Partly cloudy throughout the day.,partly-cloudy-day,"KHOU,72059400188,KIAH,KMCJ,72244012918,7224301...",Tuesday,8,2023,Aug-2023,Summer,0,0
2023-08-02,2.0,1.0,4.0,15.0,151.0,0.0,0.0,30.0,194.0,81.0,9.0,13.0,0.0,1.0,72.0,20.0,0.0,0.0,3.0,0.0,4.0,0.0,2.0,0.0,5.0,7.0,0.0,1.0,0.0,0.0,31.0,57.0,703.0,98.9,80.4,88.4,102.4,85.7,94.4,71.6,61.2,0.0,0,0.0,,0,0,23.0,17.6,164.5,1016.4,21.5,9.9,217.7,18.6,9,2023-08-02T06:41:30,2023-08-02T20:13:44,0.53,Partially cloudy,Partly cloudy throughout the day.,partly-cloudy-day,"KHOU,72059400188,KIAH,KMCJ,72244012918,7224301...",Wednesday,8,2023,Aug-2023,Summer,0,0
2023-08-03,0.0,0.0,6.0,11.0,152.0,1.0,0.0,29.0,193.0,56.0,7.0,24.0,0.0,1.0,55.0,21.0,0.0,1.0,0.0,0.0,6.0,0.0,1.0,0.0,8.0,8.0,0.0,1.0,1.0,0.0,23.0,49.0,654.0,98.5,82.2,88.1,107.2,90.5,96.8,74.7,66.7,0.0,0,0.0,,0,0,31.7,19.7,174.1,1015.3,19.6,9.9,202.8,17.5,10,2023-08-03T06:42:06,2023-08-03T20:12:59,0.57,Clear,Clear conditions throughout the day.,clear-day,"KHOU,72059400188,KIAH,KMCJ,72244012918,7224301...",Thursday,8,2023,Aug-2023,Summer,0,0
2023-08-04,1.0,0.0,4.0,17.0,161.0,0.0,0.0,39.0,212.0,65.0,6.0,27.0,1.0,2.0,52.0,19.0,0.0,1.0,1.0,0.0,6.0,0.0,2.0,0.0,8.0,15.0,0.0,0.0,0.0,0.0,32.0,58.0,729.0,100.4,82.3,88.4,106.7,90.7,97.0,74.3,65.8,0.0,0,0.0,,0,0,30.0,23.1,191.5,1013.9,26.5,9.9,206.0,17.8,9,2023-08-04T06:42:41,2023-08-04T20:12:12,0.6,Partially cloudy,Clearing in the afternoon.,partly-cloudy-day,"KHOU,72059400188,KIAH,KMCJ,72244012918,7224301...",Friday,8,2023,Aug-2023,Summer,0,0
2023-08-05,1.0,2.0,9.0,21.0,203.0,3.0,0.0,22.0,185.0,66.0,3.0,11.0,0.0,1.0,69.0,21.0,0.0,1.0,0.0,0.0,11.0,0.0,2.0,0.0,4.0,18.0,0.0,2.0,0.0,0.0,21.0,51.0,727.0,99.7,80.7,88.3,106.2,87.4,96.8,74.4,66.3,0.0,0,0.0,,0,0,27.7,20.7,195.1,1014.3,25.4,9.8,202.2,17.7,10,2023-08-05T06:43:17,2023-08-05T20:11:25,0.63,Partially cloudy,Becoming cloudy in the afternoon.,partly-cloudy-day,"KHOU,72059400188,KIAH,KMCJ,72244012918,7224301...",Saturday,8,2023,Aug-2023,Summer,0,1


## **Merge1-2: Daily Total and Weather for Aug 2023 and Daily Total and Weather for Jan 2010 through Jul 2023**
df_crime_aug2023_daily_total and df_weather_aug2023**
* 'daily crime numbers and weather data for time series analysis.csv'

* Merge df_crime_daily_total_weather_aug2023 to 'daily crime numbers and weather data for time series analysis.csv'
* Load 'daily crime numbers and weather data for time series analysis.csv'

#### **Load 'daily crime numbers and weather data for time series analysis.csv'**

In [36]:
# Load 'daily crime numbers and weather data for time series analysis.csv'
df_crime_daily_total_weather_jan2010_jul2023 = pd.read_csv('data/daily crime numbers and weather data for time series analysis.csv', index_col='date', parse_dates=['date'])

#### **Missing Value Treatment**

* Calculate daily average for 'solarradiation', 'solarenergy', and 'uvindex' from 2010, 2011, 2012 and 2013
* Impute missing values with the daily average

In [37]:
df_crime_daily_total_weather_jan2010_jul2023.isna().sum()

Kidnapping/Abduction                           0
Sex Offenses, Forcible                         0
Robbery                                        0
Assault Offenses                               0
Arson                                          0
Extortion/Blackmail                            0
Burglary/Breaking & Entering                   0
Larceny/Theft Offenses                         0
Motor Vehicle Theft                            0
Counterfeiting/Forgery                         0
Fraud Offenses                                 0
Embezzlement                                   0
Stolen Property Offenses                       0
Destruction/Damage/Vandalism of Property       0
Drug/Narcotic Offenses                         0
Sex Offenses, Nonforcible                      0
Pornography/Obscene Material                   0
Gambling Offenses                              0
Prostitution Offenses                          0
Bribery                                        0
Weapon Law Violation

In [38]:
# Impute missing values with the daily average
solarradiation_energy_uvindex = pd.concat([df_crime_daily_total_weather_jan2010_jul2023.loc['2010-01-01':'2013-12-31', ['solarradiation', 'solarenergy', 'uvindex']],
                                           df_crime_daily_total_weather_jan2010_jul2023.loc['2015-01-01':'2023-07-31', ['solarradiation', 'solarenergy', 'uvindex']]],
                                           axis=0)

# Create 'mon_day' Ex.) 10-15
solarradiation_energy_uvindex['mon_day'] = solarradiation_energy_uvindex.index.map(lambda m: m.strftime('%m-%d'))

groupby_month_solarradiation_energy_uvindex = solarradiation_energy_uvindex.groupby('mon_day')[['solarradiation', 'solarenergy', 'uvindex']].mean()
groupby_month_solarradiation_energy_uvindex.drop(index='02-29', inplace=True)

In [39]:
# Impute missing values with the daily average between 2010 to 2023
cols_have_null = ['solarradiation', 'solarenergy', 'uvindex']
for i in range(len(cols_have_null)):
    col_to_fill = df_crime_daily_total_weather_jan2010_jul2023.loc['2014-01-01':'2014-12-31', cols_have_null[i]]
    for j in range(len(col_to_fill)):
        col_to_fill[j] = groupby_month_solarradiation_energy_uvindex[cols_have_null[i]][j]

In [40]:
# Verify there is no missing data
df_crime_daily_total_weather_jan2010_jul2023.isna().any().sum()

0

In [41]:
df_crime_daily_total_weather_jan2010_aug2023 = pd.concat([df_crime_daily_total_weather_jan2010_jul2023, df_crime_daily_total_weather_aug2023], axis=0).fillna(0)
df_crime_daily_total_weather_jan2010_aug2023.head(3)

Unnamed: 0,Kidnapping/Abduction,"Sex Offenses, Forcible",Robbery,Assault Offenses,Arson,Extortion/Blackmail,Burglary/Breaking & Entering,Larceny/Theft Offenses,Motor Vehicle Theft,Counterfeiting/Forgery,Fraud Offenses,Embezzlement,Stolen Property Offenses,Destruction/Damage/Vandalism of Property,Drug/Narcotic Offenses,"Sex Offenses, Nonforcible",Pornography/Obscene Material,Gambling Offenses,Prostitution Offenses,Bribery,Weapon Law Violations,Human Trafficking or Kidnapping/Abduction,Animal Cruelty,Homicide Offenses,Bad Checks,Curfew/Loitering/Vagrancy Violations,Disorderly Conduct,Driving Under the Influence,Drunkenness,"Family Offenses, Nonviolent",Liquor Law Violations,Peeping Tom,Runaway,Trespass of Real Property,All Other Offenses,Offense Count,week,month,year,mon_year,season,is_holiday,is_weekend,tempmax,tempmin,temp,feelslikemax,feelslikemin,feelslike,dew,humidity,precip,precipprob,precipcover,preciptype,snow,snowdepth,windgust,windspeed,winddir,sealevelpressure,cloudcover,visibility,solarradiation,solarenergy,uvindex,sunrise,sunset,moonphase,conditions,description,icon,stations
2010-01-01,0.0,13.0,31.0,58.0,0.0,0.0,77.0,215.0,21.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,416.0,Friday,1,2010,Jan-2010,Winter,1,0,53.9,41.3,46.9,53.9,33.0,42.3,34.0,63.0,0.0,0,0.0,none,0.0,0.0,27.8,18.0,359.3,1028.0,47.7,9.8,174.8,15.0,7.0,07:16:59,17:33:24,0.53,Partially cloudy,Clearing in the afternoon.,partly-cloudy-day,"KHOU,72059400188,KIAH,KMCJ,72244012918,7224301..."
2010-01-02,0.0,2.0,32.0,15.0,0.0,0.0,67.0,200.0,28.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,344.0,Saturday,1,2010,Jan-2010,Winter,0,1,53.7,40.1,46.3,53.7,34.9,43.1,30.9,56.9,0.0,0,0.0,none,0.0,0.0,16.1,11.6,93.7,1025.9,9.1,9.6,156.6,13.4,6.0,07:17:13,17:34:07,0.57,Clear,Clear conditions throughout the day.,clear-day,"KHOU,72059400188,KIAH,KMCJ,72244012918,7224301..."
2010-01-03,0.0,0.0,38.0,35.0,0.0,0.0,52.0,179.0,35.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,339.0,Sunday,1,2010,Jan-2010,Winter,0,1,47.4,41.1,44.8,46.5,33.8,40.8,35.6,70.2,0.0,0,0.0,none,0.0,0.0,18.3,13.4,55.4,1023.2,78.6,8.8,80.2,6.7,3.0,07:17:25,17:34:50,0.6,Partially cloudy,Partly cloudy throughout the day.,partly-cloudy-day,"KHOU,72059400188,KIAH,KMCJ,72244012918,7224301..."


In [42]:
df_crime_daily_total_weather_jan2010_aug2023.tail(3)

Unnamed: 0,Kidnapping/Abduction,"Sex Offenses, Forcible",Robbery,Assault Offenses,Arson,Extortion/Blackmail,Burglary/Breaking & Entering,Larceny/Theft Offenses,Motor Vehicle Theft,Counterfeiting/Forgery,Fraud Offenses,Embezzlement,Stolen Property Offenses,Destruction/Damage/Vandalism of Property,Drug/Narcotic Offenses,"Sex Offenses, Nonforcible",Pornography/Obscene Material,Gambling Offenses,Prostitution Offenses,Bribery,Weapon Law Violations,Human Trafficking or Kidnapping/Abduction,Animal Cruelty,Homicide Offenses,Bad Checks,Curfew/Loitering/Vagrancy Violations,Disorderly Conduct,Driving Under the Influence,Drunkenness,"Family Offenses, Nonviolent",Liquor Law Violations,Peeping Tom,Runaway,Trespass of Real Property,All Other Offenses,Offense Count,week,month,year,mon_year,season,is_holiday,is_weekend,tempmax,tempmin,temp,feelslikemax,feelslikemin,feelslike,dew,humidity,precip,precipprob,precipcover,preciptype,snow,snowdepth,windgust,windspeed,winddir,sealevelpressure,cloudcover,visibility,solarradiation,solarenergy,uvindex,sunrise,sunset,moonphase,conditions,description,icon,stations
2023-08-29,1.0,6.0,18.0,145.0,0.0,0.0,37.0,159.0,47.0,8.0,20.0,2.0,2.0,51.0,23.0,0.0,0.0,0.0,0.0,0.0,9.0,1.0,3.0,0.0,0.0,0.0,6.0,7.0,0.0,0.0,0.0,0.0,0.0,27.0,52.0,624.0,Tuesday,8,2023,Aug-2023,Summer,0,0,93.8,79.7,87.2,92.6,79.7,88.7,64.3,49.2,0.0,0,0.0,0,0.0,0.0,31.3,19.6,33.0,1010.1,32.5,9.9,165.8,14.2,9.0,2023-08-29T06:56:57,2023-08-29T19:47:31,0.45,Partially cloudy,Partly cloudy throughout the day.,partly-cloudy-day,"KHOU,72059400188,KIAH,KMCJ,72244012918,7224301..."
2023-08-30,0.0,1.0,20.0,164.0,0.0,2.0,36.0,137.0,46.0,4.0,10.0,1.0,1.0,37.0,22.0,0.0,2.0,0.0,11.0,0.0,7.0,0.0,0.0,2.0,0.0,0.0,4.0,3.0,0.0,4.0,0.0,0.0,0.0,22.0,54.0,590.0,Wednesday,8,2023,Aug-2023,Summer,0,0,96.7,78.0,87.4,95.0,78.0,86.5,57.1,37.4,0.0,0,0.0,0,0.0,0.0,30.1,15.2,35.9,1009.9,15.9,9.9,195.6,16.9,9.0,2023-08-30T06:57:30,2023-08-30T19:46:22,0.5,Clear,Clear conditions throughout the day.,clear-day,"KHOU,72059400188,KIAH,KMCJ,72244012918,7224301..."
2023-08-31,0.0,6.0,14.0,148.0,2.0,2.0,20.0,140.0,52.0,4.0,9.0,1.0,2.0,55.0,25.0,0.0,2.0,0.0,1.0,0.0,14.0,0.0,1.0,2.0,0.0,0.0,3.0,8.0,0.0,2.0,0.0,0.0,0.0,20.0,35.0,568.0,Thursday,8,2023,Aug-2023,Summer,0,0,97.1,80.2,87.6,95.0,80.7,87.5,59.6,40.8,0.0,0,0.0,0,0.0,0.0,23.0,12.5,69.9,1010.9,24.2,9.8,181.2,15.8,9.0,2023-08-31T06:58:02,2023-08-31T19:45:13,0.52,Partially cloudy,Partly cloudy throughout the day.,partly-cloudy-day,"KHOU,72059400188,KIAH,KMCJ,72244012918,7224301..."


In [43]:
# Verify there is no missign value
df_crime_daily_total_weather_jan2010_aug2023.isna().any().sum()

0

#### **Save Daily Crime Count**
* Save df_crime_daily_total_weather_jan2010_aug2023 as 'daily crime numbers and weather data for time series analysis_082003.csv'

In [54]:
# Save as csv
df_crime_daily_total_weather_jan2010_aug2023.reset_index(names='date', inplace=True)
df_crime_daily_total_weather_jan2010_aug2023.to_csv("data\daily crime numbers and weather data for time series analysis_082003_ver2.csv", index=False)

## **Merge2: Crime Dataset for Aug 2023 and Crime Dataset from Jan 2010 through Jul 2023**

* Merge df_crime_daily_total_weather_aug2003 to 'all_crime_features_2010_2023_w_nibrs_class.csv'
* Load 'all_crime_features_2010_2023_w_nibrs_class.csv'

In [57]:
df_crime_jan2010_jul2023 = pd.read_csv('all_crime_features_2010_2023_w_nibrs_class.csv', parse_dates=['date'], index_col='date')
df_crime_jan2010_jul2023.head()

Unnamed: 0_level_0,Offense Count,Beat,Premise,Block Range,Street Name,Street Type,Suffix,Incident,City,ZIP Code,Street No,MapLongitude,MapLatitude,cleaned_occurence_hour,week,month,year,mon_year,season,is_holiday,is_weekend,cleaned_description,cleaned_class,aggregate_cleaned_class
date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1,Unnamed: 22_level_1,Unnamed: 23_level_1,Unnamed: 24_level_1
2010-01-01,1.0,3B10,20R,4900-4999,POINCIANA,DR,-,,,,,,,8,Friday,1,2010,Jan-2010,Winter,1,0,all other larceny,23H,23
2010-01-01,1.0,5F20,20D,8700-8799,HAMMERLY,-,-,,,,,,,18,Friday,1,2010,Jan-2010,Winter,1,0,all other larceny,23H,23
2010-01-01,1.0,1A10,05O,400-499,MAIN,ST,-,,,,,,,0,Friday,1,2010,Jan-2010,Winter,1,0,"burglary, breaking and entering",220,220
2010-01-01,1.0,7C10,20R,1900-1999,LOCKWOOD,DR,-,,,,,,,0,Friday,1,2010,Jan-2010,Winter,1,0,all other larceny,23H,23
2010-01-01,1.0,18F20,18A,3300-3399,MCCUE,RD,-,,,,,,,10,Friday,1,2010,Jan-2010,Winter,1,0,all other larceny,23H,23


In [58]:
# Rename columns
df_crime_aug2023 = df_crime_aug2023.rename(columns={'RMSOccurrenceHour': 'cleaned_occurence_hour', 
                                                    'NIBRSDescription': 'cleaned_description', 
                                                    'NIBRSClass': 'cleaned_class',
                                                    'StreetNo': 'Street No',
                                                    'StreetName': 'Street Name', 
                                                    'StreetType': 'Street Type',
                                                    'ZIPCode': 'ZIP Code',
                                                    'OffenseCount': 'Offense Count'})
                                                    

In [59]:
# Change index name
df_crime_aug2023.rename_axis('date', inplace=True)

In [60]:
df_crime_aug2023.head(3)

Unnamed: 0_level_0,Incident,cleaned_occurence_hour,cleaned_class,cleaned_description,Offense Count,Beat,Premise,Street No,Street Name,Street Type,Suffix,City,ZIP Code,MapLongitude,MapLatitude,week,month,year,mon_year,season,is_holiday,is_weekend,aggregate_cleaned_class
date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1,Unnamed: 22_level_1,Unnamed: 23_level_1
2023-08-01,109054223,7,90J,Trespass of real property,1,150000000000.0,Park/Playground,8800,MULLINS,DR,,HOUSTON,77096,-95.486595,29.685225,Tuesday,8,2023,Aug-2023,Summer,0,0,90J
2023-08-01,109587223,9,290,"Destruction, damage, vandalism",1,150000000000.0,"Residence, Home (Includes Apartment)",5900,BRAESWOOD,BLVD,N,HOUSTON,77074,-95.494935,29.679116,Tuesday,8,2023,Aug-2023,Summer,0,0,290
2023-08-01,110834923,9,23C,Shoplifting,1,150000000000.0,"Department, Discount Store",300,MEYERLAND PLAZA,,,HOUSTON,77096,-95.461402,29.686654,Tuesday,8,2023,Aug-2023,Summer,0,0,23


In [61]:
# Merge df_crime_jan2010_jul2023 and df_crime_aug2023
df_crime_jan2010_aug2023 = pd.concat([df_crime_jan2010_jul2023, df_crime_aug2023], axis=0)
df_crime_jan2010_aug2023.head(3)

Unnamed: 0_level_0,Offense Count,Beat,Premise,Block Range,Street Name,Street Type,Suffix,Incident,City,ZIP Code,Street No,MapLongitude,MapLatitude,cleaned_occurence_hour,week,month,year,mon_year,season,is_holiday,is_weekend,cleaned_description,cleaned_class,aggregate_cleaned_class
date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1,Unnamed: 22_level_1,Unnamed: 23_level_1,Unnamed: 24_level_1
2010-01-01,1.0,3B10,20R,4900-4999,POINCIANA,DR,-,,,,,,,8,Friday,1,2010,Jan-2010,Winter,1,0,all other larceny,23H,23
2010-01-01,1.0,5F20,20D,8700-8799,HAMMERLY,-,-,,,,,,,18,Friday,1,2010,Jan-2010,Winter,1,0,all other larceny,23H,23
2010-01-01,1.0,1A10,05O,400-499,MAIN,ST,-,,,,,,,0,Friday,1,2010,Jan-2010,Winter,1,0,"burglary, breaking and entering",220,220


In [62]:
df_crime_jan2010_aug2023.tail(3)

Unnamed: 0_level_0,Offense Count,Beat,Premise,Block Range,Street Name,Street Type,Suffix,Incident,City,ZIP Code,Street No,MapLongitude,MapLatitude,cleaned_occurence_hour,week,month,year,mon_year,season,is_holiday,is_weekend,cleaned_description,cleaned_class,aggregate_cleaned_class
date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1,Unnamed: 22_level_1,Unnamed: 23_level_1,Unnamed: 24_level_1
2023-08-31,1.0,HCSO,"Residence, Home (Includes Apartment)",,MILLS,RD,,124835223.0,HARRIS CO,77070.0,9100,-95.555181,29.955979,15,Thursday,8,2023,Aug-2023,Summer,0,0,All other offenses,90Z,90Z
2023-08-31,1.0,OOJ,"Parking Lot, Garage",,GUADALUPE,RD,,124980523.0,STAFFORD,77477.0,1415,-95.55546,29.595597,20,Thursday,8,2023,Aug-2023,Summer,0,0,"Destruction, damage, vandalism",290,290
2023-08-31,1.0,,"Highway, Road, Street, Alley",,OLD HUMBLE,RD,,124913123.0,HARRIS CO,,13000,,,18,Thursday,8,2023,Aug-2023,Summer,0,0,"Drug, narcotic violations",35A,35


* Save df_crime_jan2010_aug2023 as 'all_crime_features_jan2010_aug2023_w_nibrs_class.csv'

In [63]:
df_crime_jan2010_aug2023.reset_index(names='date', inplace=True)
df_crime_jan2010_aug2023.to_csv('all_crime_features_jan2010_aug2023_w_nibrs_class_ver2.csv', index=False)

## **Verify1: "data\daily crime numbers and weather data for time series analysis_082003.csv"**

* Load 'daily crime numbers and weather data for time series analysis_082003.csv'

In [55]:
df_total_2023 = pd.read_csv("data\daily crime numbers and weather data for time series analysis_082003_ver2.csv", 
                       parse_dates=['date'], 
                       index_col='date',
                       dtype={'RMSOccurrenceHour': int, 'NIBRS Class': str, 'Offense Count': int, 'ZIP Code': str})

In [51]:
df_total_2023.head()

Unnamed: 0_level_0,Kidnapping/Abduction,"Sex Offenses, Forcible",Robbery,Assault Offenses,Arson,Extortion/Blackmail,Burglary/Breaking & Entering,Larceny/Theft Offenses,Motor Vehicle Theft,Counterfeiting/Forgery,Fraud Offenses,Embezzlement,Stolen Property Offenses,Destruction/Damage/Vandalism of Property,Drug/Narcotic Offenses,"Sex Offenses, Nonforcible",Pornography/Obscene Material,Gambling Offenses,Prostitution Offenses,Bribery,Weapon Law Violations,Human Trafficking or Kidnapping/Abduction,Animal Cruelty,Homicide Offenses,Bad Checks,Curfew/Loitering/Vagrancy Violations,Disorderly Conduct,Driving Under the Influence,Drunkenness,"Family Offenses, Nonviolent",Liquor Law Violations,Peeping Tom,Runaway,Trespass of Real Property,All Other Offenses,Offense Count,week,month,year,mon_year,season,is_holiday,is_weekend,tempmax,tempmin,temp,feelslikemax,feelslikemin,feelslike,dew,humidity,precip,precipprob,precipcover,preciptype,snow,snowdepth,windgust,windspeed,winddir,sealevelpressure,cloudcover,visibility,solarradiation,solarenergy,uvindex,sunrise,sunset,moonphase,conditions,description,icon,stations
date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1,Unnamed: 22_level_1,Unnamed: 23_level_1,Unnamed: 24_level_1,Unnamed: 25_level_1,Unnamed: 26_level_1,Unnamed: 27_level_1,Unnamed: 28_level_1,Unnamed: 29_level_1,Unnamed: 30_level_1,Unnamed: 31_level_1,Unnamed: 32_level_1,Unnamed: 33_level_1,Unnamed: 34_level_1,Unnamed: 35_level_1,Unnamed: 36_level_1,Unnamed: 37_level_1,Unnamed: 38_level_1,Unnamed: 39_level_1,Unnamed: 40_level_1,Unnamed: 41_level_1,Unnamed: 42_level_1,Unnamed: 43_level_1,Unnamed: 44_level_1,Unnamed: 45_level_1,Unnamed: 46_level_1,Unnamed: 47_level_1,Unnamed: 48_level_1,Unnamed: 49_level_1,Unnamed: 50_level_1,Unnamed: 51_level_1,Unnamed: 52_level_1,Unnamed: 53_level_1,Unnamed: 54_level_1,Unnamed: 55_level_1,Unnamed: 56_level_1,Unnamed: 57_level_1,Unnamed: 58_level_1,Unnamed: 59_level_1,Unnamed: 60_level_1,Unnamed: 61_level_1,Unnamed: 62_level_1,Unnamed: 63_level_1,Unnamed: 64_level_1,Unnamed: 65_level_1,Unnamed: 66_level_1,Unnamed: 67_level_1,Unnamed: 68_level_1,Unnamed: 69_level_1,Unnamed: 70_level_1,Unnamed: 71_level_1,Unnamed: 72_level_1,Unnamed: 73_level_1
2010-01-01,0.0,13.0,31.0,58.0,0.0,0.0,77.0,215.0,21.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,416,Friday,1,2010,Jan-2010,Winter,1,0,53.9,41.3,46.9,53.9,33.0,42.3,34.0,63.0,0.0,0,0.0,none,0.0,0.0,27.8,18.0,359.3,1028.0,47.7,9.8,174.8,15.0,7.0,07:16:59,17:33:24,0.53,Partially cloudy,Clearing in the afternoon.,partly-cloudy-day,"KHOU,72059400188,KIAH,KMCJ,72244012918,7224301..."
2010-01-02,0.0,2.0,32.0,15.0,0.0,0.0,67.0,200.0,28.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,344,Saturday,1,2010,Jan-2010,Winter,0,1,53.7,40.1,46.3,53.7,34.9,43.1,30.9,56.9,0.0,0,0.0,none,0.0,0.0,16.1,11.6,93.7,1025.9,9.1,9.6,156.6,13.4,6.0,07:17:13,17:34:07,0.57,Clear,Clear conditions throughout the day.,clear-day,"KHOU,72059400188,KIAH,KMCJ,72244012918,7224301..."
2010-01-03,0.0,0.0,38.0,35.0,0.0,0.0,52.0,179.0,35.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,339,Sunday,1,2010,Jan-2010,Winter,0,1,47.4,41.1,44.8,46.5,33.8,40.8,35.6,70.2,0.0,0,0.0,none,0.0,0.0,18.3,13.4,55.4,1023.2,78.6,8.8,80.2,6.7,3.0,07:17:25,17:34:50,0.6,Partially cloudy,Partly cloudy throughout the day.,partly-cloudy-day,"KHOU,72059400188,KIAH,KMCJ,72244012918,7224301..."
2010-01-04,0.0,1.0,24.0,25.0,0.0,0.0,94.0,211.0,29.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,384,Monday,1,2010,Jan-2010,Winter,0,0,46.4,36.2,41.8,42.1,28.7,35.2,29.4,62.0,0.0,0,0.0,none,0.0,0.0,25.0,18.4,346.2,1028.6,47.6,9.6,167.4,14.5,7.0,07:17:35,17:35:35,0.64,Partially cloudy,Clearing in the afternoon.,partly-cloudy-day,"KHOU,72059400188,KIAH,KMCJ,72244012918,7224301..."
2010-01-05,0.0,3.0,19.0,22.0,0.0,0.0,88.0,183.0,20.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,2.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,337,Tuesday,1,2010,Jan-2010,Winter,0,0,48.2,31.7,38.7,47.5,24.0,34.3,23.6,56.1,0.0,0,0.0,none,0.0,0.0,16.1,9.8,22.5,1029.7,4.4,9.8,174.1,15.0,7.0,07:17:44,17:36:20,0.67,Clear,Clear conditions throughout the day.,clear-day,"KHOU,72059400188,KIAH,KMCJ,72244012918,7224301..."


In [52]:
df_total_2023.tail()

Unnamed: 0_level_0,Kidnapping/Abduction,"Sex Offenses, Forcible",Robbery,Assault Offenses,Arson,Extortion/Blackmail,Burglary/Breaking & Entering,Larceny/Theft Offenses,Motor Vehicle Theft,Counterfeiting/Forgery,Fraud Offenses,Embezzlement,Stolen Property Offenses,Destruction/Damage/Vandalism of Property,Drug/Narcotic Offenses,"Sex Offenses, Nonforcible",Pornography/Obscene Material,Gambling Offenses,Prostitution Offenses,Bribery,Weapon Law Violations,Human Trafficking or Kidnapping/Abduction,Animal Cruelty,Homicide Offenses,Bad Checks,Curfew/Loitering/Vagrancy Violations,Disorderly Conduct,Driving Under the Influence,Drunkenness,"Family Offenses, Nonviolent",Liquor Law Violations,Peeping Tom,Runaway,Trespass of Real Property,All Other Offenses,Offense Count,week,month,year,mon_year,season,is_holiday,is_weekend,tempmax,tempmin,temp,feelslikemax,feelslikemin,feelslike,dew,humidity,precip,precipprob,precipcover,preciptype,snow,snowdepth,windgust,windspeed,winddir,sealevelpressure,cloudcover,visibility,solarradiation,solarenergy,uvindex,sunrise,sunset,moonphase,conditions,description,icon,stations
date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1,Unnamed: 22_level_1,Unnamed: 23_level_1,Unnamed: 24_level_1,Unnamed: 25_level_1,Unnamed: 26_level_1,Unnamed: 27_level_1,Unnamed: 28_level_1,Unnamed: 29_level_1,Unnamed: 30_level_1,Unnamed: 31_level_1,Unnamed: 32_level_1,Unnamed: 33_level_1,Unnamed: 34_level_1,Unnamed: 35_level_1,Unnamed: 36_level_1,Unnamed: 37_level_1,Unnamed: 38_level_1,Unnamed: 39_level_1,Unnamed: 40_level_1,Unnamed: 41_level_1,Unnamed: 42_level_1,Unnamed: 43_level_1,Unnamed: 44_level_1,Unnamed: 45_level_1,Unnamed: 46_level_1,Unnamed: 47_level_1,Unnamed: 48_level_1,Unnamed: 49_level_1,Unnamed: 50_level_1,Unnamed: 51_level_1,Unnamed: 52_level_1,Unnamed: 53_level_1,Unnamed: 54_level_1,Unnamed: 55_level_1,Unnamed: 56_level_1,Unnamed: 57_level_1,Unnamed: 58_level_1,Unnamed: 59_level_1,Unnamed: 60_level_1,Unnamed: 61_level_1,Unnamed: 62_level_1,Unnamed: 63_level_1,Unnamed: 64_level_1,Unnamed: 65_level_1,Unnamed: 66_level_1,Unnamed: 67_level_1,Unnamed: 68_level_1,Unnamed: 69_level_1,Unnamed: 70_level_1,Unnamed: 71_level_1,Unnamed: 72_level_1,Unnamed: 73_level_1
2023-08-27,4.0,5.0,22.0,186.0,0.0,0.0,25.0,157.0,36.0,4.0,12.0,0.0,2.0,50.0,20.0,0.0,0.0,0.0,4.0,0.0,13.0,0.0,0.0,1.0,0.0,0.0,6.0,30.0,0.0,4.0,3.0,0.0,0.0,19.0,44.0,647,Sunday,8,2023,Aug-2023,Summer,0,1,106.1,80.6,90.7,106.8,85.4,94.4,67.8,54.0,0.0,0,0.0,0,0.0,0.0,37.7,23.1,298.9,1009.6,21.0,9.8,200.6,17.5,9.0,2023-08-27T06:55:52,2023-08-27T19:49:48,0.39,Partially cloudy,Partly cloudy throughout the day.,partly-cloudy-day,"KHOU,72059400188,KIAH,KMCJ,72244012918,7224301..."
2023-08-28,2.0,1.0,18.0,185.0,2.0,1.0,38.0,152.0,55.0,4.0,13.0,2.0,1.0,50.0,16.0,0.0,1.0,0.0,3.0,0.0,7.0,1.0,1.0,1.0,0.0,0.0,8.0,13.0,1.0,0.0,0.0,0.0,1.0,27.0,43.0,647,Monday,8,2023,Aug-2023,Summer,0,0,95.5,78.9,87.0,100.7,78.9,90.9,69.3,57.2,0.0,0,0.0,0,0.0,0.0,21.9,14.6,21.8,1009.4,30.5,9.9,218.2,19.0,10.0,2023-08-28T06:56:25,2023-08-28T19:48:40,0.42,Partially cloudy,Partly cloudy throughout the day.,partly-cloudy-day,"KHOU,72059400188,KIAH,KMCJ,72244012918,7224301..."
2023-08-29,1.0,6.0,18.0,145.0,0.0,0.0,37.0,159.0,47.0,8.0,20.0,2.0,2.0,51.0,23.0,0.0,0.0,0.0,0.0,0.0,9.0,1.0,3.0,0.0,0.0,0.0,6.0,7.0,0.0,0.0,0.0,0.0,0.0,27.0,52.0,624,Tuesday,8,2023,Aug-2023,Summer,0,0,93.8,79.7,87.2,92.6,79.7,88.7,64.3,49.2,0.0,0,0.0,0,0.0,0.0,31.3,19.6,33.0,1010.1,32.5,9.9,165.8,14.2,9.0,2023-08-29T06:56:57,2023-08-29T19:47:31,0.45,Partially cloudy,Partly cloudy throughout the day.,partly-cloudy-day,"KHOU,72059400188,KIAH,KMCJ,72244012918,7224301..."
2023-08-30,0.0,1.0,20.0,164.0,0.0,2.0,36.0,137.0,46.0,4.0,10.0,1.0,1.0,37.0,22.0,0.0,2.0,0.0,11.0,0.0,7.0,0.0,0.0,2.0,0.0,0.0,4.0,3.0,0.0,4.0,0.0,0.0,0.0,22.0,54.0,590,Wednesday,8,2023,Aug-2023,Summer,0,0,96.7,78.0,87.4,95.0,78.0,86.5,57.1,37.4,0.0,0,0.0,0,0.0,0.0,30.1,15.2,35.9,1009.9,15.9,9.9,195.6,16.9,9.0,2023-08-30T06:57:30,2023-08-30T19:46:22,0.5,Clear,Clear conditions throughout the day.,clear-day,"KHOU,72059400188,KIAH,KMCJ,72244012918,7224301..."
2023-08-31,0.0,6.0,14.0,148.0,2.0,2.0,20.0,140.0,52.0,4.0,9.0,1.0,2.0,55.0,25.0,0.0,2.0,0.0,1.0,0.0,14.0,0.0,1.0,2.0,0.0,0.0,3.0,8.0,0.0,2.0,0.0,0.0,0.0,20.0,35.0,568,Thursday,8,2023,Aug-2023,Summer,0,0,97.1,80.2,87.6,95.0,80.7,87.5,59.6,40.8,0.0,0,0.0,0,0.0,0.0,23.0,12.5,69.9,1010.9,24.2,9.8,181.2,15.8,9.0,2023-08-31T06:58:02,2023-08-31T19:45:13,0.52,Partially cloudy,Partly cloudy throughout the day.,partly-cloudy-day,"KHOU,72059400188,KIAH,KMCJ,72244012918,7224301..."


In [None]:
df_total_2023.shape

(4991, 73)

In [None]:
df_total_2023.info()

<class 'pandas.core.frame.DataFrame'>
DatetimeIndex: 4991 entries, 2010-01-01 to 2023-08-31
Data columns (total 73 columns):
 #   Column                                     Non-Null Count  Dtype  
---  ------                                     --------------  -----  
 0   Kidnapping/Abduction                       4991 non-null   float64
 1   Sex Offenses, Forcible                     4991 non-null   float64
 2   Robbery                                    4991 non-null   float64
 3   Assault Offenses                           4991 non-null   float64
 4   Arson                                      4991 non-null   float64
 5   Extortion/Blackmail                        4991 non-null   float64
 6   Burglary/Breaking & Entering               4991 non-null   float64
 7   Larceny/Theft Offenses                     4991 non-null   float64
 8   Motor Vehicle Theft                        4991 non-null   float64
 9   Counterfeiting/Forgery                     4991 non-null   float64
 10  Fraud 

## **Verify2: 'data/all_crime_features_jan2010_aug2023_w_nibrs_class.csv'**

* Load 'all_crime_features_jan2010_aug2023_w_nibrs_class.csv'

In [64]:
df_crime_2023 = pd.read_csv('data/all_crime_features_jan2010_aug2023_w_nibrs_class_ver2.csv', 
                       parse_dates=['date'], 
                       index_col='date',
                       dtype={'RMSOccurrenceHour': int, 'NIBRS Class': str, 'Offense Count': int, 'ZIP Code': str})

In [65]:
df_crime_2023.head()

Unnamed: 0_level_0,Offense Count,Beat,Premise,Block Range,Street Name,Street Type,Suffix,Incident,City,ZIP Code,Street No,MapLongitude,MapLatitude,cleaned_occurence_hour,week,month,year,mon_year,season,is_holiday,is_weekend,cleaned_description,cleaned_class,aggregate_cleaned_class
date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1,Unnamed: 22_level_1,Unnamed: 23_level_1,Unnamed: 24_level_1
2010-01-01,1,3B10,20R,4900-4999,POINCIANA,DR,-,,,,,,,8,Friday,1,2010,Jan-2010,Winter,1,0,all other larceny,23H,23
2010-01-01,1,5F20,20D,8700-8799,HAMMERLY,-,-,,,,,,,18,Friday,1,2010,Jan-2010,Winter,1,0,all other larceny,23H,23
2010-01-01,1,1A10,05O,400-499,MAIN,ST,-,,,,,,,0,Friday,1,2010,Jan-2010,Winter,1,0,"burglary, breaking and entering",220,220
2010-01-01,1,7C10,20R,1900-1999,LOCKWOOD,DR,-,,,,,,,0,Friday,1,2010,Jan-2010,Winter,1,0,all other larceny,23H,23
2010-01-01,1,18F20,18A,3300-3399,MCCUE,RD,-,,,,,,,10,Friday,1,2010,Jan-2010,Winter,1,0,all other larceny,23H,23


In [66]:
df_crime_2023.tail()

Unnamed: 0_level_0,Offense Count,Beat,Premise,Block Range,Street Name,Street Type,Suffix,Incident,City,ZIP Code,Street No,MapLongitude,MapLatitude,cleaned_occurence_hour,week,month,year,mon_year,season,is_holiday,is_weekend,cleaned_description,cleaned_class,aggregate_cleaned_class
date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1,Unnamed: 22_level_1,Unnamed: 23_level_1,Unnamed: 24_level_1
2023-08-31,1,HCSO,"Residence, Home (Includes Apartment)",,MILLS,RD,,124835223.0,HARRIS CO,77070.0,9100,-95.555181,29.955979,15,Thursday,8,2023,Aug-2023,Summer,0,0,"Drug, narcotic violations",35A,35
2023-08-31,1,HCSO,"Residence, Home (Includes Apartment)",,MILLS,RD,,124835223.0,HARRIS CO,77070.0,9100,-95.555181,29.955979,15,Thursday,8,2023,Aug-2023,Summer,0,0,Weapon law violations,520,520
2023-08-31,1,HCSO,"Residence, Home (Includes Apartment)",,MILLS,RD,,124835223.0,HARRIS CO,77070.0,9100,-95.555181,29.955979,15,Thursday,8,2023,Aug-2023,Summer,0,0,All other offenses,90Z,90Z
2023-08-31,1,OOJ,"Parking Lot, Garage",,GUADALUPE,RD,,124980523.0,STAFFORD,77477.0,1415,-95.55546,29.595597,20,Thursday,8,2023,Aug-2023,Summer,0,0,"Destruction, damage, vandalism",290,290
2023-08-31,1,,"Highway, Road, Street, Alley",,OLD HUMBLE,RD,,124913123.0,HARRIS CO,,13000,,,18,Thursday,8,2023,Aug-2023,Summer,0,0,"Drug, narcotic violations",35A,35


In [67]:
df_crime_2023.shape

(1446301, 24)

In [68]:
df_crime_2023.info()

<class 'pandas.core.frame.DataFrame'>
DatetimeIndex: 1446301 entries, 2010-01-01 to 2023-08-31
Data columns (total 24 columns):
 #   Column                   Non-Null Count    Dtype  
---  ------                   --------------    -----  
 0   Offense Count            1446301 non-null  int32  
 1   Beat                     1445993 non-null  object 
 2   Premise                  1423084 non-null  object 
 3   Block Range              1424090 non-null  object 
 4   Street Name              1446300 non-null  object 
 5   Street Type              1413585 non-null  object 
 6   Suffix                   1094630 non-null  object 
 7   Incident                 264341 non-null   float64
 8   City                     264341 non-null   object 
 9   ZIP Code                 261040 non-null   object 
 10  Street No                20231 non-null    object 
 11  MapLongitude             20079 non-null    float64
 12  MapLatitude              20079 non-null    float64
 13  cleaned_occurence_hour   14