Data Source: [Data.gov](https://catalog.data.gov/dataset/nypd-calls-for-service) a non-federal dataset for public use. 

License: City of New York - NYC.gov [Terms of Use](https://www.nyc.gov/home/terms-of-use.page)

# Data Collection and Preparation

Install and import required libraries 

In [1]:
import numpy as np
import pandas as pd

After downloading the dataset from source, load and verify the dataset fields and records using Padas to store them as df

In [2]:
# Import and load the dataset
nypdData = '/home/jupyter-raphrivers/Dataset/RAW/NYPD_Calls_for_Service__Year_to_Date_.csv'
df = pd.read_csv(nypdData)

In [3]:
# Verify the total number of rows
totals = len(df)
print(totals)

7050127


In [4]:
# Preview dataframe 
df.head()

Unnamed: 0,CAD_EVNT_ID,CREATE_DATE,INCIDENT_DATE,INCIDENT_TIME,NYPD_PCT_CD,BORO_NM,PATRL_BORO_NM,GEO_CD_X,GEO_CD_Y,RADIO_CODE,TYP_DESC,CIP_JOBS,ADD_TS,DISP_TS,ARRIVD_TS,CLOSNG_TS,Latitude,Longitude
0,91250176,01/01/2023,12/31/2022,23:24:39,67.0,BROOKLYN,PATROL BORO BKLYN SOUTH,1001878,175994,53I,VEHICLE ACCIDENT: INJURY,Non CIP,01/01/2023 01:08:21 AM,01/01/2023 01:09:57 AM,,01/01/2023 01:57:44 AM,40.64973,-73.936475
1,91250180,01/01/2023,12/31/2022,23:24:47,75.0,BROOKLYN,PATROL BORO BKLYN NORTH,1017204,180778,11C4,ALARMS: COMMERCIAL/BURGLARY,Non CIP,01/01/2023 12:38:00 AM,01/01/2023 12:38:34 AM,01/01/2023 12:44:33 AM,01/01/2023 01:45:21 AM,40.662817,-73.881221
2,91250681,01/01/2023,12/31/2022,23:55:56,114.0,QUEENS,PATROL BORO QUEENS NORTH,1008573,217117,11R4,ALARMS: RESIDENTIAL/BURGLARY,Non CIP,01/01/2023 12:01:26 AM,01/01/2023 12:06:18 AM,,01/01/2023 12:06:27 AM,40.762587,-73.912199
3,91250683,01/01/2023,12/31/2022,23:55:59,66.0,BROOKLYN,PATROL BORO BKLYN SOUTH,993234,161780,11R4,ALARMS: RESIDENTIAL/BURGLARY,Non CIP,01/01/2023 12:01:34 AM,01/01/2023 12:37:14 AM,01/01/2023 01:09:32 AM,01/01/2023 01:21:14 AM,40.610729,-73.967644
4,91250700,01/01/2023,12/31/2022,23:57:08,115.0,QUEENS,PATROL BORO QUEENS NORTH,1014264,211852,11C4,ALARMS: COMMERCIAL/BURGLARY,Non CIP,01/01/2023 12:01:29 AM,01/01/2023 12:14:28 AM,01/01/2023 12:21:59 AM,01/01/2023 01:24:22 AM,40.748119,-73.891679


In [5]:
# Inspect columns and data types
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 7050127 entries, 0 to 7050126
Data columns (total 18 columns):
 #   Column         Dtype  
---  ------         -----  
 0   CAD_EVNT_ID    int64  
 1   CREATE_DATE    object 
 2   INCIDENT_DATE  object 
 3   INCIDENT_TIME  object 
 4   NYPD_PCT_CD    float64
 5   BORO_NM        object 
 6   PATRL_BORO_NM  object 
 7   GEO_CD_X       int64  
 8   GEO_CD_Y       int64  
 9   RADIO_CODE     object 
 10  TYP_DESC       object 
 11  CIP_JOBS       object 
 12  ADD_TS         object 
 13  DISP_TS        object 
 14  ARRIVD_TS      object 
 15  CLOSNG_TS      object 
 16  Latitude       float64
 17  Longitude      float64
dtypes: float64(3), int64(3), object(12)
memory usage: 968.2+ MB


From the info we can see that the dataset consists of columns and types that may not be suited for our objective, hence we have to restructure the dataset. See source dataset ***<a href = "https://github.com/RaphRivers/Analysis-of-NYPD-911-Calls-Data-for-Enhanced-Public-Safety/blob/main/dataset-desc.ipynb" target="_blank">column description</a>.*** 

# Data Preprocessing and Cleaning
To prepare the dataset for analysis, lets begin by structuring and cleaning the data, including data type(s) conversion, standardizing formats, handling missing values, and removing duplicates.

## Structuring Dataset

Given the dataset's columns and descriptions, to prepare the dataset for the analysis and visualization tasks, including Exploratory Data Analysis (EDA), Temporal Analysis, Trend Analysis, Geospatial Analysis, Response Time Analysis, and Predictive Modeling, we convert the appropriate columns to their correct data types and possibly create new columns that will help in the analysis. 

#### 1. Convert Date and Time Columns
Convert ***CREATE_DATE, INCIDENT_DATE, INCIDENT_TIME, ADD_TS, DISP_TS, ARRIVD_TS, CLOSNG_TS***. These columns will be converted to datetime format to facilitate temporal analysis. Note that the ***INCIDENT_TIME*** column contains time information without a date. Therefore we want to combine it with ***INCIDENT_DATE*** to create a new column ***INCIDENT_DATETIME*** for analyses that will require precise timing.
#### 2. Ensure Geographic Information is Correct
From the source data information, the ***NYPD_PCT_CD***  column represents "precinct call is in" and should be in a numeric format, an integer or float for mapping and analysis. As seen, it is already in the correct format; therefore, we do nothing. The ***BORO_NM and PATRL_BORO_NM*** columns are categorical data representing borough names and patrol borough names, respectively, to ensure they are consistent and correctly spelled. Also, the ***GEO_CD_X, GEO_CD_Y, Latitude, Longitude*** columns represent data with geographic coordinates. Hence, the ***GEO_CD_X and GEO_CD_Y*** might need conversion depending on the coordinate system (e.g., converting to latitude and longitude if they were not already in that format).
#### 3. Convert Other Relevant Columns
***RADIO_CODE, TYP_DESC, CIP_JOBS*** columns also contain categorical data that can be converted to category dtype to save memory and improve performance.
#### 4. Add and Calculate Additional Columns for Analysis
***For Response Time Calculation***, we must calculate each 911 call response time by subtracting ***DISP_TS*** (Dispatch Timestamp) from ***ARRIVD_TS*** (Arrived Timestamp) to ensure we can handle missing values appropriately.

Next, we extract ***Day of Week, Month, Year from Dates*** for trend analysis. Extracting the day of the week, month, and year from date columns can be useful.

Finally, we will create***Incident a Duration*** column to store the difference between ARRIVD_TS (Responsteam arrival time) and CLOSNG_TS (Incident closing time), which will provide insights into how long incidents last.

## Why Use Categorical Data Type?
Converting RADIO_CODE, TYP_DESC, CIP_JOBS to a categorical data type can lead to significant performance improvements because we are working with a very large dataset and will be performing operations that are sensitive to the datatype, such as sorting, grouping, and plotting. It will also help reduce memory usage substantially, as Pandas uses an optimized storage format for categorical data.

#### Benefits of Using Categorical Data Type
***Performance:*** Operations on categorical data are often faster than their equivalent operations on string data, especially for large datasets.

***Memory Efficiency:*** Categorical data uses less memory by storing data as references to a limited set of categories rather than repeating the strings.

***Semantic Meaning:*** Converting a column to categorical explicitly informs anyone working with this dataset that the column's values have a specific categorical interpretation.

***Ease of Analysis:*** Certain Pandas operations, like groupby(), work more efficiently with categorical data, and plotting libraries may provide better support for labels and legends when working with categorical axes.

#### Limitations
While there are significant benefits, there are also situations where converting to a category may not be beneficial. For instance, if the number of categories is nearly as large as the number of observations, converting to categorical won't save much memory and might even use more. As such, converting a column with many unique strings to categorical can be slow because Pandas has to identify and categorize each unique string. Let's begin data pre processing and cleaning...

In [6]:
# Convert create date and incident date columns to DateTime format, month, day, and year.
df['CREATE_DATE'] = pd.to_datetime(df['CREATE_DATE'],format='%m/%d/%Y')
df['INCIDENT_DATE'] = pd.to_datetime(df['INCIDENT_DATE'],format='%m/%d/%Y')

In [7]:
# Note that incident time format is in 24-hour format 
# Convert INCIDENT_TIME to datetime using tmp var to hold it
df['INCIDENT_TIME_TEMP'] = pd.to_datetime(df['INCIDENT_TIME'], format='%H:%M:%S')
# Convert INCIDENT_TIME to 12-hour format with AM/PM
df['INCIDENT_TIME'] = df['INCIDENT_TIME_TEMP'].dt.strftime('%I:%M:%S %p')

# Drop the temporary column because we no longer needed
df.drop('INCIDENT_TIME_TEMP', axis=1, inplace=True)

In [8]:
# To facilitate temporal analysis that requires precise timing, combine INCIDENT_DATE and INCIDENT_TIME to form INCIDENT_DATETIME column
df['INCIDENT_DATETIME'] = pd.to_datetime(df['INCIDENT_DATE'].dt.strftime('%m/%d/%Y') + ' ' + df['INCIDENT_TIME'], format='%m/%d/%Y %I:%M:%S %p')

In [9]:
# Convert the timestamp(s) column ADD_TS, DISP_TS, ARRIVD_TS, and CLOSNG_TS to datetime including both date and time with AM/PM
df['ADD_TS'] = pd.to_datetime(df['ADD_TS'], format='%m/%d/%Y %I:%M:%S %p')
df['DISP_TS'] = pd.to_datetime(df['DISP_TS'], format='%m/%d/%Y %I:%M:%S %p')
df['ARRIVD_TS'] = pd.to_datetime(df['ARRIVD_TS'], format='%m/%d/%Y %I:%M:%S %p')
df['CLOSNG_TS'] = pd.to_datetime(df['CLOSNG_TS'], format='%m/%d/%Y %I:%M:%S %p')

Notice that TYP_DESC (Incident type and description) is a string object separated by :. Therefore before we convert it to category type. Let's split the incident type and description into separate columns

In [10]:
# Split the 'TYP_DESC' column by ':' and expand into separate columns based on the description of the type of incident 
df[['INCIDENT_TYPE', 'INCIDENT_DESC']] = df['TYP_DESC'].str.split(':', expand=True)

# Optional you can use .str.strip() to remove any leading/trailing whitespace
#df['INCIDENT_TYPE'] = df['INCIDENT_TYPE'].str.strip()
#df['INCIDENT_DESC'] = df['INCIDENT_DESC'].str.strip()


In [11]:
# Convert categorical columns to 'category' dtype using for loop
cat_col = ['RADIO_CODE', 'INCIDENT_TYPE', 'INCIDENT_DESC', 'CIP_JOBS', 'BORO_NM', 'PATRL_BORO_NM']
for col in cat_col:
    df[col] = df[col].astype('category')

In [12]:
# Create new column and calculate response time in minutes
df['RESPONSE_TIME_MINUTES'] = (df['ARRIVD_TS'] - df['DISP_TS']).dt.total_seconds() / 60

In [13]:
# Extract day of week, month, year from INCIDENT DATES for trend analysis. 
df['DAY_OF_WEEK'] = df['INCIDENT_DATE'].dt.day_name()
df['MONTH'] = df['INCIDENT_DATE'].dt.month
df['YEAR'] = df['INCIDENT_DATE'].dt.year

In [14]:
# Calculate INCIDENT DURATION and add it into a new column to provide insights into how long each incident last
df['INCIDENT_DURATION'] = df['CLOSNG_TS'] - df['ARRIVD_TS']

In [15]:
# review dataframe
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 7050127 entries, 0 to 7050126
Data columns (total 26 columns):
 #   Column                 Dtype          
---  ------                 -----          
 0   CAD_EVNT_ID            int64          
 1   CREATE_DATE            datetime64[ns] 
 2   INCIDENT_DATE          datetime64[ns] 
 3   INCIDENT_TIME          object         
 4   NYPD_PCT_CD            float64        
 5   BORO_NM                category       
 6   PATRL_BORO_NM          category       
 7   GEO_CD_X               int64          
 8   GEO_CD_Y               int64          
 9   RADIO_CODE             category       
 10  TYP_DESC               object         
 11  CIP_JOBS               category       
 12  ADD_TS                 datetime64[ns] 
 13  DISP_TS                datetime64[ns] 
 14  ARRIVD_TS              datetime64[ns] 
 15  CLOSNG_TS              datetime64[ns] 
 16  Latitude               float64        
 17  Longitude              float64        
 18  IN

Now that we have restructured our data frame with relevant data types. Let's position the columns relative to each other to enhance readability before cleaning and exporting to csv for analysis

In [16]:
# Define the new order of columns
new_col_order = ['CAD_EVNT_ID', 'CREATE_DATE', 'INCIDENT_DATE', 'INCIDENT_TIME',
             'INCIDENT_DATETIME', 'RESPONSE_TIME_MINUTES', 'DAY_OF_WEEK', 'MONTH', 'YEAR', 'INCIDENT_DURATION', 
             'NYPD_PCT_CD', 'BORO_NM', 'PATRL_BORO_NM', 'GEO_CD_X', 'GEO_CD_Y', 
             'RADIO_CODE', 'TYP_DESC', 'INCIDENT_TYPE', 'INCIDENT_DESC', 
             'CIP_JOBS', 'ADD_TS', 'DISP_TS', 'ARRIVD_TS', 'CLOSNG_TS', 
             'Latitude', 'Longitude']
# Reassign the DataFrame columns according to the new columns order
df = df[new_col_order]

In [17]:
# Let preview the data frame
df.head(10)

Unnamed: 0,CAD_EVNT_ID,CREATE_DATE,INCIDENT_DATE,INCIDENT_TIME,INCIDENT_DATETIME,RESPONSE_TIME_MINUTES,DAY_OF_WEEK,MONTH,YEAR,INCIDENT_DURATION,...,TYP_DESC,INCIDENT_TYPE,INCIDENT_DESC,CIP_JOBS,ADD_TS,DISP_TS,ARRIVD_TS,CLOSNG_TS,Latitude,Longitude
0,91250176,2023-01-01,2022-12-31,11:24:39 PM,2022-12-31 23:24:39,,Saturday,12,2022,NaT,...,VEHICLE ACCIDENT: INJURY,VEHICLE ACCIDENT,INJURY,Non CIP,2023-01-01 01:08:21,2023-01-01 01:09:57,NaT,2023-01-01 01:57:44,40.64973,-73.936475
1,91250180,2023-01-01,2022-12-31,11:24:47 PM,2022-12-31 23:24:47,5.983333,Saturday,12,2022,0 days 01:00:48,...,ALARMS: COMMERCIAL/BURGLARY,ALARMS,COMMERCIAL/BURGLARY,Non CIP,2023-01-01 00:38:00,2023-01-01 00:38:34,2023-01-01 00:44:33,2023-01-01 01:45:21,40.662817,-73.881221
2,91250681,2023-01-01,2022-12-31,11:55:56 PM,2022-12-31 23:55:56,,Saturday,12,2022,NaT,...,ALARMS: RESIDENTIAL/BURGLARY,ALARMS,RESIDENTIAL/BURGLARY,Non CIP,2023-01-01 00:01:26,2023-01-01 00:06:18,NaT,2023-01-01 00:06:27,40.762587,-73.912199
3,91250683,2023-01-01,2022-12-31,11:55:59 PM,2022-12-31 23:55:59,32.3,Saturday,12,2022,0 days 00:11:42,...,ALARMS: RESIDENTIAL/BURGLARY,ALARMS,RESIDENTIAL/BURGLARY,Non CIP,2023-01-01 00:01:34,2023-01-01 00:37:14,2023-01-01 01:09:32,2023-01-01 01:21:14,40.610729,-73.967644
4,91250700,2023-01-01,2022-12-31,11:57:08 PM,2022-12-31 23:57:08,7.516667,Saturday,12,2022,0 days 01:02:23,...,ALARMS: COMMERCIAL/BURGLARY,ALARMS,COMMERCIAL/BURGLARY,Non CIP,2023-01-01 00:01:29,2023-01-01 00:14:28,2023-01-01 00:21:59,2023-01-01 01:24:22,40.748119,-73.891679
5,91250736,2023-01-01,2022-12-31,11:59:09 PM,2022-12-31 23:59:09,,Saturday,12,2022,NaT,...,ALARMS: COMMERCIAL/BURGLARY,ALARMS,COMMERCIAL/BURGLARY,Non CIP,2023-01-01 00:01:35,2023-01-01 02:40:24,NaT,2023-01-01 14:26:19,40.849889,-73.916483
6,91250746,2023-01-01,2023-01-01,12:00:12 AM,2023-01-01 00:00:12,,Sunday,1,2023,NaT,...,SEE COMPLAINANT: OTHER/INSIDE,SEE COMPLAINANT,OTHER/INSIDE,Non CIP,2023-01-01 00:00:12,2023-01-01 00:00:16,NaT,2023-01-01 02:06:53,40.716344,-74.001253
7,91250747,2023-01-01,2023-01-01,12:00:15 AM,2023-01-01 00:00:15,,Sunday,1,2023,NaT,...,INVESTIGATE/POSSIBLE CRIME: SERIOUS/OTHER,INVESTIGATE/POSSIBLE CRIME,SERIOUS/OTHER,Non CIP,2023-01-01 00:01:03,2023-01-01 03:06:30,NaT,2023-01-01 03:06:42,40.757317,-73.987881
8,91250747,2023-01-01,2023-01-01,12:00:15 AM,2023-01-01 00:00:15,,Sunday,1,2023,NaT,...,INVESTIGATE/POSSIBLE CRIME: SERIOUS/OTHER,INVESTIGATE/POSSIBLE CRIME,SERIOUS/OTHER,Non CIP,2023-01-01 00:00:15,2023-01-01 02:34:21,NaT,2023-01-01 03:20:26,40.757317,-73.987881
9,91250748,2023-01-01,2023-01-01,12:00:40 AM,2023-01-01 00:00:40,0.0,Sunday,1,2023,0 days 00:23:24,...,STATION INSPECTION BY TRANSIT BUREAU PERSONNEL,STATION INSPECTION BY TRANSIT BUREAU PERSONNEL,,Non CIP,2023-01-01 00:00:40,2023-01-01 00:00:40,2023-01-01 00:00:40,2023-01-01 00:24:04,40.825472,-73.892941


### Clean the Data

#### 1. Review Data Type(s): 
Inspect and ensure that each column is of the correct data type (e.g., converting timestamps to datetime objects, etc.).
#### 2. Handle Missing Values: 
Identify missing values and decide on a strategy for handling them (e.g., imputation or removal).
#### 3. Remove Duplicates:
Check for and remove any duplicate records to ensure data integrity.

In [18]:
# Review converted data types
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 7050127 entries, 0 to 7050126
Data columns (total 26 columns):
 #   Column                 Dtype          
---  ------                 -----          
 0   CAD_EVNT_ID            int64          
 1   CREATE_DATE            datetime64[ns] 
 2   INCIDENT_DATE          datetime64[ns] 
 3   INCIDENT_TIME          object         
 4   INCIDENT_DATETIME      datetime64[ns] 
 5   RESPONSE_TIME_MINUTES  float64        
 6   DAY_OF_WEEK            object         
 7   MONTH                  int32          
 8   YEAR                   int32          
 9   INCIDENT_DURATION      timedelta64[ns]
 10  NYPD_PCT_CD            float64        
 11  BORO_NM                category       
 12  PATRL_BORO_NM          category       
 13  GEO_CD_X               int64          
 14  GEO_CD_Y               int64          
 15  RADIO_CODE             category       
 16  TYP_DESC               object         
 17  INCIDENT_TYPE          category       
 18  IN

In [19]:
# Check for missing values to identify and summarize the number of missing data
dfNULL = df.isnull().sum()
print(dfNULL)

CAD_EVNT_ID                    0
CREATE_DATE                    0
INCIDENT_DATE                  0
INCIDENT_TIME                  0
INCIDENT_DATETIME              0
RESPONSE_TIME_MINUTES    1503133
DAY_OF_WEEK                    0
MONTH                          0
YEAR                           0
INCIDENT_DURATION        1503149
NYPD_PCT_CD                    4
BORO_NM                        0
PATRL_BORO_NM                  0
GEO_CD_X                       0
GEO_CD_Y                       0
RADIO_CODE                     0
TYP_DESC                       0
INCIDENT_TYPE                  0
INCIDENT_DESC            1593127
CIP_JOBS                       0
ADD_TS                         0
DISP_TS                        0
ARRIVD_TS                1503133
CLOSNG_TS                     20
Latitude                       0
Longitude                      0
dtype: int64


In [20]:
# Because we have several NaN in the INCIDENT_DESC (Incident description column) where the TYP_DESC was not we will fill these with 
# Add 'Not Specified' to the categories of the column
if 'Not Specified' not in df['INCIDENT_DESC'].cat.categories:
    df['INCIDENT_DESC'] = df['INCIDENT_DESC'].cat.add_categories(['Not Specified'])

# fill NaN values with 'Not Specified'
df['INCIDENT_DESC'].fillna('Not Specified', inplace=True)


In [21]:
# Handle missing values in RESPONSE_TIME_MINUTES, INCIDENT_DURATION, ARRIVD_TS, CLOSNG_TS NYPD_PCT_CD filling it with median of the values
df['RESPONSE_TIME_MINUTES'] = df['RESPONSE_TIME_MINUTES'].fillna(df['RESPONSE_TIME_MINUTES'].median())
df['INCIDENT_DURATION'] = df['INCIDENT_DURATION'].fillna(df['INCIDENT_DURATION'].median())
df['ARRIVD_TS'] = df['ARRIVD_TS'].fillna(df['ARRIVD_TS'].median())
df['CLOSNG_TS'] = df['CLOSNG_TS'].fillna(df['CLOSNG_TS'].median())
df['NYPD_PCT_CD'] = df['NYPD_PCT_CD'].fillna(df['NYPD_PCT_CD'].median())

In [22]:
# Check for duplicate rows 
df.duplicated().sum()

0

Validate geospatial data to ensure these values are within valid ranges (Latitude between -90 and 90, Longitude between -180 and 180). Outliers or incorrect values could indicate data entry errors.

In [23]:
# Validate Latitude and Longitude
df = df[(df['Latitude'].between(-90, 90)) & (df['Longitude'].between(-180, 180))]

In [24]:
# Standardize text columns to lowercase
text_cols = ['TYP_DESC', 'INCIDENT_TYPE', 'INCIDENT_DESC']
for col in text_cols:
    df[col] = df[col].str.lower()

In [25]:
# Check for missing values to identify and summarize the number of missing data
dfNULL = df.isnull().sum()
print(dfNULL)

CAD_EVNT_ID              0
CREATE_DATE              0
INCIDENT_DATE            0
INCIDENT_TIME            0
INCIDENT_DATETIME        0
RESPONSE_TIME_MINUTES    0
DAY_OF_WEEK              0
MONTH                    0
YEAR                     0
INCIDENT_DURATION        0
NYPD_PCT_CD              0
BORO_NM                  0
PATRL_BORO_NM            0
GEO_CD_X                 0
GEO_CD_Y                 0
RADIO_CODE               0
TYP_DESC                 0
INCIDENT_TYPE            0
INCIDENT_DESC            0
CIP_JOBS                 0
ADD_TS                   0
DISP_TS                  0
ARRIVD_TS                0
CLOSNG_TS                0
Latitude                 0
Longitude                0
dtype: int64


In [26]:
df.head()

Unnamed: 0,CAD_EVNT_ID,CREATE_DATE,INCIDENT_DATE,INCIDENT_TIME,INCIDENT_DATETIME,RESPONSE_TIME_MINUTES,DAY_OF_WEEK,MONTH,YEAR,INCIDENT_DURATION,...,TYP_DESC,INCIDENT_TYPE,INCIDENT_DESC,CIP_JOBS,ADD_TS,DISP_TS,ARRIVD_TS,CLOSNG_TS,Latitude,Longitude
0,91250176,2023-01-01,2022-12-31,11:24:39 PM,2022-12-31 23:24:39,0.0,Saturday,12,2022,0 days 00:19:04,...,vehicle accident: injury,vehicle accident,injury,Non CIP,2023-01-01 01:08:21,2023-01-01 01:09:57,2023-06-21 22:59:37.500,2023-01-01 01:57:44,40.64973,-73.936475
1,91250180,2023-01-01,2022-12-31,11:24:47 PM,2022-12-31 23:24:47,5.983333,Saturday,12,2022,0 days 01:00:48,...,alarms: commercial/burglary,alarms,commercial/burglary,Non CIP,2023-01-01 00:38:00,2023-01-01 00:38:34,2023-01-01 00:44:33.000,2023-01-01 01:45:21,40.662817,-73.881221
2,91250681,2023-01-01,2022-12-31,11:55:56 PM,2022-12-31 23:55:56,0.0,Saturday,12,2022,0 days 00:19:04,...,alarms: residential/burglary,alarms,residential/burglary,Non CIP,2023-01-01 00:01:26,2023-01-01 00:06:18,2023-06-21 22:59:37.500,2023-01-01 00:06:27,40.762587,-73.912199
3,91250683,2023-01-01,2022-12-31,11:55:59 PM,2022-12-31 23:55:59,32.3,Saturday,12,2022,0 days 00:11:42,...,alarms: residential/burglary,alarms,residential/burglary,Non CIP,2023-01-01 00:01:34,2023-01-01 00:37:14,2023-01-01 01:09:32.000,2023-01-01 01:21:14,40.610729,-73.967644
4,91250700,2023-01-01,2022-12-31,11:57:08 PM,2022-12-31 23:57:08,7.516667,Saturday,12,2022,0 days 01:02:23,...,alarms: commercial/burglary,alarms,commercial/burglary,Non CIP,2023-01-01 00:01:29,2023-01-01 00:14:28,2023-01-01 00:21:59.000,2023-01-01 01:24:22,40.748119,-73.891679


In [27]:
# Export the cleaned DataFrame to a CSV file
df.to_csv('/home/jupyter-raphrivers/Dataset/CLEANED/cleaned_df.csv', index=False)

# `index=False` means the DataFrame's index will not be written to the file.
# however if you want to include the index, you can set `index=True`.