# Explanatory Analysis

Conduct an exploratory analysis of the data. The analysis should include:

- If your dataset has missing values, identify and explain them. If your analysis requires you to
handle the missing values, describe your strategy for doing so.
- Numeric variables:
    - Mean, min, max, median
    - Correlations between all continuous variables
    - Visualize data distribution, noting outlier values
- Categorical variables: Value counts with bar charts

In [1]:
import pandas as pd

df = pd.read_csv("../dataset/old/arrests_data.csv")
df.head()

Unnamed: 0,ARREST_KEY,ARREST_DATE,PD_CD,PD_DESC,KY_CD,OFNS_DESC,LAW_CODE,LAW_CAT_CD,ARREST_BORO,ARREST_PRECINCT,JURISDICTION_CODE,AGE_GROUP,PERP_SEX,PERP_RACE,X_COORD_CD,Y_COORD_CD,Latitude,Longitude,New Georeferenced Column
0,298874520,01/04/2025,439,"LARCENY,GRAND FROM OPEN AREAS, UNATTENDED",109.0,GRAND LARCENY,PL 1553004,F,M,7,0,25-44,M,BLACK,0,0,0.0,0.0,POINT (0 0)
1,298799078,01/02/2025,101,ASSAULT 3,344.0,ASSAULT 3 & RELATED OFFENSES,PL 1200001,M,M,23,0,25-44,F,BLACK,1000213,228833,40.794755,-73.942348,POINT (-73.9423482609703 40.79475532416718)
2,298921520,01/05/2025,779,"PUBLIC ADMINISTRATION,UNCLASSI",126.0,MISCELLANEOUS PENAL LAW,PL 215510B,F,K,76,0,45-64,M,WHITE,0,0,0.0,0.0,POINT (0 0)
3,299008265,01/07/2025,105,STRANGULATION 1ST,106.0,FELONY ASSAULT,PL 1211200,F,Q,113,0,45-64,M,BLACK,1046399,187126,40.680086,-73.775931,POINT (-73.775931 40.680086)
4,298969999,01/06/2025,793,WEAPONS POSSESSION 3,118.0,DANGEROUS WEAPONS,PL 2650201,F,M,5,73,25-44,M,WHITE,983907,199958,40.715526,-74.001238,POINT (-74.001238 40.715526)


In [2]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 71242 entries, 0 to 71241
Data columns (total 19 columns):
 #   Column                    Non-Null Count  Dtype  
---  ------                    --------------  -----  
 0   ARREST_KEY                71242 non-null  int64  
 1   ARREST_DATE               71242 non-null  object 
 2   PD_CD                     71242 non-null  int64  
 3   PD_DESC                   71242 non-null  object 
 4   KY_CD                     71238 non-null  float64
 5   OFNS_DESC                 71242 non-null  object 
 6   LAW_CODE                  71242 non-null  object 
 7   LAW_CAT_CD                70881 non-null  object 
 8   ARREST_BORO               71242 non-null  object 
 9   ARREST_PRECINCT           71242 non-null  int64  
 10  JURISDICTION_CODE         71242 non-null  int64  
 11  AGE_GROUP                 71242 non-null  object 
 12  PERP_SEX                  71242 non-null  object 
 13  PERP_RACE                 71242 non-null  object 
 14  X_COOR

In [4]:
df['ARREST_DATE'] = pd.to_datetime(df['ARREST_DATE'], format='%m/%d/%Y')
print(df.info(),"\n\n\n")
df.head()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 71242 entries, 0 to 71241
Data columns (total 19 columns):
 #   Column                    Non-Null Count  Dtype         
---  ------                    --------------  -----         
 0   ARREST_KEY                71242 non-null  int64         
 1   ARREST_DATE               71242 non-null  datetime64[ns]
 2   PD_CD                     71242 non-null  int64         
 3   PD_DESC                   71242 non-null  object        
 4   KY_CD                     71238 non-null  float64       
 5   OFNS_DESC                 71242 non-null  object        
 6   LAW_CODE                  71242 non-null  object        
 7   LAW_CAT_CD                70881 non-null  object        
 8   ARREST_BORO               71242 non-null  object        
 9   ARREST_PRECINCT           71242 non-null  int64         
 10  JURISDICTION_CODE         71242 non-null  int64         
 11  AGE_GROUP                 71242 non-null  object        
 12  PERP_SEX          

Unnamed: 0,ARREST_KEY,ARREST_DATE,PD_CD,PD_DESC,KY_CD,OFNS_DESC,LAW_CODE,LAW_CAT_CD,ARREST_BORO,ARREST_PRECINCT,JURISDICTION_CODE,AGE_GROUP,PERP_SEX,PERP_RACE,X_COORD_CD,Y_COORD_CD,Latitude,Longitude,New Georeferenced Column
0,298874520,2025-01-04,439,"LARCENY,GRAND FROM OPEN AREAS, UNATTENDED",109.0,GRAND LARCENY,PL 1553004,F,M,7,0,25-44,M,BLACK,0,0,0.0,0.0,POINT (0 0)
1,298799078,2025-01-02,101,ASSAULT 3,344.0,ASSAULT 3 & RELATED OFFENSES,PL 1200001,M,M,23,0,25-44,F,BLACK,1000213,228833,40.794755,-73.942348,POINT (-73.9423482609703 40.79475532416718)
2,298921520,2025-01-05,779,"PUBLIC ADMINISTRATION,UNCLASSI",126.0,MISCELLANEOUS PENAL LAW,PL 215510B,F,K,76,0,45-64,M,WHITE,0,0,0.0,0.0,POINT (0 0)
3,299008265,2025-01-07,105,STRANGULATION 1ST,106.0,FELONY ASSAULT,PL 1211200,F,Q,113,0,45-64,M,BLACK,1046399,187126,40.680086,-73.775931,POINT (-73.775931 40.680086)
4,298969999,2025-01-06,793,WEAPONS POSSESSION 3,118.0,DANGEROUS WEAPONS,PL 2650201,F,M,5,73,25-44,M,WHITE,983907,199958,40.715526,-74.001238,POINT (-74.001238 40.715526)
