# WEATHER AND CRIME IN NEW YORK CITY

At the time I conducted this analysis in May 2025, spring was well on its way and summer was breaking out in New York City. Sure, the flowers bloom at this time of the year; but I had also heard that hot weather increases tensions and can bring out the worst in people. Using NYC MTA data, could I analyze whether there might be a correlation between the weather and crime? I set out to find out.

I first downloaded data from the MTA open data sets at data.ny.gov and also temperature data among others from the NOAA US federal agency for their Central Park station. Since the MTA open data for crimes committed only goes back to 2019 while the weather data goes back to at least 1980, the analysis will be limited to 2019 for the purposes of this Jupyter Notebook.

In [7]:
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

crime = pd.read_csv("./MTA_Major_Felonies_20250509.csv", parse_dates=["Month"])
weather = pd.read_csv("./weather_noaa.csv", parse_dates=["DATE"])

# have a look at what I've loaded to make sure it makes sense 

crime.head(), weather.head()

(       Month Agency Police Force     Felony Type  Felony Count  \
 0 2019-01-01    MNR        MTAPD        Burglary             1   
 1 2019-01-01    MNR        MTAPD  Felony Assault             3   
 2 2019-01-01    MNR        MTAPD          Murder             0   
 3 2019-01-01    MNR        MTAPD            Rape             0   
 4 2019-01-01    MNR        MTAPD         Robbery             1   
 
    Crimes per Million Riders  
 0                       0.15  
 1                       0.44  
 2                       0.00  
 3                       0.00  
 4                       0.15  ,
        STATION                         NAME  LATITUDE  LONGITUDE  ELEVATION  \
 0  USW00094728  NY CITY CENTRAL PARK, NY US  40.77898  -73.96925       42.7   
 1  USW00094728  NY CITY CENTRAL PARK, NY US  40.77898  -73.96925       42.7   
 2  USW00094728  NY CITY CENTRAL PARK, NY US  40.77898  -73.96925       42.7   
 3  USW00094728  NY CITY CENTRAL PARK, NY US  40.77898  -73.96925       42.7   
 4 

Since (at least initially) I'm going for a holistic view of all of the data, I'm going to simplify all of these disparate crime (felony) types into just "Total Crimes" for a particular month.

In [11]:
monthly_crime = crime.groupby("Month")["Felony Count"].sum().reset_index()
monthly_crime.columns = ["Date", "Total_Crimes"]
# Let's see how it looks so far
monthly_crime.head()

Unnamed: 0,Date,Total_Crimes
0,2019-01-01,214
1,2019-02-01,207
2,2019-03-01,183
3,2019-04-01,194
4,2019-05-01,215
