# NYC Weather Data
The NYC weather data is from the National Centers for Environmental Information, https://www.ncdc.noaa.gov/cdo-web/datasets/GHCND/stations/GHCND:USW00094728/detail. The data is from the weather station at `NY CITY CENTRAL PARK, NY US` from `1869-01-01` to `2023-02-27`. And from the dates we need weather data, from 2006-01-01 to 2021-12-31, we see that are no null values. So there's minimal processing besides converting the date string to datetime, filtering by a time range, and selecting the columns we want.

## Data Dictionary
The data dictionary is available in the link above under the `Available Data Types` in `Station Data Inventory, Access & History`. It will show that `TAVG`, `TMIN`, `TMAX` is average, minimum, and maximum temperature respectively.

## Purpose
The only columns I'll use for in weather data is average temperature, minimum temperature, maximum temperature, precipitation, and snow fall. I think these columns will be meaningful to see the correlation between crime rates in NYC and the variables.

In [1]:
import pandas as pd
import plotly.graph_objects as go
import plotly.io as pio

pio.renderers.default = "iframe"

In [2]:
def read_weather_data(fname: str="USW00094728.csv") -> pd.DataFrame:
    df = pd.read_csv(fname, parse_dates=["DATE"])
    subset_df = df[["DATE", "TAVG", "TMAX", "TMIN", "PRCP", "SNOW"]]
    timerange = subset_df.query("DATE.between('2006-01-01', '2021-12-31')")
    return timerange.reset_index(drop=True)

In [3]:
df = read_weather_data()
df

Unnamed: 0,DATE,TAVG,TMAX,TMIN,PRCP,SNOW
0,2006-01-01,4.93,8.64,1.23,126.8,51.0
1,2006-02-01,2.07,5.78,-1.63,73.1,683.0
2,2006-03-01,6.15,10.33,1.98,20.3,33.0
3,2006-04-01,13.15,18.48,7.82,141.2,3.0
4,2006-05-01,17.27,21.97,12.57,117.5,0.0
...,...,...,...,...,...,...
187,2021-08-01,25.30,28.81,21.80,262.1,0.0
188,2021-09-01,21.27,24.79,17.76,254.8,0.0
189,2021-10-01,16.67,19.93,13.41,133.7,0.0
190,2021-11-01,7.91,11.54,4.29,28.6,0.0


In [4]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 192 entries, 0 to 191
Data columns (total 6 columns):
 #   Column  Non-Null Count  Dtype         
---  ------  --------------  -----         
 0   DATE    192 non-null    datetime64[ns]
 1   TAVG    192 non-null    float64       
 2   TMAX    192 non-null    float64       
 3   TMIN    192 non-null    float64       
 4   PRCP    192 non-null    float64       
 5   SNOW    192 non-null    float64       
dtypes: datetime64[ns](1), float64(5)
memory usage: 9.1 KB


In [5]:
df.describe()

Unnamed: 0,TAVG,TMAX,TMIN,PRCP,SNOW
count,192.0,192.0,192.0,192.0,192.0
mean,13.373802,17.307656,9.439583,111.375521,66.770833
std,8.664113,8.979786,8.384961,61.889651,161.756101
min,-4.41,0.08,-8.9,9.2,0.0
25%,5.4075,9.2875,1.5725,71.0,0.0
50%,13.31,17.595,9.225,102.75,0.0
75%,21.67,25.8625,17.615,138.675,34.25
max,27.41,32.27,22.97,481.3,937.0


In [6]:
df.isna().any()

DATE    False
TAVG    False
TMAX    False
TMIN    False
PRCP    False
SNOW    False
dtype: bool

In [7]:
fig = go.Figure()


average_temperature = go.Scatter(
    x=df.DATE,
    y=df.TAVG,
    hovertemplate="<i>Date</i>: %{x}"
          "<br><i>Temperature</i>: %{y}°C<br>"
          "<extra></extra>",
    mode='lines',
    name="Average Temperature"
)

max_temperature = go.Scatter(
    x=df.DATE,
    y=df.TMAX,
    hovertemplate="<i>Date</i>: %{x}"
          "<br><i>Temperature</i>: %{y}°C<br>"
          "<extra></extra>",
    mode='lines',
    name="Max Temperature"
    
)

min_temperature = go.Scatter(
    x=df.DATE,
    y=df.TMIN,
    hovertemplate="<i>Date</i>: %{x}"
          "<br><i>Temperature</i>: %{y}°C<br>"
          "<extra></extra>",
    mode='lines',
    name="Min Temperature"
    
)

fig.add_trace(average_temperature)
fig.add_trace(max_temperature)
fig.add_trace(min_temperature)

fig.update_layout(
    title="Average Temperature from 2006-2021",
    xaxis_title="Date",
    yaxis_title="Average Temperature",
)
fig.show()

Here we see seasionality that it's hotter in the summer months (June, July, August) and colder in the winter months (December, January, February)

In [None]:
# https://harris-ippp.github.io/weather.html