# NOAA Weather Data Scrape

In this section, we will retrieve and query the weather data in the NYC through NOAA (National Oceanic and Atmospheric Administration). 

2021 Hourly Data Collected at Central Park NY : [NY CENTRAL PARK 2021 WEATHER DATA](https://www.ncei.noaa.gov/data/global-hourly/access/2021/72505394728.csv)

2022 Hourly Data Collected at Central Park NY : [NY CENTRAL PARK 2021 WEATHER DATA](https://www.ncei.noaa.gov/data/global-hourly/access/2022/72505394728.csv)


### Aim: 
- Join this data to the hourly pickup data to predict future hourly taxi ride demand.

### Data dictionary:
- Can be retrieved from this link: [FEDERAL CLIMATE COMPLEX DATA DOCUMENTATION FOR INTEGRATED SURFACE DATA](https://www.ncei.noaa.gov/data/global-hourly/doc/isd-format-document.pdf) 

In [243]:
import requests
import pandas as pd
import json
import numpy as np
import datetime as dt

# TOKEN = 'prFURygHhcjchMdwFdWXiQwJyTzpWoDf'
# STATION_ID = 'GHCND:USW00094728'

In [223]:
df2021 = pd.read_csv("https://www.ncei.noaa.gov/data/global-hourly/access/2021/72505394728.csv")
df2022 = pd.read_csv("https://www.ncei.noaa.gov/data/global-hourly/access/2022/72505394728.csv")

  exec(code_obj, self.user_global_ns, self.user_ns)


In [224]:
df = pd.concat([df2021, df2022])

In [212]:
(31+30+31+31+28+31+30)*24

5088

In [251]:
def preprocess(hourly_data):
    # Get the hourly weather report type
    df = hourly_data.loc[hourly_data['REPORT_TYPE'] == 'FM-15', :]
    
    # Extract the unscaled values for each column
    df.loc[:,'WND'] = df['WND'].apply(lambda x: int(x.split(',')[-2])/10).replace(999.9, np.nan)
    df.loc[:,'TMP'] = df['TMP'].apply(lambda x: int(x.split(',')[0])/10).replace(999.9, np.nan)
    df.loc[:,'DEW'] = df['DEW'].apply(lambda x: int(x.split(',')[0])/10).replace(999.9, np.nan)
    df.loc[:,'SLP'] = df['SLP'].apply(lambda x: int(x.split(',')[0])/10).replace(9999.9, np.nan)
    df.loc[:,'AA1'] = df['AA1'].apply(lambda x: np.nan if x != x else int(x.split(',')[1])/10).replace(999.9, np.nan)
    
    
    # Impute missing data using data from an hour before
    df.ffill(inplace=True)
    
    # Filter data to period between 2021-10 to 2022-04
    processed_data = df.loc[(df['DATE'] <= '2022-05-01') & (df['DATE'] >= '2021-10-01'), :]
    
    # Extract date and hour from datetime column
    processed_data.loc[:,'DATE'] = pd.to_datetime(processed_data['DATE'])
    processed_data.loc[:,'HOUR'] = processed_data['DATE'].dt.hour
    processed_data.loc[:,'DATE'] = processed_data['DATE'].dt.date
    
    return processed_data[['DATE',
                           'HOUR',
                           'TMP',
                           'DEW',
                           'SLP',
                           'AA1']]
    

In [252]:
df2 = preprocess(df)

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  self._setitem_single_column(ilocs[0], value, pi)
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  downcast=downcast,
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  self.obj[key] = value


In [254]:
df2.to_csv("/Users/oliver/Downloads/weather.csv")

In [253]:
df2

Unnamed: 0,DATE,HOUR,TMP,DEW,SLP,AA1
8835,2021-10-01,0,16.1,6.7,1021.5,0.0
8836,2021-10-01,1,16.1,7.2,1022.1,0.0
8837,2021-10-01,2,14.4,7.8,1022.4,0.0
8838,2021-10-01,3,13.3,7.2,1022.3,0.0
8839,2021-10-01,4,12.8,7.8,1022.3,0.0
...,...,...,...,...,...,...
3871,2022-04-30,19,19.4,-8.3,1016.8,0.0
3872,2022-04-30,20,18.9,-9.4,1016.9,0.0
3873,2022-04-30,21,18.9,-8.3,1016.9,0.0
3874,2022-04-30,22,16.1,-3.9,1017.4,0.0
