# NOAA Weather Station Data Analysis Recreation
By: Aaron Padilla
Date Start: 5/15/2025
Date Completed: ___

This is conceptually based on a project I did while at Northrop Grumman Co. 

Link to Database: https://www.ncei.noaa.gov/data/global-summary-of-the-day/access/
^ This is apparently a legacy application, since replaced by a series of API's and AWS hosting services.
That said, for the sake of familiarity, I'll engineer the data similar to how I did back then, then load it into the notebook.

---

Goal: 

Overall Goal: Provide a proof of concept by gathering critical weather observations contributing to feasibility studies for various clientelle (aviation/aerospace, agriculture, climate research, etc.)

Sample Cities/Weather Stations:

* New York
* Los Angeles
* Tokyo
* Beijing
* Paris
* London


Objectives:

[ ] Display timeseries weather trends at international weather stations (<b><u>line plots</u></b>)

[ ] Take <b><u>Histograms</u></b> of various series parameters within the dataset

    [ ] Wind Speed

    [ ] Gust

    [ ] Precipitation

    etc

--- Further Objectives Listed Below ---


## Initial Data Intake & Engineering

In [12]:
# Initial Library Imports

import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sb
import numpy as np
import itertools

In [64]:
# Data Intake & Initialization:

# Dictionary of All Weather Stations for this study's major cities indexed by name and station number designation
# Format is:
#
# KEY : [STATION NAME (NOAA), STATION NUMBER (NOAA ADJUSTED)] 
#
# This is meant to be itterable later on. Also helps research with 

locationDict = {
    "New York":"74486094789",
    "Los Angeles":"72295023174", 
    "Tokyo":"47662099999",
    "Beijing":"54511099999",
    "Paris":"71560099999",
    "London":"03772099999"}

# List of the locations matching the keys in the above dictionary just in case I need it to itterate. (THIS IS REDUNDANCY)
locationList = locationDict.keys()

# --------------------------------------------------------------------------------
# For Loops to Assemble Dataframes from File

# What is this intended to do?

# 1) Itterate through a list or dictionary to:
#     a) assemble all dataframes of annualized data
#        I) by location
#       II) from earliest to latest (2020-2025)
#
# "How" would I do that?
#  
# 2) Structure of the loops:
#     a) Instanciate each dataframe as its own variable (as a blank dataframe)
#     b) Grab Keys from a dictionary or the list above
#         I) Have the location keys match the variable designators (for redundancy)
#
#     # This is where the looping starts
#     c) PSEUDOCODE:
#    -----
#    
#    for loc, year in itertools.product(locationList, np.arange(2020, 2026))
#        
#        [locationDataFrame] = pd.concat([locationDataFrame, pd.convert_csv(f"./Weather Data Analysis/{locationList[i]}/{locationsList[i]}/{LocationsList[i]}-{year}-{locationDict[LocationList[i]}.csv"])
#        
#    -----
#     
# Remember, flat is better than nested, but, I'll add: "Separated is is better than integrated if possible"

# --------------------------------------------------------------------------------

# initializing all dataframes to be itterated onto
nyData = pd.DataFrame()
laData = pd.DataFrame()
tokyoData = pd.DataFrame()
beijingData = pd.DataFrame()
parisData = pd.DataFrame()
londonData = pd.DataFrame()

In [152]:
def appendAllYears(df, location):

    # Empties the existing dataframe
    df = pd.DataFrame()
    
    # copying globals to local to speed compile time
    loclocationDict = locationDict

    # for loop which concatenates all weather data into a dataframe across years from earliest to latest by location initialized in the def parameters 
    for year in np.arange(2020, 2026):
       
        df = pd.concat([df, pd.read_csv(f"./Weather Data/{location}/{location.replace(" ", "")}-{year}-{loclocationDict[f"{location}"]}.csv")]).replace(999.9, 0.0)
#                                                                            Location (file directory)/"Location-year-[station number]"
    df['YEAR'] = df.DATE.str[0:4]
    df['MONTH-DAY'] = df.DATE.str[5:]
    df['LOCATION'] = location
    df= df[['LOCATION']+["YEAR"]+["MONTH-DAY"]+[col for col in list(df.columns)[:-3]]]
    
    return df


In [155]:
# Populating all dataframes with time series data
nyData = appendAllYears(nyData, "New York")
laData = appendAllYears(laData, "Los Angeles")
tokyoData = appendAllYears(tokyoData, "Tokyo")
beijingData = appendAllYears(beijingData, "Beijing")
parisData = appendAllYears(parisData, "Paris")
londonData = appendAllYears(londonData, "London")

# This list of dataframes is meant only for itteration purposes and not for transforming the dataframes themselves.
# Changes made to the values in this list will only affect the values in this list, as they are copies enumerated into this list.
# This is where pointers would be helpful. But I don't want to deal with that right now. 
# Therefore: Do not use the values in this list to transform the data in the dataframes themselves.

itterableLocations = [nyData, laData, tokyoData, beijingData, londonData]

In [154]:
# Sanity check on New York
nyData.head()

# Features that should not be present:
# 1) Other Location's Data
# 2) Repeat Year Data
# 3) 
# 

# print("Columns: \n")
# for i in np.arange(0, len(nyData.columns)): print(nyData.columns[i])
# print(" /n"+" Unique values for Each Column: ")


Unnamed: 0,LOCATION,YEAR,MONTH-DAY,STATION,DATE,LATITUDE,LONGITUDE,ELEVATION,NAME,TEMP,...,MXSPD,GUST,MAX,MAX_ATTRIBUTES,MIN,MIN_ATTRIBUTES,PRCP,PRCP_ATTRIBUTES,SNDP,FRSHTT
0,New York,2020,01-01,74486094789,2020-01-01,40.63915,-73.7639,2.7,"JFK INTERNATIONAL AIRPORT, NY US",39.8,...,20.0,26.0,44.1,,36.0,,0.0,G,0.0,0
1,New York,2020,01-02,74486094789,2020-01-02,40.63915,-73.7639,2.7,"JFK INTERNATIONAL AIRPORT, NY US",37.9,...,15.0,22.0,48.0,,30.9,,0.0,G,0.0,0
2,New York,2020,01-03,74486094789,2020-01-03,40.63915,-73.7639,2.7,"JFK INTERNATIONAL AIRPORT, NY US",44.1,...,9.9,0.0,48.0,,30.9,,0.04,G,0.0,10000
3,New York,2020,01-04,74486094789,2020-01-04,40.63915,-73.7639,2.7,"JFK INTERNATIONAL AIRPORT, NY US",47.1,...,8.9,0.0,51.1,,42.1,,0.23,G,0.0,110000
4,New York,2020,01-05,74486094789,2020-01-05,40.63915,-73.7639,2.7,"JFK INTERNATIONAL AIRPORT, NY US",41.3,...,28.0,41.0,51.1,,32.0,,0.02,G,0.0,10000


## Comparing Temperatures Across All Dates Across All Previous Years

The reason the 

In [164]:
def compareDateTempsPerYear(df):

    plotDf = pd.DateFrame(colums = list(df.YEAR.unique()[:-1]))

    


In [163]:
nyData.YEAR.unique()[:-1]

array(['2020', '2021', '2022', '2023', '2024'], dtype=object)