<center>
    course: <b>Data Analytics with Statistics</b> | lecturer: <b>Prof. Dr. Jan Kirenz</b> | Date: <b>29.12.2023</b>  | Name: <b>Julian Erath, Furkan Saygin, Sofie Pischl</b> | Group: <b> Group B</b> 
</center>

# Final Report
# Weather Data Analysis: A Regression and Classification Approach on the ERA5 Dataset
---

Group name:  Group iBm

---

## 1. Introduction and data

### 1.1 Motivation

Weather, a phenomenon as old as Earth itself, has long been a subject of human fascination. The complex interplay of temperature, wind, and precipitation shapes our environment, affects our lives, and challenges our understanding of the natural world [^1]. With the advent of technology and data analysis techniques, a deeper exploration of these complex processes is now possible. [^2].  As discussed in a systematic review titled "The contribution of weather forecast information to agriculture, water, and energy sectors in East and West Africa," accurate weather prediction is crucial for effective agriculture, disaster management, and urban planning, particularly in the context of climate change risks[^3]. The project, titled "Weather Data Analysis: A Regression and Classification Approach on the ERA5 Dataset", aims to contribute to this ongoing exploration by examining how different variables interact to create complex weather phenomena.

This project leverages the ERA5, which is a high-quality global atmospheric reanalysis dataset covering multiple decades [^4]. This data, examined for its comprehensiveness in the "Evaluation of spatial-temporal variation performance of ERA5 precipitation data in China," is a example for researchers seeking to understand weather patterns and atmospheric dynamics[^5].  The analysis will focus on the region of Bancroft in Ontario, Canada.  The geographic location of Bancroft can be seen under "../references/images/complete_map.png". This decision is rooted in the unique climatic and meteorological characteristics of the region, which offer a distinctive case study for understanding complex weather phenomena. The geographical and climatic characteristics of Bancroft are strongly influenced by the so called 'lake-effect', which is a phenomenon where cold winds moving over warmer lake waters and leads to significant changes in weather patterns nearby[^6]. This  creates an excellent case study for analyzing the relationships among different atmospheric elements, which also leads to interest in that area by IBM and IBM clients [^7]. The work presented is limited to the evaluation of the years 2015 to 2022 in order to include only the current and latest climate developments. 
Focusing on Bancroft enables a targeted application of regression and classification techniques, facilitating a detailed analysis of localized weather patterns and their impacts [^8], [^9].

### 1.2 Data

### 1.2.1 Data description of sample
The ERA5 dataset, sourced from the European Centre for Medium-Range Weather Forecasts (ECMWF), is comprised of atmospheric reanalysis data spanning multiple decades (2015-2022) at hourly intervals and characterized by a spatial resolution of approximately 31 km. Various meteorological parameters such as temperature, precipitation, wind speed, and atmospheric pressure, grouped by average, minimum, and maximum values for the observed hour, are included in the dataset. The data, labeled by meteorologists and data scientists from IBM and The Weather Company, offers comprehensive global-scale atmospheric information, with each observation representing a set of meteorological parameters at a specific location and time. Recognized for its high quality and precision, the ERA5 dataset's enhanced spatial and temporal resolution makes it well-suited for detailed analyses and modeling across diverse applications, including climate research, environmental monitoring, and weather forecasting [^10].

### 1.2.2 Variables
The dataset, crucial for this analysis, encompasses key variables such as air temperature, wind speed and direction, precipitation (rainfall and snowfall), atmospheric pressure, snow density, cumulative snow, cumulative ice, and weather events. Temperature is measured in Kelvin, while wind information includes zonal and meridional components. Precipitation data is essential for hydrology and agriculture, and atmospheric pressure variations are associated with weather patterns. The dataset also includes categorical weather events such as Blue Sky Day, Mild Snowfall, and Storm with Freezing Rain. These variables form the foundation for the assignment's comprehensive analysis [^11].

### 1.2.3 Overview of data 
Initially, the .csv file is loaded, and the data's head is printed for an initial overview of columns (variables) and rows (observations). The dataset comprises 65,345 observations and 184 columns, including unique predictor variables and a response variable. Identifier variables like "Unnamed: 0" are identified and dropped due to redundancy, while 'run_datetime' and 'valid_datetime' are transformed into datetime format. A new column, 'avg_temp_celsius,' is created by converting temperatures from Kelvin to Celsius. Wind directions in 'avg_winddir' are then categorized into cardinal directions using a function, resulting in the 'wind_direction_label' column. Subsequently, a new dataframe is formed by selecting specific columns for optimized resource usage. This dataframe is later split into training, testing, and validation sets, underlining the foundational role of proper data splitting for reliable machine learning model development and generalization to new data [^12][^13].

In [2]:
# import dependencies 
import pandas as pd
import numpy as np
import datetime
import math
import matplotlib.pyplot as plt
from matplotlib import rcParams
import seaborn as sns
from sklearn.preprocessing import StandardScaler
import warnings
warnings.filterwarnings("ignore")

#import data
df_full = pd.read_csv("../data/external/feature_data_substation_bancroft_labelled.csv")
df_full.head()

ModuleNotFoundError: No module named 'pandas'

In [None]:
# Change datetime columns to datetime format
df_full['run_datetime'] = pd.to_datetime(df_full['run_datetime'])
df_full['valid_datetime'] = pd.to_datetime(df_full['valid_datetime'])

# Add temperature column for degree in celsius
df_full['avg_temp_celsius'] = df_full['avg_temp'] - 273.15

# Translate wind directions into cardinal directions
def wind_direction_label(degrees):
    if 22.5 < degrees <= 67.5:
        return 'Northeast'
    elif 67.5 < degrees <= 112.5:
        return 'East'
    elif 112.5 < degrees <= 157.5:
        return 'Southeast'
    elif 157.5 < degrees <= 202.5:
        return 'South'
    elif 202.5 < degrees <= 247.5:
        return 'Southwest'
    elif 247.5 < degrees <= 292.5:
        return 'West'
    elif 292.5 < degrees <= 337.5:
        return 'Northwest'
    else:
        return 'North'
    
df_full['wind_direction_label'] = df_full['avg_winddir'].apply(wind_direction_label)

### 1.3 Research Questions

**Regression Analysis:**  
Is it possible to accurately predict the temperature and its relationship with wind characteristics using historical data? This question is inspired by the current understanding of atmospheric variables and their interactions and specified as follows:  

*1. Temperature Prediction:* Is it possible to build an accurate regression model to predict temperature based on historical data?   
*2. Temperature and Wind Modeling:* Is it possible to find a correlation or causation between the temperature and the wind features windspeed, windgust or winddirection using regression techniques?   
*3. Multivariate Temperature Prediction / Linear Regression of Temperature with Multiple Predictors:* How does the incorporation of multiple atmospheric predictors, such as windspeed, windgust, winddirection, air pressure, snow and ice parameters enhance the accuracy of temperature prediction compared to a model solely based on windspeed?   
*4. Extreme Weather Event Prediction by Temperature in Logistic Regression:* Can logistic regression effectively classify and predict the occurrence of extreme or normal weather events based on temperature (or alternatively windspeed) ranges?   

*Regression Hypothesis:* There exists a significant correlation between temperature and wind characteristics, which can be modeled to predict future temperature trends and variations. This hypothesis is based on the premise that atmospheric variables are interconnected and can be analyzed to forecast weather conditions.


**Classification Analysis:**  
Can effective classification and prediction of weather events, including extreme occurrences, be achieved based on multivariate weather data? This aspect of the research aims to develop methods for accurate prediction of weather events, acknowledging the complexity and variability of weather patterns and is specified as follows:

*1. Extreme Weather Events in Binary Classification:* Is it possible to classify and predict extreme weather events such as storms? This involves training a binary classification model to identify patterns indicative of extreme events. The result of this classification analysis is the prediction of extreme weather events based on the current weather data and a model that was trained on historical weather data. 
*2. Weather Event and Pattern Classification in Multiclass Classification:* Is it possible to categorize and predict different extreme weather events based on multivariate weather data? This involves using multiclass classification algorithms. The results of this classification analysis is the prediction of certain weather events based on the current weather data and a model that was trained on historical weather data.
 temperature trends and variations. This hypothesis is based on the premise that atmospheric variables are interconnected and can be analyzed to forecast weather conditions.

* Classification Hypthesis:* Specific patterns in the weather data can accurately predict various weather events, including extreme conditions. This hypothesis is informed by the need for effective prediction models in the face of increasingly frequent and severe weather events.

## 1.4 Exploratory Data Analysis (EDA)

!EDA of response variable!

In [None]:
import matplotlib.pyplot as plt
import seaborn as sns


plt.figure(figsize=(12, 6))
sns.lineplot(x='run_datetime', y='avg_temp', data=df_full)
plt.title('Average Temperature Over Time')
plt.xlabel('Time')
plt.ylabel('Average Temperature')
plt.show()

plt.figure(figsize=(12, 6))
sns.boxplot(x='avg_temp', data=df_full)
plt.title('Histogram with Median for Average Temperature')
plt.xlabel('Average Temperature')
plt.show()

plt.figure(figsize=(12, 6))
sns.countplot(x='run_datetime', hue='category_variable', data=df_full)
plt.title('Count of Categorical Variable Over Time')
plt.xlabel('Time')
plt.ylabel('Count')
plt.show()


plt.figure(figsize=(12, 6))
sns.boxplot(x='category_variable', data=df_full)
plt.title('Histogram with Median for Categorical Variable')
plt.xlabel('Categorical Variable')
plt.show()


[^1]: Liljequist, G.H. / Cehak, K. (1984): Allgemeine Meteorologie. 3. Auflage, Springer-Verlag.  
[^2]: Fathi, M. / Haghi Kashani, M. / Jameii, S. M. / Mahdipour, E. (2022): Big Data Analytics in Weather Forecasting: A Systematic Review, in: Archives of Computational Methods in Engineering 29.2 (2022, Springer): 1247–1275  
[^3]: [The contribution of weather forecast information to agriculture, water, and energy sectors in East and West Africa](https://www.frontiersin.org/articles/10.3389/fenvs.2022.935696/full)  
[^4]: ECMWF (2023a): ERA5: data documentation. URL: https://confluence.ecmwf.int/display/CKB/ERA5%3A+data+documentation  
[^5]: [Evaluation of spatial-temporal variation performance of ERA5 precipitation data in China](https://www.nature.com/articles/s41598-021-97432-y)  
[^6]: [A Hybrid Dataset of Historical Cool-Season Lake Effects From the Eastern Great Lakes of North America](https://www.frontiersin.org/journals/water/articles/10.3389/frwa.2022.788493/full)   
[^7]: Hjelmfelt, M.R. (1990): Numerical study of the influence of environmental conditions on lake-effect snowstorms over Lake Michigan, in: Monthly Weather Review, 118(1), pp.138-150.  
[^8]: Ghirardelli, J.E. (2005): An Overview of the Redeveloped Localized Aviation Mos Program (Lamp) For Short-Range Forecasting.     
[^9]: de Lima, Glauston, R.T. / Stephan, S. (2013): A new classification approach for detecting severe weather patterns, in: Computers & geosciences 57 (2013): 158-165.  
[^10]: ECMWF (2023a): ERA5: data documentation. URL: https://confluence.ecmwf.int/display/CKB/ERA5%3A+data+documentation
[^11]: ECMWF (2023c): ERA5: data documentation parameterlistings. URL: https://confluence.ecmwf.int/display/CKB/ERA5%3A+data+documentation#ERA5:datadocumentation-Parameterlistings
[^12]: Scikit-learn (2023a): https://scikit-learn.org/stable/documentation.html
[^13]: Hastie, T., Tibshirani, R., & Friedman, J. (2009). The Elements of Statistical Learning.

## Methodology

> REMOVE THE FOLLOWING TEXT

This section includes a brief description of your modeling process.

Explain the reasoning for the type of model you're fitting, predictor variables considered for the model.

Additionally, show how you arrived at the final model by describing the model selection process, variable transformations (if needed), assessment of conditions and diagnostics, and any other relevant considerations that were part of the model fitting process.

### 2.1 Methodological Insights and Impact
This project employs a range of statistical and machine learning techniques to analyze the ERA5 dataset. The outcomes of this research are expected to not only enhance meteorological understanding but also provide practical tools for weather prediction, with implications for environmental policy and public safety.
By harnessing the power of the ERA5 dataset [^10] and advanced analytical techniques [^11], this project aims to contribute valuable insights to the field of meteorology and support informed decision-making in a world increasingly affected by weather-related challenges.

### 2.2 Methodology Overview
This project uses a multifaceted approach, primarily utilizing Design Science Research (DSR) [^12] in line with Hesse's framework[^13]. This methodological framework focuses on the creation and critical evaluation of artifacts to address specific problems. In this DSR approach, complex weather phenomena are identified as problems to be addressed. Additionally, iterative prototyping [^14] is employed, enabling systematic refinement of models and methods based on continuous evaluation and integration of data-driven insights[^15]. This project combines the DSR cycle by Gregor / Hevner, with iterative prototyping by Wilde / Hess and Goldman / Narayanaswamy. This integration fosters a dynamic environment where each prototype's development and evaluation progressively inform subsequent cycles of design and analysis. This leads into a cycle of artifact creation (in this case, models and algorithms) specifically tailored to analyze weather patterns in Bancroft using the ERA5 dataset. These models will be refined continuously through iterative prototyping, where each iteration's outcomes inform the next cycle, ensuring they are increasingly effective and accurate. The artifacts are then rigorously evaluated against the research questions. The results are evaluated in every iteration using cross-validation by Shao 1993 and Browne 2000[^16].
This process is enriched by a comprehensive literature review by Webster / Watson [^17], conducted before and during the implementation, ensuring the methods and analyses remain aligned with current meteorological and data science advancements. The amalgamation of DSR, iterative prototyping, cross-validation and literature research forms the foundation of this approach, ensuring a thorough and robust analysis that is well-suited to address the complexities in atmospheric data analysis.

For the analysis the data was split into training and testing datasets according to [^22] and [^23].
[^22]: Scikit-learn (2023a): https://scikit-learn.org/stable/documentation.html
[^23]: Hastie, T., Tibshirani, R., & Friedman, J. (2009). The Elements of Statistical Learning.


## Results

> REMOVE THE FOLLOWING TEXT

This is where you will output the final model with any relevant model fit statistics.

Describe the key results from the model.
The goal is not to interpret every single variable in the model but rather to show that you are proficient in using the model output to address the research questions, using the interpretations to support your conclusions.

Focus on the variables that help you answer the research question and that provide relevant context for the reader.


## Discussion + Conclusion


In this exploratory data analysis project, titled "Weather Data Analysis: A Regression and Classification Approach on the ERA5 Dataset," an endeavor was made to understand and predict weather patterns in the Bancroft region of Ontario, Canada, utilizing the ERA5 dataset from 2015 to 2022. The project was motivated by the significant role that accurate weather prediction plays in various sectors and the unique climate characteristics of the region, influenced by the lake effect. The complex task of weather prediction was approached through regression and classification analyses, with the aim of modeling weather parameters and predicting weather events, including extreme conditions.

## Regression Analysis Findings
The regression analyses in chapter 5.3.4. aimed to forecast the temperature with historical data. The research question "Is it possible to build an accurate regression model to predict temperature based on historical data?" can be answered with yes. The results indicated that while an exact daily weather forecast remains challenging, satisfactory levels of accuracy in predicting the temperature forecast for the general temperature trends over the year using linear regression models were achieved. Support Vector Regressor emerged as the most effective model in this context.

However, predicting the temperature using the wind speed variable in a linear regression analysis (5.3.1.) or a paramerer mix of different variables in a multiple predictor linear regression analysis (5.3.2.) did not proof to be successfull. The research question "Is it possible to find a correlation or causation between the temperature and windspeed, windgust or winddirection using regression techniques?" can therefore be answered with no. Using the given data, linear regression models cannot be successfully, with a satisfactory accuracy, be used to predict the temperature with the wind variables or a mix of variables. The analyses proofed, that the cause of this is for one the mostly non-linear relationship of the parameters, which proofs to be a lot more complex. The correlation of the parameters also was not sufficiently high for a linear regression analysis. For these reasons it was decided that the further approach is to continue the analysis with logistic regression and classification techniques.

Also, the SARIMAX model used for temperature and wind modeling showed a systematic bias, consistently overestimating temperatures. This highlighted the limitations of using SARIMAX for predicting temperature variations based on wind, prompting the need for further refinement and exploration of alternative modeling approaches.

The last regression analysis was using logistic regression techniques to classify extreme weather and bluse sky day events in a binary classification. The research question was "Can logistic regression effectively classify and predict the occurrence of extreme or normal weather events based on temperature (or alternatively windspeed) ranges?". The approach to predict extreme weather events based solely on temperature using logistic regression's techniques (5.3.5) was insufficient, often misclassifying extreme events as normal. This underscored the necessity for more complex or multivariate approaches to accurately anticipate hazardous weather conditions. Instead of further optimizing the logistic regression approach (e.g., hyperparameter tuning or multiple predictor regression), it was decided that the more promising approach would be to identify further binary classifiers as presented in the classification analysis chapters.

## Classification Analysis Findings
In binary classification the goal was to predict whether an observation was an extreme weather or blue sky day event (5.4.1.). The asked research question was "Is it possible to classify and predict extreme weather events such as storms?". It was identified that extreme weather events can indeed very accurately be separated from blue sky day events and both classes can be predicted with a very high accuracy, precision and recall. "ExtraTreesClassifier," "XGBClassifier," "RandomForestClassifier," and "LGBMClassifier" are the top-performing classifiers based on LazyClassifier's assessment. Each demonstrated high accuracy, with XGBoost slightly leading the pack. These models proved effective in categorizing and predicting weather events from the given data, providing valuable tools for future weather prediction endeavors. The results of this analysis could then be used in multiclass classification, to determine the specific type of extreme weather event.

The multiclass classification further nuanced the understanding of various weather events (5.4.2.). The goal was to determine and classify the specific type of extreme weather event, answering the research question "Is it possible to categorize and predict different extreme weather events based on multivariate weather data?". The research question can be answered with yes, the prediction and categorization of various extreme weather events is possible with a very high accuracy, precision and recall. Gradient boosting emerged as a particularly potent method, achieving high precision, recall, and F1-scores across all classes. This success illustrates the potential of sophisticated classification algorithms in deciphering complex weather patterns and predicting diverse weather events. This knowlegde can then also be used by scientists for further research for governmental institutions, e.g., when it comes to taking countermeasures to prevent damage from certain extreme weather events and minimize the risks and dangers.

## Critical reflection and outlook
This project's journey through regression and classification analyses of weather data has provided a deeper insight into the atmospheric dynamics of Bancroft, Ontario. While profound analyses and research in temperature trend prediction and weather event classification was made, the challenges and inaccuracies encountered remind of the complexities inherent in meteorological studies. Predicting the weather accurately remains a daunting task, demanding continual refinement of models and methods.

Critically reflecting on the used methodology and procedure, it can be said, that after conducting the EDA, which showed, that wind and temperature variables do not correlate or show association, further regression analyses of these parameters could have been discontinued there. But since scientific literature discussed, that such an analysis proofs to be valuable and other researchers already made successfull aproaches with those techniques on this data, the approach was continued, to further evaluate patterns and characteristics of temperature and wind parameters and their correlations. Especially the mutliple predictor analysis promised to deliver interesting results and after all, the research question and hypotheses could succesfully be refuted, which in the end is a very valuable contribution to science.

Furthermore, the original ERA5 dataset was reduced using PCA and feature forward selection / feature backward deletion to conduct analysis on feature relevance. Only a reduced dataset of parameters was used in the final analyses. In further research other variables and their association / correlation could be analyzed. For example, literature suggests that cloud cover has a strong influence on air temperature. Using the same methodology and implemenation as in this assignment, simply switching the parameters, additional research with valuable insights into meteorological data can be conducted, resulting in better results.

Moreover, the generalizability of the data has to be scrutinized, as the analyses are based on regional data and meteorological data is always very biased towards regional effects. Conducting the same analysis on data for regions in e.g., African countries might result in completely different results.

The unpredictability and irregularity of weather and temperature patterns is a crucial aspect that adds complexity to the analysis. Acknowledging the irregular nature of meteorological phenomena emphasizes the inherent challenges in making precise predictions, even with advanced analytical techniques.

While the analyses presented valuable insights, the potential for further optimization, including hyperparameter tuning, remains. Exploring these avenues could enhance the performance of the models and provide more accurate predictions, especially in the context of fine-tuning parameters for machine learning algorithms. 

The exploration of weather patterns could be expanded to include a dedicated analysis of trends related to climate change. Understanding the long-term impacts on temperature and weather events could provide valuable insights into the broader environmental context.

Acknowledging that certain factors, such as climate change and omitted variable bias, might not have been explicitly addressed in the analysis, highlights the potential sources of variance and errors. Future research could delve deeper into these factors to refine models and improve predictive accuracy.
It's essential to recognize that certain aspects, possibly including external factors or variables not considered in the analysis, might influence the weather and temperature trends. Acknowledging what falls out of the scope of the current study adds a layer of humility to the findings and encourages future researchers to explore additional dimensions.


In summary, while the project has contributed significantly to understanding and predicting weather patterns, there exist additional dimensions and challenges that warrant exploration. The complexities of meteorological studies, coupled with the acknowledgment of unpredictable weather dynamics, call for continual refinement, optimization, and consideration of broader environmental factors for a comprehensive understanding of weather phenomena.
Our findings contribute to the broader discourse on weather prediction, emphasizing the need for multidimensional approaches and the potential of machine learning techniques. As climate variability continues to present profound challenges, the insights garnered here offer a stepping stone towards more accurate, reliable, and comprehensive weather forecasting methods. Moving forward, integrating more diverse datasets, refining models, and exploring new methodologies will be crucial in enhancing the predictive capabilities and understanding of weather patterns. This endeavor not only aids in better forecasting but also in strategic planning and preparedness for the diverse impacts of weather and climate change across sectors.


## Apendix

### Data Dictionary

In [None]:
data = {
    'Name': [
        'run_datetime',
        'wep',
        'avg_temp',
        'min_wet_bulb_temp',
        'avg_dewpoint',
        'avg_temp_change',
        'avg_windspd',
        'max_windgust',
        'avg_winddir',
        'wind_direction_label',
        'max_cumulative_precip',
        'max_snow_density_6',
        'max_cumulative_snow',
        'max_cumulative_ice',
        'avg_pressure_change'
        ],
    'Description': [
        'Date and time when the weather observations were recorded.',
        'Weather Event Type (WEP) is a categorization of weather conditions based on specific parameters at a given location and time.',
        'The average temperature measured at two meters above ground level, considering all sensors, for the entire duration of one hour in the Bancroft region.',
        'Minimum wet bulb temperature recorded during the observation period, considering all sensors, for the entire duration of one hour in the Bancroft region.',
        'Average dewpoint temperature observed during the recording period, considering all sensors, for the entire duration of one hour in the Bancroft region.',
        'Average change in temperature during the observation period, obtained by calculating the difference between this observation and the following.',
        'Average wind speed measured during the recording period, considering all sensors, for the entire duration of one hour in the Bancroft region.',
        'Maximum wind gust observed during the recording period, considering all sensors, for the entire duration of one hour in the Bancroft region.',
        'Average wind direction (in degree) observed during the recording period, considering all sensors, for the entire duration of one hour in the Bancroft region.',
        'Wind direction (in cardinal direction) observed during the recording period, obtained by recalculating the avg_winddir parameter.',
        'Maximum cumulative precipitation recorded, considering all sensors, for the entire duration of one hour in the Bancroft region.',
        'Maximum snow density at a depth of 6 inches, considering all sensors, for the entire duration of one hour in the Bancroft region.',
        'Maximum cumulative snow recorded, considering all sensors, for the entire duration of one hour in the Bancroft region.',
        'Maximum cumulative ice recorded, considering all sensors, for the entire duration of one hour in the Bancroft region.',
        'Average change in atmospheric pressure during the observation period, considering all sensors, for the entire duration of one hour in the Bancroft region.'
        ],
    'Role': [
        'ID / predictor', 
        'response', 
        'response / predictor', 
        'predictor',
        'predictor',
        'predictor',
        'predictor',
        'predictor',
        'predictor',
        'predictor',
        'predictor',
        'predictor',
        'predictor',
        'predictor',
        'predictor'
        ],
    'Type': [
        'numerical continuous / ID', 
        'categorical nominal', 
        'numerical continuous', 
        'numerical continuous', 
        'numerical continuous', 
        'numerical continuous', 
        'numerical continuous', 
        'numerical continuous', 
        'numerical continuous', 
        'categorical ordinal', 
        'numerical continuous',
        'numerical continuous',
        'numerical continuous',
        'numerical continuous',
        'numerical continuous'
        ],
    'Format': [
        type(df_full['run_datetime'][0]),
        type(df_full['wep'][0]), 
        type(df_full['min_wet_bulb_temp'][0]),
        type(df_full['avg_temp'][0]), 
        type(df_full['avg_dewpoint'][0]), 
        type(df_full['avg_temp_change'][0]), 
        type(df_full['avg_windspd'][0]), 
        type(df_full['max_windgust'][0]),
        type(df_full['avg_winddir'][0]), 
        type(df_full['wind_direction_label'][0]), 
        type(df_full['max_cumulative_precip'][0]), 
        type(df_full['max_snow_density_6'][0]),
        type(df_full['max_cumulative_snow'][0]), 
        type(df_full['max_cumulative_ice'][0]), 
        type(df_full['avg_pressure_change'][0])
]
}

data_dict_df = pd.DataFrame(data)

# Display the data dictionary
data_dict_df

### Sources

[^1]: de Lima, Glauston, R.T. / Stephan, S. (2013): A new classification approach for detecting severe weather patterns, in: Computers & geosciences 57 (2013): 158-165.
[^2]: ECMWF (2023a): ERA5: data documentation. URL: https://confluence.ecmwf.int/display/CKB/ERA5%3A+data+documentation
[^3]: Liljequist, G.H. / Cehak, K. (1984): Allgemeine Meteorologie. 3. Auflage, Springer-Verlag.
[^4]: Fathi, M. / Haghi Kashani, M. / Jameii, S. M. / Mahdipour, E. (2022): Big Data Analytics in Weather Forecasting: A Systematic Review, in: Archives of Computational Methods in Engineering 29.2 (2022, Springer): 1247–1275
[^5]: Hjelmfelt, M.R. (1990): Numerical study of the influence of environmental conditions on lake-effect snowstorms over Lake Michigan, in: Monthly Weather Review, 118(1), pp.138-150.
[^6]: Ghirardelli, J.E. (2005): An Overview of the Redeveloped Localized Aviation Mos Program (Lamp) For Short-Range Forecasting.
[^7]: [The contribution of weather forecast information to agriculture, water, and energy sectors in East and West Africa](https://www.frontiersin.org/articles/10.3389/fenvs.2022.935696/full)
[^8]: [A Hybrid Dataset of Historical Cool-Season Lake Effects From the Eastern Great Lakes of North America](https://www.frontiersin.org/journals/water/articles/10.3389/frwa.2022.788493/full)
[^9]: [Evaluation of spatial-temporal variation performance of ERA5 precipitation data in China](https://www.nature.com/articles/s41598-021-97432-y)
[^10]: ECMWF (2023b): ERA5: reanalysis datasets for forecasts. URL: https://www.ecmwf.int/en/forecasts/datasets/reanalysis-datasets/era5
[^11]: Fathi, M. / Haghi Kashani, M. / Jameii, S. M. / Mahdipour, E. (2022): Big Data Analytics in Weather Forecasting: A Systematic Review, in: Archives of Computational Methods in Engineering 29.2 (2022, Springer): 1247–1275
[^12]: Gregor, S. / Hevner, A.R. (2013): Positioning and Presenting Design Science Research for Maximum Impact, in: MIS Quarterly, Jg. 37, Nr. 2, S. 337-355; Hevner, A. / Chatterjee, S. (2010): Design Research in Information Systems, Theory and Practice. Hrsg. von R. Sharda/S. Voß. Bd. 22. Integrated Series in Information Systems. New York, NY, USA: Springer New York, NY.; Hevner, A. / March, S.T. / Park, J. / Ram, S. (2004): Design Science in Information Systems Research, in: MIS Quaterly 28.1, S. 75–105.
[^13]: [Design Science in Information Systems Research.](https://www.researchgate.net/publication/201168946_Design_Science_in_Information_Systems_Research)
[^14]: Wilde, T. and Hess, T., 2007. Forschungsmethoden der wirtschaftsinformatik. Wirtschaftsinformatik, 4(49), pp.280-287.; Goldman, N. and Narayanaswamy, K., 1992, June. Software evolution through iterative prototyping. In Proceedings of the 14th international conference on Software engineering (pp. 158-172).
[^15]: [Reflective physical prototyping through integrated design, test, and analysis](https://www.researchgate.net/publication/220877433_Reflective_physical_prototyping_through_integrated_design_test_and_analysis)
[^16]: Shao, J., 1993. Linear model selection by cross-validation. Journal of the American statistical Association, pp.486-494.; Browne, M.W., 2000. Cross-validation methods. Journal of mathematical psychology, 44(1), pp.108-132.
[^17]: Webster, J. / Watson, R.T. (2002): Analyzing the past to prepare for the future: Writing a literature review, in: MIS quarterly. Jun 1: xiii-xiii.
[^18]: ECMWF (2023c): ERA5: data documentation parameterlistings. URL: https://confluence.ecmwf.int/display/CKB/ERA5%3A+data+documentation#ERA5:datadocumentation-Parameterlistings
[^19]: ECMWF (2023a): ERA5: data documentation. URL: https://confluence.ecmwf.int/display/CKB/ERA5%3A+data+documentation
