# Weather Wiz: An AI-Based Weather Forecasting Project

Final project for the course Machine Learning in Earth and Environmental Sciences (70938) by Prof. Efrat Morin  
Authors:  
Noam Shabat – Noam.Shabat@mail.huji.ac.il  
Tomer Vagenfeld – Tomer.vagenfeld@mail.huji.ac.il

---

## Abstract
Weather Wiz is an AI-based project to forecast ground temperature using 25 years of multi-station meteorological data provided by the Israel Meteorological Service (IMS). This project explores a range of machine learning approaches—from traditional regularized regression models to deep learning architectures such as Long Short-Term Memory networks (LSTM) and Graph Neural Networks (GNN). By capturing both temporal trends and spatial correlations, the system produces short-term forecasts.

In this paper, we describe the problem formulation, data characteristics, preprocessing and exploratory analyses, modeling methodologies, and experimental results, and discuss insights obtained in the project.

---

## 1. Introduction

Accurate weather forecasting is important in many fields, including agriculture, energy, transportation, and emergency management. The Weather Wiz project focuses on predicting ground temperature (TG) using a dataset spanning 25 years of high-resolution weather measurements. Given the complex nature of meteorological phenomena—which are influenced by humidity, wind patterns, and precipitation—this project employs a combination of linear models, ensemble methods, and deep learning techniques. In particular, the use of Graph Neural Networks enables the integration of spatial relationships among weather stations, thereby improving prediction performance.

---

## 2. Data Description

### 2.1 Data Source, Station Map, and Nature
The dataset is obtained from the [Israel Meteorological Service (IMS)](https://ims.gov.il), which records weather data from multiple stations across Israel. Observations are recorded every 10 minutes and subsequently aggregated into hourly or daily summaries. This extensive temporal coverage and multi-station design provide a detailed view of regional weather dynamics.

**Station Map:**  
The station locations were extracted directly from the IMS API, and the map was created using ArcGIS.  
![Station Map](images/station_map.png)  
*Figure 2.1 – Station Map: This map displays the geographic locations of the IMS stations across Israel.*

### 2.2 Data Extraction from the IMS API
Data extraction was performed by interfacing with the IMS API, which provides secure access to detailed meteorological data in JSON format. By specifying station identifiers and a defined date range, the system automatically queries and aggregates data on a monthly basis to address API limitations. The extracted data is then converted into structured tabular formats for further processing and analysis, ensuring complete coverage of the 25-year period.

### 2.3 Features and Target Variable
- **Features:**
  - **Meteorological Variables:** Measurements such as relative humidity (RH), wind speed (WS), wind direction (WD), and rainfall (Rain).
  - **Engineered Features:**  
    - *Temporal Attributes:* Hour of day along with sine and cosine transformations to capture cyclical patterns.  
    - *Wind Vectors:* Components derived from wind speed and wind direction that quantify directional wind influence.
- **Target:**  
  - **Ground Temperature (TG):** The variable to be predicted, representing the ground-level temperature.

### 2.4 Data Visualization
- **Feature Explanation:**  
  An illustration of the features and their descriptions is provided below.  
  ![Features explaination](images/features_explain.png)  
  *Figure 2.2 – Features Explanation: This diagram details the various meteorological. Image provided by the IMS.*

- **Ground Temperature Analysis:**  
  The distribution and time series of ground temperature (TG) are visualized as follows:  
  ![TG Distribution](images/TG_distribution.png)  
  *Figure 2.3 – TG Distribution: This histogram shows the distribution of ground temperature values across the dataset.*  

  ![TG Series Full](images/TG_series%20full.png)  
  *Figure 2.4 – TG Series (Full): This plot shows the complete time series of ground temperature measurements over the 25-year period.*  

  ![TG Series](images/TG_series.png)  
  *Figure 2.5 – TG Series: This plot provides a detailed view of short-term fluctuations in ground temperature.*

### 2.5 Sample Data
Below is a table presenting sample data extracted from the IMS API, including the major features used later in training. *Reminder that the data captured by the meteorological station is taken in 10-minute intervals.*

| Region ID | Station Name           | station_id | datetime                       | RH | Rain | TG   | WD  | WS  | Latitude | Longitude | Min Date                       | Max Date                       | Date Range (days approx.) |
|-----------|------------------------|------------|--------------------------------|----|------|------|-----|-----|----------|-----------|--------------------------------|--------------------------------|---------------------------|
| 8         | TAVOR KADOORIE        | 13         | 2010-11-30T04:00:00+02:00      | 25 | 0    | 9.1  | 306 | 1.8 | 32.7053  | 35.4069   | 2000-01-31T00:00:00+02:00      | 2024-09-30T23:50:00+03:00      | 9000                      |
| 9         | ZEMAH                 | 8          | 2022-03-31T19:50:00+03:00      | 77 | 0    | 13.8 | 201 | 0.1 | 32.7024  | 35.5839   | 2000-01-31T00:00:00+02:00      | 2024-10-31T23:50:00+02:00      | 9000                      |
| 8         | TAVOR KADOORIE        | 13         | 2017-07-31T13:20:00+03:00      | 38 | 0    | 52.8 | 302 | 2.9 | 32.7053  | 35.4069   | 2000-01-31T00:00:00+02:00      | 2024-09-30T23:50:00+03:00      | 9000                      |
| 9         | GILGAL                | 30         | 2018-06-30T21:10:00+03:00      | 45 | 0    | 27.9 | 254 | 0.8 | 31.9973  | 35.4509   | 2000-01-31T00:00:00+02:00      | 2024-12-31T23:50:00+02:00      | 9000                      |
| 9         | GILGAL                | 30         | 2022-04-30T09:20:00+03:00      | 27 | 0    | 40.4 | 121 | 0.8 | 31.9973  | 35.4509   | 2000-01-31T00:00:00+02:00      | 2024-12-31T23:50:00+02:00      | 9000                      |
| 9         | GILGAL                | 30         | 2001-07-31T12:10:00+03:00      | 39 | 0    | 51.3 | 147 | 1.8 | 31.9973  | 35.4509   | 2000-01-31T00:00:00+02:00      | 2024-12-31T23:50:00+02:00      | 9000                      |
| 8         | MEROM GOLAN PICMAN    | 10         | 2013-04-30T18:30:00+03:00      | 38 | 0    | 18.3 | 306 | 2.5 | 33.1288  | 35.8045   | 2000-01-31T00:00:00+02:00      | 2024-12-31T23:50:00+02:00      | 9000                      |
| 8         | TAVOR KADOORIE        | 13         | 2011-12-31T06:20:00+02:00      | 87 | 0    | 6.3  | 320 | 1.1 | 32.7053  | 35.4069   | 2000-01-31T00:00:00+02:00      | 2024-09-30T23:50:00+03:00      | 9000                      |
| 10        | YOTVATA               | 36         | 2012-05-31T09:40:00+03:00      | 31 | 0    | 38.8 | 6   | 2.9 | 29.8851  | 35.0771   | 2000-02-29T00:00:00+02:00      | 2024-12-31T23:50:00+02:00      | 9000                      |
| 9         | GILGAL                | 30         | 2011-02-28T05:00:00+02:00      | 80 | 0    | 8.2  | 232 | 1.6 | 31.9973  | 35.4509   | 2000-01-31T00:00:00+02:00      | 2024-12-31T23:50:00+02:00      | 9000                      |

*Table 2.1 – Sample Data: This table presents 10 rows from the dataset extracted via the IMS API, including station identifiers, timestamps, and key meteorological measurements, along with additional metadata (e.g., latitude, longitude, date range).*

---

## 3. Data Preprocessing and Exploratory Data Analysis

### 3.1 Data Cleaning and Preparation
Ensuring high data quality is essential for reliable model training. Our data cleaning process included:
- **Removal of Unrealistic Values:**  
  Data entries were filtered to enforce physical plausibility (e.g., \(0 \leq \text{RH} \leq 100\) and \(-15 \leq \text{TG} \leq 50\)). Outliers and values outside known meteorological ranges were removed.
- **Handling Missing Values:**  
  Missing or null values in critical variables were identified and addressed. Depending on the variable, rows with missing data were either dropped or imputed.
- **Datetime Conversion and Sorting:**  
  Timestamp strings were converted into datetime objects to ensure proper chronological ordering, which is critical for time series analysis.
- **Normalization and Scaling:**  
  To ensure that all input features contribute equally during model training, the data was scaled using a MinMaxScaler. The scaling transformation is defined as:
  $$
  x' = \frac{x - x_{\min}}{x_{\max} - x_{\min}},
  $$
  where \(x\) is the original feature value, and \(x_{\min}\) and \(x_{\max}\) are the minimum and maximum values of that feature. This process is crucial for models sensitive to input scale, as it improves convergence speed and stability.
- **Feature Engineering:**  
  Additional features were computed to capture underlying patterns in the data.

#### Mathematical Operations in Feature Engineering
To represent cyclical and directional information effectively, the following mathematical transformations were applied:
- **Cyclical Time Features:**  
  The hour of the day was transformed using sine and cosine functions:
  $$
  \text{hour}_{\sin} = \sin\left(\frac{2\pi \times \text{hour}}{24}\right),
  $$
  $$
  \text{hour}_{\cos} = \cos\left(\frac{2\pi \times \text{hour}}{24}\right).
  $$
  These transformations allow the model to capture the periodic nature of time.
  
- **Wind Vector Components:**  
  Wind speed (\(WS\)) and wind direction (\(WD\)) were converted from polar to Cartesian coordinates:
  $$
  wind_x = WS \times \cos\left(\frac{\pi \times WD}{180}\right),
  $$
  $$
  wind_y = WS \times \sin\left(\frac{\pi \times WD}{180}\right).
  $$
  This conversion provides a richer representation of wind dynamics.

### 3.2 Exploratory Data Analysis (EDA)
EDA was conducted to understand the structure and variability of the dataset:
- **Distribution Analysis:**  
  Histograms and density plots illustrate the distributions of temperature, humidity, and rainfall.
- **Time Series Visualization:**  
  Time series plots highlight both short-term fluctuations and long-term trends in ground temperature.
- **Correlation Analysis:**  
  Correlation matrices help identify significant relationships among features, guiding further feature engineering.
- **Missing Value Patterns:**  
  Visualizations of missing data assess data quality and inform imputation strategies.

Additional visualizations, such as zoomed-in seasonal trends and scatter plots comparing multiple variables, provided further insights that informed model development.

---

## 4. Methodology

### 4.1 Modeling Approaches and Rationale
A multi-model strategy was implemented to address different aspects of the data:
- **Lasso Regression:**  
  Used as a baseline model for its simplicity and interpretability. Its L1 regularization promotes sparsity, thereby highlighting the most relevant features.
- **Random Forest:**  
  An ensemble method that captures complex nonlinear interactions among features. Its robustness makes it well-suited for high-dimensional data.
- **Long Short-Term Memory (LSTM) Networks:**  
  LSTMs are effective at capturing long-term dependencies in sequential data, making them ideal for modeling temporal dynamics.
- **Graph Neural Networks (GNN):**  
  GNNs integrate spatial information by treating weather stations as nodes in a graph, leveraging both spatial and temporal correlations to capture localized weather patterns.

### 4.2 Explanation of Graph Neural Networks (GNN)
Graph Neural Networks (GNNs) are neural architectures designed to operate on graph-structured data. Unlike traditional neural networks that work on fixed grids or sequences, GNNs handle irregular, interconnected data. In a GNN, each node (e.g., a weather station) gathers information from its neighboring nodes through a process called message passing. During this process, each node updates its representation by combining its features with those of its neighbors via learnable functions. This approach enables the network to capture both local interactions and the overall structure of the graph, making it particularly effective for modeling spatial relationships.

**Unique Edge Technique Attempt:**  
We initially attempted a “unique edge” technique to incorporate additional station-to-station relationships or domain-specific spatial distances. However, we found that leveraging this approach exceeded our available RAM resources, making it impractical for our current hardware setup. We expect that a properly optimized unique edge strategy would further enhance the GNN’s capacity to capture spatial dependencies, as it would provide more granular connectivity between stations.

### 4.3 Training and Optimization

#### 4.3.1 Optimization Functions in the Models
Each model uses an optimization strategy tailored to its structure:
- **LSTM Models – Adam Optimizer:**  
  The LSTM networks are trained using the Adam algorithm, which adapts the learning rate for each parameter based on estimates of the first and second moments of the gradients:
  $$
  m_t = \beta_1 m_{t-1} + (1-\beta_1) g_t,
  $$
  $$
  v_t = \beta_2 v_{t-1} + (1-\beta_2) g_t^2,
  $$
  $$
  \hat{m}_t = \frac{m_t}{1-\beta_1^t}, \quad \hat{v}_t = \frac{v_t}{1-\beta_2^t},
  $$
  $$
  w_{t+1} = w_t - \alpha \frac{\hat{m}_t}{\sqrt{\hat{v}_t} + \epsilon},
  $$
  where \(g_t\) is the gradient at time \(t\) and \(\alpha\), \(\beta_1\), \(\beta_2\), and \(\epsilon\) are hyperparameters.
  
- **GNN Models – AdamW Optimizer:**  
  The GNN models use the AdamW optimizer, which decouples weight decay from the gradient update to improve generalization:
  $$
  w_{t+1} = w_t - \alpha \left( \frac{\hat{m}_t}{\sqrt{\hat{v}_t} + \epsilon} + \lambda w_t \right),
  $$
  where \(\lambda\) is the weight decay coefficient.
  
- **Lasso Regression and Random Forest:**  
  Lasso Regression uses coordinate descent to solve:
  $$
  \min_{w} \; \|y - Xw\|_2^2 + \alpha \|w\|_1,
  $$
  promoting sparsity in \(w\). Random Forests build an ensemble of decision trees using recursive partitioning and impurity minimization (e.g., minimizing mean squared error) to generate predictions.

#### 4.3.2 Hyperparameter Tuning
Hyperparameters such as the number of hidden units, dropout rates, learning rates, and the number of epochs were tuned within predefined ranges (e.g., LSTM hidden units between 32 and 128, GNN dimensions between 32 and 256) to ensure efficient convergence and robust performance.

### 4.4 Cross-Validation and Data Folds
To evaluate model performance reliably, a custom time series cross-validation strategy was employed:
- **Chronological Splitting:**  
  The dataset is divided into sequential folds, where each fold consists of a training set (earlier time periods) and a test set (later time periods), reflecting a realistic forecasting scenario.
- **Multiple Folds:**  
  Evaluating across several folds provides insights into the stability and consistency of the model’s predictions over different time intervals.
- **Avoiding Data Leakage:**  
  Preserving the temporal order prevents future information from influencing the training process, ensuring an unbiased evaluation.

---

## 5. Experimental Results

### 5.1 Evaluation Metrics
Models were assessed using the following metrics:
- **Mean Absolute Error (MAE):** The average absolute difference between predicted and actual temperatures.
- **Mean Squared Error (MSE):** The average squared difference between predicted and actual temperatures.
- **Coefficient of Determination (R² Score):** The proportion of variance in the target variable explained by the model.
- **Median Absolute Percentage Error (MdAPE):** A relative measure of prediction accuracy.

### 5.2 Visualizations of Model Performance
Several plots were generated to illustrate model performance:
- **Random Forest Results:**  
  ![Random Forest](images/random_forest.png)  
  *Figure 5.1 – Random Forest Results: This figure illustrates the performance of the Random Forest model in predicting ground temperature.*
- **Lasso Regression Results:**  
  ![Lasso](images/lasso.png)  
  *Figure 5.2 – Lasso Regression Results: This figure shows the performance of the Lasso Regression model as a baseline.*
- **LSTM Performance:**  
  ![LSTM](images/lstm.png)  
  *Figure 5.3 – LSTM Performance: This figure depicts the LSTM model's ability to capture temporal dependencies in the data.*
- **Ground Temperature Predictions:**  
  ![TG Predictions](images/TG_pred.png)  
  *Figure 5.4 – Ground Temperature Predictions: This plot compares the actual and forecasted ground temperature values over time.*  
  ![TG Predictions 2](images/TG_pred_2.png)  
  *Figure 5.5 – Ground Temperature Predictions (Alternate View): This plot provides an alternative visualization of the forecasted ground temperature.*
- **Loss Curve for GNN:**  
  ![Loss Curve](images/loss.png)  
  *Figure 5.6 – Loss Curve for GNN: This graph shows the training and validation loss trends during GNN model training.*
- **Evaluation Metrics:**  
  ![MAE](images/mae.png)  
  *Figure 5.7 – MAE: This figure displays the Mean Absolute Error for the models across the test folds.*  
  ![MSE](images/mse.png)  
  *Figure 5.8 – MSE: This figure shows the Mean Squared Error for the models.*  
  ![R²](images/r2.png)  
  *Figure 5.9 – R² Score: This figure illustrates the R² scores, indicating the proportion of variance explained by the models.*
- **Cross-Validation Summary:**  
  ![Cross-Validation Summary](images/cross.png)  
  *Figure 5.10 – Cross-Validation Summary: This bar chart summarizes the performance of different models across multiple cross-validation folds.*

### 5.3 Discussion of Results
Results indicate that:
- **Lasso Regression** provides a strong, interpretable baseline.
- **Random Forest** effectively captures nonlinear interactions, though its predictions may exhibit higher variance.
- **LSTM Networks** excel in modeling temporal dependencies, particularly for short-term forecasts.
- **GNN Models** enhance prediction accuracy by incorporating spatial correlations, especially in regions with a dense network of weather stations.

---

## 6. Discussion

The multi-model approach of Weather Wiz highlights the importance of integrating various techniques when working with complex weather data. Traditional methods, while valuable for their interpretability, are complemented by deep learning approaches that more effectively capture temporal and spatial patterns. Our custom time series cross-validation strategy further ensures that the models are rigorously evaluated on unseen data. Although further refinements in feature engineering and hyperparameter tuning may improve performance, the current results support the overall approach for academic research in weather forecasting.

---

## 7. Conclusion

Weather Wiz provides a comprehensive approach to forecasting ground temperature by combining traditional and advanced machine learning methods. By leveraging both temporal and spatial features, the project achieves a high level of predictive accuracy. Future work will involve extending the framework to additional weather parameters, refining spatial modeling techniques, and exploring longer forecasting horizons.

---

## 8. Future Work and Recommendations

- **Incorporation of Additional Metrics:** Explore predictions for other weather phenomena such as radiation, precipitation, and wind speed.  
- **Enhanced Spatial Modeling:** Investigate further refinements to the GNN architecture, including dynamic edge weights based on real-time data.  
- **Extended Forecast Horizons:** Experiment with longer sequence lengths and alternative temporal models to support medium- and long-term forecasts.  
- **Integration of External Data:** Consider incorporating satellite or remote sensing data to enhance the feature set and overall model robustness.

---

## Acknowledgments
This research was made possible by the data provided by the [Israel Meteorological Service (IMS)](https://ims.gov.il). We acknowledge the contributions of researchers and practitioners in machine learning and meteorology whose work has informed this project.

---

*Note: This paper emphasizes the scientific and methodological aspects of the project. For detailed technical implementation, code, and installation instructions, please refer to the supplementary documentation available in the project repository.*
