## Introduction

In this regression project, we use **AutoGluon**, a high-level AutoML framework, to predict **Temperature** based on various weather-related features. The goal is to develop a model that can estimate temperature using attributes like humidity, wind speed, UV index, cloud cover, and more.

AutoGluon simplifies the modeling process by automatically handling preprocessing, feature engineering, model selection, and hyperparameter tuning, making it highly effective for rapid experimentation.


## Data Definitions

Here are the feature columns used in this regression task:

- `Humidity`: Humidity percentage.
- `Wind Speed`: Wind speed in km/h.
- `Precipitation (%)`: Percentage likelihood of precipitation.
- `Cloud Cover`: Type of cloud presence (e.g., clear, overcast).
- `Atmospheric Pressure`: Atmospheric pressure in hPa.
- `UV Index`: Intensity of UV radiation.
- `Season`: Season in which the observation was recorded.
- `Visibility (km)`: Visibility range in kilometers.
- `Location`: Type of location (e.g., inland, coastal, mountain).

The target variable is:
- `Temperature`: Temperature in degrees Celsius, to be predicted using the features above.


## Model Training

We use the `TabularPredictor` from AutoGluon, specifying `regression` as the problem type and `rmse` (Root Mean Squared Error) as the evaluation metric.

AutoGluon automatically trained multiple models, including gradient boosting, random forests, neural networks, and k-nearest neighbors. It also evaluated and ranked them using the leaderboard function.

This AutoML approach streamlines the development of regression models by managing data preparation, modeling, and evaluation in a single step.


In [1]:
from autogluon.tabular import TabularPredictor
import pandas as pd
from sklearn.model_selection import train_test_split

In [2]:
# Load the dataset
df = pd.read_csv('C:\\Program Files\\python\\weather_classification_data.csv')

In [3]:
# Drop ID and classification target
df = df.drop(columns=["CaseId", "Weather Type"])

In [4]:
# Define regression target
target = "Temperature"

In [5]:
# Split data
train_data, test_data = train_test_split(df, test_size=0.2, random_state=42)

In [6]:
# Train AutoGluon regression model
predictor = TabularPredictor(label=target, problem_type='regression', eval_metric='rmse').fit(train_data=train_data)

No path specified. Models will be saved in: "AutogluonModels\ag-20250416_031214"
Verbosity: 2 (Standard Logging)
AutoGluon Version:  1.2
Python Version:     3.11.0
Operating System:   Windows
Platform Machine:   AMD64
Platform Version:   10.0.19045
CPU Count:          4
Memory Avail:       5.91 GB / 15.85 GB (37.3%)
Disk Space Avail:   47.90 GB / 237.93 GB (20.1%)
No presets specified! To achieve strong results with AutoGluon, it is recommended to use the available presets. Defaulting to `'medium'`...
	Recommended Presets (For more details refer to https://auto.gluon.ai/stable/tutorials/tabular/tabular-essentials.html#presets):
	presets='experimental' : New in v1.2: Pre-trained foundation model + parallel fits. The absolute best accuracy without consideration for inference speed. Does not support GPU.
	presets='best'         : Maximize accuracy. Recommended for most users. Use in competitions and benchmarks.
	presets='high'         : Strong accuracy with fast inference speed.
	presets=

In [7]:
# View model performance
predictor.leaderboard(test_data, silent=True)

If you only need to load model weights and optimizer state, use the safe `Learner.load` instead.
  warn("load_learner` uses Python's insecure pickle module, which can execute malicious arbitrary code when loading. Only load files you trust.\nIf you only need to load model weights and optimizer state, use the safe `Learner.load` instead.")


Unnamed: 0,model,score_test,score_val,eval_metric,pred_time_test,pred_time_val,fit_time,pred_time_test_marginal,pred_time_val_marginal,fit_time_marginal,stack_level,can_infer,fit_order
0,WeightedEnsemble_L2,-10.769362,-11.733382,root_mean_squared_error,0.566389,0.104777,21.668882,0.015838,0.0,0.048856,2,True,11
1,XGBoost,-10.841651,-11.815297,root_mean_squared_error,0.033496,0.012964,0.759967,0.033496,0.012964,0.759967,1,True,9
2,ExtraTreesMSE,-10.847286,-12.070217,root_mean_squared_error,0.443971,0.080368,2.919394,0.443971,0.080368,2.919394,1,True,7
3,LightGBM,-10.876443,-12.01182,root_mean_squared_error,0.014207,0.008019,0.693834,0.014207,0.008019,0.693834,1,True,4
4,LightGBMXT,-10.914173,-12.060823,root_mean_squared_error,0.02546,0.010044,1.055236,0.02546,0.010044,1.055236,1,True,3
5,CatBoost,-10.917013,-11.876443,root_mean_squared_error,0.031247,0.004025,11.15875,0.031247,0.004025,11.15875,1,True,6
6,LightGBMLarge,-11.014016,-12.005856,root_mean_squared_error,0.01701,0.015621,1.087857,0.01701,0.015621,1.087857,1,True,10
7,RandomForestMSE,-11.046298,-11.986148,root_mean_squared_error,0.485807,0.087788,9.70131,0.485807,0.087788,9.70131,1,True,5
8,NeuralNetFastAI,-11.163615,-12.196396,root_mean_squared_error,0.094757,0.023085,12.1703,0.094757,0.023085,12.1703,1,True,8
9,KNeighborsUnif,-12.971853,-14.331438,root_mean_squared_error,0.040568,0.020629,3.052582,0.040568,0.020629,3.052582,1,True,1


## Conclusion

AutoGluon enabled us to efficiently build a robust regression model for predicting temperature. The best-performing model was selected based on RMSE, providing us with a reliable tool for forecasting temperatures based on other environmental conditions.

With minimal manual tuning, we achieved high-quality results. This highlights the strength of AutoML in quickly delivering predictive models that are both accurate and scalable. Future improvements may involve retraining with additional features, performing model interpretability analysis, or deploying the model as an API.
