## Introduction

In this project, we use **AutoGluon**, a powerful AutoML framework, to build a multi-class classification model on a weather dataset. Our goal is to predict the **weather type** (e.g., Rainy, Cloudy, Sunny, Snowy) based on various atmospheric and environmental conditions such as temperature, humidity, wind speed, and more.

AutoGluon simplifies the modeling process by automatically preprocessing data, tuning hyperparameters, and selecting the best model from a wide range of algorithms. This makes it an excellent tool for rapid and reliable model development.


## Data Definitions

Here is a brief explanation of the dataset columns:

- `Temperature`: Temperature in degrees Celsius.
- `Humidity`: Percentage of humidity in the air.
- `Wind Speed`: Wind speed in kilometers per hour.
- `Precipitation (%)`: The chance of precipitation as a percentage.
- `Cloud Cover`: The type of cloud coverage (e.g., clear, partly cloudy, overcast).
- `Atmospheric Pressure`: Pressure measured in hPa.
- `UV Index`: Strength of ultraviolet radiation.
- `Season`: Season when data was collected (e.g., Winter, Spring).
- `Visibility (km)`: How far ahead one can see clearly.
- `Location`: Type of geographic location (e.g., inland, coastal, mountain).
- `Weather Type`: **Target variable** — classifies the weather as Rainy, Sunny, Cloudy, or Snowy.


## Model Training

We use `AutoGluon.TabularPredictor` with `accuracy` as the evaluation metric to build our classification model. AutoGluon automatically trained several models including LightGBM, XGBoost, Random Forest, CatBoost, KNN, and neural networks.

The final model chosen was a **Weighted Ensemble**, which combines top models to optimize performance. Our model achieved a **validation accuracy of approximately 93.75%**, indicating strong predictive capability.


In [1]:
from autogluon.tabular import TabularPredictor
import pandas as pd
from sklearn.model_selection import train_test_split

In [2]:
# Load your data
df = pd.read_csv('C:\\Program Files\\python\\weather_classification_data.csv')

In [3]:
df = df.drop(columns=['CaseId'])

In [4]:
# Split
train_data, test_data = train_test_split(df, test_size=0.2, random_state=42)

In [5]:
# Train
predictor = TabularPredictor(label='Weather Type', eval_metric='accuracy').fit(train_data=train_data)

No path specified. Models will be saved in: "AutogluonModels\ag-20250416_025747"
Verbosity: 2 (Standard Logging)
AutoGluon Version:  1.2
Python Version:     3.11.0
Operating System:   Windows
Platform Machine:   AMD64
Platform Version:   10.0.19045
CPU Count:          4
Memory Avail:       6.06 GB / 15.85 GB (38.2%)
Disk Space Avail:   48.09 GB / 237.93 GB (20.2%)
No presets specified! To achieve strong results with AutoGluon, it is recommended to use the available presets. Defaulting to `'medium'`...
	Recommended Presets (For more details refer to https://auto.gluon.ai/stable/tutorials/tabular/tabular-essentials.html#presets):
	presets='experimental' : New in v1.2: Pre-trained foundation model + parallel fits. The absolute best accuracy without consideration for inference speed. Does not support GPU.
	presets='best'         : Maximize accuracy. Recommended for most users. Use in competitions and benchmarks.
	presets='high'         : Strong accuracy with fast inference speed.
	presets=

In [6]:
# Evaluate
predictor.leaderboard(test_data)

If you only need to load model weights and optimizer state, use the safe `Learner.load` instead.
  warn("load_learner` uses Python's insecure pickle module, which can execute malicious arbitrary code when loading. Only load files you trust.\nIf you only need to load model weights and optimizer state, use the safe `Learner.load` instead.")


Unnamed: 0,model,score_test,score_val,eval_metric,pred_time_test,pred_time_val,fit_time,pred_time_test_marginal,pred_time_val_marginal,fit_time_marginal,stack_level,can_infer,fit_order
0,NeuralNetFastAI,0.915909,0.932765,accuracy,0.118682,0.046874,13.850857,0.118682,0.046874,13.850857,1,True,3
1,WeightedEnsemble_L2,0.915909,0.9375,accuracy,0.686166,0.274891,21.204796,0.009973,0.005309,0.190673,2,True,13
2,LightGBM,0.91553,0.928977,accuracy,0.153589,0.049867,3.940462,0.153589,0.049867,3.940462,1,True,5
3,LightGBMXT,0.914015,0.928977,accuracy,0.300197,0.141629,6.318104,0.300197,0.141629,6.318104,1,True,4
4,RandomForestEntr,0.913636,0.91572,accuracy,0.164557,0.097736,2.430492,0.164557,0.097736,2.430492,1,True,7
5,LightGBMLarge,0.911364,0.924242,accuracy,0.572469,0.133645,7.14988,0.572469,0.133645,7.14988,1,True,12
6,RandomForestGini,0.910985,0.913826,accuracy,0.193486,0.08577,2.007129,0.193486,0.08577,2.007129,1,True,6
7,CatBoost,0.910606,0.920455,accuracy,0.021941,0.005996,45.802598,0.021941,0.005996,45.802598,1,True,8
8,ExtraTreesEntr,0.910227,0.918561,accuracy,0.218417,0.070396,1.549639,0.218417,0.070396,1.549639,1,True,10
9,XGBoost,0.90947,0.925189,accuracy,0.18351,0.030917,3.691319,0.18351,0.030917,3.691319,1,True,11


## Conclusion

Using AutoGluon, we quickly developed a high-performing classification model to predict weather types from environmental data. The ensemble model outperformed individual models and achieved over 93% accuracy. This demonstrates the power of AutoML for real-world datasets — enabling fast, accurate, and scalable model building with minimal manual intervention.

Future work could include feature importance analysis, deploying the model, or retraining with more diverse data to improve generalizability.
