This project focuses on predicting weather conditions based on meteorological data. Using a dataset of daily weather observations, the project applies various Machine Learning algorithms to classify the weather into categories such as Rain, Sun, Drizzle, Snow, and Fog.
Model.ipynb: The Jupyter Notebook containing the end-to-end workflow: data preparation, visualization, preprocessing, and model training.seattle-weather.csv: The dataset containing historical weather data used for training and testing.
The Dataset consists of daily records with the following features:
| Column | Description |
|---|---|
| date | The date of the observation |
| precipitation | Amount of precipitation |
| temp_max | Maximum temperature recorded |
| temp_min | Minimum temperature recorded |
| wind | Wind speed |
| weather | Target Variable (drizzle, rain, sun, snow, fog) |
The project is built using Python and the following libraries:
- Data Manipulation:
pandas,numpy - Visualization:
seaborn,matplotlib - Machine Learning (Scikit-Learn):
RandomForestClassifierGradientBoostingClassifierAdaBoostClassifierLogisticRegressionSVCDecisionTreeClassifier
- Preprocessing:
imblearn(RandomOverSampler),StandardScaler
To improve model performance, the following preprocessing steps were applied:
-
Feature Engineering:
- Extracted the Month from the
datecolumn to capture seasonal trends. - Dropped the original
datecolumn after extraction.
- Extracted the Month from the
-
Handling Imbalanced Data:
- Analyzed the target distribution and identified class imbalance (e.g., significantly more "Sun" or "Rain" days than "Snow").
- Applied
RandomOverSamplerto balance the dataset, ensuring the model learns equally from all weather types.
-
Data Splitting:
- Split the data into training and testing sets to evaluate performance on unseen data.
The notebook explores multiple classification algorithms. The primary workflow includes:
- Training: Fitting models (like Random Forest) on the oversampled training data.
- Evaluation: Using metrics such as Accuracy Score, Confusion Matrix, and Classification Report to assess performance.
- Prediction: Testing the model with custom inputs (e.g., specific month, temp, and wind conditions) to predict the weather.
- Install the required dependencies:
pip install pandas numpy seaborn matplotlib scikit-learn imbalanced-learn
- Ensure
seattle-weather.csvis in the project directory. - Open and run
Model.ipynbto see the analysis and predictions.

