✅  **Introduction** 

# Introduction

The goal of this project is to classify weather conditions — **Rainy, Snowy, Cloudy, or Sunny** — using supervised machine learning techniques. 

PyCaret, a low-code machine learning library in Python, was used to automate the model training and selection process. This approach allows for rapid experimentation and comparison across multiple models with minimal coding effort.

The dataset includes various meteorological factors such as temperature, wind speed, humidity, and cloud cover. By using these features, we aim to accurately predict the **weather type**, which is valuable for environmental planning, automated alert systems, and more.


📘  **Data Dictionary** 

# Data Dictionary

Below is a summary of the features in the weather dataset used for classification:

| Feature               | Type         | Description                                                       |
|-----------------------|--------------|-------------------------------------------------------------------|
| Temperature           | Numeric      | Temperature in degrees Celsius                                    |
| Humidity              | Numeric      | Humidity percentage (may include values >100 due to outliers)     |
| Wind Speed            | Numeric      | Wind speed measured in kilometers per hour                        |
| Precipitation (%)     | Numeric      | Percentage chance of precipitation                                |
| UV Index              | Numeric      | Strength of ultraviolet radiation                                 |
| Atmospheric Pressure  | Numeric      | Pressure in hPa (hectopascals)                                    |
| Visibility (km)       | Numeric      | Distance visible in kilometers                                    |
| Cloud Cover           | Categorical  | Description of cloud coverage (e.g., Clear, Overcast)             |
| Season                | Categorical  | Season when data was recorded (e.g., Winter, Summer)              |
| Location              | Categorical  | Area or type of location where data was collected                 |
| **Weather Type**      | Categorical  | **Target variable**: Rainy, Snowy, Cloudy, or Sunny               |


⚙️ **Show Your Built Model** 

# Show Your Built Model

In this notebook, **PyCaret** was used to streamline the model-building process for weather classification.

After setting up the classification environment with the dataset, PyCaret automatically compared multiple machine learning models, including:

- Logistic Regression  
- K-Nearest Neighbors  
- Decision Trees  
- Random Forest  
- Naive Bayes  
- Gradient Boosting  
- and more

Using `compare_models()`, PyCaret selected the **Random Forest Classifier** as the best performing model based on accuracy and other evaluation metrics.

Key steps performed:

1. **setup()**: Initialized the PyCaret environment with preprocessing, encoding, and feature engineering handled automatically.
2. **compare_models()**: Ranked all available classifiers.
3. **evaluate_model()**: Visualized and interpreted performance of the top model.
4. **finalize_model()**: Locked in the best model for prediction.
5. **predict_model()**: Applied the trained model to new data for testing.
6. **save_model()**: Exported the model for future use.

This workflow demonstrates the power and simplicity of PyCaret in building robust classification models quickly.


📊 **Show All Metrics** 

# Show All Metrics

PyCaret automatically evaluates models using multiple metrics during comparison and final evaluation.

---

### ✅ Accuracy
**Accuracy** measures the percentage of correctly classified instances.

\[
\text{Accuracy} = \frac{\text{Correct Predictions}}{\text{Total Predictions}}
\]

It’s a good starting point for overall model performance, especially when classes are balanced.

---

### 📉 Confusion Matrix
The **confusion matrix** shows how many instances were correctly or incorrectly classified per class.

It breaks down:
- **True Positives (TP)**: Correct predictions for a class  
- **False Positives (FP)**: Incorrect predictions of a class  
- **False Negatives (FN)**: Missed predictions for a class  
- **True Negatives (TN)**: Correctly rejected predictions

This helps assess performance on a per-class basis.

---

### 🧠 Classification Report
The **classification report** in PyCaret includes:

- **Precision**: Accuracy of positive predictions  
- **Recall**: Coverage of actual class instances  
- **F1-Score**: Balance between precision and recall  

Each class (Rainy, Snowy, Cloudy, Sunny) is evaluated individually. This gives a clearer picture than accuracy alone, especially in multi-class classification.

---

PyCaret makes it easy to visualize these metrics with `evaluate_model()` and view predictions with `predict_model()`.


In [1]:
# Step 1: Importing necessary libraries
import pandas as pd
from pycaret.classification import *

In [2]:
# Step 2: Load the dataset
df = pd.read_csv('C:\\Program Files\\python\\weather_classification_data.csv')

In [3]:
# Step 3: Set up the PyCaret environment
clf = setup(data=df, target='Weather Type', session_id=123, normalize=True,
            categorical_features=['Season', 'Location', 'Cloud Cover'])

Unnamed: 0,Description,Value
0,Session id,123
1,Target,Weather Type
2,Target type,Multiclass
3,Target mapping,"Cloudy: 0, Rainy: 1, Snowy: 2, Sunny: 3"
4,Original data shape,"(13200, 12)"
5,Transformed data shape,"(13200, 20)"
6,Transformed train set shape,"(9240, 20)"
7,Transformed test set shape,"(3960, 20)"
8,Numeric features,8
9,Categorical features,3


In [4]:
# Step 4: Compare models
best_model = compare_models()

Unnamed: 0,Model,Accuracy,AUC,Recall,Prec.,F1,Kappa,MCC,TT (Sec)
gbc,Gradient Boosting Classifier,0.9124,0.0,0.9124,0.913,0.9125,0.8833,0.8834,3.732
rf,Random Forest Classifier,0.9122,0.9933,0.9122,0.9129,0.9123,0.883,0.8831,0.515
lightgbm,Light Gradient Boosting Machine,0.9117,0.9932,0.9117,0.9121,0.9117,0.8823,0.8823,0.685
et,Extra Trees Classifier,0.9052,0.991,0.9052,0.9071,0.9056,0.8736,0.874,0.456
dt,Decision Tree Classifier,0.9016,0.9344,0.9016,0.902,0.9016,0.8688,0.869,0.099
knn,K Neighbors Classifier,0.8863,0.9679,0.8863,0.8882,0.8867,0.8483,0.8487,0.153
lr,Logistic Regression,0.8679,0.0,0.8679,0.8684,0.8677,0.8238,0.824,0.598
ada,Ada Boost Classifier,0.8613,0.0,0.8613,0.8669,0.8621,0.815,0.8163,0.387
svm,SVM - Linear Kernel,0.8404,0.0,0.8404,0.8437,0.8404,0.7872,0.7882,0.14
lda,Linear Discriminant Analysis,0.8248,0.0,0.8248,0.8424,0.8256,0.7664,0.7708,0.114


In [5]:
# Step 5: Evaluate the best model
evaluate_model(best_model)

interactive(children=(ToggleButtons(description='Plot Type:', icons=('',), options=(('Pipeline Plot', 'pipelin…

In [6]:
# Step 6: Finalize the best model
final_model = finalize_model(best_model)

In [7]:
# Step 7: Predict on the same dataset (or split if needed)
predictions = predict_model(final_model, data=df)

Unnamed: 0,Model,Accuracy,AUC,Recall,Prec.,F1,Kappa,MCC
0,Gradient Boosting Classifier,0.9432,0.9964,0.9432,0.9435,0.9432,0.9242,0.9243


In [8]:
# Step 8: Save the model
save_model(final_model, 'best_pycaret_weather_model')

Transformation Pipeline and Model Successfully Saved


(Pipeline(memory=Memory(location=None),
          steps=[('label_encoding',
                  TransformerWrapperWithInverse(exclude=None, include=None,
                                                transformer=LabelEncoder())),
                 ('numerical_imputer',
                  TransformerWrapper(exclude=None,
                                     include=['CaseId', 'Temperature',
                                              'Humidity', 'Wind Speed',
                                              'Precipitation (%)',
                                              'Atmospheric Pressure', 'UV Index',
                                              'Visibility (km)'],
                                     transformer=SimpleImputer...
                                             criterion='friedman_mse', init=None,
                                             learning_rate=0.1, loss='log_loss',
                                             max_depth=3, max_features=None,
                

📝 **Summary** 

# Summary

In this notebook, we leveraged **PyCaret** to build a machine learning model that classifies weather conditions into categories: **Rainy, Snowy, Cloudy, or Sunny**.

By using PyCaret's automated machine learning workflow, we were able to:

- Efficiently preprocess and encode the data
- Automatically compare multiple classification models
- Select the best model — **Random Forest Classifier** — based on performance
- Evaluate the model using built-in visualization tools and metrics
- Save the trained model for future use

---

### Key Insights:

- PyCaret significantly reduced the manual effort required to build a strong baseline model.
- The selected model achieved high accuracy and performed well across all classes.
- The project demonstrates the power of low-code tools for fast prototyping and reliable results in classification tasks.

This model could be expanded in the future with real-time data integration, hyperparameter tuning, or deployment as part of a larger weather prediction system.

---

🎯 **End of PyCaret Notebook**
