📍 **Introduction** 

# Introduction

The purpose of this project is to classify weather conditions into one of four categories: **Rainy, Snowy, Cloudy, or Sunny** using supervised machine learning. 

The dataset includes a variety of meteorological features such as:
- Temperature
- Humidity
- Wind Speed
- Precipitation
- UV Index
- Cloud Cover
- Season
- Location

By training a classification model, we aim to predict the **weather type** based on these inputs. This model can be helpful in weather prediction systems, environmental monitoring, and decision-making tools that depend on weather conditions.


📘  **Data Dictionary** 

# Data Dictionary

Below is a summary of the key features used in this weather classification project:

| Feature               | Type         | Description                                                       |
|-----------------------|--------------|-------------------------------------------------------------------|
| Temperature           | Numeric      | Temperature in degrees Celsius                                    |
| Humidity              | Numeric      | Humidity percentage (may include values >100 due to outliers)     |
| Wind Speed            | Numeric      | Wind speed measured in kilometers per hour                        |
| Precipitation (%)     | Numeric      | Percentage of precipitation likelihood                            |
| UV Index              | Numeric      | Intensity of ultraviolet radiation                                |
| Atmospheric Pressure  | Numeric      | Pressure in hPa (hectopascals)                                    |
| Visibility (km)       | Numeric      | Distance visible in kilometers                                    |
| Cloud Cover           | Categorical  | Description of cloud coverage (e.g., Clear, Partly Cloudy)        |
| Season                | Categorical  | Season when the data was recorded (e.g., Winter, Summer)          |
| Location              | Categorical  | Type or area of location where the data was collected             |
| **Weather Type**      | Categorical  | **Target variable**: Rainy, Snowy, Cloudy, or Sunny               |


⚙️ **Show Your Built Model** 

# Show Your Built Model

For this classification task, the **Random Forest Classifier** was chosen based on its strong performance and ability to handle both numeric and categorical features effectively.

Random Forest is an ensemble learning method that builds multiple decision trees during training and outputs the mode of the classes for classification. It helps reduce overfitting and improves accuracy by combining predictions from multiple trees.

The model was implemented using **scikit-learn** with the following steps:

1. Data preprocessing:  
   - One-hot encoding of categorical variables  
   - Standardization using `StandardScaler`  
2. Data split:  
   - 80% for training, 20% for testing  
3. Model building:  
   - `RandomForestClassifier(random_state=123)`  
4. Model training and prediction on the test set

This setup ensures the model generalizes well and performs consistently on unseen data.


📊  **Show All Metrics** 

# Show All Metrics

The performance of the Random Forest Classifier was evaluated using several key classification metrics:

---

### ✅ Accuracy
**Accuracy** measures the proportion of correct predictions out of all predictions.

\[
\text{Accuracy} = \frac{\text{Correct Predictions}}{\text{Total Predictions}}
\]

This gives a quick overall picture of how well the model is doing.

---

### 📉 Confusion Matrix
A **confusion matrix** provides a breakdown of actual vs. predicted classes. It shows:

- **True Positives (TP)**: Correctly predicted positive class
- **True Negatives (TN)**: Correctly predicted negative class
- **False Positives (FP)**: Incorrectly predicted as positive
- **False Negatives (FN)**: Incorrectly predicted as negative

This helps in evaluating how well the model distinguishes between classes.

---

### 🧠 Classification Report
This report includes:

- **Precision**: Of all instances predicted as a class, how many were correct?
- **Recall**: Of all actual instances of a class, how many did we catch?
- **F1-Score**: The harmonic mean of precision and recall. Best when both are important.

Each class (Rainy, Snowy, Cloudy, Sunny) gets its own set of scores to show how well the model performs across categories.

---

Together, these metrics give a complete view of the model’s performance.


In [1]:
# Step 1: Import libraries
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import classification_report, confusion_matrix, accuracy_score

In [2]:
# Step 2: Load dataset
df = pd.read_csv('C:\\Program Files\\python\\weather_classification_data.csv')

In [3]:
# Step 3: Encode categorical features and separate target
X = df.drop('Weather Type', axis=1)
y = df['Weather Type']

# One-hot encode categorical variables
X = pd.get_dummies(X, drop_first=True)

In [4]:
# Step 4: Feature scaling
scaler = StandardScaler()
X_scaled = scaler.fit_transform(X)

In [5]:
# Step 5: Train-test split
X_train, X_test, y_train, y_test = train_test_split(X_scaled, y, test_size=0.2, random_state=123)

In [6]:
# Step 6: Train Random Forest model
model = RandomForestClassifier(random_state=123)
model.fit(X_train, y_train)

In [7]:
# Step 7: Predictions and evaluation
y_pred = model.predict(X_test)

In [8]:
print("Accuracy:", accuracy_score(y_test, y_pred))
print("\nClassification Report:\n", classification_report(y_test, y_pred))
print("\nConfusion Matrix:\n", confusion_matrix(y_test, y_pred))

Accuracy: 0.9170454545454545

Classification Report:
               precision    recall  f1-score   support

      Cloudy       0.90      0.92      0.91       667
       Rainy       0.90      0.92      0.91       647
       Snowy       0.95      0.92      0.94       685
       Sunny       0.92      0.91      0.92       641

    accuracy                           0.92      2640
   macro avg       0.92      0.92      0.92      2640
weighted avg       0.92      0.92      0.92      2640


Confusion Matrix:
 [[611  28  10  18]
 [ 30 595   8  14]
 [ 18  21 629  17]
 [ 23  19  13 586]]


In [None]:
# A github link of the webpage "dfweatherprofilereport"
https://github.com/ThomasTosin/Pycaret-and-SKlearn-Classification/blob/main/dfWeatherProfileReport.html

📝  **Summary** 

# Summary

In this project, we successfully built a weather classification model using a supervised learning approach.

Using the **Random Forest Classifier**, we achieved strong predictive performance by leveraging a diverse set of weather-related features, including temperature, humidity, precipitation, and cloud cover. The model was trained and evaluated using accuracy, confusion matrix, and a detailed classification report.

---

### Key Takeaways:

- **Random Forest** was selected due to its high performance in PyCaret and proven effectiveness in sklearn.
- **Accuracy** and class-specific metrics (precision, recall, f1-score) confirm the model performs reliably across different weather types.
- The project showcases the complete machine learning workflow: from data preparation to model evaluation and explanation.

This model can be expanded further by incorporating more real-time or historical weather data, fine-tuning hyperparameters, or deploying it as part of a forecasting tool.

---

🎯 **End of Notebook**
