# **OPEN-ARC**
---

### Project 2: Weather Type Classification Model:
**Challenge:** Create an AI model, capable of classifying the weather type based on 10 different features. It is important to note that this data is synthetic, and does not represent real-world data. This project is part of a collaborative research project, OPEN-ARC, aiming to improve AI solutions for everyone.


### Terms and Use:
Learn more about the project's [LICENSE](https://github.com/Infinitode/OPEN-ARC/blob/main/LICENSE) and read our [CODE_OF_CONDUCT](https://github.com/Infinitode/OPEN-ARC/blob/main/CODE_OF_CONDUCT) before contributing to the project. You can contribute to this project from here: [https://github.com/Infinitode/OPEN-ARC/](https://github.com/Infinitode/OPEN-ARC/).

---

Please fill out this performance sheet to help others quickly see your model's performance **(optional)**:

### Performance Sheet:
| Contributor | Architecture Type | Platform | Base Model | Dataset | Accuracy | Link |
|-------------|-------------------|----------|------------|---------|----------|------|
| Infinitode  | RandomForestClassifier  | Kaggle   | ✗  | Weather Type Classification | 91.2%    | [Notebook](https://github.com/Infinitode/OPEN-ARC/Project-2-WTC/project-2-wtc.ipynb) |
| Username  | Unknown  | Kaggle   | ✗/✔  | Weather Type Classification | Score    | [Notebook](https://github.com) |

---

### Model: Decision Tree Classifier:
This model uses **Grid Search** to optimize the model for the best performance and accuracy score while training. Grid Search uses a defined `grid` so to speak, to tune the model's parameters. Whichever combination of parameters in the grid has the highest accuracy score is used.

In [2]:
import pandas as pd
import numpy as np
from sklearn.model_selection import train_test_split, GridSearchCV
from sklearn.preprocessing import LabelEncoder, MinMaxScaler
from sklearn.tree import DecisionTreeClassifier
from sklearn.metrics import accuracy_score

# Load the dataset
data = pd.read_csv('/kaggle/input/weather-type-classification/weather_classification_data.csv')

# Preprocess the data using a LabelEncoder to transform text data into numerical data
label_encoder = LabelEncoder()
data['Cloud Cover'] = label_encoder.fit_transform(data['Cloud Cover'])
data['Season'] = label_encoder.fit_transform(data['Season'])
data['Location'] = label_encoder.fit_transform(data['Location'])
data['Weather Type'] = label_encoder.fit_transform(data['Weather Type'])

X = data.drop('Weather Type', axis=1)
y = data['Weather Type']

# Scale the numerical features to improve the accuracy score
numerical_cols = X.select_dtypes(include=['float64', 'int64']).columns
scaler = MinMaxScaler()
X[numerical_cols] = scaler.fit_transform(X[numerical_cols])

# Split the data into train and test sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Tune the decision tree model using GridSearchCV
param_grid = {
    'max_depth': [3, 5, 7, None],
    'min_samples_leaf': [1, 2, 4],
    'criterion': ['gini', 'entropy']
}

dt_grid_search = GridSearchCV(DecisionTreeClassifier(), param_grid, cv=5, scoring='accuracy')
dt_grid_search.fit(X_train, y_train)

best_params = dt_grid_search.best_params_
best_dt_model = dt_grid_search.best_estimator_

# Evaluate the best decision tree model on the test set
y_pred = best_dt_model.predict(X_test)
test_acc = accuracy_score(y_test, y_pred)
print(f"Decision Tree Test Accuracy: {test_acc}")

Decision Tree Test Accuracy: 0.9053030303030303


Not bad, a testing accuracy of 90%, but we can do better.

### Model: Random Forest Classifier:
Let's now use a **Random Forest Classifier**. This implemenation won't include **Feature Selection** to improve the model's accuracy score, since we have data that roughly correlates to eachother. This is one of those cases where Feature Selection causes the model to perform worse (89% Testing Accuracy).

In [36]:
import pandas as pd
import numpy as np
from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestClassifier
from sklearn.preprocessing import LabelEncoder, MinMaxScaler
from sklearn.metrics import accuracy_score

# Load the dataset
data = pd.read_csv('/kaggle/input/weather-type-classification/weather_classification_data.csv')

# Initialize the LabelEncoders for the categorical features
label_encoders = {
    'Cloud Cover': LabelEncoder(),
    'Season': LabelEncoder(),
    'Location': LabelEncoder(),
    'Weather Type': LabelEncoder()
}

# Fit the LabelEncoders
for feature in ['Cloud Cover', 'Season', 'Location', 'Weather Type']:
    data[feature] = label_encoders[feature].fit_transform(data[feature])

X = data.drop('Weather Type', axis=1)
y = data['Weather Type']

# Identify numerical columns for scaling
numerical_cols = X.select_dtypes(include=['float64', 'int64']).columns

# Split the data into train and test sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Initialize and fit the scaler on the training data
scaler = MinMaxScaler()
X_train[numerical_cols] = scaler.fit_transform(X_train[numerical_cols])
X_test[numerical_cols] = scaler.transform(X_test[numerical_cols])

features = X_train.columns

# Create the model
rf_model = RandomForestClassifier(n_estimators=100, random_state=42, min_samples_split=5, min_samples_leaf=1)

# Train the Random Forest model
rf_model.fit(X_train, y_train)

# Evaluate the model on the test set
y_pred = rf_model.predict(X_test)
test_acc = accuracy_score(y_test, y_pred)
print(f"Random Forest Test Accuracy: {test_acc}")

# Function to take user input and predict the type of weather
def predict_weather_type():
    input_data = {}
    for feature in features:
        if feature == "Temperature":
            print("Temperature in degrees Celsius, example 17")
        if feature == "Humidity":
            print("Humidity, ranging from 0 to above 100 (outliers), example 45")
        if feature == "Wind Speed":
            print("Wind speed in kilometers per hour, example 15")
        if feature == "Precipitation (%)":
            print("Precipitation, ranging from 0 to above 100 (outliers), example 45")
        if feature == "Cloud Cover":
            print("Cloud Coverage, (overcast, partly cloudy, clear, cloudy), example cloudy")
        if feature == "Atmospheric Pressure":
            print("Atmospheric Pressure in hPa, typically ranges from 800 to 1200, example 956")
        if feature == "UV Index":
            print("UV Index, typically ranges from 0 to 14, example 3")
        if feature == "Season":
            print("Season, (Winter, Summer, Spring, Autumn), example Winter")
        if feature == "Visibility (km)":
            print("Visibility in kilometer, ranges from 0 to 20, example 15")
        if feature == "Location":
            print("Location, (inland, mountain, coastal), example inland")
        value = input(f"Enter the value for '{feature}': ")
        if feature in label_encoders:
            # Check if the input value is within the known categories
            if value in label_encoders[feature].classes_:
                # Encode categorical features
                value = label_encoders[feature].transform([value])[0]
            else:
                print(f"Error: '{value}' is not a known category for '{feature}'. Known categories are: {label_encoders[feature].classes_}")
                return
        input_data[feature] = [float(value) if feature in numerical_cols else value]

    input_df = pd.DataFrame(input_data)

    # Scale the input data
    input_df[numerical_cols] = scaler.transform(input_df[numerical_cols])

    # Make prediction
    prediction = rf_model.predict(input_df[features])
    weather_type = label_encoders['Weather Type'].inverse_transform(prediction)
    print(f"Predicted type of weather: {weather_type[0]}")

Random Forest Test Accuracy: 0.9128787878787878


Just a little bit better, we get 91% instead of 90%, but we don't want to spoil all of the fun, have at it! You can also run the cell below, to test the model for yourself, on your own data, or on data from other datasets, or sources.

In [None]:
# Call the prediction function to test out the model
predict_weather_type()

### The End:
This is the end of this project notebook, make sure to experiment and contribute to help improve the model and implementation. You can browse more of the open-source free projects on our GitHub repository: [https://github.com/Infinitode/OPEN-ARC](https://github.com/Infinitode/OPEN-ARC). If you like this project, make sure to star the repo and contribute your implementation, or help others in the community.

~ Infinitode