<a href="https://colab.research.google.com/github/IfeoluwaRuth/Deep-Learning/blob/main/climater_simulation_outcome_NN.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

The goal of this project is to explore the two libraries with different activation function and optimizer on the Climate model simulation crashes dataset (https://archive.ics.uci.edu/dataset/252/climate+model+simulation+crashes) in order to predict climate model simulation outcomes (fail or succeed) given scaled values of climate model input parameters.

This dataset contains records of simulation crashes encountered during climate model uncertainty quantification (UQ) ensembles.
Column 1: Latin hypercube study ID (study 1 to study 3)
Column 2: simulation ID (run 1 to run 180)
Columns 3-20: values of 18 climate model parameters scaled in the interval [0, 1]
Column 21: simulation outcome (0 = failure, 1 = success)

The MLPClassifier is a library from the sklearn.neural_network module in the Scikit-learn framework, which provides tools for machine learning, including simple neural networks. On the other hand, TensorFlow is a deep learning framework designed for creating more complex and scalable neural network models.

In [None]:
import pandas as pd
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt

from sklearn.preprocessing import LabelEncoder
from sklearn.neural_network import MLPClassifier
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score, confusion_matrix, classification_report

import tensorflow as tf


In [None]:
## import dataset
data = pd.read_csv("pop_failures.csv")

# Print column names
print("data column name:")
print(data.columns)

# Print dimensions of the dataset
print("The dimension of the dataset is:")
print(data.shape)

In [None]:
data.info()

In [None]:
data.describe()

In [None]:
# Group by 'outcome' and count the occurrences of each label to ascertain the state of balancing in the dataset
data_label_count = data.groupby('outcome').size().reset_index(name='count')
print(data_label_count)

data_label_count = data['outcome'].value_counts()
# Plotting
plt.figure(figsize=(8, 5))
data_label_count.plot(kind='bar', color='skyblue')
plt.title("Distribution of climate model success outcome")
plt.xlabel("Outcome")
plt.ylabel("count")
plt.xticks(rotation=0)
plt.savefig("data_label_hist.png")  #save the plot
plt.show()

In [None]:
# Separate target and features
X = data.drop(columns=['outcome','Study', 'Run'])
y = data['outcome']


# Label Encode the target variable (Toilet_type)
le = LabelEncoder()
y_encoded = le.fit_transform(y)

# Split the dataset into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y_encoded, test_size=0.3, random_state=123)

In [None]:
models = {
  'model_sgd' : MLPClassifier(max_iter=500,hidden_layer_sizes=(300,), activation='relu', solver = 'sgd'),
  'model_adam': MLPClassifier(max_iter=500,hidden_layer_sizes=(300,), activation='relu', solver = 'adam'),
  'model_lbfgs' : MLPClassifier(max_iter=500,hidden_layer_sizes=(300,), activation='relu', solver = 'lbfgs')
}
# Train and evaluate each model
results = {}
for name, model in models.items():
    model.fit(X_train, y_train)
    y_pred = model.predict(X_test)
    report = classification_report(y_test, y_pred, output_dict=True)
    results[name] = report

# Display the evaluation results for each model
for model_name, result in results.items():
    print(f"Model: {model_name}")
    print(f"Accuracy: {result['accuracy']:.2f}")
    print(f"Weighted F1-score: {result['weighted avg']['f1-score']:.2f}")
    print("\n")