<a href="https://colab.research.google.com/github/aimee-annabelle/Peer_Group_10_Water_Quality_Model/blob/Armand/Armand_Kayiranga_formative_II.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

<a href="https://colab.research.google.com/github/yourrepo/Armand_Kayiranga_Model/blob/main/Armand_Kayiranga_formative_II.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Excercise - Creating our own custom Model

This is a notebook that provides a quick overview of how to create your own custom model. You will be creating a simple model.
You will be utilizing Keras and Tensorflow

## Water Quality Dataset

This dataset contains water quality measurements and assessments related to potability, which is the suitability of water for human consumption. The dataset's primary objective is to provide insights into water quality parameters and assist in determining whether the water is potable or not. Each row in the dataset represents a water sample with specific attributes, and the "Potability" column indicates whether the water is suitable for consumption.

https://www.kaggle.com/datasets/uom190346a/water-quality-and-potability?select=water_potability.csv

In [None]:
import matplotlib.pyplot as plt
import warnings
warnings.filterwarnings('ignore')

In [None]:
#LOAD THE DATA
import pandas as pd
df = pd.read_csv("https://raw.githubusercontent.com/aimee-annabelle/Peer_Group_10_Water_Quality_Model/main/water_potability.csv")
print(df.info())
print(df.head())

In [None]:
df.isnull().sum()

In [None]:
missing_percentage = df.isnull().mean() * 100
print(missing_percentage)

In [None]:
df.fillna(df.median(), inplace=True)
df.isnull().sum()

In [None]:
df.describe()

Plot the Data Appropriately

In [None]:
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
X = df.drop("Potability", axis=1)
y = df["Potability"]
scaler = StandardScaler()
X_scaled = scaler.fit_transform(X)
trainX, testX, trainy, testy = train_test_split(X_scaled, y, test_size=0.3, random_state=42)

### Model Definition by Armand Kayiranga

In [None]:
def model_armand_kayiranga():
    from tensorflow.keras.models import Sequential
    from tensorflow.keras.layers import Dense
    from tensorflow.keras import regularizers
    from tensorflow.keras.optimizers import Adam

    model = Sequential([
        Dense(32, activation='relu', input_shape=(X.shape[1],),
              kernel_regularizer=regularizers.l1(0.001)),
        Dense(16, activation='relu',
              kernel_regularizer=regularizers.l1(0.001)),
        Dense(1, activation='sigmoid')
    ])

    model.compile(optimizer=Adam(learning_rate=0.001),
                  loss='binary_crossentropy',
                  metrics=['accuracy'])
    return model

In [None]:
from tensorflow.keras.callbacks import EarlyStopping
es = EarlyStopping(monitor='val_loss', patience=10)
model = model_armand_kayiranga()
history = model.fit(trainX, trainy,
                    validation_data=(testX, testy),
                    epochs=100,
                    batch_size=32,
                    verbose=1,
                    callbacks=[es])

In [None]:
_, train_acc = model.evaluate(trainX, trainy, verbose=0)
_, test_acc = model.evaluate(testX, testy, verbose=0)
print('Train Accuracy: %.3f, Test Accuracy: %.3f' % (train_acc, test_acc))

In [None]:
plt.plot(history.history['loss'], label='train loss')
plt.plot(history.history['val_loss'], label='val loss')
plt.legend()
plt.title("Training and Validation Loss")
plt.xlabel("Epoch")
plt.ylabel("Loss")
plt.grid(True)
plt.show()