
# MLP Classifier for Imbalanced Binary Classification

**Author**: Ammar Yousuf Abrahani 
**Course/Project**: A Novel Deep Q Learning Aomaly Detection  

---

### Description:
This notebook demonstrates how to build and evaluate a simple Multi-Layer Perceptron (MLP) using TensorFlow/Keras for binary classification on an imbalanced dataset. The dataset is synthetically generated using `make_classification` from scikit-learn. One class is underrepresented to simulate real-world anomaly or fraud detection scenarios.

The model is evaluated using accuracy, precision, recall, and macro F1-score to assess its performance in handling class imbalance.

---

### References and Source Links:
- [TensorFlow Keras Sequential Model](https://www.tensorflow.org/api_docs/python/tf/keras/Sequential)  
- [Keras Dense Layer](https://www.tensorflow.org/api_docs/python/tf/keras/layers/Dense)  
- [scikit-learn make_classification](https://scikit-learn.org/stable/modules/generated/sklearn.datasets.make_classification.html)  
- [scikit-learn train_test_split](https://scikit-learn.org/stable/modules/generated/sklearn.model_selection.train_test_split.html)  
- [scikit-learn classification_report](https://scikit-learn.org/stable/modules/generated/sklearn.metrics.classification_report.html)  
- [scikit-learn F1-score](https://scikit-learn.org/stable/modules/generated/sklearn.metrics.f1_score.html)  

---

> **Disclaimer**: This notebook is intended for academic and educational use. Please cite the sources if reused or adapted.


In [1]:

# Imports
import pandas as pd
import numpy as np
from sklearn.datasets import make_classification
from sklearn.model_selection import train_test_split
from sklearn.metrics import classification_report, f1_score
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense
from tensorflow.keras.optimizers import Adam


## Step 1: Generate Synthetic Imbalanced Dataset

In [2]:

# Generate synthetic classification dataset
X, y = make_classification(
    n_samples=1000,
    n_features=4,
    n_informative=3,
    n_redundant=1,
    n_clusters_per_class=1,
    weights=[0.9, 0.1],  # Imbalanced dataset
    flip_y=0.01,
    random_state=42
)


## Step 2: Split Data into Training and Test Sets

In [3]:

# Train-test split
X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.3, random_state=42
)


## Step 3: Define and Train MLP Model

In [4]:

# Build and compile MLP model
model = Sequential([
    Dense(32, activation='relu', input_shape=(X_train.shape[1],)),
    Dense(16, activation='relu'),
    Dense(1, activation='sigmoid')
])
model.compile(optimizer=Adam(0.001), loss='binary_crossentropy', metrics=['accuracy'])

# Train model
model.fit(X_train, y_train, epochs=20, batch_size=16, validation_split=0.2, verbose=1)


Epoch 1/20


  super().__init__(activity_regularizer=activity_regularizer, **kwargs)


[1m35/35[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 5ms/step - accuracy: 0.9045 - loss: 0.5618 - val_accuracy: 0.8929 - val_loss: 0.3799
Epoch 2/20
[1m35/35[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 2ms/step - accuracy: 0.9293 - loss: 0.3067 - val_accuracy: 0.8929 - val_loss: 0.2790
Epoch 3/20
[1m35/35[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 2ms/step - accuracy: 0.9115 - loss: 0.2390 - val_accuracy: 0.9071 - val_loss: 0.2356
Epoch 4/20
[1m35/35[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 2ms/step - accuracy: 0.9344 - loss: 0.2057 - val_accuracy: 0.9143 - val_loss: 0.2042
Epoch 5/20
[1m35/35[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 2ms/step - accuracy: 0.9450 - loss: 0.1578 - val_accuracy: 0.9214 - val_loss: 0.1821
Epoch 6/20
[1m35/35[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 3ms/step - accuracy: 0.9516 - loss: 0.1489 - val_accuracy: 0.9286 - val_loss: 0.1683
Epoch 7/20
[1m35/35[0m [32m━━━━━━━━━━━━━━━━━━━━

<keras.src.callbacks.history.History at 0x2310c547020>

## Step 4: Evaluate the Model Performance

In [5]:

# Evaluate model
y_pred = (model.predict(X_test) > 0.5).astype(int)
report = classification_report(y_test, y_pred, target_names=['Normal', 'Anomaly'], digits=4)
f1_macro = f1_score(y_test, y_pred, average='macro')

print("📊 MLP Classifier Performance:\n")
print(report)
print(f"Macro Average F1-Score: {f1_macro:.4f}")


[1m10/10[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 4ms/step 
📊 MLP Classifier Performance:

              precision    recall  f1-score   support

      Normal     0.9560    0.9962    0.9757       262
     Anomaly     0.9630    0.6842    0.8000        38

    accuracy                         0.9567       300
   macro avg     0.9595    0.8402    0.8879       300
weighted avg     0.9569    0.9567    0.9534       300

Macro Average F1-Score: 0.8879
