Link: https://drive.google.com/drive/folders/1zKTc2r2Dp-cOVKOQ0K0h3Ug2AbgIKA9a?usp=drive_link

# Diabetes Classification with FNN

In this notebook, we use the **Diabetes Dataset** to build a simple feedforward neural network (FNN)
for a binary classification task: predicting whether a patient has diabetes or not.

## Step 1: Import Libraries
We will use:
- **Pandas / Numpy** for data handling
- **Seaborn / Matplotlib** for visualization
- **Scikit-learn** for data splitting and normalization
- **TensorFlow Keras** to build and train the neural network

In [1]:
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense
from tensorflow.keras.optimizers import Adam

## Step 2: Load Dataset
The dataset contains 768 samples with 8 input features (such as glucose, blood pressure, BMI, etc.)
and one target column `Outcome` (0 = No Diabetes, 1 = Diabetes).

In [2]:
data = pd.read_csv("./data/diabetes.csv")
data.head()

Unnamed: 0,Pregnancies,Glucose,BloodPressure,SkinThickness,Insulin,BMI,DiabetesPedigreeFunction,Age,Outcome
0,6,148,72,35,0,33.6,0.627,50,1
1,1,85,66,29,0,26.6,0.351,31,0
2,8,183,64,0,0,23.3,0.672,32,1
3,1,89,66,23,94,28.1,0.167,21,0
4,0,137,40,35,168,43.1,2.288,33,1


## Step 3: Features and Labels
We use all columns except `Outcome` as features (X).
The target label (y) is the `Outcome` column.

In [3]:
X = data.drop("Outcome", axis=1)
y = data["Outcome"]

## Step 4: Train Test Split
We split the dataset into 80% training and 20% testing.
Stratification ensures both sets have the same proportion of classes.

In [4]:
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

Then pick up the last two data as the prediction data

In [5]:
X_pred = X_test[:2]
y_pred = y_test[:2]

X_test = X_test[:-2]
y_test = y_test[:-2]

## Step 5: Scaling
Since features have very different ranges (e.g., glucose vs pregnancies),
we apply standardization to normalize the input values.


In [6]:
scaler = StandardScaler()
X_train = scaler.fit_transform(X_train)
X_test = scaler.transform(X_test)

## Step 6: Build Feedforward Neural Network
- Input layer: 8 features
- Hidden layer 1: 64 neurons, ReLU activation
- Hidden layer 2: 32 neurons, ReLU activation
- Output layer: 1 neuron, Sigmoid activation (probability of diabetes)

In [7]:
model = Sequential([
    Dense(64, input_dim=X_train.shape[1], activation="relu"),
    Dense(32, activation="relu"),
    Dense(1, activation="sigmoid")
])

model.compile(
    optimizer=Adam(learning_rate=0.001),
    loss="binary_crossentropy",
    metrics=["accuracy"]
)

model.summary()

  super().__init__(activity_regularizer=activity_regularizer, **kwargs)


## Step 7: Train the Model
We train for 50 epochs with a batch size of 16.
10% of the training data is used as a validation set.

In [13]:
history = model.fit(
    X_train, y_train,
    validation_split=0.1,
    epochs=30,
    batch_size=16,
    verbose=0
)

## Step 8: Evaluate on Test Data
Finally, we evaluate the trained model on the test set.

In [14]:
loss, acc = model.evaluate(X_test, y_test, verbose=0)
print(f"Test Accuracy: {acc:.4f}")

Test Accuracy: 0.7368


In [15]:
predictions = model.predict(X_pred)

# reshape for neat printing
probs = predictions.reshape(-1)
classes = (predictions > 0.5).astype(int).reshape(-1)
true_labels = y_pred.to_numpy().reshape(-1)

# build a comparison table
results = pd.DataFrame({
    "True Label": true_labels,
    "Predicted Probability": probs,
    "Predicted Class": classes
})

print(results)

[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 11ms/step
   True Label  Predicted Probability  Predicted Class
0           0                    1.0                1
1           0                    1.0                1
