# ASSIGNMENT
# Chapter 2 – Group Exercise 3: Neural Networks
## Group Members

| Name | Matriculation Number |
|------|----------------------|
| Arya Shinde | 100006646 (co-ordinator) |
| Mirang Bhandari | 100007049 |
| Yash Annapure | 100006547 |
| Anushka Sawant | 100006644 |

# Importing Packages

In [1]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import warnings
warnings.filterwarnings("ignore")

from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.metrics import (
    accuracy_score, precision_score, recall_score,
    f1_score, confusion_matrix, classification_report,
    roc_curve, auc
)

from imblearn.over_sampling import SMOTE

import tensorflow as tf
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense, Dropout, BatchNormalization
from tensorflow.keras.callbacks import EarlyStopping, ReduceLROnPlateau
from tensorflow.keras.optimizers import Adam

from ucimlrepo import fetch_ucirepo

In [2]:
pip install ucimlrepo



Please run this file in Google Colab as Tensorflow might not be compatible with Python 3.13 version in some cases.

# Data Loading

In [3]:
dataset = fetch_ucirepo(id=572)

df = pd.DataFrame(data=dataset.data.features, columns=dataset.data.feature_names)
df['Bankrupt?'] = dataset.data.targets.values

In [4]:
df

Unnamed: 0,ROA(C) before interest and depreciation before interest,ROA(A) before interest and % after tax,ROA(B) before interest and depreciation after tax,Operating Gross Margin,Realized Sales Gross Margin,Operating Profit Rate,Pre-tax net Interest Rate,After-tax net Interest Rate,Non-industry income and expenditure/revenue,Continuous interest rate (after tax),...,Total assets to GNP price,No-credit Interval,Gross Profit to Sales,Net Income to Stockholder's Equity,Liability to Equity,Degree of Financial Leverage (DFL),Interest Coverage Ratio (Interest expense to EBIT),Net Income Flag,Equity to Liability,Bankrupt?
0,0.370594,0.424389,0.405750,0.601457,0.601457,0.998969,0.796887,0.808809,0.302646,0.780985,...,0.009219,0.622879,0.601453,0.827890,0.290202,0.026601,0.564050,1,0.016469,1
1,0.464291,0.538214,0.516730,0.610235,0.610235,0.998946,0.797380,0.809301,0.303556,0.781506,...,0.008323,0.623652,0.610237,0.839969,0.283846,0.264577,0.570175,1,0.020794,1
2,0.426071,0.499019,0.472295,0.601450,0.601364,0.998857,0.796403,0.808388,0.302035,0.780284,...,0.040003,0.623841,0.601449,0.836774,0.290189,0.026555,0.563706,1,0.016474,1
3,0.399844,0.451265,0.457733,0.583541,0.583541,0.998700,0.796967,0.808966,0.303350,0.781241,...,0.003252,0.622929,0.583538,0.834697,0.281721,0.026697,0.564663,1,0.023982,1
4,0.465022,0.538432,0.522298,0.598783,0.598783,0.998973,0.797366,0.809304,0.303475,0.781550,...,0.003878,0.623521,0.598782,0.839973,0.278514,0.024752,0.575617,1,0.035490,1
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
6814,0.493687,0.539468,0.543230,0.604455,0.604462,0.998992,0.797409,0.809331,0.303510,0.781588,...,0.000466,0.623620,0.604455,0.840359,0.279606,0.027064,0.566193,1,0.029890,0
6815,0.475162,0.538269,0.524172,0.598308,0.598308,0.998992,0.797414,0.809327,0.303520,0.781586,...,0.001959,0.623931,0.598306,0.840306,0.278132,0.027009,0.566018,1,0.038284,0
6816,0.472725,0.533744,0.520638,0.610444,0.610213,0.998984,0.797401,0.809317,0.303512,0.781546,...,0.002840,0.624156,0.610441,0.840138,0.275789,0.026791,0.565158,1,0.097649,0
6817,0.506264,0.559911,0.554045,0.607850,0.607850,0.999074,0.797500,0.809399,0.303498,0.781663,...,0.002837,0.623957,0.607846,0.841084,0.277547,0.026822,0.565302,1,0.044009,0


In [5]:
print(f"\n Dataset Shape: {df.shape}")


 Dataset Shape: (6819, 96)


In [6]:
print(f"\n Dataset Description: {df.describe()}")


 Dataset Description:         ROA(C) before interest and depreciation before interest  \
count                                        6819.000000          
mean                                            0.505180          
std                                             0.060686          
min                                             0.000000          
25%                                             0.476527          
50%                                             0.502706          
75%                                             0.535563          
max                                             1.000000          

        ROA(A) before interest and % after tax  \
count                              6819.000000   
mean                                  0.558625   
std                                   0.065620   
min                                   0.000000   
25%                                   0.535543   
50%                                   0.559802   
75%                    

In [7]:
# Rename target column for clarity (first column is Bankrupt?)
df.columns = df.columns.str.strip()  # Remove any whitespace
target_col = "Bankrupt?"

In [8]:
# Target Distribution
print(f" Target Distribution:\n{df[target_col].value_counts()}")
print(f"\n  Class Imbalance Ratio: {df[target_col].value_counts()[0] / df[target_col].value_counts()[1]:.1f}:1 (Non-Bankrupt:Bankrupt)")

 Target Distribution:
Bankrupt?
0    6599
1     220
Name: count, dtype: int64

  Class Imbalance Ratio: 30.0:1 (Non-Bankrupt:Bankrupt)


We can see that the there is an imbalance in target labels. Hence we will have to treat it.

# Data Preprocessing

In [9]:
# Separate features and target
X = df.drop(columns=[target_col])
y = df[target_col]

In [10]:
# Check and handle missing values
missing = X.isnull().sum().sum()
print(missing)

0


No missing values found so did not perform any missing value imputation steps.

## Train Test Split

In [11]:
# Train/Test Split (80/20)
X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.2, random_state=42, stratify=y
)

## Scaling

In [12]:
# Feature Scaling (StandardScaler — zero mean, unit variance)
scaler = StandardScaler()
X_train_scaled = scaler.fit_transform(X_train)
X_test_scaled  = scaler.transform(X_test)

## SMOTE

Using SMOTE to treat label imbalance

In [13]:
smote = SMOTE(random_state=42)
X_train_res, y_train_res = smote.fit_resample(X_train_scaled, y_train)

In [14]:
print(pd.Series(y_train_res).value_counts().sort_index())

Bankrupt?
0    5279
1    5279
Name: count, dtype: int64


# Feedforward Neural Network (FNN / MLP) Model

 Taiwanese Bankruptcy dataset is structured tabular data with 95 numerical financial features, thus feedforward neural network is suitable for it.

In [15]:
def build_model(input_dim):
    model = Sequential([
        # Input Layer
        Dense(128, input_dim=input_dim, activation='relu'),
        BatchNormalization(),
        Dropout(0.3),

        # Hidden Layer 1
        Dense(64, activation='relu'),
        BatchNormalization(),
        Dropout(0.3),

        # Hidden Layer 2
        Dense(32, activation='relu'),
        Dropout(0.2),

        # Output Layer (Sigmoid for binary classification)
        Dense(1, activation='sigmoid')
    ])

    model.compile(
        optimizer=Adam(learning_rate=0.001),
        loss='binary_crossentropy',
        metrics=['accuracy']
    )
    return model

model = build_model(input_dim=X_train_res.shape[1])
model.summary()

## Train the Neural Network

In [16]:
# Callbacks
early_stop = EarlyStopping(
    monitor='val_loss', patience=15, restore_best_weights=True, verbose=1
)
lr_scheduler = ReduceLROnPlateau(
    monitor='val_loss', factor=0.5, patience=7, verbose=1, min_lr=1e-6
)

history = model.fit(
    X_train_res, y_train_res,
    validation_split=0.2,
    epochs=100,
    batch_size=64,
    callbacks=[early_stop, lr_scheduler],
    verbose=1
)

Epoch 1/100
[1m132/132[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m5s[0m 8ms/step - accuracy: 0.7903 - loss: 0.4352 - val_accuracy: 0.9697 - val_loss: 0.2409 - learning_rate: 0.0010
Epoch 2/100
[1m132/132[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 6ms/step - accuracy: 0.8866 - loss: 0.2730 - val_accuracy: 0.9607 - val_loss: 0.2371 - learning_rate: 0.0010
Epoch 3/100
[1m132/132[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 6ms/step - accuracy: 0.9069 - loss: 0.2242 - val_accuracy: 0.9920 - val_loss: 0.1347 - learning_rate: 0.0010
Epoch 4/100
[1m132/132[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 8ms/step - accuracy: 0.9202 - loss: 0.2003 - val_accuracy: 0.9891 - val_loss: 0.1280 - learning_rate: 0.0010
Epoch 5/100
[1m132/132[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 7ms/step - accuracy: 0.9281 - loss: 0.1778 - val_accuracy: 0.9986 - val_loss: 0.1008 - learning_rate: 0.0010
Epoch 6/100
[1m132/132[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0

## Make Predictions

In [17]:
y_pred_prob = model.predict(X_test_scaled).flatten()
y_pred      = (y_pred_prob >= 0.5).astype(int)

[1m43/43[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 2ms/step


##  Evaluation Metrics (Classification)

In [18]:
print('Accuracy :', accuracy_score(y_test, y_pred))
print('Precision:', precision_score(y_test, y_pred))
print('Recall   :', recall_score(y_test, y_pred))
print('F1-score :', f1_score(y_test, y_pred))

Accuracy : 0.9655425219941349
Precision: 0.4482758620689655
Recall   : 0.29545454545454547
F1-score : 0.3561643835616438


## Conclusion
- Feedforward Neural Network works for Taiwanese Bankruptcy Prediction dataset to classify companies as bankrupt or non-bankrupt using 95 financial ratios, with SMOTE handling the severe class imbalance.
- The model was evaluated using Accuracy, Precision, Recall, and F1-Score, demonstrating that FNN is an effective and reliable approach for financial risk prediction, where Recall holds the highest importance to minimize the cost of missing true bankruptcy cases.