In [1]:
import pandas as pd
df = pd.read_csv("../data/german_credit_numeric_clean.csv")
df.head()

Unnamed: 0,feat_1,feat_2,feat_3,feat_4,feat_5,feat_6,feat_7,feat_8,feat_9,feat_10,...,feat_17,feat_18,feat_19,feat_20,feat_21,feat_22,feat_23,feat_24,target,BadCredit
0,1,6,4,12,5,5,3,4,1,67,...,0,1,0,0,1,0,0,1,1,0
1,2,48,2,60,1,3,2,2,1,22,...,0,1,0,0,1,0,0,1,2,1
2,4,12,4,21,1,4,3,3,1,49,...,0,1,0,0,1,0,1,0,1,0
3,1,42,2,79,1,4,3,4,2,45,...,0,0,0,0,0,0,0,1,1,0
4,1,24,3,49,1,3,3,4,4,53,...,0,1,0,0,0,0,0,1,2,1


In [2]:
#Info about this clean dataset
df.info()
print(df.columns)
print("\nBadCredit value counts:")
print(df["BadCredit"].value_counts())
print("\nBadCredit proportions:")
print(df["BadCredit"].value_counts(normalize=True))

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1000 entries, 0 to 999
Data columns (total 26 columns):
 #   Column     Non-Null Count  Dtype
---  ------     --------------  -----
 0   feat_1     1000 non-null   int64
 1   feat_2     1000 non-null   int64
 2   feat_3     1000 non-null   int64
 3   feat_4     1000 non-null   int64
 4   feat_5     1000 non-null   int64
 5   feat_6     1000 non-null   int64
 6   feat_7     1000 non-null   int64
 7   feat_8     1000 non-null   int64
 8   feat_9     1000 non-null   int64
 9   feat_10    1000 non-null   int64
 10  feat_11    1000 non-null   int64
 11  feat_12    1000 non-null   int64
 12  feat_13    1000 non-null   int64
 13  feat_14    1000 non-null   int64
 14  feat_15    1000 non-null   int64
 15  feat_16    1000 non-null   int64
 16  feat_17    1000 non-null   int64
 17  feat_18    1000 non-null   int64
 18  feat_19    1000 non-null   int64
 19  feat_20    1000 non-null   int64
 20  feat_21    1000 non-null   int64
 21  feat_22    1000

In [3]:
# Features: all feat_1 ... feat_24
feature_cols = [col for col in df.columns if col.startswith("feat_")]

X = df[feature_cols]
y = df["BadCredit"]

print("X Shape is: ", X.shape)
print("y Shape is: ", y.shape)
X.head()


X Shape is:  (1000, 24)
y Shape is:  (1000,)


Unnamed: 0,feat_1,feat_2,feat_3,feat_4,feat_5,feat_6,feat_7,feat_8,feat_9,feat_10,...,feat_15,feat_16,feat_17,feat_18,feat_19,feat_20,feat_21,feat_22,feat_23,feat_24
0,1,6,4,12,5,5,3,4,1,67,...,1,0,0,1,0,0,1,0,0,1
1,2,48,2,60,1,3,2,2,1,22,...,1,0,0,1,0,0,1,0,0,1
2,4,12,4,21,1,4,3,3,1,49,...,1,0,0,1,0,0,1,0,1,0
3,1,42,2,79,1,4,3,4,2,45,...,1,0,0,0,0,0,0,0,0,1
4,1,24,3,49,1,3,3,4,4,53,...,1,1,0,1,0,0,0,0,0,1


In [4]:
#After creating the df, we split the data
from sklearn.model_selection import train_test_split

X_train, X_test, y_train, y_test = train_test_split(
    X,
    y,
    test_size=0.2,
    stratify = y,
    random_state=42
)

print("Training Data size: ",X_train.shape[0])
print("Testing Data size: ",X_test.shape[0])

print("\nTrain class distribution:")
print(y_train.value_counts(normalize=True))

print("\nTest class distribution:")
print(y_test.value_counts(normalize=True))




Training Data size:  800
Testing Data size:  200

Train class distribution:
BadCredit
0    0.7
1    0.3
Name: proportion, dtype: float64

Test class distribution:
BadCredit
0    0.7
1    0.3
Name: proportion, dtype: float64


## Logistic Regression Baseline Model

In [5]:
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import (
accuracy_score,
confusion_matrix,
classification_report
)

log_reg = LogisticRegression(max_iter=1000)
log_reg.fit(X_train, y_train)
y_pred_log = log_reg.predict(X_test)

acc_log = accuracy_score(y_test, y_pred_log)
cm_log = confusion_matrix(y_test, y_pred_log)

print("Logisitc Regression's Accuracy is: ", round(acc_log, 3))
print("\nConfusion Matrix of the model: ", cm_log)

print("\n----------------------")
print("\nClassification report(BadCredit = 1 is the positive class): ")
print(classification_report(y_test, y_pred_log, digits=3))



Logisitc Regression's Accuracy is:  0.77

Confusion Matrix of the model:  [[126  14]
 [ 32  28]]

----------------------

Classification report(BadCredit = 1 is the positive class): 
              precision    recall  f1-score   support

           0      0.797     0.900     0.846       140
           1      0.667     0.467     0.549        60

    accuracy                          0.770       200
   macro avg      0.732     0.683     0.697       200
weighted avg      0.758     0.770     0.757       200



### Logistic Regression Baseline Model Results

- **Train/test split:** 80% train, 20% test (stratified), 1000 total rows.
- **Features:** 24 numeric features (`feat_1` ... `feat_24`)
- **Target:** `BadCredit` (0 = good, 1 = bad)

**Test performance:**

- Accuracy: **0.77**
- Confusion matrix (rows = true, cols = predicted):


**Class 1 (BadCredit = 1) metrics:**

- Precision: **0.667**
- Recall: **0.467**
- F1-score: **0.549**

**Interpretation:**

- A naive model that always predicts **good** would achieve about **70%** accuracy on this dataset.
- Logistic regression improves this to **77%** and correctly flags some bad customers.
- However, recall for bad customers (**≈47%**) is relatively low: the model still misses more than half of the truly bad credit cases (32 out of 60).
- This makes it a reasonable **baseline**, but there is room for improvement, especially in detecting bad credit risks.


## Decision Tree  Model

In [8]:
from sklearn.tree import DecisionTreeClassifier
from sklearn.metrics import (accuracy_score, confusion_matrix, classification_report)

#Quick Decsion Tree Model
tree_clf = DecisionTreeClassifier(
    random_state=42
)

tree_clf.fit(X_train, y_train)

y_pred_tree = tree_clf.predict(X_test)

acc_tree = accuracy_score(y_test, y_pred_tree)
cm_tree = confusion_matrix(y_test, y_pred_tree)

print("*****Evaluation**** \n")

print("Decision Tree - Accuracy:", round(acc_tree, 3))
print("\nConfusion matrix (rows = true, cols = predicted):")
print(cm_tree)

print("\nClassification report (BadCredit = 1 is the positive class):")
print(classification_report(y_test, y_pred_tree, digits=3))


*****Evaluation**** 

Decision Tree - Accuracy: 0.72

Confusion matrix (rows = true, cols = predicted):
[[110  30]
 [ 26  34]]

Classification report (BadCredit = 1 is the positive class):
              precision    recall  f1-score   support

           0      0.809     0.786     0.797       140
           1      0.531     0.567     0.548        60

    accuracy                          0.720       200
   macro avg      0.670     0.676     0.673       200
weighted avg      0.726     0.720     0.722       200



### Decision Tree Baseline Model Results

- Accuracy: **0.72** (slightly above naive baseline of 0.70).
- Compared to logistic regression (0.77 accuracy), the tree:
  - Improves recall for bad customers (class 1) from **0.467** to **0.567**.
  - But reduces precision for bad customers (from **0.667** to **0.531**).
  - Increases the number of false positives (good customers flagged as bad).

**Interpretation:**

- The decision tree learns non-linear rules and is slightly better than a naive
  always-good model.
- It catches more bad credit cases than logistic regression, but at the cost of
  more false alarms and lower overall accuracy.
- A single tree is still quite unstable and prone to overfitting, so the next
  step is to try ensemble methods (Random Forest, Gradient Boosting) which
  usually improve performance and robustness.


## Random Forest Model

In [10]:
from sklearn.ensemble import RandomForestClassifier

rf_clf = RandomForestClassifier(
    n_estimators=200,
    max_depth=None,
    min_samples_leaf=5,
    max_features='sqrt',
    class_weight='balanced',
    n_jobs=-1,
    random_state=42
)

rf_clf.fit(X_train,y_train)
y_pred_rf = rf_clf.predict(X_test)

acc_rf = accuracy_score(y_test,y_pred_rf)
cm_rf = confusion_matrix(y_test,y_pred_rf)


print("Random Forest - Accuracy:", round(acc_rf, 3))
print("\nConfusion matrix (rows = true, cols = predicted):")
print(cm_rf)

print("\nClassification report (BadCredit = 1 is the positive class):")
print(classification_report(y_test, y_pred_rf, digits=3))


Random Forest - Accuracy: 0.755

Confusion matrix (rows = true, cols = predicted):
[[111  29]
 [ 20  40]]

Classification report (BadCredit = 1 is the positive class):
              precision    recall  f1-score   support

           0      0.847     0.793     0.819       140
           1      0.580     0.667     0.620        60

    accuracy                          0.755       200
   macro avg      0.714     0.730     0.720       200
weighted avg      0.767     0.755     0.759       200



### Model comparison so far

Naive baseline (always predict good / BadCredit=0):
- Accuracy: **0.70**
- Recall for BadCredit=1: **0.00** (never catches bad credit)

**Logistic Regression**
- Accuracy: **0.77**
- BadCredit=1:
  - Precision: **0.667**
  - Recall: **0.467**
  - F1: **0.549**
- Interpretation: better than naive, but misses more than half of the bad credit cases.

**Decision Tree (unrestricted depth)**
- Accuracy: **0.72**
- BadCredit=1:
  - Precision: **0.531**
  - Recall: **0.567**
  - F1: **0.548**
- Interpretation: catches more bad cases than logistic (higher recall),
  but overall accuracy is lower and it produces more false alarms.

**Random Forest (200 trees, min_samples_leaf=5, max_features="sqrt", class_weight="balanced")**
- Accuracy: **0.755**
- BadCredit=1:
  - Precision: **0.580**
  - Recall: **0.667**
  - F1: **0.620**
- Interpretation:
  - Improves recall for bad credit to about **67%** while keeping a reasonable precision.
  - Best F1 for the bad class so far.
  - More suitable for a credit risk use-case than a single tree, because it balances
    catching bad customers with acceptable false-positive rate.
