
# 🔍 銀行客戶流失預測專案 - Bank Customer Churn Prediction

本專案使用 Logistic Regression 與 Random Forest 建立分類模型，以預測銀行客戶是否有流失傾向，並進一步分析其影響因子。


In [None]:

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

from sklearn.model_selection import train_test_split
from sklearn.preprocessing import LabelEncoder
from sklearn.linear_model import LogisticRegression
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import confusion_matrix, classification_report



## 📦 資料讀取與前處理

In [None]:

from pathlib import Path
data_path = Path('data/Bank Customer Churn Prediction.csv')
if not data_path.exists():
    raise FileNotFoundError(f'Dataset not found at {data_path.resolve()}')
df = pd.read_csv(data_path)

# Label Encoding 處理類別欄位
label = LabelEncoder()
df['gender'] = label.fit_transform(df['gender'])
df['country'] = label.fit_transform(df['country'])

# 特徵與目標變數拆分
X = df.drop(['churn'], axis=1)
y = df['churn']

# 切分訓練集與測試集
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)


## 🤖 模型訓練

In [None]:

# Logistic Regression
lr = LogisticRegression(max_iter=1000)
lr.fit(X_train, y_train)
y_pred_lr = lr.predict(X_test)

# Random Forest
rf = RandomForestClassifier(n_estimators=100, random_state=42)
rf.fit(X_train, y_train)
y_pred_rf = rf.predict(X_test)


## 📊 模型評估

In [None]:

# Confusion Matrix 與報告
print("Logistic Regression:")
print(confusion_matrix(y_test, y_pred_lr))
print(classification_report(y_test, y_pred_lr))

print("Random Forest:")
print(confusion_matrix(y_test, y_pred_rf))
print(classification_report(y_test, y_pred_rf))


## 📈 模型比較與視覺化圖表


### 模型比較圖：
![](./images/model_comparison.png)

### Logistic Regression 混淆矩陣：
![](./images/confusion_logistic.png)

### Random Forest 混淆矩陣：
![](./images/confusion_rf.png)

### 特徵重要性（以 Logistic Regression 為例）：
![](./images/feature_importance_logistic.png)



## ✅ 結論與應用

- 透過簡單的模型可以快速預測客戶是否流失
- 特徵重要性顯示「是否活躍」、「性別」、「國籍」等是關鍵因素
- 可用於銀行 CRM 策略應用與流失預警系統建構
