XGBoost：需要將類別型變數轉換為數值型（如 One-Hot 編碼或 Label 編碼）  
LightGBM 和 CatBoost：能自動處理類別型變數(只需設置 categorical_feature)

### LightGBM

In [13]:
import lightgbm as lgb
import pandas as pd
import matplotlib.pyplot as plt
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.preprocessing import LabelEncoder

card = pd.read_csv("card_clear2.csv")
card = card.drop("Unnamed: 0",axis=1)
X = card.drop("Credit_Score",axis=1)
y = card["Credit_Score"]

label_encoder = LabelEncoder()
y_encoded = label_encoder.fit_transform(y)
classes = label_encoder.classes_
for i, class_label in enumerate(classes):
    print(f"{class_label} -> {i}")


X_train, X_test, y_train, y_test = train_test_split(X, y_encoded, test_size=0.2, random_state=2)
numerical_cols = X.select_dtypes(include=["number"]).columns 
categorical_cols = ['Occupation','Payment_of_Min_Amount','Credit_Mix']

scale = StandardScaler()
X_train[numerical_cols] = scale.fit_transform(X_train[numerical_cols])
X_test[numerical_cols] = scale.transform(X_test[numerical_cols])

X_train1 = X_train[list(numerical_cols)+categorical_cols]

for col in categorical_cols:
    X_train1[col] = X_train1[col].astype('category')
# LightGBM Dataset 格式
lgb_data = lgb.Dataset(X_train1, label=y_train, categorical_feature=categorical_cols)

# 訓練模型
params = {
    'objective': 'multiclass',  # 多分類
    'num_class': len(np.unique(y_train)),  # 類別數
    'boosting_type': 'gbdt',
    'metric': 'multi_logloss',
    'learning_rate': 0.1,
    'num_leaves': 31,
    'feature_fraction': 0.8,
    'categorical_feature': categorical_cols
}

gbdt_model = lgb.train(params, lgb_data, num_boost_round=100)

# 提取特徵重要性
feature_importances = gbdt_model.feature_importance(importance_type='gain')
feature_names = X_train1.columns
importance_df = pd.DataFrame({
    'Feature': feature_names,
    'Importance': feature_importances
}).sort_values(by='Importance', ascending=False)

# 查看最不重要的特徵
print("最不重要的特徵：")
print(importance_df.tail())

importance_df = pd.DataFrame({
    'Feature': X_train1.columns,
    'Importance': feature_importances
}).sort_values(by='Importance', ascending=False)

print(importance_df)


Good -> 0
Poor -> 1
Standard -> 2
[LightGBM] [Info] Auto-choosing col-wise multi-threading, the overhead of testing was 0.003527 seconds.
You can set `force_col_wise=true` to remove the overhead.
[LightGBM] [Info] Total Bins 2270
[LightGBM] [Info] Number of data points in the train set: 80000, number of used features: 18
[LightGBM] [Info] Start training from score -1.723727
[LightGBM] [Info] Start training from score -1.238996
[LightGBM] [Info] Start training from score -0.631253


A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  X_train1[col] = X_train1[col].astype('category')
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  X_train1[col] = X_train1[col].astype('category')
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  X_train1[col] = X_train1[col].astype('category')
Please use categorical_feature argument of the Dataset con

最不重要的特徵：
                     Feature   Importance
0                        Age  6082.533182
13   Amount_invested_monthly  4136.549455
6                Num_of_Loan  3286.195930
14           Monthly_Balance  1892.656028
10  Credit_Utilization_Ratio   962.478561
                     Feature    Importance
17                Credit_Mix  99954.025783
9           Outstanding_Debt  82863.438280
5              Interest_Rate  45585.298458
4            Num_Credit_Card  19087.964773
16     Payment_of_Min_Amount  17954.439403
7        Delay_from_due_date  15618.453998
11        Credit_History_Age  10864.401712
12       Total_EMI_per_month  10136.986126
15                Occupation   9370.076226
1              Annual_Income   8963.548150
2      Monthly_Inhand_Salary   8883.359397
3          Num_Bank_Accounts   7855.303784
8       Num_Credit_Inquiries   6826.400855
0                        Age   6082.533182
13   Amount_invested_monthly   4136.549455
6                Num_of_Loan   3286.195930
14      

In [15]:
import xgboost as xgb
import pandas as pd
import numpy as np
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.preprocessing import LabelEncoder

# 讀取數據
card = pd.read_csv("card_clear2.csv")
card = card.drop("Unnamed: 0",axis=1)
X = card.drop("Credit_Score",axis=1)
y = card["Credit_Score"]

# Label Encoding
label_encoder = LabelEncoder()
y_encoded = label_encoder.fit_transform(y)
classes = label_encoder.classes_
for i, class_label in enumerate(classes):
    print(f"{class_label} -> {i}")

# 分割數據集
X_train, X_test, y_train, y_test = train_test_split(X, y_encoded, test_size=0.2, random_state=2)

# 標準化數值型特徵
numerical_cols = X.select_dtypes(include=["number"]).columns 
categorical_cols = ['Occupation','Payment_of_Min_Amount','Credit_Mix']

scale = StandardScaler()
X_train[numerical_cols] = scale.fit_transform(X_train[numerical_cols])
X_test[numerical_cols] = scale.transform(X_test[numerical_cols])

# 編碼類別型特徵
X_train1 = X_train[list(numerical_cols) + categorical_cols]
X_test1 = X_test[list(numerical_cols) + categorical_cols]

# 類別特徵需要轉換為類別型（XGBoost 支援此方式，但需要手動處理）
for col in categorical_cols:
    X_train1[col] = X_train1[col].astype('category')
    X_test1[col] = X_test1[col].astype('category')

# 構建 XGBoost 數據集
dtrain = xgb.DMatrix(X_train1, label=y_train)
dtest = xgb.DMatrix(X_test1, label=y_test)

# 設置 XGBoost 參數
params = {
    'objective': 'multi:softmax',  # 多分類問題
    'num_class': len(np.unique(y_train)),  # 類別數
    'eval_metric': 'mlogloss',  # 計算多類別對數損失
    'learning_rate': 0.1,
    'max_depth': 6,
    'subsample': 0.8,
    'colsample_bytree': 0.8
}

# 訓練 XGBoost 模型
num_round = 100
bst = xgb.train(params, dtrain, num_round)

# 提取特徵重要性
feature_importances = bst.get_fscore(fmap='auto')

# 轉換成 DataFrame 並按重要性排序
importance_df = pd.DataFrame({
    'Feature': list(feature_importances.keys()),
    'Importance': list(feature_importances.values())
}).sort_values(by='Importance', ascending=False)

# 查看最不重要的特徵
print("最不重要的特徵：")
print(importance_df.tail())

# 打印所有特徵的重要性
print(importance_df)


Good -> 0
Poor -> 1
Standard -> 2


A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  X_train1[col] = X_train1[col].astype('category')
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  X_test1[col] = X_test1[col].astype('category')
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  X_train1[col] = X_train1[col].astype('category')
A value is trying to be set on a copy of a slice from a Data

ValueError: DataFrame.dtypes for data must be int, float, bool or category. When categorical type is supplied, the experimental DMatrix parameter`enable_categorical` must be set to `True`.  Invalid columns:Occupation: category, Payment_of_Min_Amount: category, Credit_Mix: category

In [None]:
import catboost as cb
import pandas as pd
import numpy as np
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.preprocessing import LabelEncoder

# 讀取數據
card = pd.read_csv("card_clear2.csv")
card = card.drop("Unnamed: 0",axis=1)
X = card.drop("Credit_Score",axis=1)
y = card["Credit_Score"]

# Label Encoding
label_encoder = LabelEncoder()
y_encoded = label_encoder.fit_transform(y)
classes = label_encoder.classes_
for i, class_label in enumerate(classes):
    print(f"{class_label} -> {i}")

# 分割數據集
X_train, X_test, y_train, y_test = train_test_split(X, y_encoded, test_size=0.2, random_state=2)

# 標準化數值型特徵
numerical_cols = X.select_dtypes(include=["number"]).columns 
categorical_cols = ['Occupation','Payment_of_Min_Amount','Credit_Mix']

scale = StandardScaler()
X_train[numerical_cols] = scale.fit_transform(X_train[numerical_cols])
X_test[numerical_cols] = scale.transform(X_test[numerical_cols])

# 編碼類別型特徵
X_train1 = X_train[list(numerical_cols) + categorical_cols]
X_test1 = X_test[list(numerical_cols) + categorical_cols]

# 在 CatBoost 中，類別型特徵可以直接使用類別資料類型
for col in categorical_cols:
    X_train1[col] = X_train1[col].astype('category')
    X_test1[col] = X_test1[col].astype('category')

# 設置 CatBoost 的參數
params = {
    'iterations': 1000,
    'learning_rate': 0.1,
    'depth': 6,
    'loss_function': 'MultiClass',  # 多類別分類
    'cat_features': categorical_cols,  # 指定類別型特徵
    'verbose': 200  # 顯示訓練過程
}

# 訓練 CatBoost 模型
model = cb.CatBoostClassifier(**params)
model.fit(X_train1, y_train)

# 提取特徵重要性
feature_importances = model.get_feature_importance(type='FeatureImportance')

# 轉換成 DataFrame 並按重要性排序
importance_df = pd.DataFrame({
    'Feature': list(X_train1.columns),
    'Importance': feature_importances
}).sort_values(by='Importance', ascending=False)

# 查看最不重要的特徵
print("最不重要的特徵：")
print(importance_df.tail())

# 打印所有特徵的重要性
print(importance_df)
