### LogisticRegression 邏輯斯迴歸模型

In [68]:
import pandas as pd
import numpy as np
from sklearn.linear_model import LogisticRegression, LogisticRegressionCV
from sklearn.model_selection import train_test_split
from sklearn.metrics import confusion_matrix
from sklearn.metrics import accuracy_score, f1_score, recall_score, precision_score

In [69]:
df = pd.read_csv('diabetes.csv')
df

Unnamed: 0,Pregnancies,Glucose,BloodPressure,SkinThickness,Insulin,BMI,DiabetesPedigreeFunction,Age,Outcome
0,6,148,72,35,0,33.6,0.627,50,1
1,1,85,66,29,0,26.6,0.351,31,0
2,8,183,64,0,0,23.3,0.672,32,1
3,1,89,66,23,94,28.1,0.167,21,0
4,0,137,40,35,168,43.1,2.288,33,1
...,...,...,...,...,...,...,...,...,...
763,10,101,76,48,180,32.9,0.171,63,0
764,2,122,70,27,0,36.8,0.340,27,0
765,5,121,72,23,112,26.2,0.245,30,0
766,1,126,60,0,0,30.1,0.349,47,1


In [70]:
X = df.drop(columns=['Outcome'])
y = df[['Outcome']]

In [71]:
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size= 0.3, random_state= 33)

### 各參數的詳細解釋：

1. **`max_iter=500`**：
   - 設定最大迭代次數。

2. **`random_state=33`**：
   - 設定隨機數種子，以便在每次執行時獲得相同的隨機結果。

3. **`C=1`**：
   - `C` 值越小，正則化強度越大，即懲罰越強。（Ｃ與懲罰項為倒數）
	- `C` 值越大，正則化強度越小，即懲罰越弱。（Ｃ與懲罰項為倒數）
   - 過擬合的情況下，應該將 C 設得較小 ; 欠擬合的情況下，應該將 C 設得較大。
   - 默認值是 1。

4. **`tol=0.0001`**：
   - 設定優化過程的收斂容忍度，當優化過程的損失函數變化小於 `tol` 時，停止迭代。這是模型訓練停止的條件之一。
   - 默認值是 `1e-4` (0.0001)。

In [72]:
lr = LogisticRegression(max_iter=500, random_state= 33, C=1, tol= 0.0001)

In [73]:
lr.fit(X_train, y_train)
y_pred = lr.predict(X_test)
train_score = lr.score(X_train, y_train)
test_score = lr.score(X_test, y_test)
confusion_matrix = confusion_matrix(y_test, y_pred)

  y = column_or_1d(y, warn=True)


In [74]:
print(confusion_matrix) #混淆矩陣
print(f'train score :{train_score}')
print(f'test score :{test_score}')

[[131  15]
 [ 41  44]]
train score :0.7783985102420856
test score :0.7575757575757576


# 模糊矩陣
|         | 預測 NO | 預測 YES |
|---------|----------|-----------|
| 實際 NO | TN       | FP        |
| 實際 YES| FN       | TP        |

### 數據解讀
- Accuracy(準確率) :有多少比率的個體被分類正確
    -   Accuracy = (TP + TN) / (TP + FP + TN + FN)
- Recall（召回率）：表示模型對正類檢測的靈敏度，越高越能識別真實正類。
    -   Recall/Sensitivity = TP / (TP + FN)
- Precision（精確度）：表示模型預測正類的準確度，越高越少出現假陽性。
    -   Precision = TP / (TP + FP)
- F1 Score ：提供了精確率和召回率之間的平衡，是綜合評估模型性能的重要指標 ; 1表示模型輸出的最好，0表示最差。

In [75]:
accuracy = accuracy_score(y_test, y_pred)
recall = recall_score(y_test, y_pred)
precision = precision_score(y_test, y_pred)
f1 = f1_score(y_test, y_pred)
score = {
    "Metric": ["Accuracy", "Recall", "Precision", "F1 Score"],
    "Value": [accuracy, recall, precision, f1]
}
df = pd.DataFrame(score)
df

Unnamed: 0,Metric,Value
0,Accuracy,0.757576
1,Recall,0.517647
2,Precision,0.745763
3,F1 Score,0.611111
