In [None]:
# 交差検証・モデル評価指標 & 過学習・汎化・正則化

## 1. はじめに
交差検証は、モデルの汎化性能（未知データへの適応力）を評価するための手法です。
モデル評価指標には、精度、再現率、F値などがあります。
過学習は、学習データに対してモデルが複雑になりすぎる現象で、汎化性能が低下します。
正則化は過学習を防ぐための手法です。

## 2. 必要なライブラリのインポート
```python
import numpy as np
import pandas as pd
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split, cross_val_score, KFold
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import accuracy_score, precision_score, recall_score, f1_score, confusion_matrix
from sklearn.preprocessing import StandardScaler
```

## 3. データセットの準備
```python
iris = load_iris()
X = iris.data
y = iris.target
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)
```

## 4. 標準化（前処理）
```python
scaler = StandardScaler()
X_train_scaled = scaler.fit_transform(X_train)
X_test_scaled = scaler.transform(X_test)
```

## 5. ロジスティック回帰モデルの学習
```python
model = LogisticRegression(max_iter=200)
model.fit(X_train_scaled, y_train)
y_pred = model.predict(X_test_scaled)
```

## 6. モデル評価指標
```python
print("Accuracy:", accuracy_score(y_test, y_pred))
print("Precision:", precision_score(y_test, y_pred, average='macro'))
print("Recall:", recall_score(y_test, y_pred, average='macro'))
print("F1 Score:", f1_score(y_test, y_pred, average='macro'))
print("Confusion Matrix:\n", confusion_matrix(y_test, y_pred))
```

## 7. 交差検証
```python
cv = KFold(n_splits=5, shuffle=True, random_state=42)
scores = cross_val_score(model, X, y, cv=cv, scoring='accuracy')
print("Cross-validation scores:", scores)
print("Mean CV score:", np.mean(scores))
```

## 8. 過学習・汎化の確認
```python
train_score = model.score(X_train_scaled, y_train)
test_score = model.score(X_test_scaled, y_test)
print("Train score:", train_score)
print("Test score:", test_score)
```

## 9. 正則化の効果
```python
model_l1 = LogisticRegression(penalty='l1', solver='liblinear', max_iter=200)
model_l1.fit(X_train_scaled, y_train)
print("L1正則化 Test score:", model_l1.score(X_test_scaled, y_test))

model_l2 = LogisticRegression(penalty='l2', solver='liblinear', max_iter=200)
model_l2.fit(X_train_scaled, y_train)
print("L2正則化 Test score:", model_l2.score(X_test_scaled, y_test))
```

## 10. まとめ
- 交差検証で汎化性能を評価できる
- 過学習は学習・テストスコアの差で確認
- 正則化で過学習を抑制できる