# 模块导入

1. 数据集处理和生成
2. 模型定义
3. 训练模块 (k 折交叉验证)
4. 模型评估

In [1]:
from sklearn.model_selection import StratifiedKFold, cross_val_score

from data_process import data_full
from models import model
from metrics import evaluate_model

params = {
    'C': 1.0,
    'kernel': 'rbf',
    'gamma': 'auto'
}

model = model(params)

# 训练

1. 数据加载进内存
2. k 折交叉验证
3. 子数据集训练

In [4]:
X_train, X_test, y_train, y_test = data_full()
kfold = StratifiedKFold(n_splits=5, shuffle=True, random_state=42)

cv_scores = cross_val_score(model, X_train, y_train, cv=kfold, scoring='accuracy')

Loading baseball data: 100%|██████████| 995/995 [00:00<00:00, 1457.15it/s]
Loading baseball data: 100%|██████████| 995/995 [00:00<00:00, 1457.15it/s]
Loading hockey data: 100%|██████████| 996/996 [00:00<00:00, 1077.39it/s]



# 计算各个子数据集准确率 + 平均准确率

In [5]:
cv_scores

array([0.4984326 , 0.4984326 , 0.80188679, 0.82389937, 0.76415094])

In [1]:
sum([0.4984326 , 0.4984326 , 0.80188679, 0.82389937, 0.76415094]) / 5

0.6773604599999999

# 在全体数据集上重新训练和评估

- 准确率：81.70%
- 查准率：86.61%
- 召回率：81.70%
- F1-score：81.08%

In [6]:
model.fit(X_train, y_train)
y_pred = model.predict(X_test)
metrics_result = evaluate_model(y_test, y_pred)
print(metrics_result)

{'accuracy': 0.8170426065162907, 'precision': 0.8661451422674332, 'recall': 0.8170426065162907, 'f1_score': 0.8107989836684859}
