We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
这个小bug可能很少有人碰到。在做ScoreCard的过程中,一般是直接使用默认的pdo=60, rate=2, base_odds=35, base_score=750。这些值一般都不会去做任何的设置。这些值在计算每个bins的分数其他的很大作用。
在ScoreCard.export的时候,发现bins的分数是保存下来了,但是完全没有保存下来pdo,rate,base_odds,base_score的值。而在ScoreCard().load()的时候,虽然加载进去了bins的分数,但是ScoreCard()中的self.factor和self.offset使用的却还是默认的值所计算出来的分数。从而前后的predict的结果不一致。
以下为问题的复现,其中用的代码是pipeline的分支
import os import numpy as np import pytest import pandas as pd from os.path import join from toad.pipeline import Toad_Pipeline from sklearn.model_selection import train_test_split, KFold, GridSearchCV from sklearn.linear_model import LogisticRegression from sklearn.datasets import load_breast_cancer lb = load_breast_cancer() X = pd.DataFrame(lb['data'], columns=lb['feature_names']) Y = lb['target'] Xtrain, Xtest, Ytrain, Ytest = train_test_split(X, Y, test_size=0.3, random_state=42) toad_pipe = Toad_Pipeline() toad_pipe = toad_pipe.set_params(**params) toad_pipe = toad_pipe.fit(Xtrain, Ytrain) Xtrain_ = toad_pipe.transform(Xtrain) from toad import ScoreCard card1 = ScoreCard( combiner=toad_pipe.combiner, transer=toad_pipe.woe, base_score=1000, pdo=10, rate=5, base_odds=50 ) card1 = card1.fit(Xtrain_, Ytrain) card1_result = card1.export() print(card1_result['mean texture']) print(card1.predict(Xtrain)[:5])
[910.61408963 891.17084246 960.33222975 900.2856913 932.18954537]
# 导入ScoreCard card2 = ScoreCard().load(card1_result) print(card2.predict(Xtrain_)[:5])
[910.63 891.17 960.33 900.3 932.21]
# 由于self.offset和self.factor有明显的不一样,在计算score_to_proba或者proba_to_score的时候有非常明显的差别 card1.score_to_proba(970), card2.score_to_proba(970)
(0.7142857142857154, 0.002244808514989028)
解决方案
{ 'card_params' : { 'pdo': 60, 'rate': 2, 'base_odds' : 35, 'base_score' : 750 }, 'bins_scores' : { ... } }
@classmethod def load( cls, json_dict, ): params = json_dict['card_params'] bins = json_dict['bins_scores'] return cls(**params)._load(bins)
The text was updated successfully, but these errors were encountered:
@FrankDataAnalystPython 好提议,其实之前就一直再考虑改这块儿的内容,因为考虑到导出的json格式变化后会有版本间的兼容性问题,所以一直没有动手。我会考虑设计一版新的json导出格式来进行升级,同时考虑一下兼容性的问题如果解决
Sorry, something went wrong.
赞同赞同
No branches or pull requests
这个小bug可能很少有人碰到。在做ScoreCard的过程中,一般是直接使用默认的pdo=60, rate=2, base_odds=35, base_score=750。这些值一般都不会去做任何的设置。这些值在计算每个bins的分数其他的很大作用。
在ScoreCard.export的时候,发现bins的分数是保存下来了,但是完全没有保存下来pdo,rate,base_odds,base_score的值。而在ScoreCard().load()的时候,虽然加载进去了bins的分数,但是ScoreCard()中的self.factor和self.offset使用的却还是默认的值所计算出来的分数。从而前后的predict的结果不一致。
以下为问题的复现,其中用的代码是pipeline的分支
解决方案
The text was updated successfully, but these errors were encountered: