Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Monotonic-WOE-Binning-Algorithm #7

Open
Leo-Lee15 opened this issue Sep 15, 2018 · 22 comments
Open

Monotonic-WOE-Binning-Algorithm #7

Leo-Lee15 opened this issue Sep 15, 2018 · 22 comments

Comments

@Leo-Lee15
Copy link

Hello,

I just discover a Github repo, jstephenj14/Monotonic-WOE-Binning-Algorithm, which provides a Python implementation of a variable binning algorithm that optimizes information value (IV) monotonicity and representativeness.

I think it would be great to include this algorithm is your fantastic package scorecard. Since the author provides the Python version, I wonder if it could be incorporated into you scorecard R package.

Thanks!

@ShichenXie
Copy link
Owner

Thank you for your suggestion. I will read the repo and the referenced article. If it is reasonable, I will add it into the package. This might take some time.

According to my experience, some variables wouldn't be monotonic after woe binning. For example, the default rate at different hours in a day, always peak at midnight and afternoon.

@Leo-Lee15
Copy link
Author

Yes, it is too difficult to get a monotonic result for some variables. But at least, this algorithm provides a way to achieve the desired results less troublesome.

Anyway, thanks for your effort to this nice package!

@monicamn
Copy link

I use 'woebin' to bin the variables with my data. The error 'you are trying to merge an object and float64 columns. If you wish to proceed you should use pd.concat' appeared. I compared my variable type with yours, there existed 'object' type in your data too. But why using your data are there no error and my data error? Do you have any suggestions for me?

@ShichenXie
Copy link
Owner

I use 'woebin' to bin the variables with my data. The error 'you are trying to merge an object and float64 columns. If you wish to proceed you should use pd.concat' appeared. I compared my variable type with yours, there existed 'object' type in your data too. But why using your data are there no error and my data error? Do you have any suggestions for me?

You are using python version package? Please open an issue in scorecardpy repo and provide a reproducible example.

@monicamn
Copy link

monicamn commented Nov 27, 2018

Thank you for your answer and i use python 3.7 to run the code. The error is as follows:

Traceback (most recent call last):
File "C:\ProgramData\Anaconda3\lib\site-packages\IPython\core\interactiveshell.py", line 2961, in run_code
exec(code_obj, self.user_global_ns, self.user_ns)
File "", line 4, in
positive="bad|1", no_cores=None, print_step=1, method="tree")
File "C:\ProgramData\Anaconda3\lib\site-packages\scorecardpy-0.1.7.1-py3.7.egg\scorecardpy\woebin.py", line 877, in woebin
bins = dict(zip(xs, pool.starmap(woebin2, args)))
File "C:\ProgramData\Anaconda3\lib\multiprocessing\pool.py", line 276, in starmap
return self._map_async(func, iterable, starmapstar, chunksize).get()
File "C:\ProgramData\Anaconda3\lib\multiprocessing\pool.py", line 657, in get
raise self._value
ValueError: You are trying to merge on object and float64 columns. If you wish to proceed you should use pd.concat

@ShichenXie
Copy link
Owner

你到scorecardpy那个项目新建一个issue吧。然后给一个可重现的例子,不然我没法知道你碰到了啥问题。

@6yuan789
Copy link

hi,shichenxie,
在woebin.py中binning_tree变量没有初始化,有时会报错,加上“binning_tree = None”可以解决问题

default

@ShichenXie
Copy link
Owner

我看看,这个问题

@wgx711
Copy link

wgx711 commented Mar 5, 2019

Dear ShichenXie
我在运行woebin函数时弹出错误,提示没有"data.table"函数,但我后面library(data.table)后还是如此提示,不知道什么原因。如下图:
image

@ShichenXie
Copy link
Owner

我在运行woebin函数时弹出错误,提示没有"data.table"函数,但我后面library(data.table)后还是如此提示,不知道什么原因。如下图:

重启一下R,再试试看。如果你在windows环境下,确认是否安装了rtools。

@wgx711
Copy link

wgx711 commented Mar 5, 2019

我在运行woebin函数时弹出错误,提示没有 “data.table” 函数,但我后面库(data.table)后还是如此提示,不知道什么原因如下图:

重启一下R,再试试看。如果你在视窗环境下,确认是否安装了rtools。

我想弱弱的问下,rtools是什么意思。
我确实是win10 64位环境,安装了64位的r和64位的rstuido,r是3.5.2版本。
我在woebin的帮助文件中,按照帮助文件,运行 bins2_tree = woebin(germancredit, y="creditability",x=c("credit.amount","housing"), method="tree")能正确运行,但运行bins_germ = woebin(germancredit, y = "creditability") 就会提示...没有"data.table"这个函数...

@ShichenXie
Copy link
Owner

ShichenXie commented Mar 5, 2019

我在运行woebin函数时弹出错误,提示没有 “data.table” 函数,但我后面库(data.table)后还是如此提示,不知道什么原因如下图:

重启一下R,再试试看。如果你在视窗环境下,确认是否安装了rtools。

我想弱弱的问下,rtools是什么意思。
我确实是win10 64位环境,安装了64位的r和64位的rstuido,r是3.5.2版本。
我在woebin的帮助文件中,按照帮助文件,运行 bins2_tree = woebin(germancredit, y="creditability",x=c("credit.amount","housing"), method="tree")能正确运行,但运行bins_germ = woebin(germancredit, y = "creditability") 就会提示...没有"data.table"这个函数...

你看看CRAN网站上的Download R for Windows,里面第四个就是

@wgx711
Copy link

wgx711 commented Mar 5, 2019

我在运行woebin函数时弹出错误,提示没有 “data.table” 函数,但我后面库(data.table)后还是如此提示,不知道什么原因如下图:

重启一下R,再试试看。如果你在视窗环境下,确认是否安装了rtools。

我想弱弱的问下,rtools是什么意思。
我确实是win10 64位环境,安装了64位的r和64位的rstuido,r是3.5.2版本。
我在woebin的帮助文件中,按照帮助文件,运行 bins2_tree = woebin(germancredit, y="creditability",x=c("credit.amount","housing"), method="tree")能正确运行,但运行bins_germ = woebin(germancredit, y = "creditability") 就会提示...没有"data.table"这个函数...

你看看CRAN网站上的Download R for Windows,里面第四个就是

谢谢。我去了解下,但目前的情况是bins2_tree = woebin(germancredit, y="creditability",x=c("credit.amount","housing"), method="tree")一直到bins_width = woebin(germancredit, y="creditability", x=numeric_cols, method="width") 都能正常运行,就是bins_germ = woebin(germancredit, y = "creditability") 运行弹出那个错误。。。

你建议我先卸载scorecard这个包,然后再从github装最新的吗?

@wgx711
Copy link

wgx711 commented Mar 5, 2019

我在运行woebin函数时弹出错误,提示没有 “data.table” 函数,但我后面库(data.table)后还是如此提示,不知道什么原因如下图:

重启一下R,再试试看。如果你在视窗环境下,确认是否安装了rtools。

我想弱弱的问下,rtools是什么意思。
我确实是win10 64位环境,安装了64位的r和64位的rstuido,r是3.5.2版本。
我在woebin的帮助文件中,按照帮助文件,运行 bins2_tree = woebin(germancredit, y="creditability",x=c("credit.amount","housing"), method="tree")能正确运行,但运行bins_germ = woebin(germancredit, y = "creditability") 就会提示...没有"data.table"这个函数...

你看看CRAN网站上的Download R for Windows,里面第四个就是

我在另一个电脑上,R的版本是3.4.2.可以正常运行所有的函数。。。也不晓得是什么原因

@ShichenXie
Copy link
Owner

我在另一个电脑上,R的版本是3.4.2.可以正常运行所有的函数。。。也不晓得是什么原因

  1. 如果你没安装过rtools,那就是这个原因,安装下就解决了
  2. 如果R是从3.5之前升级到目前的3.5.2,那么需要重新安装所有包
    如果还没解决,我也没办法了

@ShichenXie
Copy link
Owner

后面的朋友别在这个issue里面提问题了啊。有问题重新开一个new issue。这个issue是因为一直还没解决所以没有关闭。

@ddzr
Copy link

ddzr commented Jul 19, 2019

This Github Repo by Wensui Liu also has some MonotonicBinning implementations in R.

@longhua8800w
Copy link

我也希望scorecard包加入单调分箱的功能作为 分bin的选项

@shlid007
Copy link

If WOEBIN doesn't return monotonic bins, does that compromise the interpretability of the WOE/IV values? Is it up the user to rebin?

@Blanket58
Copy link

单调分箱的功能什么时候能加入啊?

@ShichenXie
Copy link
Owner

If WOEBIN doesn't return monotonic bins, does that compromise the interpretability of the WOE/IV values? Is it up the user to rebin?

The woebin_adj function provides an interface to adjust the binning results manually.

@ShichenXie
Copy link
Owner

单调分箱的功能什么时候能加入啊?

等回头我再研究研究啊

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

9 participants