-
-
Notifications
You must be signed in to change notification settings - Fork 63
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Column ID Disappears #16
Comments
I have update some arguments to keep or skip columns in the modeling process, see the codes in below. You can update to the latest version package from GitHub via library(scorecard)
library(data.table)
# data ------
data("germancredit")
dat = setDT(germancredit)[,rowid := .I] # add a rowid column
dt_f = var_filter(dat, y="creditability", var_kp = 'rowid')
dt_list = split_df(dt_f, y="creditability", ratio = 0.6, seed = 30)
label_list = lapply(dt_list, function(x) x$creditability)
# woe binning ------
bins = woebin(dt_f, y="creditability", var_skip = 'rowid')
dt_woe_list = lapply(dt_list, function(x) woebin_ply(x, bins))
# glm ------
m1 = glm( creditability ~ ., family = binomial(), data = dt_woe_list$train)
m_step = step(m1, direction="both", trace = FALSE)
m2 = eval(m_step$call)
# score ------
card = scorecard(bins, m2)
score_list = lapply(dt_list, function(x) scorecard_ply(x, card, var_kp = 'rowid'))
perf_psi(score = score_list, label = label_list, var_skip = 'rowid') |
It could be easier to flag our own ID (ex. person_id in my data) from our data, |
bins = woebin(dt_f, y="creditability", var_skip = 'rowid') also there is problem about this step, Error in checkForRemoteErrors(val) : |
The issue should be solved. Try to restart your R core and run the code again. |
the below part of code not work : gives the error below : m1 = glm( creditability ~ ., family = binomial(), data = dt_woe_list$train) Error in terms.formula(formula, data = data) : |
In our model data, we have ID column.
(ex: PERSON_ID) (the main distinct ID that we need the scores of)
The package disappears the ID_COLUMN,
How can we identify it in the code? How does the code know which column is ID?
(it rejects that column at the beginning assuming that it is a feature)
In the code I could not see anywhere to clarify the ID column.
(it disappears the ID after the code :
dt_sel = var_filter(germancredit, "creditability")
So it causes a problem that, we do not know which score belongs to which PERSON_ID
(it just gives rows and scores...)
I hope my question is clear :)
May be that final scorecard code should include the column ID :
(or var filter code may include a column ıd like :
var_filter(germancredit, "creditability","person_id")
credit score, only_total_score = FALSE
score_list2 = lapply(dt_list, function(x) scorecard_ply(x,card, only_total_score=FALSE))
Thanks for that great work!
The text was updated successfully, but these errors were encountered: