Column ID Disappears #16

islander22 · 2019-01-29T13:15:40Z

In our model data, we have ID column.
(ex: PERSON_ID) (the main distinct ID that we need the scores of)

The package disappears the ID_COLUMN,
How can we identify it in the code? How does the code know which column is ID?
(it rejects that column at the beginning assuming that it is a feature)

In the code I could not see anywhere to clarify the ID column.
(it disappears the ID after the code :
dt_sel = var_filter(germancredit, "creditability")

So it causes a problem that, we do not know which score belongs to which PERSON_ID
(it just gives rows and scores...)

I hope my question is clear :)

May be that final scorecard code should include the column ID :
(or var filter code may include a column ıd like :
var_filter(germancredit, "creditability","person_id")

credit score, only_total_score = FALSE
score_list2 = lapply(dt_list, function(x) scorecard_ply(x,card, only_total_score=FALSE))

Thanks for that great work!

ShichenXie · 2019-01-29T15:30:46Z

I have update some arguments to keep or skip columns in the modeling process, see the codes in below. You can update to the latest version package from GitHub via devtools:install_github('shichenxie/scorecard').

library(scorecard)
library(data.table)

# data ------
data("germancredit")
dat = setDT(germancredit)[,rowid := .I] # add a rowid column
dt_f = var_filter(dat, y="creditability", var_kp = 'rowid') 
dt_list = split_df(dt_f, y="creditability", ratio = 0.6, seed = 30)
label_list = lapply(dt_list, function(x) x$creditability)

# woe binning ------
bins = woebin(dt_f, y="creditability", var_skip = 'rowid')
dt_woe_list = lapply(dt_list, function(x) woebin_ply(x, bins))

# glm ------
m1 = glm( creditability ~ ., family = binomial(), data = dt_woe_list$train)
m_step = step(m1, direction="both", trace = FALSE)
m2 = eval(m_step$call)

# score ------
card = scorecard(bins, m2)
score_list = lapply(dt_list, function(x) scorecard_ply(x, card, var_kp = 'rowid'))

perf_psi(score = score_list, label = label_list, var_skip = 'rowid')

islander22 · 2019-01-29T16:16:18Z

It could be easier to flag our own ID (ex. person_id in my data) from our data,
rather than creating a new ROW_ID column I think.
(bec. so at the end we will have to join it with our original ID s anyway... also in the middle steps ;
to keep the ROW_ID s we have to export the unseperated (train,test) version and then join it with the final table... )

islander22 · 2019-01-29T16:38:27Z

bins = woebin(dt_f, y="creditability", var_skip = 'rowid')

also there is problem about this step,

Error in checkForRemoteErrors(val) :
75 nodes produced errors; first error: Error in data.table(y = dt[[y]], variable = x_i, value = dt[[x_i]]) :
"data.table" not found
In addition: Warning message:
In e$fun(obj, substitute(ex), parent.frame(), e$data) :
already exporting variable(s): dt, xs, y, breaks_list, special_values, init_count_distr, count_distr_limit, stop_limit, bin_num_limit, method

ShichenXie · 2019-01-29T23:48:21Z

The issue should be solved. Try to restart your R core and run the code again.

islander22 · 2019-02-09T10:40:06Z

the below part of code not work : gives the error below :

m1 = glm( creditability ~ ., family = binomial(), data = dt_woe_list$train)

Error in terms.formula(formula, data = data) :
duplicated name 'NA' in data frame using '.'

ShichenXie · 2019-02-10T16:32:57Z

It works well in my local environment. Make sure you have installed the 0.2.3 version package, which has been upload to CRAN today and can be installed via install.package('scorecard').

ShichenXie closed this as completed Jan 30, 2019

ShichenXie mentioned this issue Feb 11, 2019

Error in Code #17

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Column ID Disappears #16

Column ID Disappears #16

islander22 commented Jan 29, 2019 •

edited

Loading

ShichenXie commented Jan 29, 2019

islander22 commented Jan 29, 2019 •

edited

Loading

islander22 commented Jan 29, 2019

ShichenXie commented Jan 29, 2019

islander22 commented Feb 9, 2019

ShichenXie commented Feb 10, 2019

Column ID Disappears #16

Column ID Disappears #16

Comments

islander22 commented Jan 29, 2019 • edited Loading

ShichenXie commented Jan 29, 2019

islander22 commented Jan 29, 2019 • edited Loading

islander22 commented Jan 29, 2019

ShichenXie commented Jan 29, 2019

islander22 commented Feb 9, 2019

ShichenXie commented Feb 10, 2019

islander22 commented Jan 29, 2019 •

edited

Loading

islander22 commented Jan 29, 2019 •

edited

Loading