Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Column ID Disappears #16

Closed
islander22 opened this issue Jan 29, 2019 · 6 comments
Closed

Column ID Disappears #16

islander22 opened this issue Jan 29, 2019 · 6 comments

Comments

@islander22
Copy link

islander22 commented Jan 29, 2019

In our model data, we have ID column.
(ex: PERSON_ID) (the main distinct ID that we need the scores of)

The package disappears the ID_COLUMN,
How can we identify it in the code? How does the code know which column is ID?
(it rejects that column at the beginning assuming that it is a feature)

In the code I could not see anywhere to clarify the ID column.
(it disappears the ID after the code :
dt_sel = var_filter(germancredit, "creditability")

So it causes a problem that, we do not know which score belongs to which PERSON_ID
(it just gives rows and scores...)

I hope my question is clear :)

May be that final scorecard code should include the column ID :
(or var filter code may include a column ıd like :
var_filter(germancredit, "creditability","person_id")

credit score, only_total_score = FALSE
score_list2 = lapply(dt_list, function(x) scorecard_ply(x,card, only_total_score=FALSE))

Thanks for that great work!

@ShichenXie
Copy link
Owner

I have update some arguments to keep or skip columns in the modeling process, see the codes in below. You can update to the latest version package from GitHub via devtools:install_github('shichenxie/scorecard').

library(scorecard)
library(data.table)

# data ------
data("germancredit")
dat = setDT(germancredit)[,rowid := .I] # add a rowid column
dt_f = var_filter(dat, y="creditability", var_kp = 'rowid') 
dt_list = split_df(dt_f, y="creditability", ratio = 0.6, seed = 30)
label_list = lapply(dt_list, function(x) x$creditability)

# woe binning ------
bins = woebin(dt_f, y="creditability", var_skip = 'rowid')
dt_woe_list = lapply(dt_list, function(x) woebin_ply(x, bins))

# glm ------
m1 = glm( creditability ~ ., family = binomial(), data = dt_woe_list$train)
m_step = step(m1, direction="both", trace = FALSE)
m2 = eval(m_step$call)

# score ------
card = scorecard(bins, m2)
score_list = lapply(dt_list, function(x) scorecard_ply(x, card, var_kp = 'rowid'))

perf_psi(score = score_list, label = label_list, var_skip = 'rowid')

@islander22
Copy link
Author

islander22 commented Jan 29, 2019

It could be easier to flag our own ID (ex. person_id in my data) from our data,
rather than creating a new ROW_ID column I think.
(bec. so at the end we will have to join it with our original ID s anyway... also in the middle steps ;
to keep the ROW_ID s we have to export the unseperated (train,test) version and then join it with the final table... )

@islander22
Copy link
Author

bins = woebin(dt_f, y="creditability", var_skip = 'rowid')

also there is problem about this step,

Error in checkForRemoteErrors(val) :
75 nodes produced errors; first error: Error in data.table(y = dt[[y]], variable = x_i, value = dt[[x_i]]) :
"data.table" not found
In addition: Warning message:
In e$fun(obj, substitute(ex), parent.frame(), e$data) :
already exporting variable(s): dt, xs, y, breaks_list, special_values, init_count_distr, count_distr_limit, stop_limit, bin_num_limit, method

@ShichenXie
Copy link
Owner

The issue should be solved. Try to restart your R core and run the code again.

@islander22
Copy link
Author

the below part of code not work : gives the error below :

m1 = glm( creditability ~ ., family = binomial(), data = dt_woe_list$train)

Error in terms.formula(formula, data = data) :
duplicated name 'NA' in data frame using '.'

@ShichenXie
Copy link
Owner

It works well in my local environment. Make sure you have installed the 0.2.3 version package, which has been upload to CRAN today and can be installed via install.package('scorecard').

screen shot 2019-02-11 at 12 23 34 am

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants