Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

one_hot_encoder in Test #35

Closed
dotRData opened this issue Jan 11, 2018 · 5 comments
Closed

one_hot_encoder in Test #35

dotRData opened this issue Jan 11, 2018 · 5 comments
Assignees

Comments

@dotRData
Copy link

dotRData commented Jan 11, 2018

how do we use one_hot_encoder in test data ?
lets say some new value got added in some column
it will add extra column in test-dataset, which is not a problem,

but let's say some values are missing in the test-data
and it will drop that column in one_hot_encoder
and that might create a problem while scoring

@ELToulemonde
Copy link
Owner

Hi,

That's a good one.

A quick fix: I would recommand using sameShape which allows you to control the oclumns of your test set.

After, I don't know what is the best approach, do you have an example of another package that allows you to have the same columns in train and test.

@dotRData
Copy link
Author

currently I am using this
testData[, setdiff(names(trainData), names(testData)):=0]

I thought you might have some better way.

@ELToulemonde
Copy link
Owner

I guess a future modification would be to perrform one_hot_encoder such as fastScale works for example...

With first a buildEncoding funtion to build encoding parameters that would be applicable using one_hot_encoding either on train and test.

Feature should be developped in next version.

@dotRData
Copy link
Author

Yes, buildEncoding might also take input as min-frequency of the levels present in the features. That way we might have control over the final dimension of the dataset.

@ELToulemonde ELToulemonde self-assigned this Jan 17, 2018
@ELToulemonde
Copy link
Owner

Good idea. I added it. It is implemented in branch v0.3.5 will be merged soon.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants