-
Notifications
You must be signed in to change notification settings - Fork 3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Feature] Basic utils to handle raw data features #2102
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The phrase of parse
is a bit confusing. Maybe a submodule called transform, and all function called encode_XXX
would better?
else: | ||
return feat | ||
|
||
def parse_category_multi_feat(category_inputs, norm=None): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would user need the corresponding labels of the multi-hot vector (i.e. The first elements is for label 'A')?
manager = Manager() | ||
d = manager.dict() | ||
job=[] | ||
for i in range(num_process): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can use multiprocessing pool(pool.map) here.
And for converting feature into one-hot/multi-hot, if the category information is needed, I think it might be better to design it as a class, just like what sklearn do. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
This reverts commit 33a8bb9.
Description
We provide a list of APIs in data.utils to handle raw data into numpy arrays:
#2088
Checklist
Please feel free to remove inapplicable items for your PR.
or have been fixed to be compatible with this change
Changes