[Feature] Basic utils to handle raw data features #2102

classicsong · 2020-08-25T01:47:40Z

Description

We provide a list of APIs in data.utils to handle raw data into numpy arrays:

parse_word2vec_node_feature
parse_category_single_feat
parse_category_multi_feat
parse_numerical_feat
parse_numerical_multihot_feat

#2088

Checklist

Please feel free to remove inapplicable items for your PR.

The PR title starts with [$CATEGORY] (such as [NN], [Model], [Doc], [Feature]])
Changes are complete (i.e. I finished coding on this PR)
All changes have test coverage
Code is well-documented
To the my best knowledge, examples are either not affected by this change,
or have been fixed to be compatible with this change
Related issue is referred in this PR

Changes

VoVAllen

The phrase of parse is a bit confusing. Maybe a submodule called transform, and all function called encode_XXX would better?

python/dgl/data/utils.py

VoVAllen · 2020-08-25T07:10:14Z

python/dgl/data/utils.py

+    else:
+        return feat
+
+def parse_category_multi_feat(category_inputs, norm=None):


Would user need the corresponding labels of the multi-hot vector (i.e. The first elements is for label 'A')?

python/dgl/data/utils.py

VoVAllen · 2020-08-25T07:16:30Z

python/dgl/data/utils.py

+    manager = Manager()
+    d = manager.dict()
+    job=[]
+    for i in range(num_process):


You can use multiprocessing pool(pool.map) here.

python/dgl/data/utils.py

VoVAllen · 2020-08-25T07:31:02Z

And for converting feature into one-hot/multi-hot, if the category information is needed, I think it might be better to design it as a class, just like what sklearn do.

…o graph-loader

VoVAllen

LGTM

…o graph-loader

This reverts commit 33a8bb9.

…2147) This reverts commit 33a8bb9. Co-authored-by: Minjie Wang <wmjlyjemaine@gmail.com>

dmlc#2147) This reverts commit 33a8bb9. Co-authored-by: Minjie Wang <wmjlyjemaine@gmail.com>

Ubuntu and others added 5 commits August 23, 2020 11:07

add feature utils and add test for feature norm

28b3864

Add docstring and test

e2eee12

upd

873ba90

dis able some test

d66cb8f

Merge branch 'master' into graph-loader

a8f2005

VoVAllen reviewed Aug 25, 2020

View reviewed changes

Ubuntu added 2 commits August 25, 2020 16:14

Update

3023d25

Merge branch 'graph-loader' of https://github.com/classicsong/dgl int…

98afa32

…o graph-loader

classicsong requested a review from VoVAllen August 26, 2020 02:13

classicsong added 3 commits August 26, 2020 10:18

Merge branch 'master' into graph-loader

51ca663

Merge branch 'master' into graph-loader

19670db

Merge branch 'master' into graph-loader

f690fa9

VoVAllen approved these changes Aug 26, 2020

View reviewed changes

Ubuntu and others added 5 commits August 26, 2020 14:54

update doc string

1101ddf

Merge branch 'graph-loader' of https://github.com/classicsong/dgl int…

59e6d16

…o graph-loader

Merge branch 'master' into graph-loader

bd1523a

update

fda13f0

Merge branch 'graph-loader' of https://github.com/classicsong/dgl int…

578b702

…o graph-loader

classicsong merged commit 33a8bb9 into dmlc:master Aug 27, 2020

classicsong deleted the graph-loader branch August 27, 2020 03:36

BarclayII mentioned this pull request Aug 29, 2020

[Patch Release] 0.5.1 #2125

Closed

classicsong added a commit that referenced this pull request Sep 3, 2020

Revert "[Feature] Basic utils to handle raw data features (#2102)"

2156238

This reverts commit 33a8bb9.

classicsong mentioned this pull request Sep 3, 2020

Revert "[Feature] Basic utils to handle raw data features" #2147

Merged

jermainewang added a commit that referenced this pull request Sep 8, 2020

Revert "[Feature] Basic utils to handle raw data features (#2102)" (#…

c9c6171

…2147) This reverts commit 33a8bb9. Co-authored-by: Minjie Wang <wmjlyjemaine@gmail.com>

kingmbc pushed a commit to kingmbc/dgl that referenced this pull request Sep 10, 2020

Revert "[Feature] Basic utils to handle raw data features (dmlc#2102)" (

19517fa

dmlc#2147) This reverts commit 33a8bb9. Co-authored-by: Minjie Wang <wmjlyjemaine@gmail.com>

zhjwy9343 pushed a commit to zhjwy9343/dgl that referenced this pull request Sep 17, 2020

Revert "[Feature] Basic utils to handle raw data features (dmlc#2102)" (

81a5e73

dmlc#2147) This reverts commit 33a8bb9. Co-authored-by: Minjie Wang <wmjlyjemaine@gmail.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Feature] Basic utils to handle raw data features #2102

[Feature] Basic utils to handle raw data features #2102

classicsong commented Aug 25, 2020 •

edited

VoVAllen left a comment

VoVAllen Aug 25, 2020

VoVAllen Aug 25, 2020

VoVAllen commented Aug 25, 2020

VoVAllen left a comment

[Feature] Basic utils to handle raw data features #2102

[Feature] Basic utils to handle raw data features #2102

Conversation

classicsong commented Aug 25, 2020 • edited

Description

Checklist

Changes

VoVAllen left a comment

Choose a reason for hiding this comment

VoVAllen Aug 25, 2020

Choose a reason for hiding this comment

VoVAllen Aug 25, 2020

Choose a reason for hiding this comment

VoVAllen commented Aug 25, 2020

VoVAllen left a comment

Choose a reason for hiding this comment

classicsong commented Aug 25, 2020 •

edited