Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Line 530 in utils.py is too slow with huge datasets #71

Closed
andreybondarb opened this issue Dec 15, 2021 · 1 comment
Closed

Line 530 in utils.py is too slow with huge datasets #71

andreybondarb opened this issue Dec 15, 2021 · 1 comment

Comments

@andreybondarb
Copy link

andreybondarb commented Dec 15, 2021

Line 530 in construct_bucket_vb_wc function in utils.py is too slow with huge datasets. It even freezes if dataset is larger than 300k objects.

I propose to change line

forw_corpus = [pad_char_feature] + list(reduce(lambda x, y: x + [pad_char_feature] + y, forw_features)) + [pad_char_feature]

to

forw_corpus = [pad_char_feature]
for forw_feature in forw_features:
   forw_corpus.extend(forw_feature + [pad_char_feature])

Which works considerably faster with no freezes.

LiyuanLucasLiu added a commit that referenced this issue Dec 16, 2021
@LiyuanLucasLiu
Copy link
Owner

thanks and fixed in 4f35e0a
PS: a more up-to-date lib is available at https://github.com/LiyuanLucasLiu/Vanilla_NER

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants