Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tokenization benchmark miscalculate word-level metrics #268

Closed
p16i opened this issue Sep 6, 2019 · 5 comments
Closed

Tokenization benchmark miscalculate word-level metrics #268

p16i opened this issue Sep 6, 2019 · 5 comments
Assignees
Labels
benchmark benchmarking tools in the libarary good first issue help wanted no contributor yet
Milestone

Comments

@p16i
Copy link
Contributor

p16i commented Sep 6, 2019

Reported by @korakot, as described in the benchmark document, the correct counting at word-level should be ✗✗✓✗, not ✓✗✓✗.

@p16i p16i added benchmark benchmarking tools in the libarary good first issue help wanted no contributor yet labels Sep 6, 2019
@p16i
Copy link
Contributor Author

p16i commented Sep 6, 2019

It turns out that the code is correct, and I've added a test case for this issue. The PR is about refactoring some part of the code that is redundant.

After merging, please also rebuild the document as I've updated the tokenization benchmark figure.

@wannaphong
Copy link
Member

@heytitle It's automatically build/deploy documentation after merge PR.

@p16i
Copy link
Contributor Author

p16i commented Sep 7, 2019

@wannaphongcom great I didn't know that. could you please review it?

@p16i p16i self-assigned this Sep 7, 2019
@wannaphong
Copy link
Member

Now, I merge this PR.

@bact
Copy link
Member

bact commented Sep 9, 2019

"Fixed" with #269

@bact bact closed this as completed Sep 9, 2019
@bact bact added this to the 2.1 milestone Oct 5, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
benchmark benchmarking tools in the libarary good first issue help wanted no contributor yet
Projects
None yet
Development

No branches or pull requests

3 participants