This repository has been archived by the owner on Dec 16, 2022. It is now read-only.
-
Notifications
You must be signed in to change notification settings - Fork 2.2k
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Making Token class a "slots" class (#4312)
* ensure linting and typechecking ran on all code * make Token a __slots__ class * add benchmarks * update CHANGELOG * fix test with custom token subclass * Update allennlp/data/tokenizers/token.py Co-authored-by: Matt Gardner <mattg@allenai.org> Co-authored-by: Matt Gardner <mattg@allenai.org>
- Loading branch information
1 parent
32bccfb
commit 11a08ae
Showing
12 changed files
with
115 additions
and
21 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Empty file.
Empty file.
Empty file.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,16 @@ | ||
from allennlp.data.tokenizers import CharacterTokenizer | ||
|
||
|
||
tokenizer = CharacterTokenizer() | ||
passage = ( | ||
"Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor " | ||
"incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis " | ||
"nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. " | ||
"Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu " | ||
"fugiat nulla pariatur. Excepteur sint occaecat cupidatat non proident, sunt in " | ||
"culpa qui officia deserunt mollit anim id est laborum." | ||
) | ||
|
||
|
||
def bench_character_tokenizer(benchmark): | ||
benchmark(tokenizer.tokenize, passage) |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,8 @@ | ||
# We use pytest to run benchmarks, which is weird, but so far the best benchmarking | ||
# framework we've found is only available as a pytest plugin. | ||
# That said, we like to organize our benchmarks seperately and with different naming | ||
# conventions from our tests, which requires using a seperate pytest configuration. | ||
[pytest] | ||
python_files = *_bench.py | ||
python_functions = bench_* *_bench | ||
python_classes = |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters