Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add tests and update docstring to tokenizer matching #1398

Merged
merged 3 commits into from Feb 10, 2023
Merged

Add tests and update docstring to tokenizer matching #1398

merged 3 commits into from Feb 10, 2023

Conversation

jackapbutler
Copy link
Contributor

@jackapbutler jackapbutler commented Feb 9, 2023

This PR does performs some minor updates;

  • Adds tests for the matching logic within get_tokenizer
  • Updates docstring with an example to make the logic clear

@jackapbutler jackapbutler changed the title Draft: Fix tokenizer matching and add tests Fix tokenizer matching and add tests Feb 9, 2023
@@ -37,7 +37,7 @@ class TokenizerConfig(NamedTuple):

def match_tokenizer_name(model_name: str) -> TokenizerConfig:
"""Match a partial model name to a tokenizer configuration"""
tokenizer_config_matches = [config for name, config in TOKENIZER_CONFIGS.items() if name in model_name]
tokenizer_config_matches = [config for name, config in TOKENIZER_CONFIGS.items() if model_name in name]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It was already correct, no? We call match_tokenizer_name based on the name of the model not the keys of TOKENIZER_CONFIGS

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @sanagno, yes you're right actually. I've just updated it back and added some docstrings / tests to make it clearer.

@jackapbutler jackapbutler changed the title Fix tokenizer matching and add tests Add tests and update docstring to tokenizer matching Feb 10, 2023
Copy link
Collaborator

@sanagno sanagno left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great :)!

@sanagno sanagno merged commit 4dd0d67 into LAION-AI:main Feb 10, 2023
@jackapbutler jackapbutler deleted the fix-tokenizer-match branch February 21, 2023 11:10
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants