-
Notifications
You must be signed in to change notification settings - Fork 1.9k
Address the design review feedback #7513
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR addresses design review feedback for the ML.NET tokenizers library by improving API consistency and usability. The changes focus on standardizing parameter types, adding constructor overloads, and improving property visibility.
- Replaced tuple-based vocabulary parameters with
KeyValuePair<string, int>
for better API consistency - Added a new
BpeOptions
constructor that accepts file paths for vocabulary and merges files - Updated parameter naming and property visibility for better API design
Reviewed Changes
Copilot reviewed 5 out of 5 changed files in this pull request and generated 2 comments.
Show a summary per file
File | Description |
---|---|
test/Microsoft.ML.Tokenizers.Tests/BpeTests.cs | Updated test code to use new KeyValuePair vocabulary format and added tests for new constructor |
src/Microsoft.ML.Tokenizers/PreTokenizer/CompositePreTokenizer.cs | Added missing namespace declaration |
src/Microsoft.ML.Tokenizers/Model/SentencePieceTokenizer.cs | Renamed parameter from addBeginOfSentence to addBeginningOfSentence for consistency |
src/Microsoft.ML.Tokenizers/Model/BpeOptions.cs | Added new constructor accepting file paths and changed vocabulary type from tuples to KeyValuePair |
src/Microsoft.ML.Tokenizers/Model/BPETokenizer.cs | Updated to work with new vocabulary format and made BeginningOfSentenceToken property public |
Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just a couple small questions, but code changes look fine.
Co-authored-by: Eric StJohn <ericstj@microsoft.com>
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #7513 +/- ##
==========================================
- Coverage 69.01% 69.00% -0.01%
==========================================
Files 1482 1482
Lines 273999 274048 +49
Branches 28258 28266 +8
==========================================
+ Hits 189093 189113 +20
- Misses 77520 77540 +20
- Partials 7386 7395 +9
Flags with carried forward coverage won't be shown. Click here to find out more.
🚀 New features to boost your workflow:
|
Fixes #7512