Skip to content

Conversation

@tarekgh
Copy link
Member

@tarekgh tarekgh commented Nov 18, 2025

There is an open issue requesting the same support in the official Tiktoken library: openai/tiktoken#464.

Copilot AI review requested due to automatic review settings November 18, 2025 21:51
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR adds support for the gpt-5.1 model in the Tiktoken tokenizer implementation, aligning with an open feature request in the official Tiktoken library.

Key changes:

  • Added gpt-5.1 model mapping to the O200kBase encoding in the tokenizer configuration
  • Extended test coverage to include the new gpt-5.1 model across multiple test scenarios

Reviewed Changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.

File Description
src/Microsoft.ML.Tokenizers/Model/TiktokenTokenizer.cs Added gpt-5.1 model entries to prefix and exact match lookup tables for O200kBase encoding
test/Microsoft.ML.Tokenizers.Tests/TiktokenTests.cs Added GPT5_1 tokenizer instance and included it in encoding tests and test data parameters

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@tarekgh tarekgh added this to the ML.NET Future milestone Nov 18, 2025
@codecov
Copy link

codecov bot commented Nov 18, 2025

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 69.02%. Comparing base (8f9674f) to head (aace8b8).
⚠️ Report is 1 commits behind head on main.

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #7556      +/-   ##
==========================================
- Coverage   69.02%   69.02%   -0.01%     
==========================================
  Files        1482     1482              
  Lines      274093   274096       +3     
  Branches    28266    28266              
==========================================
+ Hits       189183   189184       +1     
- Misses      77527    77528       +1     
- Partials     7383     7384       +1     
Flag Coverage Δ
Debug 69.02% <100.00%> (-0.01%) ⬇️
production 63.30% <100.00%> (-0.01%) ⬇️
test 89.47% <100.00%> (+<0.01%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Files with missing lines Coverage Δ
...Microsoft.ML.Tokenizers/Model/TiktokenTokenizer.cs 79.90% <100.00%> (+0.04%) ⬆️
...est/Microsoft.ML.Tokenizers.Tests/TiktokenTests.cs 99.08% <100.00%> (+<0.01%) ⬆️

... and 5 files with indirect coverage changes

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@ericstj
Copy link
Member

ericstj commented Nov 19, 2025

The macos-13 runners are being retired. I'll update those in a separate PR.

@ericstj ericstj merged commit de4eba2 into dotnet:main Nov 19, 2025
19 of 25 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants