Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RBS signatures #16

Closed
wants to merge 3 commits into from
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
1 change: 1 addition & 0 deletions .github/workflows/build.yml
Expand Up @@ -35,3 +35,4 @@ jobs:
ruby-version: 3.1
bundler-cache: true
- run: bundle exec rake standard
- run: bundle exec rake steep:check
1 change: 1 addition & 0 deletions .gitignore
Expand Up @@ -16,3 +16,4 @@ target/

# rspec failure tracking
.rspec_status
TAGS
2 changes: 2 additions & 0 deletions Gemfile
Expand Up @@ -14,3 +14,5 @@ gem "rspec", "~> 3.0"
gem "standard", "~> 1.3"

gem "yard-doctest", "~> 0.1.17"

gem "steep", "~> 1.5"
53 changes: 53 additions & 0 deletions Gemfile.lock
Expand Up @@ -7,19 +7,49 @@ PATH
GEM
remote: https://rubygems.org/
specs:
activesupport (7.1.1)
base64
bigdecimal
concurrent-ruby (~> 1.0, >= 1.0.2)
connection_pool (>= 2.2.5)
drb
i18n (>= 1.6, < 2)
minitest (>= 5.1)
mutex_m
tzinfo (~> 2.0)
ast (2.4.2)
base64 (0.1.1)
bigdecimal (3.1.4)
concurrent-ruby (1.2.2)
connection_pool (2.4.1)
csv (3.2.7)
diff-lcs (1.5.0)
drb (2.1.1)
ruby2_keywords
ffi (1.16.3)
fileutils (1.7.1)
i18n (1.14.1)
concurrent-ruby (~> 1.0)
json (2.6.3)
language_server-protocol (3.17.0.3)
listen (3.8.0)
rb-fsevent (~> 0.10, >= 0.10.3)
rb-inotify (~> 0.9, >= 0.9.10)
logger (1.5.3)
minitest (5.18.0)
mutex_m (0.1.2)
parallel (1.22.1)
parser (3.2.1.1)
ast (~> 2.4.1)
rainbow (3.1.1)
rake (13.0.6)
rake-compiler (1.2.1)
rake
rb-fsevent (0.11.2)
rb-inotify (0.10.1)
ffi (~> 1.0)
rb_sys (0.9.68)
rbs (3.2.2)
regexp_parser (2.7.0)
rexml (3.2.5)
rspec (3.12.0)
Expand Down Expand Up @@ -51,10 +81,32 @@ GEM
rubocop (>= 1.7.0, < 2.0)
rubocop-ast (>= 0.4.0)
ruby-progressbar (1.13.0)
ruby2_keywords (0.0.5)
securerandom (0.2.2)
standard (1.25.1)
language_server-protocol (~> 3.17.0.2)
rubocop (= 1.48.1)
rubocop-performance (= 1.16.0)
steep (1.5.3)
activesupport (>= 5.1)
concurrent-ruby (>= 1.1.10)
csv (>= 3.0.9)
fileutils (>= 1.1.0)
json (>= 2.1.0)
language_server-protocol (>= 3.15, < 4.0)
listen (~> 3.0)
logger (>= 1.3.0)
parser (>= 3.1)
rainbow (>= 2.2.2, < 4.0)
rbs (>= 3.1.0)
securerandom (>= 0.1)
strscan (>= 1.0.0)
terminal-table (>= 2, < 4)
strscan (3.0.7)
terminal-table (3.0.2)
unicode-display_width (>= 1.1.1, < 3)
tzinfo (2.0.6)
concurrent-ruby (~> 1.0)
unicode-display_width (2.4.2)
webrick (1.7.0)
yard (0.9.28)
Expand All @@ -74,6 +126,7 @@ DEPENDENCIES
rake-compiler
rspec (~> 3.0)
standard (~> 1.3)
steep (~> 1.5)
tiktoken_ruby!
yard-doctest (~> 0.1.17)

Expand Down
8 changes: 7 additions & 1 deletion README.md
Expand Up @@ -2,7 +2,7 @@
# tiktoken_ruby

[Tiktoken](https://github.com/openai/tiktoken) is BPE tokenizer from OpenAI used with their GPT models.
This is a wrapper around it aimed primarily at enabling accurate counts of GPT model tokens used.
This is a wrapper around it aimed primarily at enabling accurate counts of GPT model tokens used.

## Installation

Expand Down Expand Up @@ -33,6 +33,12 @@ enc = Tiktoken.encoding_for_model("gpt-4")
enc.encode("hello world").length #=> 2
```

## RBS support

Tiktoken comes with Ruby type signatures (RBS).

To use them with Steep, add `library "tiktoken_ruby"` to your `Steepfile`.

## Development

After checking out the repo, run `bin/setup` to install dependencies. Then, run `rake spec` to run the tests. You can also run `bin/console` for an interactive prompt that will allow you to experiment.
Expand Down
9 changes: 8 additions & 1 deletion Rakefile
Expand Up @@ -18,6 +18,13 @@ task :native, [:platform] do |_t, platform:|
sh "bundle", "exec", "rb-sys-dock", "--platform", platform, "--build"
end

namespace :steep do
desc "Check RBS signatures"
task "check" do
sh "bundle", "exec", "steep", "check"
end
end

task build: :compile

task default: %i[compile spec standard]
task default: %i[compile spec standard steep:check]
5 changes: 5 additions & 0 deletions Steepfile
@@ -0,0 +1,5 @@
target :lib do
signature "sig"

check "lib"
end
26 changes: 24 additions & 2 deletions sig/tiktoken_ruby.rbs
@@ -1,4 +1,26 @@
module TiktokenRuby
module Tiktoken
VERSION: String
# See the writing guide of rbs: https://github.com/ruby/rbs#guides

def self.get_encoding: (String) -> (Encoding|nil)
def self.encoding_for_model: (String) -> (Encoding|nil)
def self.list_encoding_names: () -> Array[Symbol]
def self.list_model_names: () -> Array[String|Symbol]

SUPPORTED_ENCODINGS: Array[Symbol]
MODEL_TO_ENCODING_NAME: Hash[(String|Symbol), String]
PREFIX_MODELS: Array[String]

class Encoding
def encode: (String) -> Array[String]
def initialize: (BpeFactory, String|Symbol) -> (String|Symbol)
def self.for_name_cached: (String|Symbol) -> Encoding
def self.for_name: (String|Symbol) -> Encoding
end

class BpeFactory
def self.r50k_base: (Array[Integer]) -> String
def self.p50k_base: (Array[Integer]) -> String
def self.p50k_edit: (Array[Integer]) -> String
def self.cl100k_base: (Array[Integer]) -> String
end
end