Add support for using GGUF tokenizer #345

EricLBuehler · 2024-05-25T15:01:10Z

This adds support for using a GGUF tokenizer as documented here:
https://github.com/ggerganov/ggml/blob/master/docs/gguf.md#tokenizer

llama
- Unigram
replit
- Unigram
gpt
- BPE
rwkv
- RWKV

github-actions · 2024-05-25T15:02:11Z

Code Metrics Report

  ===============================================================================
 Language            Files        Lines         Code     Comments       Blanks
===============================================================================
 Dockerfile              1           34           25            0            9
 Happy                   1          442          369            0           73
 JSON                    5            9            9            0            0
 Python                 21          741          622           21           98
 TOML                   15          388          351            1           36
-------------------------------------------------------------------------------
 Jupyter Notebooks       1            0            0            0            0
 |- Markdown             1           60           30           22            8
 |- Python               1           96           87            1            8
 (Total)                            156          117           23           16
-------------------------------------------------------------------------------
 Markdown               15         1028            0          761          267
 |- BASH                 6          205          192            0           13
 |- Python               6          121          110            0           11
 |- Rust                 3          185          172            9            4
 (Total)                           1539          474          770          295
-------------------------------------------------------------------------------
 Rust                   84        27992        25630          365         1997
 |- Markdown            41          426            0          414           12
 (Total)                          28418        25630          779         2009
===============================================================================
 Total                 144        30634        27006         1148         2480
===============================================================================

Jeadie · 2024-05-28T05:39:52Z

What remains on this PR? Need GGUF tokenizer support so happy to contribute.

EricLBuehler · 2024-05-28T09:12:30Z

What remains on this PR? Need GGUF tokenizer support so happy to contribute.

Currently, it doesn't work. In this PR I tried to convert the GGUF tokenizer to a HF tokenizer for easy integration with the rest of mistral.rs, but I ran into some problems with how the decoder/post processor/normalizer parts of the HF tokenizer are being set up. Additionally, it looks like the Mistral GGUF doesn't contain any merges, but the HF tokenizer itself does. I'm not sure if there are sensible defaults or ways to calculate those values from the token types that I can use.

So, the current state of this PR is that it is half working. If you could perhaps take a look and see if you can get it to work, that would be amazing!

Jeadie · 2024-05-28T10:08:56Z

What example GGUF are you using for mistral? I don't see any reference to Mistral in ggerganov/ggml

EricLBuehler · 2024-05-28T10:49:05Z

I'm using this one: https://huggingface.co/TheBloke/Mistral-7B-Instruct-v0.1-GGUF/tree/main

Mistral uses a llama tokenizer, which ggerganov/ggml says should be a SentencePiece tokenizer. In this PR I use a BPE tokenizer, however, because the HF tokenizer is BPE (maybe that is the problem). Perhaps we can use this crate?

EricLBuehler · 2024-05-28T12:39:54Z

@Jeadie, I made some progress! It mostly works now, and I think there is just one small bug left. With this PR you can run models fully locally, specifying paths for the chat template and GGUF file:

./mistralrs-server --chat-template <chat_template> gguf -m . -f Phi-3-mini-128k-instruct-q4_K_M.gguf

EricLBuehler added 2 commits May 25, 2024 10:56

Add automatic conversion from gguf to hf tokenizer

1c2f5d7

Add info messages

b3ac5c8

EricLBuehler added 2 commits May 25, 2024 11:12

Add decoder to tokenizer

36c46cc

More progress, its horrifying

be2fca1

EricLBuehler mentioned this pull request May 26, 2024

Running model from a GGUF file, only #326

Open

Merge branch 'master' into gguf_to_hf_tokenizer

6408483

EricLBuehler added 2 commits May 28, 2024 05:18

Merge branch 'master' into gguf_to_hf_tokenizer

c576226

Merge

ba44cca

EricLBuehler added 8 commits May 28, 2024 07:17

Use unigram tokenizer for llama

b276c16

Logging

1e31df7

Implement for llama and replit

dd5a855

Better logging

d68522c

Nicer logging

d366d2a

Update for verbose mode

3d416a7

Allow fully local loading for gguf

19cf028

Update docs for loading

d883123

EricLBuehler added 6 commits May 28, 2024 12:46

Fix extension checking

bf308d4

Add some tests

ec4ccb9

Update test

e0551d3

Update docs

6c832d1

Update readme

30055ff

Update readme

c374297

EricLBuehler merged commit 34275f4 into master May 28, 2024
11 checks passed

EricLBuehler deleted the gguf_to_hf_tokenizer branch May 28, 2024 19:05

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add support for using GGUF tokenizer #345

Add support for using GGUF tokenizer #345

EricLBuehler commented May 25, 2024 •

edited

Loading

github-actions bot commented May 25, 2024 •

edited

Loading

Jeadie commented May 28, 2024

EricLBuehler commented May 28, 2024

Jeadie commented May 28, 2024

EricLBuehler commented May 28, 2024

EricLBuehler commented May 28, 2024

Add support for using GGUF tokenizer #345

Add support for using GGUF tokenizer #345

Conversation

EricLBuehler commented May 25, 2024 • edited Loading

github-actions bot commented May 25, 2024 • edited Loading

Jeadie commented May 28, 2024

EricLBuehler commented May 28, 2024

Jeadie commented May 28, 2024

EricLBuehler commented May 28, 2024

EricLBuehler commented May 28, 2024

EricLBuehler commented May 25, 2024 •

edited

Loading

github-actions bot commented May 25, 2024 •

edited

Loading