Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SentencePiece 0.1.97 changed API to take std::vector<string_view> "everywhere", breaking SetVocabulary here #323

Open
Micket opened this issue Mar 16, 2023 · 2 comments

Comments

@Micket
Copy link

Micket commented Mar 16, 2023

here

void SentencePiece::set_vocabulary(const std::vector<std::string>& vocabulary,
const Tokenizer::Options* options)
{
if (options && (options->joiner_annotate || options->spacer_new))
throw std::invalid_argument("SentencePiece vocabulary restriction requires the tokenization "
"to use \"spacer_annotate\" (same as spm_encode)");
auto status = _processor->SetVocabulary(vocabulary);

either a quick conversion

    auto status = _processor->SetVocabulary(ToPieceArray(vocabulary));

is needed or possible the string_view should switch should be propagated up the abstractions.

ToPieceArray was added in the same release that made this switch to views.
google/sentencepiece@631420b#diff-77e6a3b3bfda73d84fe1fef8205f2a2ec1d46b8f232100041f7135505f8adcefR52

@guillaumekln
Copy link
Collaborator

guillaumekln commented Mar 21, 2023

We still want to support version 0.1.96 (which is included as a Git submodule in the third_party directory) so we should wrap this code with a preprocessor condition #if ... #else ... #endif.

However, I'm not sure what the condition should be. SentencePiece exposes a version string but I don't know how to use it in a preprocessor comparison.

Do you have any suggestions?

@Micket
Copy link
Author

Micket commented Mar 22, 2023

Unfortunately, I just happened to encounter this helping @boegel with how to fix this when packaging Tokenizer for easybuild (where we just patch it for since we know the version used). I wanted to make change known to everyone else.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants