Major llama.cpp API Change #185

martindevans · 2023-09-29T00:22:12Z

llama.cpp ggerganov/llama.cpp#3228 (comment) just got merged with major changes to the API. See #184 with some notes on this.

This PR has basic changes to match the underlying API changes. Things to do:

LLamaContextParams and LLamaModelParams have been split into 2, we should split IModelParams in the same way
Fix cloning contexts (removed support)
Fix OutOfContext test
Add binaries for all platforms
- ~~Built by this run based on this commit~~
- Build by this run based on this commit
- Test on all platforms
  - Windows (Martin)
  - Linux
  - MacOS
    - @SignalRT 🙏
- Update binaries and fix tokenizer tests

- params contains a list of loras, instead of just one

martindevans · 2023-09-30T15:34:25Z

@saddam213 I've made some changes which have broken some of the LLama.Web stuff at the moment, it looks like it's all places which construct a single context by loading an entirely new set of model weights (probably legacy code?). I can probably throw in some workarounds if you want, or do you want to do something more complete and push it into this PR?

saddam213 · 2023-10-03T20:17:11Z

@saddam213 I've made some changes which have broken some of the LLama.Web stuff at the moment, it looks like it's all places which construct a single context by loading an entirely new set of model weights (probably legacy code?). I can probably throw in some workarounds if you want, or do you want to do something more complete and push it into this PR?

No probs man, I'll get it done tonight, wont take long to get it up to date with your changes :)

The web project has fallen behind a bit, I need to get it up to date, got lost in the world of stable diffusion the last few weeks, lol

martindevans · 2023-10-03T20:22:22Z

@saddam213 I actually just pushed up some workarounds to at least get it building, but I left some todos in place.

martindevans · 2023-10-03T20:22:58Z

The web project has fallen behind a bit, I need to get it up to date, got lost in the world of stable diffusion the last few weeks, lol

haha I know how that is. I bounce back and forth between LLMs and stable diffusion all the time 😆

saddam213 · 2023-10-03T20:28:06Z

wondering if we should merge the Web and WebApi into one project, there is a lot of code that will need to be shared to get them both up to date, and we can just use an appsetting config to decide what it strats up, API or WEB

Or add a shared project, but that seems overkill for examples

I cant even port the LLamaStack api or web cause I went and make custom executors like an idiot :/

saddam213 · 2023-10-04T00:03:26Z

Ok, so the changes to get the web up to date with version 0.5.1 are more than I thought.
(have done a untested prelim change on my master branch) Commits

I think its too big to mash into this PR, so I will create its own PR then we can merge that in, then we can merge this PR's changes overtop

That work for you?

martindevans · 2023-10-04T00:40:48Z

That work for you?

That's fine by me, there are workarounds in place in this PR so that it at least builds and we can get your PR in right after this one is done.

I'll try to get this one finished and merged soon so we can get on with that :)

- Replaced all binaries

martindevans · 2023-10-14T15:52:06Z

The only thing blocking this PR is proper testing on MacOS.

SignalRT · 2023-10-16T15:33:23Z

I will be at home tomorrow. This week I could check this PR

SignalRT · 2023-10-17T19:31:43Z

@martindevans

At least with today llama.cpp binaries It fail in two test. All related with the tokenizer.

This could be related with the latest changes in the tokenizer: ggerganov/llama.cpp#3538
, but I cannot test it in Windows today

These are the errors:

Xunit.Sdk.EqualException
Assert.Equal() Failure: Collections differ
↓ (pos 1)
Expected: [1, 450, 4996, 17354, 1701, ···]
Actual: [1, 1576, 4996, 17354, 1701, ···]
↑ (pos 1)
at LLama.Unittest.LLamaContextTests.Tokenize() in ...LLamaContextTests.cs:line 40

And

Xunit.Sdk.EqualException
Assert.Equal() Failure: Collections differ
↓ (pos 0)
Expected: [450, 4996, 17354, 1701, 29916]
Actual: [1576, 4996, 17354, 1701, 29916]
↑ (pos 0)
at LLama.Unittest.LLamaContextTests.TokenizeWithoutBOS() in ../LLamaContextTests.cs:line 48

martindevans · 2023-10-17T20:13:13Z

Thanks for testing that, I'll retrigger the github action to recompile all of the binaries and fix the tokenizer tests.

Did you test with the included binaries at all to confirm that everything else works?

SignalRT · 2023-10-17T21:22:31Z

I didn’t tested it yet with the included binaries, I used my daily build binaries. Tomorrow I will test the binaries generated by the github build and I will make some test with additional projects and some Semantic Kernel examples.

staviq · 2023-10-17T22:28:54Z

This could be related with the latest changes in the tokenizer: ggerganov/llama.cpp#3538 , but I cannot test it in Windows today

Hi, that PR only applies if you explicitly pass "special=true" to llama_tokenize, otherwise tokenizer behavior remains unchanged.

Using "special=true" is meant only for "internal" use, and should not be exposed to the end user ( application user )

Enabling special token processing causes the following:

No space prefix is added to the input string. This is important when prompt format expects word instead of word, for example when system prompt expects system instead of system which are different tokens.
Special ( control ) tokens, such as <s>, <|im_start|> etc. are explicitly matched first and directly converted to appropriate token id.
Remaining string segments, between previously matched special tokens are tokenized normally.
Results are merged into one chain of token ids.

This addition to the API is meant for processing things like prompt template, and allows including bos/eos directly in the input string.

User text should be tokenized normally, with "special=false".

Clarification on "special" tokens:

Model vocabulary naturally has a tree like structure, every normal token can be represented by two other shorter tokens, down to single character.

Tokenizer picks neighboring characters from input string, finds a token that represents that substring, and repeats merging neighbors until no valid longer token can be matched.

Special tokens break that rule, and are chosen to never be "reachable" by merging normal tokens, preventing the user from inserting control tokens in the prompt.

Previously, special tokens were simply tokenized as plaintext ( <s> would become something like <, s, > ), and various prompt formats weren't exactly working with llama.cpp.

This change, allows use of special ( control tokens ), directly in the input string, removing the need to fiddle with querying vocabulary manually to make prompt templates work.

- Fixed stateless executor out-of-context handling - Fixed token tests

martindevans · 2023-10-17T22:55:39Z

@SignalRT I've added new binaries and fixed a few small things (e.g. I think out-of-context handling should be about 100x faster now).

@staviq thanks very much for the extra info! I think the problem in this case was because SignalRT was using newer binaries than this PR was built for I hadn't added that new parameter yet on the C# side! I'm not quite sure what happens in this case (the call was still successful, so I guess whatever junk was on the stack gets used). Once I added the parameter and passed special=false the results went back to the usual values.

…ganov/llama.cpp#3538)

SignalRT · 2023-10-18T19:41:38Z

@martindevans, all my test are working. I made some changes in the SK packages but I see that the master is right now on SK 1.0 beta so it makes no sense to change it in this branch

martindevans · 2023-10-18T19:50:22Z

Thanks for testing that!

AsakusaRinne · 2023-10-19T18:30:16Z

It's a great work! Is it a good time to publish a new release?

AsakusaRinne · 2023-10-19T18:31:17Z

And maybe I should add an auto-release to ci, I'll do it this weekend.

martindevans · 2023-10-19T19:11:32Z

It's a great work! Is it a good time to publish a new release?

I think so yeah, this has made a few breaking changes to the API but nothing massive. Good time to push out a new version 👍

AsakusaRinne · 2023-10-21T10:48:33Z

I think so yeah, this has made a few breaking changes to the API but nothing massive. Good time to push out a new version 👍

I'm making the auto-release ci now and I think there's something confusing me. There are two supports for MAC, CPU and Metal. However only one libllama.dylib is included in LLama/runtimes, is it both used for CPU and Metal?

martindevans · 2023-10-21T12:28:10Z

@AsakusaRinne It was removed in this PR: #163

From the discussion there it looks like one binary supports CPU and Metal inference (but I don't have a Mac so I can't confirm that).

SignalRT · 2023-10-21T16:32:23Z

@AsakusaRinne, one binary supports CPU and GPU.

AsakusaRinne · 2023-10-22T09:52:30Z

@martindevans @SignalRT Thank you for your help! I've opened a PR #204 of the auto-release ci. 😊

martindevans added 7 commits September 29, 2023 01:18

Initial changes to match the llama.cpp changes

bca55ea

Added some more native methods

ce1fc51

safe handle for batch

9e958e8

Added Decode method to SafeLLamaContextHandle

b306ac2

Fixed out-of-context handling in stateless executor

0d40338

Removed cloning support

9a0a0ae

- Split parameters into two interfaces

669ae47

- params contains a list of loras, instead of just one

Added workaround to LLama.Web and LLama.WebAPI

0f03e8f

martindevans mentioned this pull request Oct 4, 2023

Align llamacpp with the latest changes #181

Closed

martindevans added 2 commits October 12, 2023 15:38

- Split extension methods into separate files

4e9b1f8

- Added threads to context params, replaced all thread args with uint?

2a38808

- Replaced all binaries

martindevans marked this pull request as ready for review October 14, 2023 17:59

- Added new binaries

efb0664

- Fixed stateless executor out-of-context handling - Fixed token tests

Added in the special parameter to the tokenizer (introduced in gger…

1f8c94e

…ganov/llama.cpp#3538)

martindevans merged commit d8434ea into SciSharp:master Oct 18, 2023
4 checks passed

martindevans deleted the wip_major_api_change branch October 18, 2023 19:50

martindevans mentioned this pull request Oct 26, 2023

New Binaries & Improved Sampling API #223

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Major llama.cpp API Change #185

Major llama.cpp API Change #185

martindevans commented Sep 29, 2023 •

edited

Loading

martindevans commented Sep 30, 2023

saddam213 commented Oct 3, 2023 •

edited

Loading

martindevans commented Oct 3, 2023

martindevans commented Oct 3, 2023

saddam213 commented Oct 3, 2023 •

edited

Loading

saddam213 commented Oct 4, 2023

martindevans commented Oct 4, 2023

martindevans commented Oct 14, 2023

SignalRT commented Oct 16, 2023

SignalRT commented Oct 17, 2023

martindevans commented Oct 17, 2023

SignalRT commented Oct 17, 2023

staviq commented Oct 17, 2023

martindevans commented Oct 17, 2023

SignalRT commented Oct 18, 2023 •

edited

Loading

martindevans commented Oct 18, 2023

AsakusaRinne commented Oct 19, 2023

AsakusaRinne commented Oct 19, 2023

martindevans commented Oct 19, 2023

AsakusaRinne commented Oct 21, 2023

martindevans commented Oct 21, 2023

SignalRT commented Oct 21, 2023

AsakusaRinne commented Oct 22, 2023

Major llama.cpp API Change #185

Major llama.cpp API Change #185

Conversation

martindevans commented Sep 29, 2023 • edited Loading

martindevans commented Sep 30, 2023

saddam213 commented Oct 3, 2023 • edited Loading

martindevans commented Oct 3, 2023

martindevans commented Oct 3, 2023

saddam213 commented Oct 3, 2023 • edited Loading

saddam213 commented Oct 4, 2023

martindevans commented Oct 4, 2023

martindevans commented Oct 14, 2023

SignalRT commented Oct 16, 2023

SignalRT commented Oct 17, 2023

martindevans commented Oct 17, 2023

SignalRT commented Oct 17, 2023

staviq commented Oct 17, 2023

martindevans commented Oct 17, 2023

SignalRT commented Oct 18, 2023 • edited Loading

martindevans commented Oct 18, 2023

AsakusaRinne commented Oct 19, 2023

AsakusaRinne commented Oct 19, 2023

martindevans commented Oct 19, 2023

AsakusaRinne commented Oct 21, 2023

martindevans commented Oct 21, 2023

SignalRT commented Oct 21, 2023

AsakusaRinne commented Oct 22, 2023

martindevans commented Sep 29, 2023 •

edited

Loading

saddam213 commented Oct 3, 2023 •

edited

Loading

saddam213 commented Oct 3, 2023 •

edited

Loading

SignalRT commented Oct 18, 2023 •

edited

Loading