33B and 65B weights? #94

trevtravtrev · 2023-03-21T04:09:47Z

What would it take to use 33B and 65B weights?

Also, 7B seems to work better than 13B right now.

tjthejuggler · 2023-03-21T08:47:35Z

https://huggingface.co/Pi3141/alpaca-30B-ggml

EfogDev · 2023-03-21T08:56:56Z

https://huggingface.co/Pi3141/alpaca-30B-ggml

Is it gonna work just with ./chat -m ggml-model-q4_0.bin or do I need anything else? Thanks!

sowa705 · 2023-03-21T14:57:18Z

Doesn't seem to work.

llama_model_load: loading model from 'ggml-model-q4_0.bin' - please wait ...
llama_model_load: ggml ctx size = 25631.50 MB
llama_model_load: memory_size =  6240.00 MB, n_mem = 122880
llama_model_load: loading model part 1/4 from 'ggml-model-q4_0.bin'
llama_model_load: llama_model_load: tensor 'tok_embeddings.weight' has wrong size in model file
main: failed to load model from 'ggml-model-q4_0.bin'

1octopus1 · 2023-03-21T17:42:34Z

=( Help

joaops · 2023-03-21T18:05:01Z

To work with the 30B model, it is necessary to change lines 34 and 35 of the main.cpp file to the value 1. Originally, the 30B file was divided into 4 parts, just as the 13B file was divided into 2 parts.

// determine number of model parts based on the dimension
static const std::map<int, int> LLAMA_N_PARTS = {
    { 4096, 1 },
    { 5120, 1 },
    { 6656, 4 },
    { 8192, 8 },
};

Change it to:

// determine number of model parts based on the dimension
static const std::map<int, int> LLAMA_N_PARTS = {
    { 4096, 1 },
    { 5120, 1 },
    { 6656, 1 },
    { 8192, 1 },
};

After that, just recompile and run it again.

Credits to the user ItsPi3141, who gave the answer here: Issues 83

Green-Sky · 2023-03-21T18:35:07Z

You dont need to touch any code for this.

./main -h gives you the following.

  --n_parts N           number of model parts (default: -1 = determine from dimensions)

edit: Actually i was assuming llama.cpp (not this fork)

trevtravtrev · 2023-03-21T18:37:34Z

You dont need to touch any code for this.

./main -h gives you the following.
  --n_parts N           number of model parts (default: -1 = determine from dimensions)

What value do you put?

Green-Sky · 2023-03-21T18:46:50Z

Actually i am assuming llama.cpp (not this fork)

What value do you put?

If it is a single model file, 1

thatblend · 2023-03-21T19:13:29Z

Doesn't seem to work.

llama_model_load: loading model from 'ggml-model-q4_0.bin' - please wait ...
llama_model_load: ggml ctx size = 25631.50 MB
llama_model_load: memory_size =  6240.00 MB, n_mem = 122880
llama_model_load: loading model part 1/4 from 'ggml-model-q4_0.bin'
llama_model_load: llama_model_load: tensor 'tok_embeddings.weight' has wrong size in model file
main: failed to load model from 'ggml-model-q4_0.bin'

Getting the same issue :(

trevtravtrev · 2023-03-21T19:30:48Z

Actually i am assuming llama.cpp (not this fork)

What value do you put?

If it is a single model file, 1

Are you saying this method is valid for llama.cpp but not alpaca.cpp?

Green-Sky · 2023-03-21T20:05:54Z

Are you saying this method is valid for llama.cpp but not alpaca.cpp?

yea.

@antimatter15 are there any things left in your fork that did not get upstreamed yet?

trevtravtrev · 2023-03-21T20:37:09Z

To work with the 30B model, it is necessary to change lines 34 and 35 of the main.cpp file to the value 1. Originally, the 30B file was divided into 4 parts, just as the 13B file was divided into 2 parts.
// determine number of model parts based on the dimension
static const std::map<int, int> LLAMA_N_PARTS = {
    { 4096, 1 },
    { 5120, 1 },
    { 6656, 4 },
    { 8192, 8 },
};
Change it to:
// determine number of model parts based on the dimension
static const std::map<int, int> LLAMA_N_PARTS = {
    { 4096, 1 },
    { 5120, 1 },
    { 6656, 1 },
    { 8192, 1 },
};
After that, just recompile and run it again.

Credits to the user ItsPi3141, who gave the answer here: Issues 83

Has anyone gotten this 30B model working with the method above yet? If so, how does it compare to the current 7B and 13B weights?

I haven't had a chance to check the implications of this hotfix above in the source code. Is this a change we could push to main and add support for these larger models?

(I will be testing this method when I get home later)

trevtravtrev · 2023-03-21T20:48:15Z

static const std::map<int, int> LLAMA_N_PARTS = {
    { 4096, 1 },
    { 5120, 1 },
    { 6656, 1 },
    { 8192, 1 },
};

I do believe the author is referring to the chat.cpp file, not main.cpp.

trevtravtrev · 2023-03-21T21:01:08Z

I've had a chance to implement this method using the 30B weight and test. It works! Upon initial testing, this model seems to be very impressive. While I don't have a baseline to test against, I suspect it is performing better than the 7B and 13B models currently supported.
This model is very memory and CPU intensive, and requires a beefy PC/server to run. It is using roughly 65% of my CPU and 77% of my Memory. It also writes output 2-3x slower than the 7B weight if I had to guess.

My specs are:
CPU: 12th Gen Intel(R) Core(TM) i7-12700KF 3.61 GHz
RAM: 32.0 GB

I don't see any reason why the hotfix above to run the 30B weight, as well as adding documentation to the README should not be pushed to main?

@antimatter15 if I forked, implemented this feature (support for 30B weight) including readme documentation, and submitted a PR would you accept?

trevtravtrev · 2023-03-21T21:49:20Z

The code snippet to add support for 30B weight has already been merged in #104 .

I've just submitted a PR #108 to add support/instructions to the README on how to get the 30B weight running.

@antimatter15 would be very helpful if you would accept this PR #108. A lot of people would love this :)

MasMedIm · 2023-03-22T10:16:07Z

Doesn't seem to work.

llama_model_load: loading model from 'ggml-model-q4_0.bin' - please wait ...
llama_model_load: ggml ctx size = 25631.50 MB
llama_model_load: memory_size =  6240.00 MB, n_mem = 122880
llama_model_load: loading model part 1/4 from 'ggml-model-q4_0.bin'
llama_model_load: llama_model_load: tensor 'tok_embeddings.weight' has wrong size in model file
main: failed to load model from 'ggml-model-q4_0.bin'

Getting the same issue :(

For this issue try to recompile chat script with : make chat. For me it's working.

trevtravtrev mentioned this issue Mar 21, 2023

Added support instructions for 30B weight. #108

Merged

trevtravtrev closed this as completed Mar 21, 2023

mastr-ch13f mentioned this issue Mar 25, 2023

how to generate "ggml-alpaca-7b-q4.bin" with LLaMa original "consolidated.00.pth"? #157

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

33B and 65B weights? #94

33B and 65B weights? #94

trevtravtrev commented Mar 21, 2023

tjthejuggler commented Mar 21, 2023

EfogDev commented Mar 21, 2023

sowa705 commented Mar 21, 2023

1octopus1 commented Mar 21, 2023 •

edited

Loading

joaops commented Mar 21, 2023

Green-Sky commented Mar 21, 2023 •

edited

Loading

trevtravtrev commented Mar 21, 2023

Green-Sky commented Mar 21, 2023

thatblend commented Mar 21, 2023

trevtravtrev commented Mar 21, 2023

Green-Sky commented Mar 21, 2023

trevtravtrev commented Mar 21, 2023

trevtravtrev commented Mar 21, 2023

trevtravtrev commented Mar 21, 2023 •

edited

Loading

trevtravtrev commented Mar 21, 2023

MasMedIm commented Mar 22, 2023

33B and 65B weights? #94

33B and 65B weights? #94

Comments

trevtravtrev commented Mar 21, 2023

tjthejuggler commented Mar 21, 2023

EfogDev commented Mar 21, 2023

sowa705 commented Mar 21, 2023

1octopus1 commented Mar 21, 2023 • edited Loading

joaops commented Mar 21, 2023

Green-Sky commented Mar 21, 2023 • edited Loading

trevtravtrev commented Mar 21, 2023

Green-Sky commented Mar 21, 2023

thatblend commented Mar 21, 2023

trevtravtrev commented Mar 21, 2023

Green-Sky commented Mar 21, 2023

trevtravtrev commented Mar 21, 2023

trevtravtrev commented Mar 21, 2023

trevtravtrev commented Mar 21, 2023 • edited Loading

trevtravtrev commented Mar 21, 2023

MasMedIm commented Mar 22, 2023

1octopus1 commented Mar 21, 2023 •

edited

Loading

Green-Sky commented Mar 21, 2023 •

edited

Loading

trevtravtrev commented Mar 21, 2023 •

edited

Loading