Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update quantize command to properly suffix input file #9

Conversation

rizenfrmtheashes
Copy link
Contributor

I believe the original code keeps on quantizing the first file, not all files for models that have more than one input quantize file. This update should fix that.

@fermigas
Copy link

This is one of the reasons that only the 7B model runs.

Another reason may be line 124 in dalai/index.js:

model: ./models/${req.model || "7B"}/ggml-model-q4_0.bin

It doesn't seem to be aware that the larger models have more than one q4 file.

Also, in the form (index.ejs, 57-60), pressing autocomplete results in this request:

socket.emit('request', {
prompt: input.textContent,
n_predict: 256
})

which does not specify which model, so defaults to 7B.   

@rizenfrmtheashes
Copy link
Contributor Author

rizenfrmtheashes commented Mar 13, 2023

Another reason may be line 124 in dalai/index.js:

model: ./models/${req.model || "7B"}/ggml-model-q4_0.bin

It doesn't seem to be aware that the larger models have more than one q4 file.

Actually this is fine. the underlying llama.cpp package only requires referencing the first model weights file, even if you have multiple, so this should be okay. This is based on my manual testing of the underlying package used, not an e2e test.

As for your latter comment, yeah, it has no way of using anything other than the 7B model when using the quickstart setup. I ended up manually modifying source to solve this problem. I'm not confident enough in my JS skills to make changes across the repo to have that be a selectable option in the webui, and propagating that down as a request to the server, then underlying cpp package call.

@cocktailpeanut cocktailpeanut merged commit 8093d76 into cocktailpeanut:main Mar 13, 2023
@marcuswestin
Copy link
Contributor

marcuswestin commented Mar 13, 2023

@rizenfrmtheashes I implemented that web ui in #16 - would love a code review if you're up for it.

And I just verified that the requested model gets propagated all the way down to the model execution call

@fermigas
Copy link

@marcuswestin a lot of great usability fixes/adds in those commits! LGTM

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants