Multiple improvements: #16

marcuswestin · 2023-03-13T11:10:50Z

Make dalai llama script idempotent:
1: Don't re-download files that have already been downloaded
2: Don't re-create models that have already been created
Allow for specifying which model to run in the UI
- show an error message if the chosen model hasn't been downloaded and created yet
- and show instructions for how to do download and create it if it doesn't exist (dalai llama <MODEL>)
Better log statements during download, model creation, and request serving
Create yarn scripts yarn dalai:llama <optional MODEL> and yarn start
Put the dalai llama directory in cwd by default, and allow for specifying where to put it with the LLAMA_DIR environment variable
Remove express url encoded warning message on server startup by making the extended option explicit (see https://github.com/expressjs/body-parser#extended for more info)
Allow for the server to optionally end with a "\n\n" message when the response is finished
Add ./dalai script in dalai repo to allow for easier local development testing of the CLI
Exit with success code 0 when dalai llama succeeds

…x yarn scripts

…te PR merges); but also add .prettierrc and yarn script to auto-format the entire repo to encourage the decision to enable this ability

…as not been downloaded and processed yet then return an error message with instructions for how to

…ying where to put it with "LLAMA_DIR" environment variable

… does

See https://github.com/expressjs/body-parser#extended for more info

fermigas · 2023-03-13T14:31:53Z

I've confirmed that download and quantization fixes work. The UI does, too.
I'm running 65B!

marcuswestin · 2023-03-13T15:12:23Z

Thanks @fermigas

This patch should be a meaningful improvement for everyone that is playing with this I think

@cocktailpeanut would love a code review and merge, or pointers for what to update :)

fermigas

I manually made the changes I reviewed below and tested them.
Downloads and quantization code changes work fine. Approved.
All 4 models work with the new UI changes. Approved.

I did not apply the changes in the files I haven't marked "viewed".

fermigas · 2023-03-13T15:30:18Z

bin/web/index.js

@@ -9,14 +9,14 @@ const start = (port) => {
  dalai.http(httpServer)
  app.use(express.static(path.resolve(__dirname, 'public')))
  app.use(express.json());
-  app.use(express.urlencoded());
+  app.use(express.urlencoded({ extended: true }));


Good. Fixes bug where server emits this line on stating:

body-parser deprecated undefined extended: provide extended option .npm/_npx/3c737cbb02d79cc9/node_modules/dalai/bin/web/index.js:12:19

fermigas · 2023-03-13T15:35:30Z

bin/web/views/index.ejs

+  <select id="model">
+    <!-- options: 7B, 13B, 30B, 65B -->
+    <option value="7B">7B</option>
+    <option value="13B">13B</option>
+    <option value="30B">30B</option>
+    <option value="65B">65B</option>
+  </select>


Really useful, and works fine.
Might consider adding a text input box for # of threads, and temperature too. I'm finding the model pretty sensitive to temperature settings.

fermigas · 2023-03-13T15:39:20Z

index.js

+        console.log(`Skip file download, it already exists: ${file}`)
+        continue;
+      }
+


Very useful. The 6th parameter file of the 65B download kept failing for me. I had to pull it from a torrent to get it to work, then had to do the quantization manually. This and the quantization file check below improve error recover a lot.

fermigas · 2023-03-13T15:41:14Z

index.js

@@ -117,12 +132,19 @@ class Dalai {
    }
  }
  async query(req, cb) {
+    console.log(`> query:`, req)


Very useful. You might consider logging the model's output to the console, too.

fermigas · 2023-03-13T15:42:23Z

index.js

+      const outputFile1 = `./models/${model}/ggml-model-f16.bin${suffix}`
+      const outputFile2 = `./models/${model}/ggml-model-q4_0.bin${suffix}`
+      if (fs.existsSync(path.resolve(this.home, outputFile1)) && fs.existsSync(path.resolve(this.home, outputFile2))) {
+        console.log(`Skip quantization, files already exists: ${outputFile1} and ${outputFile2}}`)
+        continue
+      }
+      await this.exec(`./quantize ${outputFile1} ${outputFile2} 2`, this.home)


As mentioned above, this is a big improvement.

alcalawil

LGTM

marcuswestin · 2023-03-13T20:07:21Z

👍 🙏

marcuswestin added 15 commits March 13, 2023 04:28

Use yarn and add scripts

e8ca492

Better log statement on server run

590be3e

Remove unused line of code with syntax error

d138a8f

Add ./dalai script which runs the cli script for test running, and fi…

926303a

…x yarn scripts

Add .prettierignore file to avoid auto-formatting files (and complica…

e62611f

…te PR merges); but also add .prettierrc and yarn script to auto-format the entire repo to encourage the decision to enable this ability

Dont download files if they already exist

83093ac

Skip conversion and quantizing if they have already been done

f65e03e

Exit with success exit code success

17f9bb1

Allow for specifying which model to run in the UI, and if the model h…

2732d36

…as not been downloaded and processed yet then return an error message with instructions for how to

Log every query that comes in on the server

731d422

Allow for the server to optionally end a response with "\n\n<end>"

a701a66

Log every execution statement during download and process

f5467f7

Allow for "yarn just:run <MODEL>"

2fa081d

Put the llama/dalai directory in CWD by default, but allow for specif…

15c7713

…ying where to put it with "LLAMA_DIR" environment variable

Rename "yarn make-model" to "yarn dalai:llama", since that it what it…

7f763e5

… does

This was referenced Mar 13, 2023

skip downloading model weights #1

Closed

Autocomplete not working due to Socket.io issues #3

Closed

Remove warning on server startup

54e2457

See https://github.com/expressjs/body-parser#extended for more info

marcuswestin mentioned this pull request Mar 13, 2023

Update quantize command to properly suffix input file #9

Merged

fermigas reviewed Mar 13, 2023

View reviewed changes

alcalawil approved these changes Mar 13, 2023

View reviewed changes

This was referenced Mar 13, 2023

npx dalai serve not working, no error output (Ubuntu 18.04, dalai@0.0.13) #7

Open

npx dalai llama: FileNotFoundError: [Errno 2] No such file or directory: 'models/7B//consolidated.00.pth' #13

Open

cocktailpeanut merged commit 9dcc96d into cocktailpeanut:main Mar 13, 2023

ekp1k80 mentioned this pull request Mar 13, 2023

main: failed to quantize model from './models/7B/ggml-model-f16.bin' #18

Open

nsudhanva mentioned this pull request Mar 13, 2023

Fix: Issue #1 - skip downloading model weights #5

Closed

fermigas mentioned this pull request Mar 13, 2023

How to serve another model than 7B #28

Open

mirroredkube pushed a commit to mirroredkube/dalai that referenced this pull request Mar 26, 2023

Fix a typo in model name (cocktailpeanut#16)

6b2cb63

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Multiple improvements: #16

Multiple improvements: #16

marcuswestin commented Mar 13, 2023 •

edited

fermigas commented Mar 13, 2023

marcuswestin commented Mar 13, 2023

fermigas left a comment

fermigas Mar 13, 2023

fermigas Mar 13, 2023

fermigas Mar 13, 2023

fermigas Mar 13, 2023

fermigas Mar 13, 2023

alcalawil left a comment

marcuswestin commented Mar 13, 2023

Multiple improvements: #16

Multiple improvements: #16

Conversation

marcuswestin commented Mar 13, 2023 • edited

fermigas commented Mar 13, 2023

marcuswestin commented Mar 13, 2023

fermigas left a comment

Choose a reason for hiding this comment

fermigas Mar 13, 2023

Choose a reason for hiding this comment

fermigas Mar 13, 2023

Choose a reason for hiding this comment

fermigas Mar 13, 2023

Choose a reason for hiding this comment

fermigas Mar 13, 2023

Choose a reason for hiding this comment

fermigas Mar 13, 2023

Choose a reason for hiding this comment

alcalawil left a comment

Choose a reason for hiding this comment

marcuswestin commented Mar 13, 2023

marcuswestin commented Mar 13, 2023 •

edited