Add grammar-based sampling (for webui, llamacpp, and koboldcpp) #293

cpacker · 2023-11-04T00:45:59Z

Closes #273

Adds koboldcpp and llama.cpp backend support
Adds grammar-based sampling support for koboldcpp, llama.cpp, and webui backends
Adds basic grammar (json.gbnf) and MemGPT-specific grammar (json_func_calls_with_inner_thoughts.gbnf)
- Thank you @Drake-AI!
If you are using a backend that supports grammars, will run airoboros wrapper w/ grammar by default

Adds grammar-based sampling to prevent JSON-related parsing errors when using local LLMs, see llama.cpp.

Example

Starting the server (llama.cpp on MacOS):

llama.cpp % ./server -m ~/models/TheBloke/dolphin-2.2.1-mistral-7B-GGUF/dolphin-2.2.1-mistral-7b.Q6_K.gguf -c 8000

Starting the server (koboldcpp on MacOS):

koboldcpp % ./koboldcpp.py ~/models/TheBloke/dolphin-2.2.1-mistral-7B-GGUF/dolphin-2.2.1-mistral-7b.Q6_K.gguf --contextsize 8192

Specifying a wrapper with a grammar:

python main.py --model airoboros-l2-70b-2.1-grammar --first

Drake-AI · 2023-11-04T13:58:43Z

I'm working on a grammar to avoid errors, it is in progress but it looks like this, includes names of functions, names and types of params:

root ::= Function
Function ::= SendMessage | PauseHeartbeats
SendMessage ::= "{" ws ""function":" ws ""send_message"," ws ""params":" ws SendMessageParams "}"
PauseHeartbeats ::= "{" ws ""function":" ws ""pause_heartbeats"," ws ""params":" ws PauseHeartbeatsParams "}"
SendMessageParams ::= "{}" | "{" ws Message ("," ws Message)* ws "}"
PauseHeartbeatsParams ::= "{}" | "{" ws Minutes ("," ws Minutes)* ws "}"
Message ::= ""inner_thoughts":" ws string | ""message":" ws string | ""request_heartbeat":" ws boolean
Minutes ::= ""minutes":" ws number
string ::= """ ([^"]) """
boolean ::= "true" | "false"
ws ::= [ \t\n]
number ::= [0-9]+ "."? [0-9]*

With this should be possible to use any model without finetuning.

Drake-AI · 2023-11-04T15:32:22Z

It is working now with all the basic functions, easy to expand to extra functions. Feel free to use it or adapt to your needs.

root ::= Function
Function ::= SendMessage | PauseHeartbeats | CoreMemoryAppend | CoreMemoryReplace | ConversationSearch | ConversationSearchDate | ArchivalMemoryInsert | ArchivalMemorySearch
SendMessage ::= "{"   ws   "\"function\":"   ws   "\"send_message\","   ws   "\"params\":"   ws   SendMessageParams   "}"
PauseHeartbeats ::= "{"   ws   "\"function\":"   ws   "\"pause_heartbeats\","   ws   "\"params\":"   ws   PauseHeartbeatsParams   "}"
CoreMemoryAppend ::= "{"   ws   "\"function\":"   ws   "\"core_memory_append\","   ws   "\"params\":"   ws   CoreMemoryAppendParams   "}"
CoreMemoryReplace ::= "{"   ws   "\"function\":"   ws   "\"core_memory_replace\","   ws   "\"params\":"   ws   CoreMemoryReplaceParams   "}"
ConversationSearch  ::= "{"   ws   "\"function\":"   ws   "\"conversation_search\","   ws   "\"params\":"   ws   ConversationSearchParams   "}"
ConversationSearchDate  ::= "{"   ws   "\"function\":"   ws   "\"conversation_search_date\","   ws   "\"params\":"   ws   ConversationSearchDateParams   "}"
ArchivalMemoryInsert  ::= "{"   ws   "\"function\":"   ws   "\"archival_memory_insert\","   ws   "\"params\":"   ws   ArchivalMemoryInsertParams   "}"
ArchivalMemorySearch  ::= "{"   ws   "\"function\":"   ws   "\"archival_memory_search\","   ws   "\"params\":"   ws   ArchivalMemorySearchParams   "}"
SendMessageParams ::= "{"   ws   InnerThoughtsParam   ","   ws   "\"message\":"   ws   string   ws   "}"
PauseHeartbeatsParams ::= "{"   ws   InnerThoughtsParam   ","      ws   "\"minutes\":"   ws   number   ws   "}"
CoreMemoryAppendParams ::= "{"   ws   InnerThoughtsParam   ","      ws   "\"name\":"   ws   namestring   ","   ws   "\"content\":"   ws   string   ws   ","   ws   RequestHeartbeatParam   ws   "}"
CoreMemoryReplaceParams ::= "{"   ws   InnerThoughtsParam   ","      ws   "\"name\":"   ws   namestring   ","   ws   "\"old_content\":"   ws   string   ","   ws   "\"new_content\":"   ws   string   ws   ","   ws   RequestHeartbeatParam   ws   "}"
ConversationSearchParams ::= "{"   ws   InnerThoughtsParam   ","      ws   "\"query\":"   ws   string   ws   ","   ws   "\"page\":"   ws   number   ws   ","   ws   RequestHeartbeatParam   ws   "}"
ConversationSearchDateParams ::= "{"   ws   InnerThoughtsParam   ","      ws   "\"start_date\":"   ws   string   ws   ","      ws   "\"end_date\":"   ws   string   ws   ","   ws   "\"page\":"   ws   number   ws   ","   ws   RequestHeartbeatParam   ws   "}"
ArchivalMemoryInsertParams ::= "{"   ws   InnerThoughtsParam    ","   ws   "\"content\":"   ws   string   ws   ","   ws   RequestHeartbeatParam   ws   "}"
ArchivalMemorySearchParams ::= "{"   ws   InnerThoughtsParam   ","  ws   "\"query\":"   ws   string   ws   ","   ws   "\"page\":"   ws   number   ws   ","   ws   RequestHeartbeatParam   ws   "}"
InnerThoughtsParam ::= "\"inner_thoughts\":"   ws   string
RequestHeartbeatParam ::= "\"request_heartbeat\":"   ws   boolean
namestring ::= "\"human\"" | "\"persona\""
string ::= "\""   ([^"\[\]{}]*)   "\""
boolean ::= "true" | "false"
ws ::= [ \\t\\n]*
number ::= [0-9]+

cpacker · 2023-11-04T17:21:43Z

This is really awesome, thank you @Drake-AI ! I'll merge it into this PR as an additional grammar file.

Out of curiosity @Drake-AI while testing this what backend are you using? llama.cpp? web UI?

Drake-AI · 2023-11-04T17:35:19Z

This is really awesome, thank you @Drake-AI ! I'll merge it into this PR as an additional grammar file.

Out of curiosity @Drake-AI while testing this what backend are you using? llama.cpp? web UI?

koboldcpp which is a fork of llamacpp, should work with both.

cpacker · 2023-11-04T17:36:42Z

This is really awesome, thank you @Drake-AI ! I'll merge it into this PR as an additional grammar file.
Out of curiosity @Drake-AI while testing this what backend are you using? llama.cpp? web UI?

koboldcpp which is a fork of llamacpp, should work with both.

awesome, I guess I should probably also add "official" support for kobold with a catch for BACKEND_TYPE=kobold, though if it's a fork of llama.cpp I wouldn't be surprised if BACKEND_TYPE=llamacpp works for it

Drake-AI · 2023-11-04T17:42:53Z

No, it needs this variables:

"stop_sequence": [
    "\nUSER:",
    "\nASSISTANT:",
    "\nFUNCTION RETURN:",
    # '\n' +
    # '</s>',
    # '<|',
    # '\n#',
    # '\n\n\n',
],
"max_content_length": 4096,
"max_length": 512,

And you can send all parameters like temperature and all of that via api.

Drake-AI · 2023-11-04T17:44:23Z

I found a bug in my grammar, please change it before merge to this new version:

root ::= Function
Function ::= SendMessage | PauseHeartbeats | CoreMemoryAppend | CoreMemoryReplace | ConversationSearch | ConversationSearchDate | ArchivalMemoryInsert | ArchivalMemorySearch
SendMessage ::= "{"   ws   "\"function\":"   ws   "\"send_message\","   ws   "\"params\":"   ws   SendMessageParams   "}"
PauseHeartbeats ::= "{"   ws   "\"function\":"   ws   "\"pause_heartbeats\","   ws   "\"params\":"   ws   PauseHeartbeatsParams   "}"
CoreMemoryAppend ::= "{"   ws   "\"function\":"   ws   "\"core_memory_append\","   ws   "\"params\":"   ws   CoreMemoryAppendParams   "}"
CoreMemoryReplace ::= "{"   ws   "\"function\":"   ws   "\"core_memory_replace\","   ws   "\"params\":"   ws   CoreMemoryReplaceParams   "}"
ConversationSearch  ::= "{"   ws   "\"function\":"   ws   "\"conversation_search\","   ws   "\"params\":"   ws   ConversationSearchParams   "}"
ConversationSearchDate  ::= "{"   ws   "\"function\":"   ws   "\"conversation_search_date\","   ws   "\"params\":"   ws   ConversationSearchDateParams   "}"
ArchivalMemoryInsert  ::= "{"   ws   "\"function\":"   ws   "\"archival_memory_insert\","   ws   "\"params\":"   ws   ArchivalMemoryInsertParams   "}"
ArchivalMemorySearch  ::= "{"   ws   "\"function\":"   ws   "\"archival_memory_search\","   ws   "\"params\":"   ws   ArchivalMemorySearchParams   "}"
SendMessageParams ::= "{"   ws   InnerThoughtsParam   ","   ws   "\"message\":"   ws   string   ws   "}"
PauseHeartbeatsParams ::= "{"   ws   InnerThoughtsParam   ","      ws   "\"minutes\":"   ws   number   ws   "}"
CoreMemoryAppendParams ::= "{"   ws   InnerThoughtsParam   ","      ws   "\"name\":"   ws   namestring   ","   ws   "\"content\":"   ws   string   ws   ","   ws   RequestHeartbeatParam   ws   "}"
CoreMemoryReplaceParams ::= "{"   ws   InnerThoughtsParam   ","      ws   "\"name\":"   ws   namestring   ","   ws   "\"old_content\":"   ws   string   ","   ws   "\"new_content\":"   ws   string   ws   ","   ws   RequestHeartbeatParam   ws   "}"
ConversationSearchParams ::= "{"   ws   InnerThoughtsParam   ","      ws   "\"query\":"   ws   string   ws   ","   ws   "\"page\":"   ws   number   ws   ","   ws   RequestHeartbeatParam   ws   "}"
ConversationSearchDateParams ::= "{"   ws   InnerThoughtsParam   ","      ws   "\"start_date\":"   ws   string   ws   ","      ws   "\"end_date\":"   ws   string   ws   ","   ws   "\"page\":"   ws   number   ws   ","   ws   RequestHeartbeatParam   ws   "}"
ArchivalMemoryInsertParams ::= "{"   ws   InnerThoughtsParam    ","   ws   "\"content\":"   ws   string   ws   ","   ws   RequestHeartbeatParam   ws   "}"
ArchivalMemorySearchParams ::= "{"   ws   InnerThoughtsParam   ","  ws   "\"query\":"   ws   string   ws   ","   ws   "\"page\":"   ws   number   ws   ","   ws   RequestHeartbeatParam   ws   "}"
InnerThoughtsParam ::= "\"inner_thoughts\":"   ws   string
RequestHeartbeatParam ::= "\"request_heartbeat\":"   ws   boolean
namestring ::= "\"human\"" | "\"persona\""
string ::= "\""   ([^"\[\]{}]*)   "\""
boolean ::= "true" | "false"
ws ::= ""
number ::= [0-9]+

cpacker · 2023-11-04T17:56:44Z

I found a bug in my grammar, please change it before merge to this new version:

root ::= Function
Function ::= SendMessage | PauseHeartbeats | CoreMemoryAppend | CoreMemoryReplace | ConversationSearch | ConversationSearchDate | ArchivalMemoryInsert | ArchivalMemorySearch
SendMessage ::= "{"   ws   "\"function\":"   ws   "\"send_message\","   ws   "\"params\":"   ws   SendMessageParams   "}"
PauseHeartbeats ::= "{"   ws   "\"function\":"   ws   "\"pause_heartbeats\","   ws   "\"params\":"   ws   PauseHeartbeatsParams   "}"
CoreMemoryAppend ::= "{"   ws   "\"function\":"   ws   "\"core_memory_append\","   ws   "\"params\":"   ws   CoreMemoryAppendParams   "}"
CoreMemoryReplace ::= "{"   ws   "\"function\":"   ws   "\"core_memory_replace\","   ws   "\"params\":"   ws   CoreMemoryReplaceParams   "}"
ConversationSearch  ::= "{"   ws   "\"function\":"   ws   "\"conversation_search\","   ws   "\"params\":"   ws   ConversationSearchParams   "}"
ConversationSearchDate  ::= "{"   ws   "\"function\":"   ws   "\"conversation_search_date\","   ws   "\"params\":"   ws   ConversationSearchDateParams   "}"
ArchivalMemoryInsert  ::= "{"   ws   "\"function\":"   ws   "\"archival_memory_insert\","   ws   "\"params\":"   ws   ArchivalMemoryInsertParams   "}"
ArchivalMemorySearch  ::= "{"   ws   "\"function\":"   ws   "\"archival_memory_search\","   ws   "\"params\":"   ws   ArchivalMemorySearchParams   "}"
SendMessageParams ::= "{"   ws   InnerThoughtsParam   ","   ws   "\"message\":"   ws   string   ws   "}"
PauseHeartbeatsParams ::= "{"   ws   InnerThoughtsParam   ","      ws   "\"minutes\":"   ws   number   ws   "}"
CoreMemoryAppendParams ::= "{"   ws   InnerThoughtsParam   ","      ws   "\"name\":"   ws   namestring   ","   ws   "\"content\":"   ws   string   ws   ","   ws   RequestHeartbeatParam   ws   "}"
CoreMemoryReplaceParams ::= "{"   ws   InnerThoughtsParam   ","      ws   "\"name\":"   ws   namestring   ","   ws   "\"old_content\":"   ws   string   ","   ws   "\"new_content\":"   ws   string   ws   ","   ws   RequestHeartbeatParam   ws   "}"
ConversationSearchParams ::= "{"   ws   InnerThoughtsParam   ","      ws   "\"query\":"   ws   string   ws   ","   ws   "\"page\":"   ws   number   ws   ","   ws   RequestHeartbeatParam   ws   "}"
ConversationSearchDateParams ::= "{"   ws   InnerThoughtsParam   ","      ws   "\"start_date\":"   ws   string   ws   ","      ws   "\"end_date\":"   ws   string   ws   ","   ws   "\"page\":"   ws   number   ws   ","   ws   RequestHeartbeatParam   ws   "}"
ArchivalMemoryInsertParams ::= "{"   ws   InnerThoughtsParam    ","   ws   "\"content\":"   ws   string   ws   ","   ws   RequestHeartbeatParam   ws   "}"
ArchivalMemorySearchParams ::= "{"   ws   InnerThoughtsParam   ","  ws   "\"query\":"   ws   string   ws   ","   ws   "\"page\":"   ws   number   ws   ","   ws   RequestHeartbeatParam   ws   "}"
InnerThoughtsParam ::= "\"inner_thoughts\":"   ws   string
RequestHeartbeatParam ::= "\"request_heartbeat\":"   ws   boolean
namestring ::= "\"human\"" | "\"persona\""
string ::= "\""   ([^"\[\]{}]*)   "\""
boolean ::= "true" | "false"
ws ::= ""
number ::= [0-9]+

updated, thank you for the patch! also I added kobold support, though i haven't tested it yet

memgpt/local_llm/koboldcpp/settings.py

memgpt/local_llm/koboldcpp/api.py

cpacker · 2023-11-04T18:24:11Z

Tested on kobold.cpp, seems to be working:

MemGPT % python main.py --model airoboros-l2-70b-2.1-grammar --first

cpacker · 2023-11-04T18:27:16Z

Pending extra review from @vivi will merge - @Drake-AI let me know if you catch any bugs in the grammar in the meantime.

memgpt/local_llm/chat_completion_proxy.py

vivi

LGTM 😈

vivi · 2023-11-04T18:58:01Z

Thanks for your help @Drake-AI! Will merge this in and add you as a co-author.

* FIx cpacker#261 (cpacker#300) * should fix issue 261 - pickle fail on DotDict class * black patch --------- Co-authored-by: cpacker <packercharles@gmail.com> * Add grammar-based sampling (for webui, llamacpp, and koboldcpp) (cpacker#293) * add llamacpp server support * use gbnf loader * cleanup and warning about grammar when not using llama.cpp * added memgpt-specific grammar file * add grammar support to webui api calls * black * typo * add koboldcpp support * no more defaulting to webui, should error out instead * fix grammar * patch kobold (testing, now working) + cleanup log messages Co-Authored-By: Drake-AI <drake-ai@users.noreply.github.com> * Bump version to 0.1.18-alpha.1 * fix: import PostgresStorageConnector only if postgres is selected as storage type (cpacker#310) * Don't import postgres storage if not specified in config (cpacker#318) * Aligned code with README that environment variable for Azure embeddings should be AZURE_OPENAI_EMBEDDINGS_DEPLOYMENT (cpacker#308) * Fix: imported wrong storage connector (cpacker#320) * Fix formatting in README.md * Remove embeddings as argument in archival_memory.insert (cpacker#284) * Create docs pages (cpacker#328) * Create docs (cpacker#323) * Create .readthedocs.yaml * Update mkdocs.yml * update * revise * syntax * syntax * syntax * syntax * revise * revise * spacing * Docs (cpacker#327) * add stuff * patch homepage * more docs * updated * updated * refresh * refresh * refresh * update * refresh * refresh * refresh * refresh * missing file * refresh * refresh * refresh * refresh * fix black * refresh * refresh * refresh * refresh * add readme for just the docs * Update README.md * add more data loading docs * cleanup data sources * refresh * revised * add search * make prettier * revised * updated * refresh * favi * updated --------- Co-authored-by: Sarah Wooders <sarahwooders@gmail.com> * patch in-chat command info (cpacker#332) * Update chat_completion_proxy.py (cpacker#326) grammar_name Has to be defined, if not there's an issue with line 92 * cleanup cpacker#326 (cpacker#333) * Stopping the app to repeat the user message in normal use. (cpacker#304) - Removed repeating every user message like bein in debug mode - Re-added the "dump" flag for the user message, to make it look nicer. I may "reformat" other message too when dumping, but that was what sticked out to me as unpleasant. * Remove redundant docs from README (cpacker#334) * Fix README local LLM link * Add autogen+localllm docs (cpacker#335) Co-authored-by: Jirito0 <jirito0@users.noreply.github.com> * Update quickstart.md to show flag list properly * Add `memgpt version` command and package version (cpacker#336) * add ollama support (cpacker#314) * untested * patch * updated * clarified using tags in docs * tested ollama, working * fixed template issue by creating dummy template, also added missing context length indicator * moved count_tokens to utils.py * clean * Better interface output for function calls (cpacker#296) Co-authored-by: Charles Packer <packercharles@gmail.com> * Better error message printing for function call failing (cpacker#291) * Better error message printing for function call failing * only one import traceback * don't forward entire stack trace to memgpt * Fixing some dict value checking for function_call (cpacker#249) * Specify model inference and embedding endpoint separately (cpacker#286) * Fix config tests (cpacker#343) Co-authored-by: Vivian Fang <hi@vivi.sh> * Avoid throwing error for older `~/.memgpt/config` files due to missing section `archival_storage` (cpacker#344) * avoid error if has old config type * Dependency management (cpacker#337) * Divides dependencies into `pip install pymemgpt[legacy,local,postgres,dev]`. * Update docs * Relax verify_first_message_correctness to accept any function call (cpacker#340) * Relax verify_first_message_correctness to accept any function call * Also allow missing internal monologue if request_heartbeat * Cleanup * get instead of raw dict access * Update `poetry.lock` (cpacker#346) * mark depricated API section * add readme * add readme * add readme * add readme * add readme * add readme * add readme * add readme * add readme * CLI bug fixes for azure * check azure before running * Update README.md * Update README.md * bug fix with persona loading * remove print * make errors for cli flags more clear * format * fix imports * fix imports * add prints * update lock * Add autogen example that lets you chat with docs (cpacker#342) * Relax verify_first_message_correctness to accept any function call * Also allow missing internal monologue if request_heartbeat * Cleanup * get instead of raw dict access * Support attach in memgpt autogen agent * Add docs example * Add documentation, cleanup * add gpt-4-turbo (cpacker#349) * add gpt-4-turbo * add in another place * change to 3.5 16k * Revert relaxing verify_first_message_correctness, still add archival_memory_search as an exception (cpacker#350) * Revert "Relax verify_first_message_correctness to accept any function call (cpacker#340)" This reverts commit 30e9110. * add archival_memory_search as an exception for verify * Bump version to 0.1.18 (cpacker#351) * Remove `requirements.txt` and `requirements_local.txt` (cpacker#358) * update requirements to match poetry * update with extras * remove requirements * disable pretty exceptions (cpacker#367) * Updated documentation for users (cpacker#365) --------- Co-authored-by: Vivian Fang <hi@vivi.sh> * Create pull_request_template.md (cpacker#368) * Create pull_request_template.md * Add pymemgpt-nightly workflow (cpacker#373) * Add pymemgpt-nightly workflow * change token name * Update lmstudio.md (cpacker#382) * Update lmstudio.md * Update lmstudio.md * Update lmstudio.md to show the Prompt Formatting Option (cpacker#384) * Update lmstudio.md to show the Prompt Formatting Option * Update lmstudio.md Update the screenshot * Swap asset location from cpacker#384 (cpacker#385) * Update poetry with `pg8000` and include `pgvector` in docs (cpacker#390) * Allow overriding config location with `MEMGPT_CONFIG_PATH` (cpacker#383) * Always default to local embeddings if not OpenAI or Azure (cpacker#387) * Add support for larger archival memory stores (cpacker#359) * Replace `memgpt run` flags error with warning + remove custom embedding endpoint option + add agent create time (cpacker#364) * Update webui.md (cpacker#397) turn emoji warning into markdown warning * Update webui.md (cpacker#398) * softpass test when keys are missing (cpacker#369) * softpass test when keys are missing * update to use local model * both openai and local * typo * fix * Specify model inference and embedding endpoint separately (cpacker#286) * Fix config tests (cpacker#343) Co-authored-by: Vivian Fang <hi@vivi.sh> * Avoid throwing error for older `~/.memgpt/config` files due to missing section `archival_storage` (cpacker#344) * avoid error if has old config type * Dependency management (cpacker#337) * Divides dependencies into `pip install pymemgpt[legacy,local,postgres,dev]`. * Update docs * Relax verify_first_message_correctness to accept any function call (cpacker#340) * Relax verify_first_message_correctness to accept any function call * Also allow missing internal monologue if request_heartbeat * Cleanup * get instead of raw dict access * Update `poetry.lock` (cpacker#346) * mark depricated API section * add readme * add readme * add readme * add readme * add readme * add readme * add readme * add readme * add readme * CLI bug fixes for azure * check azure before running * Update README.md * Update README.md * bug fix with persona loading * remove print * make errors for cli flags more clear * format * fix imports * fix imports * add prints * update lock * Add autogen example that lets you chat with docs (cpacker#342) * Relax verify_first_message_correctness to accept any function call * Also allow missing internal monologue if request_heartbeat * Cleanup * get instead of raw dict access * Support attach in memgpt autogen agent * Add docs example * Add documentation, cleanup * add gpt-4-turbo (cpacker#349) * add gpt-4-turbo * add in another place * change to 3.5 16k * Revert relaxing verify_first_message_correctness, still add archival_memory_search as an exception (cpacker#350) * Revert "Relax verify_first_message_correctness to accept any function call (cpacker#340)" This reverts commit 30e9110. * add archival_memory_search as an exception for verify * Bump version to 0.1.18 (cpacker#351) * Remove `requirements.txt` and `requirements_local.txt` (cpacker#358) * update requirements to match poetry * update with extras * remove requirements * disable pretty exceptions (cpacker#367) * Updated documentation for users (cpacker#365) --------- Co-authored-by: Vivian Fang <hi@vivi.sh> * Create pull_request_template.md (cpacker#368) * Create pull_request_template.md * Add pymemgpt-nightly workflow (cpacker#373) * Add pymemgpt-nightly workflow * change token name * Update lmstudio.md (cpacker#382) * Update lmstudio.md * Update lmstudio.md * Update lmstudio.md to show the Prompt Formatting Option (cpacker#384) * Update lmstudio.md to show the Prompt Formatting Option * Update lmstudio.md Update the screenshot * Swap asset location from cpacker#384 (cpacker#385) * Update poetry with `pg8000` and include `pgvector` in docs (cpacker#390) * Allow overriding config location with `MEMGPT_CONFIG_PATH` (cpacker#383) * Always default to local embeddings if not OpenAI or Azure (cpacker#387) * Add support for larger archival memory stores (cpacker#359) * Replace `memgpt run` flags error with warning + remove custom embedding endpoint option + add agent create time (cpacker#364) * Update webui.md (cpacker#397) turn emoji warning into markdown warning * Update webui.md (cpacker#398) * dont hard code embeddings * formatting * black * add full deps * remove changes * update poetry --------- Co-authored-by: Sarah Wooders <sarahwooders@gmail.com> Co-authored-by: Vivian Fang <hi@vivi.sh> Co-authored-by: MSZ-MGS <65172063+MSZ-MGS@users.noreply.github.com> * Use `~/.memgpt/config` to set questionary defaults in `memgpt configure` + update tests to use specific config path (cpacker#389) * Dockerfile for running postgres locally (cpacker#393) * Return empty list if archival memory search over empty local index (cpacker#402) * Remove AsyncAgent and async from cli (cpacker#400) * Remove AsyncAgent and async from cli Refactor agent.py memory.py Refactor interface.py Refactor main.py Refactor openai_tools.py Refactor cli/cli.py stray asyncs save make legacy embeddings not use async Refactor presets Remove deleted function from import * remove stray prints * typo * another stray print * patch test --------- Co-authored-by: cpacker <packercharles@gmail.com> * I added some json repairs that helped me with malformed messages (cpacker#341) * I added some json repairs that helped me with malformed messages There are two of them: The first will remove hard line feeds that appear in the message part because the model added those instead of escaped line feeds. This happens a lot in my experiments and that actually fixes them. The second one is less tested and should handle the case that the model answers with multiple blocks of strings in quotes or even uses unescaped quotes. It should grab everything betwenn the message: " and the ending curly braces, escape them and makes it propper json that way. Disclaimer: Both function were written with the help of ChatGPT-4 (I can't write much Python). I think the first one is quite solid but doubt that the second one is fully working. Maybe somebody with more Python skills than me (or with more time) has a better idea for this type of malformed replies. * Moved the repair output behind the debug flag and removed the "clean" one * Added even more fixes (out of what I just encountered while testing) It seems that cut of json can be corrected and sometimes the model is to lazy to add not just one curly brace but two. I think it does not "cost" a lot to try them all out. But the expeptions get massive that way :) * black * for the final hail mary with extract_first_json, might as well add a double end bracket instead of single --------- Co-authored-by: cpacker <packercharles@gmail.com> * Fix max tokens constant (cpacker#374) * stripped LLM_MAX_TOKENS constant, instead it's a dictionary, and context_window is set via the config (defaults to 8k) * pass context window in the calls to local llm APIs * safety check * remove dead imports * context_length -> context_window * add default for agent.load * in configure, ask for the model context window if not specified via dictionary * fix default, also make message about OPENAI_API_BASE missing more informative * make openai default embedding if openai is default llm * make openai on top of list * typo * also make local the default for embeddings if you're using localllm instead of the locallm endpoint * provide --context_window flag to memgpt run * fix runtime error * stray comments * stray comment * [version] bump version to 0.2.0 (cpacker#410) * Fix main.yml to not rely on requirements.txt (cpacker#411) * Hotfix openai create all with context_window kwarg (cpacker#413) * fix agent load (cpacker#412) * Patch local LLMs with context_window (cpacker#416) * patch * patch ollama * patch lmstudio * patch kobold * Fix model configuration for when `config.model == "local"` previously (cpacker#415) * fix agent load * fix model config * add errors to make sure envs set correctly (cpacker#418) * [version] bump version to 0.2.1 (cpacker#417) * fix memgptagent attach docs error (cpacker#427) Co-authored-by: Anjalee Sudasinghe <anjalee@codegen.net> * [fix] remove asserts for `OPENAI_API_BASE` (cpacker#432) * mark depricated API section * add readme * add readme * add readme * add readme * add readme * add readme * add readme * add readme * add readme * CLI bug fixes for azure * check azure before running * Update README.md * Update README.md * bug fix with persona loading * remove print * make errors for cli flags more clear * format * fix imports * fix imports * add prints * update lock * remove asserts * patch (cpacker#435) * patch cpacker#428 (cpacker#433) * [version] bump release to 0.2.2 (cpacker#436) * fix config (cpacker#438) * Configurable presets to support easy extension of MemGPT's function set (cpacker#420) * partial * working schema builder, tested that it matches the hand-written schemas * correct another schema diff * refactor * basic working test * refactored preset creation to use yaml files * added docstring-parser * add code for dynamic function linking in agent loading * pretty schema diff printer * support pulling from ~/.memgpt/functions/*.py * clean * allow looking for system prompts in ~/.memgpt/system_prompts * create ~/.memgpt/system_prompts if it doesn't exist * pull presets from ~/.memgpt/presets in addition to examples folder * add support for loading agent configs that have additional keys --------- Co-authored-by: Sarah Wooders <sarahwooders@gmail.com> * WebSocket interface and basic `server.py` process (cpacker#399) * patch getargspec error (cpacker#440) * always cast `config.context_window` to `int` before use (cpacker#444) * always cast config.context_window to int before use * extra code to be super safe if self.config.context_window is somehow None * Refactor config + determine LLM via `config.model_endpoint_type` (cpacker#422) * mark depricated API section * CLI bug fixes for azure * check azure before running * Update README.md * Update README.md * bug fix with persona loading * remove print * make errors for cli flags more clear * format * fix imports * fix imports * add prints * update lock * update config fields * cleanup config loading * commit * remove asserts * refactor configure * put into different functions * add embedding default * pass in config * fixes * allow overriding openai embedding endpoint * black * trying to patch tests (some circular import errors) * update flags and docs * patched support for local llms using endpoint and endpoint type passed via configs, not env vars * missing files * fix naming * fix import * fix two runtime errors * patch ollama typo, move ollama model question pre-wrapper, modify question phrasing to include link to readthedocs, also have a default ollama model that has a tag included * disable debug messages * made error message for failed load more informative * don't print dynamic linking function warning unless --debug * updated tests to work with new cli workflow (disabled openai config test for now) * added skips for tests when vars are missing * update bad arg * revise test to soft pass on empty string too * don't run configure twice * extend timeout (try to pass against nltk download) * update defaults * typo with endpoint type default * patch runtime errors for when model is None * catching another case of 'x in model' when model is None (preemptively) * allow overrides to local llm related config params * made model wrapper selection from a list vs raw input * update test for select instead of input * Fixed bug in endpoint when using local->openai selection, also added validation loop to manual endpoint entry * updated error messages to be more informative with links to readthedocs * add back gpt3.5-turbo --------- Co-authored-by: cpacker <packercharles@gmail.com> * patch bad merge * patch websocket server after presets refactor * Update config to include `memgpt_version` and re-run configuration for old versions on `memgpt run` (cpacker#450) * mark depricated API section * add readme * add readme * add readme * add readme * add readme * add readme * add readme * add readme * add readme * CLI bug fixes for azure * check azure before running * Update README.md * Update README.md * bug fix with persona loading * remove print * make errors for cli flags more clear * format * fix imports * fix imports * add prints * update lock * remove asserts * store config versions and force update in some cases * Add load and load_and_attach functions to memgpt autogen agent. (cpacker#430) * Add load and load_and_attach functions to memgpt autogen agent. * Only recompute files if dataset does not exist. * Update documentation [local LLMs, presets] (cpacker#453) * updated local llm documentation * updated cli flags to be consistent with documentation * added preset documentation * update test to use new arg * update test to use new arg * missing .md file * When default_mode_endpoint has a value, it needs to become model_endpoint. (cpacker#452) Co-authored-by: Oliver Smith <oliver.smith@superevilmegacorp.com> * Upgrade workflows to Python 3.11 (cpacker#441) * use python 3.11 * change format * [version] bump version to 0.2.3 (cpacker#457) * Set service context for llama index in `local.py` (cpacker#462) * mark depricated API section * add readme * add readme * add readme * add readme * add readme * add readme * add readme * add readme * add readme * CLI bug fixes for azure * check azure before running * Update README.md * Update README.md * bug fix with persona loading * remove print * make errors for cli flags more clear * format * fix imports * fix imports * add prints * update lock * remove asserts * bump version * set global context for llama index * Update functions.md (cpacker#461) * bugfix for linking functions from ~/.memgpt/functions (cpacker#463) * Add d20 function example to readthedocs (cpacker#464) * Update functions.md * Update functions.md * move webui to new openai completions endpoint, but also provide existing functionality via webui-legacy backend (cpacker#468) * updated websocket protocol and server (cpacker#473) * Lancedb storage integration (cpacker#455) * Docs: Fix typos (cpacker#477) * Remove .DS_Store from agents list (cpacker#485) * Fix cpacker#487 (summarize call uses OpenAI even with local LLM config) (cpacker#488) * use new chatcompletion function that takes agent config inside of summarize * patch issue with model now missing * patch web UI (cpacker#484) * patch web UI * set truncation_length * ANNA, an acronym for Adaptive Neural Network Assistant. Which acts as your personal research assistant really good with archival documents and research. (cpacker#494) * vLLM support (cpacker#492) * init vllm (not tested), uses POST API not openai wrapper * add to cli config list * working vllm endpoint * add model configuration for vllm --------- Co-authored-by: Sarah Wooders <sarahwooders@gmail.com> * Add error handling during linking imports (cpacker#495) * Add error handling during linking imports * correct typo + make error message even more explicit * deadcode * Fixes bugs with AutoGen implementation and exampes (cpacker#498) * patched bugs in autogen agent example, updated autogen agent creation to follow agentconfig paradigm * more fixes * black * fix bug in autoreply * black * pass default autoreply through to the memgpt autogen conversibleagent subclass so that it doesn't leave empty messages which can trigger errors in local llm backends like lmstudio * update version (cpacker#497) * add new manual json parser meant to catch send_message calls with trailing bad extra chars (cpacker#509) * add new manual json parser meant to catch send_message calls with stray trailing chars, patch json error passing * typo * add a longer prefix that to the default wrapper (cpacker#510) * add a longer prefix that to the default wrapper (not just opening brace, but up to 'function: ' part since that is always present) * drop print * add core memory char limits to text shown in core memory (cpacker#508) * add core memory char limits to text shown in core memory * include char limit in xml tag * add flag to allow reverting to old version * extra arg being passed causing a runtime error (cpacker#517) * Add warning if no data sources loaded on `/attach` command (cpacker#513) * minor fix * add warn instead of error for no data sources * fix autogem to autogen (cpacker#512) * Update contributing guidelines (cpacker#516) * update contributing * Update CONTRIBUTING.md * Update CONTRIBUTING.md * Update CONTRIBUTING.md * Update CONTRIBUTING.md * Update CONTRIBUTING.md * Update contributing.md (cpacker#518) * Update contributing.md (cpacker#520) * Add support for HuggingFace Text Embedding Inference endpoint for embeddings (cpacker#524) * Update mkdocs theme, small fixes for `mkdocs.yml` (cpacker#522) * Update mkdocs.yml (cpacker#525) * Clean memory error messages (cpacker#523) * Raise a custom keyerror instead of basic keyerror to clarify issue to LLM processor * remove self value from error message passed to LLM processor * simplify error message propogated to llm processor * Fix class names used in persistence manager logging (cpacker#503) * Fix class names used in persistence manager logging Signed-off-by: Claudio Cambra <developer@claudiocambra.com> * Use self.__class__.__name__ for logging in different persistence managers Signed-off-by: Claudio Cambra <developer@claudiocambra.com> --------- Signed-off-by: Claudio Cambra <developer@claudiocambra.com> * add autogen extra (cpacker#530) * Add `user` field for vLLM endpoint (cpacker#531) * patched a bug where outputs of a regex extraction weren't getting cast back to string, causing an issue when the dict was then passed to json.dumps() (cpacker#533) * Update bug_report.md (cpacker#532) * Update bug_report.md * LanceDB integration bug fixes and improvements (cpacker#528) * fixes * update * lint * Remove `openai` package and migrate to requests (cpacker#534) * Update contributing.md (typo) (cpacker#538) * Run formatting checks with poetry (cpacker#537) * update black version * add workflow dispatch * Removing dead code + legacy commands (cpacker#536) * Remove usage of `BACKEND_TYPE` (cpacker#539) * Update AutoGen documentation and notebook example (cpacker#540) * Update AutoGen documentation * Update webui.md * Update webui.md * Update lmstudio.md * Update lmstudio.md * Update mkdocs.yml * Update README.md * Update README.md * Update README.md * Update autogen.md * Update local_llm.md * Update local_llm.md * Update autogen.md * Update autogen.md * Update autogen.md * refreshed the autogen examples + notebook (notebook is untested) * unrelated patch of typo I noticed * poetry remove pyautogen, then manually removed autogen extra in .toml * add pdf dependency --------- Co-authored-by: Sarah Wooders <sarahwooders@gmail.com> * Update local_llm.md (cpacker#542) * Documentation update (cpacker#541) * Update autogen.md * Update autogen.md * clean docs (cpacker#543) * Update autogen.md (cpacker#544) * update docs (cpacker#547) * update admonitions * Update local_llm.md * Update webui.md * Update autogen.md * Update storage.md * Update example_chat.md * Update example_data.md * Update example_chat.md * Update example_data.md * added vLLM doc page since we support it (cpacker#545) * added vLLM doc page since we support it * capitalization * updated documentation * Update vllm.md * Update ollama.md * Update ollama.md * Update ollama.md * Update autogen.md * Fix vLLM endpoint to have correct suffix (cpacker#548) * minor fix * fix vllm endpoint * fix docs * Add documentation for using Hugging Face models for embeddings (cpacker#549) * Update README.md * bump version (cpacker#551) * Add docs file for customizing embedding mode (cpacker#554) * minor fix * forgot to add embedding file * Upgrade to `llama_index=0.9.10` (cpacker#556) * minor fix * forgot to add embedding file * upgrade llama index * fix cannot import name 'EmptyIndex' from 'llama_index' (cpacker#558) * Update README.md * Update storage.md (cpacker#564) fix typo * use a consistent warning prefix across codebase (cpacker#569) * Update autogen.md to include Azure config example + patch for `pyautogen>=0.2.0` (cpacker#555) * Update autogen.md * in groupchat example add an azure elif * fixed missing azure mappings + corrected the gpt-4-turbo one * Updated MemGPT AutoGen agent to take credentials and store them in the config (allows users to use memgpt+autogen without running memgpt configure), also patched api_base kwarg for autogen >=v0.2 * add note about 0.2 testing * added overview to autogen integration page * default examples to openai, sync config header between the two main examples, change speaker mode to round-robin in 2-way chat to supress warning * sync config header on last example (not used in docs) * refactor to make sure we use existing config when writing out extra credentials * fixed bug in local LLM where we need to comment out api_type (for pyautogen>=0.2.0) * Update autogen.md * Update autogen.md (cpacker#571) Update example config to match `pyautogen==0.2.0` * Fix crash from bad key access into response_message without function_call (cpacker#437) Signed-off-by: Claudio Cambra <developer@claudiocambra.com> * sort agents by directory-last-modified time (cpacker#574) * sort agents by directory-last-modified time * only save agent config when agent is saved --------- Co-authored-by: Sarah Wooders <sarahwooders@gmail.com> * Add safety check to pop (cpacker#575) * Add safety check to pop * typo * Add `pyyaml` package to `pyproject.toml` (cpacker#557) * add back dotdict for backcompat (cpacker#572) * Bump version to 0.2.6 (cpacker#573) * Update cli_faq.md * Update cli_faq.md * Update cli_faq.md * allow passing `skip_verify` to autogen constructors (cpacker#581) * allow passing skip_verify to autogen constructors * added flag to examples with a NOTE, also added to docs * Chroma storage integration (cpacker#285) * Fix `pyproject.toml` chroma version (cpacker#582) * mark depricated API section * add readme * add readme * add readme * add readme * add readme * add readme * add readme * add readme * add readme * CLI bug fixes for azure * check azure before running * Update README.md * Update README.md * bug fix with persona loading * remove print * make errors for cli flags more clear * format * add initial postgres implementation * working chroma loading * add postgres tests * working initial load into postgres and chroma * add load index command * semi working load index * disgusting import code thanks to llama index's nasty APIs * add postgres connector * working postgres integration * working local storage (changed saving) * implement /attach * remove old code * split up storage conenctors into multiple files * remove unused code * cleanup * implement vector db loading * cleanup state savign * add chroma * minor fix * fix up chroma integration * fix list error * update dependencies * update docs * format * cleanup * forgot to add embedding file * upgrade llama index * fix data source naming bug * remove legacy * os import * upgrade chroma version * fix chroma package * Remove broken tests from chroma merge (cpacker#584) * fix runtime error (cpacker#586) * Patch azure embeddings + handle azure deployments properly (cpacker#594) * Fix bug where embeddings endpoint was getting set to deployment, upgraded pinned llama-index to use new version that has azure endpoint * updated documentation * added memgpt example for openai * change wording to match configure --------- Signed-off-by: Claudio Cambra <developer@claudiocambra.com> Co-authored-by: danx0r <danbmil99@gmail.com> Co-authored-by: cpacker <packercharles@gmail.com> Co-authored-by: Drake-AI <drake-ai@users.noreply.github.com> Co-authored-by: Vivian Fang <hi@vivi.sh> Co-authored-by: Robin Goetz <35136007+goetzrobin@users.noreply.github.com> Co-authored-by: Sarah Wooders <sarahwooders@gmail.com> Co-authored-by: Dividor <matthew@regolith.org> Co-authored-by: borewik <borewik@gmail.com> Co-authored-by: Hans Raaf <hara@oderwat.de> Co-authored-by: Jirito0 <jirito0@users.noreply.github.com> Co-authored-by: Mo Nuaimat <nuaimat2002@yahoo.com> Co-authored-by: MSZ-MGS <65172063+MSZ-MGS@users.noreply.github.com> Co-authored-by: Bob Kerns <1154903+BobKerns@users.noreply.github.com> Co-authored-by: Anjalee Sudasinghe <42403668+anjaleeps@users.noreply.github.com> Co-authored-by: Anjalee Sudasinghe <anjalee@codegen.net> Co-authored-by: Wes <wryanmedford@gmail.com> Co-authored-by: Oliver Smith <oliver@kfs.org> Co-authored-by: Oliver Smith <oliver.smith@superevilmegacorp.com> Co-authored-by: Prashant Dixit <54981696+PrashantDixit0@users.noreply.github.com> Co-authored-by: sahusiddharth <112792547+sahusiddharth@users.noreply.github.com> Co-authored-by: Max Blackmer, CSM <max@agiletechnologist.us> Co-authored-by: Paul Asquin <paul.asquin@gmail.com> Co-authored-by: Claudio Cambra <developer@claudiocambra.com> Co-authored-by: Ayush Chaurasia <ayush.chaurarsia@gmail.com> Co-authored-by: Alex Perez <alexperezdev@gmail.com>

add llamacpp server support

064f67c

cpacker marked this pull request as draft November 4, 2023 00:46

cpacker added 2 commits November 4, 2023 00:57

use gbnf loader

be59221

cleanup and warning about grammar when not using llama.cpp

caa9c5d

cpacker added 4 commits November 4, 2023 10:25

added memgpt-specific grammar file

44834be

add grammar support to webui api calls

ce83965

Merge branch 'main' into grammar-based-sampling

bb84bbd

black

0d4e2f4

typo

50ee5e5

cpacker added 3 commits November 4, 2023 10:53

add koboldcpp support

b0a6910

no more defaulting to webui, should error out instead

12a6510

fix grammar

2aefdce

cpacker marked this pull request as ready for review November 4, 2023 17:57

cpacker changed the title ~~[Draft] Add grammar-based sampling (initially via llama.cpp server directly)~~ Add grammar-based sampling (initially via llama.cpp server directly) Nov 4, 2023

cpacker commented Nov 4, 2023

View reviewed changes

memgpt/local_llm/koboldcpp/settings.py Show resolved Hide resolved

cpacker requested a review from vivi November 4, 2023 17:58

cpacker assigned vivi, cpacker and Drake-AI Nov 4, 2023

cpacker added the local-llm label Nov 4, 2023

cpacker changed the title ~~Add grammar-based sampling (initially via llama.cpp server directly)~~ Add grammar-based sampling (for webui, llamacpp, and koboldcpp) Nov 4, 2023

Drake-AI reviewed Nov 4, 2023

View reviewed changes

memgpt/local_llm/koboldcpp/api.py Outdated Show resolved Hide resolved

patch kobold (testing, now working) + cleanup log messages

1ba2590

vivi reviewed Nov 4, 2023

View reviewed changes

memgpt/local_llm/chat_completion_proxy.py Show resolved Hide resolved

vivi reviewed Nov 4, 2023

View reviewed changes

memgpt/local_llm/chat_completion_proxy.py Show resolved Hide resolved

vivi approved these changes Nov 4, 2023

View reviewed changes

vivi merged commit 1524d6a into main Nov 4, 2023
2 checks passed

cpacker deleted the grammar-based-sampling branch November 5, 2023 05:25

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add grammar-based sampling (for webui, llamacpp, and koboldcpp) #293

Add grammar-based sampling (for webui, llamacpp, and koboldcpp) #293

cpacker commented Nov 4, 2023 •

edited

Drake-AI commented Nov 4, 2023

Drake-AI commented Nov 4, 2023 •

edited

cpacker commented Nov 4, 2023

Drake-AI commented Nov 4, 2023

cpacker commented Nov 4, 2023

Drake-AI commented Nov 4, 2023

Drake-AI commented Nov 4, 2023 •

edited by cpacker

cpacker commented Nov 4, 2023

cpacker commented Nov 4, 2023

cpacker commented Nov 4, 2023 •

edited

vivi left a comment

vivi commented Nov 4, 2023

Add grammar-based sampling (for webui, llamacpp, and koboldcpp) #293

Add grammar-based sampling (for webui, llamacpp, and koboldcpp) #293

Conversation

cpacker commented Nov 4, 2023 • edited

Example

Drake-AI commented Nov 4, 2023

Drake-AI commented Nov 4, 2023 • edited

cpacker commented Nov 4, 2023

Drake-AI commented Nov 4, 2023

cpacker commented Nov 4, 2023

Drake-AI commented Nov 4, 2023

Drake-AI commented Nov 4, 2023 • edited by cpacker

cpacker commented Nov 4, 2023

cpacker commented Nov 4, 2023

cpacker commented Nov 4, 2023 • edited

vivi left a comment

Choose a reason for hiding this comment

vivi commented Nov 4, 2023

cpacker commented Nov 4, 2023 •

edited

Drake-AI commented Nov 4, 2023 •

edited

Drake-AI commented Nov 4, 2023 •

edited by cpacker

cpacker commented Nov 4, 2023 •

edited