local exllamav2 (TabbyAPI) KeyError: 'stop' #44

BarfingLemurs · 2024-05-16T02:22:48Z

Thanks for sharing the project! The interrupt feature is really impressive! :)

I'm getting an error on Ubuntu 22.04 when trying a different backend, with a fresh install of tabbyAPI:

whisper_init_state: compute buffer (decode) =   98.31 MB
2024-05-15 21:25:31.326 | SUCCESS  | __main__:__init__:139 - TTS text: All neural network modules are now loaded. No network access detected. How very annoying. System Operational.
2024-05-15 21:25:31.344 | SUCCESS  | __main__:start_listen_event_loop:191 - Audio Modules Operational
2024-05-15 21:25:31.344 | SUCCESS  | __main__:start_listen_event_loop:192 - Listening...
2024-05-15 21:25:55.877 | SUCCESS  | __main__:_process_detected_audio:291 - ASR text: 'Please tell me a joke.'
Exception in thread Thread-1 (process_LLM):
Traceback (most recent call last):
  File "/home/user/miniconda3/lib/python3.11/threading.py", line 1038, in _bootstrap_inner
    self.run()
  File "/home/user/miniconda3/lib/python3.11/threading.py", line 975, in run
    self._target(*self._args, **self._kwargs)
  File "/home/user/projects/GlaDOS/glados.py", line 486, in process_LLM
    next_token = self._process_line(line)
                 ^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/user/projects/GlaDOS/glados.py", line 523, in _process_line
    if not line["stop"]:
           ~~~~^^^^^^^^
KeyError: 'stop'

The text was updated successfully, but these errors were encountered:

dnhkng · 2024-05-16T05:02:29Z

Hmmm, I'll try using tabbyAPI to replicate the bug.

ncharron · 2024-05-27T18:50:30Z

I am getting the same thing with LM Studio running locally however I wonder if it has to do with the template.

EDIT: It actually has to do with the response that is given from the local AI. In my case, I integrated the openai python package, declared the client as such : client = OpenAI(base_url="http://localhost:1234/v1", api_key="lm-studio")
Then had to modify process_LLM to make it work with the chunks so that the tokens were loaded properly.

system- · 2024-05-29T02:37:47Z

Same here with LM Studio, Problem is that it uses "finish_reason":null and "finish_reason":"stop" instead of "stop":false

ncharron · 2024-05-29T13:09:40Z

I had to modify process_LLM to the following to get it to work (mind you I added debug statements and I am pretty sure you can remove a lot of if statements in there for the next_token haha. I also had to check for empty messages because for whatever reason there were so many empty messages being appended with no content so I stripped all those out. The reason for no PR is I have no idea how this would break the other implementations.:
` def process_LLM(self):
"""
Processes the detected text using the LLM model.

    """
    while not self.shutdown_event.is_set():
        try:
            detected_text = self.llm_queue.get(timeout=0.1)

            self.messages.append({"role": "user", "content": detected_text})
            filtered_msg = [msg for msg in self.messages if msg['content'].strip()]
            prompt = self.template.render(
                messages=filtered_msg,
                bos_token="<|begin_of_text|>",
                add_generation_prompt=True,
            )

            logger.debug(f"{prompt=}")
            filtered_msg = [msg for msg in self.messages if msg['content'].strip()]
            data = {
                "model": "QuantFactory/Meta-Llama-3-70B-Instruct-GGUF",
                "stream": True,
                "messages": filtered_msg,
                "temperature": 0.7,
                "max_tokens": -1
            }
            logger.debug(f"starting request on {filtered_msg=}")
            logger.debug("Performing request to LLM server...")

            # Perform the request and process the stream
            completion = client.chat.completions.create(
                model="QuantFactory/Meta-Llama-3-70B-Instruct-GGUF",
                messages=filtered_msg,
                temperature=0.7,
                stream=True,
            )
            sentence = []
            for chunk in completion:
                if self.processing is False:
                    break # If the stop flag is set from new voice input, halt processing
                if chunk.choices[0].delta.content:
                    line = chunk.choices[0].delta.content
                    if line:  # Filter out empty keep-alive new lines
                        #line = self._clean_raw_bytes(line)
                        next_token = line
                        if next_token:
                            sentence.append(next_token)
                            # If there is a pause token, send the sentence to the TTS queue
                            if next_token in [
                                ".",
                                "!",
                                "?",
                                ":",
                                ";",
                                "?!",
                                "\n",
                                "\n\n",
                            ]:
                                self._process_sentence(sentence)
                                sentence = []
                            #if self.processing:
                            #    if sentence:
                                #       self._process_sentence(sentence)
                            self.tts_queue.put("<EOS>")  # Add end of stream token to the queue

        except queue.Empty:
            time.sleep(PAUSE_TIME)`

This comment was marked as outdated.

Sign in to view

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

local exllamav2 (TabbyAPI) KeyError: 'stop' #44

local exllamav2 (TabbyAPI) KeyError: 'stop' #44

BarfingLemurs commented May 16, 2024 •

edited

Loading

dnhkng commented May 16, 2024

ncharron commented May 27, 2024 •

edited

Loading

system- commented May 29, 2024

ncharron commented May 29, 2024 •

edited

Loading

This comment was marked as outdated.

local exllamav2 (TabbyAPI) KeyError: 'stop' #44

local exllamav2 (TabbyAPI) KeyError: 'stop' #44

Comments

BarfingLemurs commented May 16, 2024 • edited Loading

dnhkng commented May 16, 2024

ncharron commented May 27, 2024 • edited Loading

system- commented May 29, 2024

ncharron commented May 29, 2024 • edited Loading

This comment was marked as outdated.

BarfingLemurs commented May 16, 2024 •

edited

Loading

ncharron commented May 27, 2024 •

edited

Loading

ncharron commented May 29, 2024 •

edited

Loading