Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

local exllamav2 (TabbyAPI) KeyError: 'stop' #44

Open
BarfingLemurs opened this issue May 16, 2024 · 5 comments
Open

local exllamav2 (TabbyAPI) KeyError: 'stop' #44

BarfingLemurs opened this issue May 16, 2024 · 5 comments

Comments

@BarfingLemurs
Copy link

BarfingLemurs commented May 16, 2024

Thanks for sharing the project! The interrupt feature is really impressive! :)

I'm getting an error on Ubuntu 22.04 when trying a different backend, with a fresh install of tabbyAPI:

whisper_init_state: compute buffer (decode) =   98.31 MB
2024-05-15 21:25:31.326 | SUCCESS  | __main__:__init__:139 - TTS text: All neural network modules are now loaded. No network access detected. How very annoying. System Operational.
2024-05-15 21:25:31.344 | SUCCESS  | __main__:start_listen_event_loop:191 - Audio Modules Operational
2024-05-15 21:25:31.344 | SUCCESS  | __main__:start_listen_event_loop:192 - Listening...
2024-05-15 21:25:55.877 | SUCCESS  | __main__:_process_detected_audio:291 - ASR text: 'Please tell me a joke.'
Exception in thread Thread-1 (process_LLM):
Traceback (most recent call last):
  File "/home/user/miniconda3/lib/python3.11/threading.py", line 1038, in _bootstrap_inner
    self.run()
  File "/home/user/miniconda3/lib/python3.11/threading.py", line 975, in run
    self._target(*self._args, **self._kwargs)
  File "/home/user/projects/GlaDOS/glados.py", line 486, in process_LLM
    next_token = self._process_line(line)
                 ^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/user/projects/GlaDOS/glados.py", line 523, in _process_line
    if not line["stop"]:
           ~~~~^^^^^^^^
KeyError: 'stop'
@dnhkng
Copy link
Owner

dnhkng commented May 16, 2024

Hmmm, I'll try using tabbyAPI to replicate the bug.

@ncharron
Copy link

ncharron commented May 27, 2024

I am getting the same thing with LM Studio running locally however I wonder if it has to do with the template.

EDIT: It actually has to do with the response that is given from the local AI. In my case, I integrated the openai python package, declared the client as such : client = OpenAI(base_url="http://localhost:1234/v1", api_key="lm-studio")
Then had to modify process_LLM to make it work with the chunks so that the tokens were loaded properly.

@system-
Copy link

system- commented May 29, 2024

Same here with LM Studio, Problem is that it uses "finish_reason":null and "finish_reason":"stop" instead of "stop":false

@ncharron
Copy link

ncharron commented May 29, 2024

I had to modify process_LLM to the following to get it to work (mind you I added debug statements and I am pretty sure you can remove a lot of if statements in there for the next_token haha. I also had to check for empty messages because for whatever reason there were so many empty messages being appended with no content so I stripped all those out. The reason for no PR is I have no idea how this would break the other implementations.:
` def process_LLM(self):
"""
Processes the detected text using the LLM model.

    """
    while not self.shutdown_event.is_set():
        try:
            detected_text = self.llm_queue.get(timeout=0.1)

            self.messages.append({"role": "user", "content": detected_text})
            filtered_msg = [msg for msg in self.messages if msg['content'].strip()]
            prompt = self.template.render(
                messages=filtered_msg,
                bos_token="<|begin_of_text|>",
                add_generation_prompt=True,
            )

            logger.debug(f"{prompt=}")
            filtered_msg = [msg for msg in self.messages if msg['content'].strip()]
            data = {
                "model": "QuantFactory/Meta-Llama-3-70B-Instruct-GGUF",
                "stream": True,
                "messages": filtered_msg,
                "temperature": 0.7,
                "max_tokens": -1
            }
            logger.debug(f"starting request on {filtered_msg=}")
            logger.debug("Performing request to LLM server...")

            # Perform the request and process the stream
            completion = client.chat.completions.create(
                model="QuantFactory/Meta-Llama-3-70B-Instruct-GGUF",
                messages=filtered_msg,
                temperature=0.7,
                stream=True,
            )
            sentence = []
            for chunk in completion:
                if self.processing is False:
                    break # If the stop flag is set from new voice input, halt processing
                if chunk.choices[0].delta.content:
                    line = chunk.choices[0].delta.content
                    if line:  # Filter out empty keep-alive new lines
                        #line = self._clean_raw_bytes(line)
                        next_token = line
                        if next_token:
                            sentence.append(next_token)
                            # If there is a pause token, send the sentence to the TTS queue
                            if next_token in [
                                ".",
                                "!",
                                "?",
                                ":",
                                ";",
                                "?!",
                                "\n",
                                "\n\n",
                            ]:
                                self._process_sentence(sentence)
                                sentence = []
                            #if self.processing:
                            #    if sentence:
                                #       self._process_sentence(sentence)
                            self.tts_queue.put("<EOS>")  # Add end of stream token to the queue

        except queue.Empty:
            time.sleep(PAUSE_TIME)`

@thijsi123

This comment was marked as outdated.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants