Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

AllTalk and Text-Gen-WebUI-ST wierd interaction, generates the image and the .wav file, but it dosn't play the audio #18

Open
guispfilho opened this issue May 18, 2024 · 9 comments
Labels
bug Something isn't working

Comments

@guispfilho
Copy link

I've just installed the Text-Gen-WebUI-ST , and it seems to be working fine, the problem is the interaction between it and AllTalk.

Here is an example response I received:
Chiharu's eyes widen in surprise at your response, but she quickly recovers and smiles reassuringly Don't worry, I won't bite! She chuckles Let's start with the basics. What got you interested in computers?
[image_correclty_generated]

AllTalk correctly generates the .wav file, but it doesnt send it alongside the text response and the image.
And as seem in the example above, the text is ouputting the .wav file path, and not the "play button".
The correct text would be:
[PLAY BUTTOM]
Chiharu's eyes widen in surprise at your response, but she quickly recovers and smiles reassuringly Don't worry, I won't bite! She chuckles Let's start with the basics. What got you interested in computers?
[image_correclty_generated]

I'm running AllTalk as a Text-Gen-WebUI extension, and it's entire ouput into the command windows is:

Llama.generate: prefix-match hit
Output generated in 6.15 seconds (10.09 tokens/s, 62 tokens, context 384, seed 1337)
[AllTalk TTSGen] Chiharu's eyes widen in surprise at your response, but she quickly recovers and smiles reassuringly Don't worry, I won't bite! She chuckles Let's start with the basics. What got you interested in computers?
[AllTalk TTSGen] 4.09 seconds. LowVRAM: False DeepSpeed: True
18:46:26-837540 INFO     [SD WebUI Integration] Using stable-diffusion-webui to generate images.
                           Prompt: She chuckles Let's start with the basics. What got you interested in computers, but she
                         quickly recovers and smiles reassuringly Don't worry, <audio
                         src=file/extensions/alltalk_tts/outputs/Chiharu Yamada_1716068782.wav controls
                         autoplay></audio>Chiharu's eyes widen in surprise at your response, I won't bite, sscore_9,
                         score_8_up, score_7_up, score_6_up, realistic, realism, source_anime, High-Res, High Quality,
                         (masterpiece, best quality, highly detailed, realistic, beautiful eyes, detailed face), BREAK red
                         eyes, red glasses, brown hair, pony tails, long hair, bangs, smiling, lab coat, black boots, ankle
                         boots, blue nails, blue lipstick, cowboy shot, upper body shot
                           Negative Prompt: score_4_up, score_5_up,low detailed, ugly face, bad hands, bad fingers, mutated
                         hands, low res, blurry face, monochrome, words, artist signature, close up

And here is Stable-Difussion ouput:

Begin to load 1 model
[Memory Management] Current Free GPU Memory (MB) =  10816.615869522095
[Memory Management] Model Memory (MB) =  2144.3546981811523
[Memory Management] Minimal Inference Memory (MB) =  1024.0
[Memory Management] Estimated Remaining GPU Memory (MB) =  7648.261171340942
Moving model(s) has taken 1.81 seconds
To load target model SDXL
Begin to load 1 model
[Memory Management] Current Free GPU Memory (MB) =  9031.173444747925
[Memory Management] Model Memory (MB) =  4897.086494445801
[Memory Management] Minimal Inference Memory (MB) =  1024.0
[Memory Management] Estimated Remaining GPU Memory (MB) =  3110.086950302124
Moving model(s) has taken 3.24 seconds
100%|██████████████████████████████████████████████████████████████████████████████████| 15/15 [00:05<00:00,  2.93it/s]
Memory cleanup has taken 3.36 seconds██████████████████████████████████████████████████| 15/15 [00:04<00:00,  2.97it/s]
Total progress: 100%|██████████████████████████████████████████████████████████████████| 15/15 [00:08<00:00,  1.77it/s]
Total progress: 100%|██████████████████████████████████████████████████████████████████| 15/15 [00:08<00:00,  2.97it/s]

And this is my "settings.yaml" file:

preset: Debug-deterministic
seed: 1337
truncation_length: 32768
stream: false
character: Chiharu Yamada
default_extensions:
- openai
- send_pictures
- sd_api_pictures
- gallery
- long_replies
- whisper_stt
- alltalk_tts
- stable_diffusion
alltalk_tts-voice: '#Scarlett_voice_preview (enhanced).wav'
stable_diffusion-api_username: ''
stable_diffusion-api_password: ''
stable_diffusion-base_prompt: sscore_9, score_8_up, score_7_up, score_6_up, realistic, realism, source_anime, High-Res, High Quality, (masterpiece, best quality, highly detailed, realistic, beautiful eyes, detailed face), BREAK red eyes, red glasses, brown hair, pony tails, long hair, bangs, smiling, lab coat, black boots, ankle boots, blue nails, blue lipstick, cowboy shot, upper body shot
stable_diffusion-base_negative_prompt: score_4_up, score_5_up,low detailed, ugly face, bad hands, bad fingers, mutated hands, low res, blurry face, monochrome, words, artist signature, close up,
stable_diffusion-sampler_name: Euler a
stable_diffusion-sampling_steps: 15
stable_diffusion-width: 1024
stable_diffusion-height: 1024
stable_diffusion-cfg_scale: 4
stable_diffusion-clip_skip: 2
stable_diffusion-debug_mode_enabled: true
stable_diffusion-trigger_mode: continuous
stable_diffusion-tool_mode_force_json_output_enabled: false
stable_diffusion-tool_mode_force_json_output_schema: "{\n  \"type\": \"array\", \n  \"items\": {\n    \"type\": \"object\",\n    \"properties\": {\n      \"tool_name\": {\n        \"type\": \"string\",\n        \"required\": true\n      },\n      \"parameters\": {\n        \"type\": \"object\",\n        \"required\": true\n      }        \n    },\n    \"additionalProperties\": false\n  },\n  \"minItems\": 1\n}"
stable_diffusion-interactive_mode_input_trigger_regex: .*(draw|paint|create|send|upload|add|show|attach|generate)\b.+?\b(image|pic(ture)?|photo|snap(shot)?|selfie|meme)(s?)
stable_diffusion-interactive_mode_output_trigger_regex: .*[*([]?(draws|paints|creates|sends|uploads|adds|shows|attaches|generates|here (is|are))\b.+?\b(image|pic(ture)?|photo|snap(shot)?|selfie|meme)(s?)
stable_diffusion-interactive_mode_description_prompt: |
  You are now a text generator for the Stable Diffusion AI image generator. You will generate a text prompt for it.
  Describe [subject] using comma-separated tags only. Do not use sentences. Include many tags such as tags for the environment, gender, clothes, age, location, light, daytime, angle, pose, etc.
  Very important: only write the comma-separated tags. Do not write anything else. Do not ask any questions. Do not talk.
stable_diffusion-dont_stream_when_generating_images: false
stable_diffusion-generation_rules:
- regex: .*\b(detailed)\b
  match:
  - input
  - output
  actions:
  - name: prompt_append
    args: '(high resolution, detailed, realistic, vivid: 1.2), hdr, 8k, <lora:add_details:1>'
- regex: ^Assistant$
  match:
  - character_name
  actions:
  - name: prompt_append
    args: 
  - name: negative_prompt_append
    args: 
- regex: ^Example$
  match:
  - character_name
  actions:
  - name: faceswaplab_enable
  - name: faceswaplab_set_source_face
    args: file:///{STABLE_DIFFUSION_EXTENSION_DIRECTORY}/assets/example_face.jpg
stable_diffusion-hires_sampling_steps: 10
stable_diffusion-faceswaplab_source_face: file:///{STABLE_DIFFUSION_EXTENSION_DIRECTORY}/assets/example_face.jpg
stable_diffusion-reactor_source_face: file:///{STABLE_DIFFUSION_EXTENSION_DIRECTORY}/assets/example_face.jpg
stable_diffusion-reactor_source_gender: none
stable_diffusion-reactor_target_gender: none
stable_diffusion-faceid_source_face: file:///{STABLE_DIFFUSION_EXTENSION_DIRECTORY}/assets/example_face.jpg
stable_diffusion-ipadapter_reference_image: file:///{STABLE_DIFFUSION_EXTENSION_DIRECTORY}/assets/example_face.jpg

If someone could give me a help on this, I'll appreciate a lot. Thank you.

@Trojaner Trojaner added the bug Something isn't working label May 18, 2024
@Trojaner
Copy link
Owner

Hello, can you try with 50c6151 ? (just update the extension)

@guispfilho
Copy link
Author

guispfilho commented May 18, 2024

Working now! Thank you very for the fast response!

And if I could bother you with another question. If I understood right, the Continuous mode of generation, if set to "Genereted Text" will use the regex rules set by the user at the bottom of the settings.yaml:

stable_diffusion-generation_rules:
  # Add details to the prompt if the input text or output text contains the word "detailed".
  - regex: .*\b(detailed)\b
    match: ["input", "output"]
    actions:
      - name: "prompt_append"
        args: "(high resolution, detailed, realistic, vivid: 1.2), hdr, 8k, <lora:add_details:1>"

So if I set it to Continuous + Generated Text, I have to extent these rules, like:

  - regex: .*\b(drinking)\b
    match: ["input", "output"]
    actions:
      - name: "prompt_append"
        args: "drinking, holding glass"

Is that correct? If so, using the regex format above is correct, in order to create AND statements within OR statements?

(?=.*\b"yellow car"\b)(?=.*\b"yellow car"\b).*|(?=.*\b"green house"\b)(?=.*\b"red house"\b).*

In the example above, it would return positive if the input/output text have both "yellow car" AND blue car", OR both "green house" AND "red house".

Does this extension works like that?

@Trojaner
Copy link
Owner

Trojaner commented May 19, 2024

Regex rules are always applied regardless of mode used.

The regex you have posted has minor errors. I will assume the second yellow car was supposed to be blue car:

  1. remove quotes unless you explicitly wants the words to be wrapped by quotes in the output
  2. You should wrap both cases with parentheses:
    ((?=.*\byellow car\b)(?=.*\byellow car\b).*)|((?=.*\bgreen house\b)(?=.*\bred house\b).*)

You can test it here, btw:
https://regex101.com/

Don't forget to select Python on the left and to use gmu as options.

@guispfilho
Copy link
Author

Oh yes... I miss typed it. Thank you for the information, and I didn't know this site! Awesome.

And the last question I swear... =D
Would it be possible to use the found regex term as reference to output a word after it?

Something like this:

  - regex: .*\b(drinking)\b
    match: ["input", "output"]
    actions:
      - name: "prompt_append"
        args: "drinking \b(detailed)\b\s+(\w+)"

I'm pretty sure this would work like that, but something similar to this could be used?

@Trojaner
Copy link
Owner

This is sadly currently not supported but feel free to create a new issue for this so it can be added to the todo list

@guispfilho
Copy link
Author

guispfilho commented May 19, 2024

I don't know why, but if I restart Text-Gen-WebUI by closing the command window, and rerunning it, it works fine, but if I click on the "Apply flags/extensions and restart" button, it returns this error and the command windows closes:

00:15:26-223627 INFO     Loading the extension "gallery"
00:15:26-223627 INFO     Loading the extension "sd_api_pictures"
00:15:26-224631 INFO     Loading the extension "stable_diffusion"
00:15:26-225635 INFO     Loading the extension "alltalk_tts"
00:15:26-225635 INFO     Loading the extension "openai"
00:15:26-641018 INFO     [SD WebUI Integration] Ready.
00:15:26-647056 INFO     [SD WebUI Integration] Connecting to Stable Diffusion WebUI...
╭───────────────────────────────────────── Traceback (most recent call last) ──────────────────────────────────────────╮
│ D:\app\text-generation-webui\server.py:263 in <module>                                                               │
│                                                                                                                      │
│   262                 time.sleep(0.5)                                                                                │
│ ❱ 263                 create_interface()                                                                             │
│   264                                                                                                                │
│                                                                                                                      │
│ D:\app\text-generation-webui\server.py:155 in create_interface                                                       │
│                                                                                                                      │
│   154                                                                                                                │
│ ❱ 155         extensions_module.create_extensions_tabs()  # Extensions tabs                                          │
│   156         extensions_module.create_extensions_block()  # Extensions block                                        │
│                                                                                                                      │
│ D:\app\text-generation-webui\modules\extensions.py:207 in create_extensions_tabs                                     │
│                                                                                                                      │
│   206             with gr.Tab(display_name, elem_classes="extension-tab"):                                           │
│ ❱ 207                 extension.ui()                                                                                 │
│   208                                                                                                                │
│                                                                                                                      │
│ D:\app\text-generation-webui\extensions\stable_diffusion\script.py:295 in ui                                         │
│                                                                                                                      │
│   294     ui_params = StableDiffusionWebUiExtensionParams(**params)                                                  │
│ ❱ 295     render_ui(ui_params)                                                                                       │
│   296                                                                                                                │
│                                                                                                                      │
│ D:\app\text-generation-webui\extensions\stable_diffusion\ui.py:43 in render_ui                                       │
│                                                                                                                      │
│    42     _render_status()                                                                                           │
│ ❱  43     _refresh_sd_data(params)                                                                                   │
│    44                                                                                                                │
│                                                                                                                      │
│ D:\app\text-generation-webui\extensions\stable_diffusion\ui.py:661 in _refresh_sd_data                               │
│                                                                                                                      │
│   660     for listener in connect_listeners:                                                                         │
│ ❱ 661         listener.set_visibility(sd_connected)                                                                  │
│   662                                                                                                                │
╰──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯
AttributeError: 'Accordion' object has no attribute 'set_visibility'

@guispfilho
Copy link
Author

guispfilho commented May 19, 2024

Whgich is weird becaus ewhen running Text-Gen-WebUI command windows, it connects to Stable Diffusion WebUI without a problem:

00:17:01-525487 INFO     [SD WebUI Integration] Ready.
00:17:01-531852 INFO     [SD WebUI Integration] Connecting to Stable Diffusion WebUI...
00:17:01-532858 INFO     [SD WebUI Integration] Fetching Stable Diffusion WebUI options...
00:17:01-536972 INFO     [SD WebUI Integration] Fetching Stable Diffusion samplers...
00:17:01-540473 INFO     [SD WebUI Integration] Fetching Stable Diffusion upscalers...
00:17:01-542492 INFO     [SD WebUI Integration] Fetching Stable Diffusion checkpoints...
00:17:01-549114 INFO     [SD WebUI Integration] Fetching Stable Diffusion VAEs...
00:17:01-553517 INFO     [SD WebUI Integration] ✓ Connected to Stable Diffusion WebUI
[Errno 10048] error while attempting to bind on address ('0.0.0.0', 7860): normalmente é permitida apenas uma utilização de cada endereço de soquete (protocolo/endereço de rede/porta)

Running on local URL:  http://0.0.0.0:7861

And at the "Stable Diffusion" tab it says at the top: ✓ Connected to Stable Diffusion WebUI
But if I click on "🔄 Connect / refresh data" it outputs on the command window:

00:18:50-456650 INFO     [SD WebUI Integration] Connecting to Stable Diffusion WebUI...
00:18:50-457651 INFO     [SD WebUI Integration] Fetching Stable Diffusion WebUI options...
00:18:50-461659 INFO     [SD WebUI Integration] Fetching Stable Diffusion samplers...
00:18:50-464168 INFO     [SD WebUI Integration] Fetching Stable Diffusion upscalers...
00:18:50-466172 INFO     [SD WebUI Integration] Fetching Stable Diffusion checkpoints...
00:18:50-472195 INFO     [SD WebUI Integration] Fetching Stable Diffusion VAEs...
Traceback (most recent call last):
  File "D:\app\text-generation-webui\installer_files\env\Lib\site-packages\gradio\queueing.py", line 527, in process_events
    response = await route_utils.call_process_api(
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "D:\app\text-generation-webui\installer_files\env\Lib\site-packages\gradio\route_utils.py", line 261, in call_process_api
    output = await app.get_blocks().process_api(
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "D:\app\text-generation-webui\installer_files\env\Lib\site-packages\gradio\blocks.py", line 1786, in process_api
    result = await self.call_function(
             ^^^^^^^^^^^^^^^^^^^^^^^^^
  File "D:\app\text-generation-webui\installer_files\env\Lib\site-packages\gradio\blocks.py", line 1338, in call_function
    prediction = await anyio.to_thread.run_sync(
                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "D:\app\text-generation-webui\installer_files\env\Lib\site-packages\anyio\to_thread.py", line 56, in run_sync
    return await get_async_backend().run_sync_in_worker_thread(
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "D:\app\text-generation-webui\installer_files\env\Lib\site-packages\anyio\_backends\_asyncio.py", line 2144, in run_sync_in_worker_thread
    return await future
           ^^^^^^^^^^^^
  File "D:\app\text-generation-webui\installer_files\env\Lib\site-packages\anyio\_backends\_asyncio.py", line 851, in run
    result = context.run(func, *args)
             ^^^^^^^^^^^^^^^^^^^^^^^^
  File "D:\app\text-generation-webui\installer_files\env\Lib\site-packages\gradio\utils.py", line 759, in wrapper
    response = f(*args, **kwargs)
               ^^^^^^^^^^^^^^^^^^
  File "D:\app\text-generation-webui\extensions\stable_diffusion\ui.py", line 112, in <lambda>
    lambda: _refresh_sd_data(params, force_refetch=True),
            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "D:\app\text-generation-webui\extensions\stable_diffusion\ui.py", line 661, in _refresh_sd_data
    listener.set_visibility(sd_connected)
    ^^^^^^^^^^^^^^^^^^^^^^^
AttributeError: 'Accordion' object has no attribute 'set_visibility'

But it keeps ✓ Connected to Stable Diffusion WebUI at the top.

@guispfilho
Copy link
Author

guispfilho commented May 19, 2024

I'm getting my a** handed to me... I can't figure out why this is happening, and also, when selecting "Continuous + Static" no image is generated, and no error message is shown in the command prompt. o.o It doesn't send anything to SD's command prompt either...

@guispfilho
Copy link
Author

guispfilho commented May 19, 2024

It seems that the problem isn't with the interaction with AllTalk. I installed just the text-gen and text-gen-ST from skretch to test, and if I select Continuous + Generated Text, it works fine:

Text-Gen-WebUI Command Prompt:

Llama.generate: prefix-match hit

llama_print_timings:        load time =     605.89 ms
llama_print_timings:      sample time =      52.45 ms /   512 runs   (    0.10 ms per token,  9762.61 tokens per second)
llama_print_timings: prompt eval time =       0.00 ms /     1 tokens (    0.00 ms per token,      inf tokens per second)
llama_print_timings:        eval time =    9762.19 ms /   512 runs   (   19.07 ms per token,    52.45 tokens per second)
llama_print_timings:       total time =   10810.71 ms /   513 tokens
Output generated in 11.23 seconds (45.58 tokens/s, 512 tokens, context 59, seed 423338244)
18:11:29-031873 INFO     [SD WebUI Integration] Using stable-diffusion-webui to generate images.
                           Prompt: small cute robot, monochrome, droid, 3d render, white reflective plastic body,
                         simple, 3DMM, <lora:3DMM_V12:1>, this will help you make the most of your time in the city,
                         museums, research popular attractions, and restaurants beforehand so you can prioritize what
                         you want to see and do during your stay, 1, plan ahead, RAW photo, subject, 8k uhd, dslr, soft
                         lighting, high quality, film grain, Fujifilm XT3
                           Negative Prompt: (deformed iris, deformed pupils, semi-realistic, cgi, 3d, render, sketch,
                         cartoon, drawing, anime), text, cropped, out of frame, worst quality, low quality, jpeg
                         artifacts, ugly, duplicate, morbid, mutilated, extra fingers, mutated hands, poorly drawn
                         hands, poorly drawn face, mutation, deformed, blurry, dehydrated, bad anatomy, bad proportions,
                         extra limbs, cloned face, disfigured, gross proportions, malformed limbs, missing arms, missing
                         legs, extra arms, extra legs, fused fingers, too many fingers, long neck

Automatic1111 Command Prompt:

Total progress: 100%|██████████████████████████████████████████████████████████████████| 10/10 [00:03<00:00,  3.03it/s]
Total progress: 100%|██████████████████████████████████████████████████████████████████| 10/10 [00:03<00:00,  4.15it/s]

And the chat output:

Of course! Here are some useful travel tips for your trip to Paris:
1- Be mindful of pickpockets: As with any major city, it&#x27;s essential...
2- Don&#x27;t forget to have a ....
Etc....
[GENERATED IMAGE]

(I just don't know why it's replacing ' by: &#x27;

But if I run Continous + Static, Text-Gen-WebUI doen't generate the image prompt, and nothing is sent to Automatic1111 command windows.

Llama.generate: prefix-match hit

llama_print_timings:        load time =     605.89 ms
llama_print_timings:      sample time =      57.68 ms /   512 runs   (    0.11 ms per token,  8876.10 tokens per second)
llama_print_timings: prompt eval time =       0.00 ms /     1 tokens (    0.00 ms per token,      inf tokens per second)
llama_print_timings:        eval time =   15530.82 ms /   512 runs   (   30.33 ms per token,    32.97 tokens per second)
llama_print_timings:       total time =   16798.36 ms /   513 tokens
Output generated in 17.22 seconds (29.73 tokens/s, 512 tokens, context 59, seed 1130289185)

The chat text is still being output.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants