Feat/dff/voice skill #362

moon-strider · 2023-03-21T14:08:41Z

Первая рабочая версия войс скилла.
Что в данный момент работает:

кепшенинг голосовых сообщений в телеграме

Что потенциально работает, но не в первом коммите:

кепшенинг аудио (хотя для этого есть все пререквизиты типа обработки mp3 -> wav)
обработка кружочков в телеграме, а именно их звукового содержания (аналогично предыдущему пункту)

Что может работать не так:

так как поддержка кепшенинга именно аудиофайлов, а не голосовых, ещё не готова, то отправка множества голосовых в одном сообщении либо всё положит, либо вернёт только один кепшен (или вообще не выйдет загрузить больше 1 голосового в сообщение по механикам телеграма)
иногда (особенно, когда сообщения очень короткое — 1с, например) падает ffpmeg, с этим ещё предстоит разобраться.

Как запускал и что писал:
0. вставить свой токен бота в телеграме или использовать мой, тогда бот находится по адресу @dprabota_bot

поднять ВСЁ командой docker-compose -f docker-compose.yml -f assistant_dists/dream_multimodal/docker-compose.override.yml -f assistant_dists/dream_multimodal/dev.yml -f assistant_dists/dream_multimodal/proxy.yml build --no-cache && docker-compose -f docker-compose.yml -f assistant_dists/dream_multimodal/docker-compose.override.yml -f assistant_dists/dream_multimodal/dev.yml -f assistant_dists/dream_multimodal/proxy.yml up
написать боту /begin
записать боту голосовое
получить ответ вида Is there a [caption] in this audio?

dilyararimovna · 2023-11-21T11:10:01Z

assistant_dists/dream_multimodal/docker-compose.override.yml

@@ -1,6 +1,6 @@
 services:
  agent:
-    command: sh -c 'bin/wait && python -m deeppavlov_agent.run agent.pipeline_config=assistant_dists/dream_multimodal/pipeline_conf.json'
+    command: sh -c 'bin/wait && python -m deeppavlov_agent.run agent.channel=telegram agent.telegram_token=$TG_TOKEN agent.pipeline_config=assistant_dists/dream_multimodal/pipeline_conf.json'


no, we have a separate file for command for telegram actually:
telegram.yml

in every dist

revert the whole file -- this is another dist

assistant_dists/dream_voice/dev.yml

dilyararimovna · 2023-11-21T11:32:21Z

assistant_dists/dream_voice/docker-compose.override.yml

+    command: sh -c 'bin/wait && python -m deeppavlov_agent.run agent.channel=telegram agent.telegram_token=$TG_TOKEN agent.pipeline_config=assistant_dists/dream_voice/pipeline_conf.json'
+    environment:
+      WAIT_HOSTS: "dff-program-y-skill:8008, sentseg:8011, convers-evaluation-selector:8009, 
+          dff-intent-responder-skill:8012, intent-catcher:8014, badlisted-words:8018,


badlisted -- you wanted to remove it

maybe forgot to git add, check again please

dilyararimovna · 2023-11-21T11:32:40Z

assistant_dists/dream_voice/docker-compose.override.yml

+
+  voice-service:
+    ports:
+        - "8333:8333"


no ports mapping here

dilyararimovna · 2023-11-21T11:48:18Z

skills/dff_voice_skill/server.py

+
+
+try:
+    # test_server.run_test(handler)


turn on tests

dilyararimovna · 2023-11-21T11:48:52Z

skills/dff_voice_skill/server.py

+def respond():
+    import common.test_utils as t_utils
+
+    t_utils.save_to_test(request.json, "tests/lets_talk_in.json", indent=4)  # TEST


this line and the next with saving ofd the tests, should be commented. they are used only to create test files.

dilyararimovna · 2023-11-21T11:49:42Z

state_formatters/dp_formatters.py

+   return [{"sound_path": [dialog["human_utterances"][-1]["attributes"].get("sound_path")],
+            "sound_duration": [dialog["human_utterances"][-1]["attributes"].get("sound_duration")],
+            "sound_type": [dialog["human_utterances"][-1]["attributes"].get("sound_type")],
+            "captions": [dialog["human_utterances"][-1]["attributes"].get("captions")]}]


what captions do you mean here?

the audiocaptions that the voice service returns: the captions like "wind blowing with the sirens in the background"

voice_formatter_service is a input formatter, so why do you return something that is not yet in dialog state? (as you said, voice_service returns these captions)

🤯 removed

state_formatters/dp_formatters.py

dilyararimovna · 2023-11-21T11:51:24Z

services/voice_service/server.py

+
+    path = request.json.get("sound_path")
+    duration = request.json.get("sound_duration")
+    type = request.json.get("sound_type")


again, THESE ARE BATCHES!
DO NOT CONSIDER IT AS A LIST OF ONE ELEMENT.

It was the first ever service I worked on, I didn't know better. Fixed now

assistant_dists/dream_voice/dev.yml

dilyararimovna · 2023-11-23T14:42:10Z

services/voice_service/requirements.txt

+fire>=0.5.0
+kaldiio>=2.17.2
+matplotlib>=3.5.3
+PyYAML>=6.0


no, I meant exactly opposite.
Better NOT to use >=

dilyararimovna · 2023-11-23T14:43:08Z

services/voice_service/server.py

+
+    paths = request.json.get("sound_path")
+    durations = request.json.get("sound_duration")
+    types = request.json.get("sound_type")


я, конечно, докапываюсь, но раз уж это батчи, это должно быть во множественном числе (и в форматтерах не забыдь поправить)

исправил

dilyararimovna · 2023-11-23T14:44:36Z

services/voice_service/server.py

+            logger.info("Scanning finished successfully, files found, starting inference...")
+            captions = infer(AUDIO_DIR, MODEL_PATH)
+            logger.info("Inference finished successfully")
+            responses = [{"sound_type": atype, "sound_duration": duration, "sound_path": path, "captions": captions}]


not = but += -- this is a step in cycle

dilyararimovna · 2023-11-23T14:45:45Z

skill_selectors/rule_based_selector/connector.py

+                logger.info(f'VOICE NOT YET DETECTED: {user_uttr["attributes"].get("sound_path")}')
+                if user_uttr["attributes"].get("sound_path") is not None:
+                    logger.info(f'VOICE DETECTED: {user_uttr["attributes"].get("sound_path")}')
+                    if "dff_voice_skill" not in skills_for_uttr:


можно не проверять, а просто добавить. Там в конце дублирование будет убрано (list(set(

skills/dff_voice_skill/scenario/condition.py

dilyararimovna · 2023-11-23T14:48:16Z

skills/dff_voice_skill/scenario/response.py

+
+def caption(ctx: Context, actor: Actor, excluded_skills=None, *args, **kwargs) -> str:
+    cap = "ERROR"
+    if not ctx.validation:


when you use int_ctx.get_last_human_utterance(ctx, actor) and methods get (as below) you will not face validation problems

dilyararimovna · 2023-11-23T14:49:23Z

skills/dff_voice_skill/scenario/response.py

+            int_ctx.get_last_human_utterance(ctx, actor)
+            .get("annotations", {})
+            .get("voice_service", {})
+            .get("captions", "No cap")


you may return f"Is there No cap in that audio?" - but this is strange. Make some default response, if you have an audio attached but could not caption it -> some question like "I could not read your audio, attach another one"

dilyararimovna · 2023-11-23T14:50:42Z

state_formatters/dp_formatters.py

+   return [{"sound_path": [dialog["human_utterances"][-1]["attributes"].get("sound_path")],
+            "sound_duration": [dialog["human_utterances"][-1]["attributes"].get("sound_duration")],
+            "sound_type": [dialog["human_utterances"][-1]["attributes"].get("sound_type")],
+            "captions": [dialog["human_utterances"][-1]["attributes"].get("captions")]}]


voice_formatter_service is a input formatter, so why do you return something that is not yet in dialog state? (as you said, voice_service returns these captions)

dilyararimovna · 2023-11-24T07:15:50Z

btw do not forget about codestyle at the end

dilyararimovna · 2023-11-30T15:31:14Z

skills/dff_voice_skill/scenario/condition.py

+    voice = int_ctx.get_last_human_utterance(ctx, actor).get("annotations", {}).get("voice_service", {})
+    logger.debug(f"CONDITION.PY VOICE: {voice}")
+    not_default = voice.get("captions", "Error") != "Error"
+    if voice is not {} and not_default:


не надо так проверять, что это не пустой дикт) is not {} - такое себе. здесь достаточно првоерки, что voice.get("captions", "Error") != "Error" без введения доп переменных

и вообще почему captionS? елси там точно 1 строка, а не лист

исправлено

dilyararimovna · 2023-11-30T15:31:53Z

skills/dff_voice_skill/scenario/processing.py

+
+
+logger = logging.getLogger(__name__)
+# ....


можно удалить файл

удалено

dilyararimovna · 2023-11-30T15:32:33Z

skills/dff_voice_skill/scenario/response.py

+    )
+
+    rsp = "I couldn't caption the audio in your message, please try again with another file" \
+        if cap == "Error" else f"Is there {cap} in that audio?"


все понимаю, но можно не в одну строку, а просто более читаемо норм проверку сделать

исправлено

dilyararimovna · 2023-11-30T15:34:32Z

ну и кодстайл, конечно же. ИНструкция, как править -- в доке

dilyararimovna · 2023-12-05T11:40:38Z

services/voice_service/Dockerfile

+ENV SERVICE_NAME ${SERVICE_NAME}
+
+ARG FLASK_APP
+ENV FLASK_APP ${FLASK_APP}


remove this two lines. You provide FLASK_APP as environment already in docker-compose. So, here you actually can overwrite this value

dilyararimovna · 2023-12-05T11:44:23Z

services/voice_service/README.md

@@ -0,0 +1,3 @@
+GPU RAM = 1Gb
+cpu time = 0.15 sec 
+gpu time = 0.05 sec 


is this a relevant info or copied?
Could you please write here a description of th eservice, how it works, what input and output
We now work on readmes, so it would be neccessary anyway

dilyararimovna · 2023-12-06T10:19:36Z

services/voice_service/server.py

+    st_time = time.time()
+
+    paths = request.json.get("sound_paths")
+    paths = request.json.get("video_paths") if paths == [None] else paths


paths - это батч, он не обязательно будет длины 1. Агент может при большой нагрузке сделать батч из нескольких элоементов.
paths = request.json.get("video_paths") if all([el is None for el in paths]) else paths

Сделаю так, хорошо, но я делал смотри почему длины один. В агенте логика следующая — либо есть звук, либо видео. Не может быть такого, что есть и видео, и голос в одном сообщении (по крайней мере такое поведение не предусмотрено), поэтому если мы загрузили только видео, звук гарантировано будет [None], и аналогично наоборот

dilyararimovna · 2023-12-06T10:20:31Z

services/voice_service/Dockerfile

+RUN python -m pip install -U pip
+RUN pip install gdown
+
+RUN git clone https://github.com/moon-strider/audio-captioning-dcase /src/aux_files


так будет тоже самое, что и проблема с установкой чего-то из папки кого-то (где image-captioning)
ну и плюсом даже версия/комит не фиксированы

Обновил ссылку и зафиксировал коммит

moon-strider added 30 commits March 6, 2023 16:18

moving files from a broken fork

b2165ae

adding tg token

c1c4475

changing the name of an attribute to be in line with dp-agent

cee1799

an attempt to fix []'s and a fix in rule based skill selector

c38b48f

fixing attributes

fcc96b5

fixing validation

07ee140

small actor.py fix

4da797f

testing

9e3ef20

removing redundant string operations

267960a

a minor fix that should enable .oga -> .wav

dd4b400

using subprocess ffmpeg to convert audio

96206b4

adding missing files

fcd21cb

a facepalm fix

b4dc99f

another facepalm fix

10994b0

keyerror fix

97cc9a2

model path fix

2121a6d

minor refactor

4a28d84

adding support for mp3 and MP3

6916619

dcase-2020 -> dcase-2022

661b8ba

adding experiments.zip to project

5336db7

apt install gdown to download model files

c40fc6a

pip install gdown to download model files

98db32e

correct download links from gdrive

bd527b8

gdown -> wget

17e74e5

gdown -> wget 2

3415087

fixing dockerfile

069ecac

another dockerfile fix

d8fa780

adding fire to reqs

ccd9540

fixing new audio-captioning

183798c

mkdir -> makedirs

c77db08

dilyararimovna requested changes Nov 21, 2023

View reviewed changes

moon-strider self-assigned this Nov 21, 2023

moon-strider added 4 commits November 21, 2023 16:16

pr fix

471b14c

added tests to voice skill

3973c8d

voice_service fixed, tests added

14d500b

another pr fix

504b6f5

dilyararimovna requested changes Nov 23, 2023

View reviewed changes

moon-strider added 3 commits November 24, 2023 13:39

pr fix

6dfd095

voice formatter fixed

da99b67

another pr fix

782552c

dilyararimovna requested changes Nov 30, 2023

View reviewed changes

moon-strider added 8 commits December 1, 2023 10:20

captions -> caption, small pr fix

5162457

codestyle fix

c846c93

flake8 fix

4820e98

typo fixed

1949265

codestyle fix

b5e51f7

tests fixed for voice skill

04507ec

another typo fixed

3accef7

reverted voice_service dockerfile

cdd2be5

dilyararimovna requested changes Dec 5, 2023

View reviewed changes

moon-strider added 3 commits December 5, 2023 15:47

minor pr fix

8418981

voice service video fix

7577a55

voice service video fix 2

efc670f

dilyararimovna requested changes Dec 6, 2023

View reviewed changes

git repo commit hardcoded

ab505df

dilyararimovna approved these changes Dec 6, 2023

View reviewed changes

dilyararimovna merged commit d75a628 into dev Dec 6, 2023
32 checks passed

dilyararimovna mentioned this pull request Dec 19, 2023

[WIP] Feat/dff/voice skill Oleg #565

Closed



		try:
		# test_server.run_test(handler)

Feat/dff/voice skill #362

Feat/dff/voice skill #362

Conversation

moon-strider commented Mar 21, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

dilyararimovna commented Nov 24, 2023

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

dilyararimovna commented Nov 30, 2023

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

moon-strider commented Mar 21, 2023 •

edited

Loading