-
Notifications
You must be signed in to change notification settings - Fork 75
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Feat/dff/voice skill #362
Feat/dff/voice skill #362
Conversation
@@ -1,6 +1,6 @@ | |||
services: | |||
agent: | |||
command: sh -c 'bin/wait && python -m deeppavlov_agent.run agent.pipeline_config=assistant_dists/dream_multimodal/pipeline_conf.json' | |||
command: sh -c 'bin/wait && python -m deeppavlov_agent.run agent.channel=telegram agent.telegram_token=$TG_TOKEN agent.pipeline_config=assistant_dists/dream_multimodal/pipeline_conf.json' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
no, we have a separate file for command for telegram actually:
telegram.yml
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
in every dist
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
revert the whole file -- this is another dist
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
command: sh -c 'bin/wait && python -m deeppavlov_agent.run agent.channel=telegram agent.telegram_token=$TG_TOKEN agent.pipeline_config=assistant_dists/dream_voice/pipeline_conf.json' | ||
environment: | ||
WAIT_HOSTS: "dff-program-y-skill:8008, sentseg:8011, convers-evaluation-selector:8009, | ||
dff-intent-responder-skill:8012, intent-catcher:8014, badlisted-words:8018, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
badlisted -- you wanted to remove it
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
|
||
voice-service: | ||
ports: | ||
- "8333:8333" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
no ports mapping here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
removed
skills/dff_voice_skill/server.py
Outdated
|
||
|
||
try: | ||
# test_server.run_test(handler) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
turn on tests
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
skills/dff_voice_skill/server.py
Outdated
def respond(): | ||
import common.test_utils as t_utils | ||
|
||
t_utils.save_to_test(request.json, "tests/lets_talk_in.json", indent=4) # TEST |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this line and the next with saving ofd the tests, should be commented. they are used only to create test files.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
commented
state_formatters/dp_formatters.py
Outdated
return [{"sound_path": [dialog["human_utterances"][-1]["attributes"].get("sound_path")], | ||
"sound_duration": [dialog["human_utterances"][-1]["attributes"].get("sound_duration")], | ||
"sound_type": [dialog["human_utterances"][-1]["attributes"].get("sound_type")], | ||
"captions": [dialog["human_utterances"][-1]["attributes"].get("captions")]}] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what captions do you mean here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the audiocaptions that the voice service returns: the captions like "wind blowing with the sirens in the background"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
voice_formatter_service is a input formatter, so why do you return something that is not yet in dialog state? (as you said, voice_service returns these captions)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🤯 removed
services/voice_service/server.py
Outdated
|
||
path = request.json.get("sound_path") | ||
duration = request.json.get("sound_duration") | ||
type = request.json.get("sound_type") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
again, THESE ARE BATCHES!
DO NOT CONSIDER IT AS A LIST OF ONE ELEMENT.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It was the first ever service I worked on, I didn't know better. Fixed now
fire>=0.5.0 | ||
kaldiio>=2.17.2 | ||
matplotlib>=3.5.3 | ||
PyYAML>=6.0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
no, I meant exactly opposite.
Better NOT to use >=
services/voice_service/server.py
Outdated
|
||
paths = request.json.get("sound_path") | ||
durations = request.json.get("sound_duration") | ||
types = request.json.get("sound_type") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
я, конечно, докапываюсь, но раз уж это батчи, это должно быть во множественном числе (и в форматтерах не забыдь поправить)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
исправил
services/voice_service/server.py
Outdated
logger.info("Scanning finished successfully, files found, starting inference...") | ||
captions = infer(AUDIO_DIR, MODEL_PATH) | ||
logger.info("Inference finished successfully") | ||
responses = [{"sound_type": atype, "sound_duration": duration, "sound_path": path, "captions": captions}] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not =
but +=
-- this is a step in cycle
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed
logger.info(f'VOICE NOT YET DETECTED: {user_uttr["attributes"].get("sound_path")}') | ||
if user_uttr["attributes"].get("sound_path") is not None: | ||
logger.info(f'VOICE DETECTED: {user_uttr["attributes"].get("sound_path")}') | ||
if "dff_voice_skill" not in skills_for_uttr: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
можно не проверять, а просто добавить. Там в конце дублирование будет убрано (list(set(
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Убрал
|
||
def caption(ctx: Context, actor: Actor, excluded_skills=None, *args, **kwargs) -> str: | ||
cap = "ERROR" | ||
if not ctx.validation: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
when you use int_ctx.get_last_human_utterance(ctx, actor) and methods get (as below) you will not face validation problems
int_ctx.get_last_human_utterance(ctx, actor) | ||
.get("annotations", {}) | ||
.get("voice_service", {}) | ||
.get("captions", "No cap") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you may return f"Is there No cap in that audio?"
- but this is strange. Make some default response, if you have an audio attached but could not caption it -> some question like "I could not read your audio, attach another one"
state_formatters/dp_formatters.py
Outdated
return [{"sound_path": [dialog["human_utterances"][-1]["attributes"].get("sound_path")], | ||
"sound_duration": [dialog["human_utterances"][-1]["attributes"].get("sound_duration")], | ||
"sound_type": [dialog["human_utterances"][-1]["attributes"].get("sound_type")], | ||
"captions": [dialog["human_utterances"][-1]["attributes"].get("captions")]}] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
voice_formatter_service is a input formatter, so why do you return something that is not yet in dialog state? (as you said, voice_service returns these captions)
btw do not forget about codestyle at the end |
voice = int_ctx.get_last_human_utterance(ctx, actor).get("annotations", {}).get("voice_service", {}) | ||
logger.debug(f"CONDITION.PY VOICE: {voice}") | ||
not_default = voice.get("captions", "Error") != "Error" | ||
if voice is not {} and not_default: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
не надо так проверять, что это не пустой дикт) is not {}
- такое себе. здесь достаточно првоерки, что voice.get("captions", "Error") != "Error"
без введения доп переменных
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
и вообще почему captionS? елси там точно 1 строка, а не лист
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
исправлено
|
||
|
||
logger = logging.getLogger(__name__) | ||
# .... |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
можно удалить файл
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
удалено
) | ||
|
||
rsp = "I couldn't caption the audio in your message, please try again with another file" \ | ||
if cap == "Error" else f"Is there {cap} in that audio?" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
все понимаю, но можно не в одну строку, а просто более читаемо норм проверку сделать
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
исправлено
ну и кодстайл, конечно же. ИНструкция, как править -- в доке |
services/voice_service/Dockerfile
Outdated
ENV SERVICE_NAME ${SERVICE_NAME} | ||
|
||
ARG FLASK_APP | ||
ENV FLASK_APP ${FLASK_APP} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
remove this two lines. You provide FLASK_APP as environment already in docker-compose. So, here you actually can overwrite this value
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
removed
services/voice_service/README.md
Outdated
@@ -0,0 +1,3 @@ | |||
GPU RAM = 1Gb | |||
cpu time = 0.15 sec | |||
gpu time = 0.05 sec |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is this a relevant info or copied?
Could you please write here a description of th eservice, how it works, what input and output
We now work on readmes, so it would be neccessary anyway
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
services/voice_service/server.py
Outdated
st_time = time.time() | ||
|
||
paths = request.json.get("sound_paths") | ||
paths = request.json.get("video_paths") if paths == [None] else paths |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
paths - это батч, он не обязательно будет длины 1. Агент может при большой нагрузке сделать батч из нескольких элоементов.
paths = request.json.get("video_paths") if all([el is None for el in paths]) else paths
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Сделаю так, хорошо, но я делал смотри почему длины один. В агенте логика следующая — либо есть звук, либо видео. Не может быть такого, что есть и видео, и голос в одном сообщении (по крайней мере такое поведение не предусмотрено), поэтому если мы загрузили только видео, звук гарантировано будет [None]
, и аналогично наоборот
services/voice_service/Dockerfile
Outdated
RUN python -m pip install -U pip | ||
RUN pip install gdown | ||
|
||
RUN git clone https://github.com/moon-strider/audio-captioning-dcase /src/aux_files |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
так будет тоже самое, что и проблема с установкой чего-то из папки кого-то (где image-captioning)
ну и плюсом даже версия/комит не фиксированы
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Обновил ссылку и зафиксировал коммит
Первая рабочая версия войс скилла.
Что в данный момент работает:
Что потенциально работает, но не в первом коммите:
Что может работать не так:
Как запускал и что писал:
0. вставить свой токен бота в телеграме или использовать мой, тогда бот находится по адресу @dprabota_bot
docker-compose -f docker-compose.yml -f assistant_dists/dream_multimodal/docker-compose.override.yml -f assistant_dists/dream_multimodal/dev.yml -f assistant_dists/dream_multimodal/proxy.yml build --no-cache && docker-compose -f docker-compose.yml -f assistant_dists/dream_multimodal/docker-compose.override.yml -f assistant_dists/dream_multimodal/dev.yml -f assistant_dists/dream_multimodal/proxy.yml up
/begin
Is there a [caption] in this audio?