-
Notifications
You must be signed in to change notification settings - Fork 76
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Bot kg annotator #586
Bot kg annotator #586
Conversation
This let us avoid assertion error about lists lengths in deeppavlov-kg
This fixes the problem of storing the same entity many times
This prevents creating the same entity many times in KG
* component card changed in pipeline_conf * sentence-ranker url removed from .env
* changes in bot-km server file * custom-el formatter changed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
all weights/models files should be downloaded from files.deeppavlov NOT stored in github.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
removed, prob it was merged from old branches
@@ -154,6 +154,12 @@ def generate_triplets(uttr_batch, relations_pred_batch): | |||
return triplets_corr_batch | |||
|
|||
|
|||
def is_question(uttr: str): | |||
uttr = uttr.lower() | |||
is_q = any([uttr.startswith(q_word) for q_word in ["what ", "who ", "when ", "where "]]) or "?" in uttr |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we have the same function to use in common (like is_any_question - i do not remember the title clearly but try to find it and use it)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
there is only is_questions that checks for question marks or another for factoid-only questions in common.utils.py
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is_any_question from common.universal_templates.py
does not fit in my case. I am checking each sentence and filtering out questions from it, bu tthe function checks the whole uttr from annotations
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not repeating templates you still can do the following:
is_yes({"text": sentence, "annotations": {}})
@@ -219,6 +219,7 @@ services: | |||
command: flask run -h 0.0.0.0 -p 8020 | |||
environment: | |||
- FLASK_APP=server | |||
- CUDA_VISIBLE_DEVICES=0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do you have this info in your component and containers cards?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
reverted
@@ -238,9 +239,9 @@ services: | |||
deploy: | |||
resources: | |||
limits: | |||
memory: 128M | |||
memory: 512M |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
did you increase the value in component/container cards?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, this should stay unchanged.
Reverted
@@ -37,4 +37,7 @@ ENV ENVVARS_TO_SEND ${ENVVARS_TO_SEND} | |||
ARG USE_KG_DATA | |||
ENV USE_KG_DATA=$USE_KG_DATA | |||
|
|||
ARG USE_BOT_KG_DATA |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ARG USE_BOT_KG_DATA | |
ARG USE_BOT_KG_DATA=0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
otherwise in the server.py getting these vars from env will get you None values (not 0 as you pointed out in deault)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for suggestion. Added default value to Dockerfile.
@@ -37,4 +37,7 @@ ENV ENVVARS_TO_SEND ${ENVVARS_TO_SEND} | |||
ARG USE_KG_DATA |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ARG USE_KG_DATA | |
ARG USE_KG_DATA=0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added
state_formatters/dp_formatters.py
Outdated
dialog = utils.get_last_n_turns(dialog, bot_last_turns=1) | ||
dialog = utils.replace_with_annotated_utterances(dialog, mode="punct_sent") | ||
if len(dialog["bot_utterances"]): | ||
context = [dialog["bot_utterances"][-1]["text"]] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
herr [str]
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed
state_formatters/dp_formatters.py
Outdated
dialog = utils.get_last_n_turns(dialog, bot_last_turns=1) | ||
dialog = utils.replace_with_annotated_utterances(dialog, mode="punct_sent") | ||
if len(dialog["utterances"]) >= 2: | ||
dialog_history = [dialog["utterances"][-2]["text"]] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should it be last humn uttr? I do not get what utterance you are trying to get to history
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should be last bot utterance, fixed it
* USE_KG_DATA and USE_BOT_KG_DATA
@@ -154,6 +154,12 @@ def generate_triplets(uttr_batch, relations_pred_batch): | |||
return triplets_corr_batch | |||
|
|||
|
|||
def is_question(uttr: str): | |||
uttr = uttr.lower() | |||
is_q = any([uttr.startswith(q_word) for q_word in ["what ", "who ", "when ", "where "]]) or "?" in uttr |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not repeating templates you still can do the following:
is_yes({"text": sentence, "annotations": {}})
uttrs.append(sentrewrite(utt_prev_l, utt_cur_l)) | ||
if not init_uttrs[0][0]: | ||
triplets_batch = [[""]] | ||
uttrs = [""] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
that's a strange check for me. You jsut check the first element and assign uttrs to list of empty string. But you are not sure that there is only one element. And why do you do this here?
Probably it;'s better to iterate over the batch of input and check every sample and append to uttr empty string if the input init_uttrs[i] is empty (or whatever check is)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this was all used to manage empty or very long bot utterances. Now, I removed the check for empty and moved it for later when iterating the utterances. The is_question
function is also removed and I reverted the question-check to the way it was before.
state_formatters/dp_formatters.py
Outdated
entity_substr_list, entity_tags_list, context = prepare_el_input_last_bot(dialog) | ||
else: | ||
entity_substr_list, entity_tags_list, context = [""], [""], [""] | ||
return [{"entity_substr": [entity_substr_list], "entity_tags": [entity_tags_list], "context": [context]}] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
in another PR I have already asked for that, so I have to ask you too))
As these are batches, please name them in multiple way, so contexts
not `context1 (and in your formatters bellow)
* changes reflected in services server-files * style format
* returns the output in the usual form and prevents skill from error
* property-extraction * entity-detection * entity-linking * custom-entity-linking * ner * response_selector * bot-knowledge-memorizer * dff-bot-knowledge-prompted-skill
* changes reflected in service_configs for dream_kg_prompted and dream_bot_kg_prompted
@@ -0,0 +1,26 @@ | |||
name: response_selector | |||
display_name: Ranking-based Response Selector |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you do not need a new card for this response selector because it is already existing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
YJzc7NwGrLmKp6gfZJh7X1.yml
@@ -0,0 +1,26 @@ | |||
name: ner | |||
display_name: NER |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this card already exists iBC0L15gOFWymHhZEAybUQ.yml
* bot-knowledge-prompted-skill * response_selector
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Bot-knowledge-memorizer to store and retrieve data into/from bot's knowledge graph
dream_bot_kg_prompted distribution with dff-bot-knowledge-prompted-skill that demonstrates bot-km in action
IMPORTANT: not to delete the branch after merge