Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Questions about the evaluation rules for the Alexa Simbot Challenge #11

Closed
594zyc opened this issue Feb 9, 2022 · 2 comments
Closed

Comments

@594zyc
Copy link

594zyc commented Feb 9, 2022

I have three questions regarding the evaluation rules for the Alexa Simbot Challenge:

  1. Can we use "dialog_history_cleaned" rather than "dialog_history" in the edh instance?
  2. Using only the action history and dialogue history from the driver results in missing a piece of key information - the time of each user utterance made during the interaction. We argue that such causal information should be allowed to use.
  3. Could you elaborate on the "should not use task definitions" rule? For example, are we allowed to integrate the task structures provided in the task definitions into our model, while do not rely on any ground truth task information during inference as the model has to figure out the task and arguments by itself from the dialog input?

Thanks!

@aishwaryap
Copy link
Contributor

Hi @594zyc

  1. You can use "dialog_history_cleaned"
  2. Using timestamps sound really interesting, and I would recommend trying out the approach and trying to publish it using the validation set for now, but we do not have the bandwidth to enable this for the Simbot challenge. Currently at test time, you will be blocked from accessing any extra information from the games/ EDH instances. We will remove this restriction after the challenge is over.
  3. While we would accept using a model to predict the task and parameters for the offline phase, the set of tasks and task definition format may change for the online phase, and may require you to handle unseen tasks at inference time. As a result, if you are a challenge participant, I recommend not relying too much on the current task definitions and instead focus on trying to understand the language directly as that is more likely to generalize to the online phase of the challenge.

Best,
Aishwarya

@594zyc
Copy link
Author

594zyc commented Feb 12, 2022

Thanks for your quick response! It is clear to me for Q1 and Q3.

While for Q2, since causality is one of the key components in our approach, we still want to see whether there is a chance we can use it (otherwise we have to start over most of our design). Actually expanding the dialog history records in edh instances from [role, utterance] to [role, utterance, time] can be easily achieved using the code below.

    dialog_history = edh_instance['dialog_history_cleaned']

    # we use interactions only to get the time of the dialog actions
    turn_idx = 0
    for action in edh_instance['interactions']:
        if 'utterance' in action:
            dialog_history[turn_idx].append(action['time_start'])
            turn_idx += 1
        if turn_idx == len(dialog_history):
            break

If the "interactions" field is still there in edh instances during testing, could we use it like this? Obviously, no extra information is stored or used besides the time of each dialog utterance.

If you remove "interactions" to avoid potential abuse of it during testing, could you just make a hotfix in the edh instance generation process with this time-expanded dialog history recorded into a new field (e.g. "dialog_history_cleaned_time_added")? I believe this will lead to minimal influence on others and the overall evaluation process.

Let me know what you think about it. Thanks!

@hangjieshi hangjieshi reopened this Apr 19, 2022
@594zyc 594zyc closed this as completed Jun 14, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants