Questions about the evaluation rules for the Alexa Simbot Challenge #11

594zyc · 2022-02-09T21:43:34Z

I have three questions regarding the evaluation rules for the Alexa Simbot Challenge:

Can we use "dialog_history_cleaned" rather than "dialog_history" in the edh instance?
Using only the action history and dialogue history from the driver results in missing a piece of key information - the time of each user utterance made during the interaction. We argue that such causal information should be allowed to use.
Could you elaborate on the "should not use task definitions" rule? For example, are we allowed to integrate the task structures provided in the task definitions into our model, while do not rely on any ground truth task information during inference as the model has to figure out the task and arguments by itself from the dialog input?

Thanks!

aishwaryap · 2022-02-11T23:10:21Z

Hi @594zyc

You can use "dialog_history_cleaned"
Using timestamps sound really interesting, and I would recommend trying out the approach and trying to publish it using the validation set for now, but we do not have the bandwidth to enable this for the Simbot challenge. Currently at test time, you will be blocked from accessing any extra information from the games/ EDH instances. We will remove this restriction after the challenge is over.
While we would accept using a model to predict the task and parameters for the offline phase, the set of tasks and task definition format may change for the online phase, and may require you to handle unseen tasks at inference time. As a result, if you are a challenge participant, I recommend not relying too much on the current task definitions and instead focus on trying to understand the language directly as that is more likely to generalize to the online phase of the challenge.

Best,
Aishwarya

594zyc · 2022-02-12T01:39:35Z

Thanks for your quick response! It is clear to me for Q1 and Q3.

While for Q2, since causality is one of the key components in our approach, we still want to see whether there is a chance we can use it (otherwise we have to start over most of our design). Actually expanding the dialog history records in edh instances from [role, utterance] to [role, utterance, time] can be easily achieved using the code below.

    dialog_history = edh_instance['dialog_history_cleaned']

    # we use interactions only to get the time of the dialog actions
    turn_idx = 0
    for action in edh_instance['interactions']:
        if 'utterance' in action:
            dialog_history[turn_idx].append(action['time_start'])
            turn_idx += 1
        if turn_idx == len(dialog_history):
            break

If the "interactions" field is still there in edh instances during testing, could we use it like this? Obviously, no extra information is stored or used besides the time of each dialog utterance.

If you remove "interactions" to avoid potential abuse of it during testing, could you just make a hotfix in the edh instance generation process with this time-expanded dialog history recorded into a new field (e.g. "dialog_history_cleaned_time_added")? I believe this will lead to minimal influence on others and the overall evaluation process.

Let me know what you think about it. Thanks!

aishwaryap closed this as completed Feb 11, 2022

hangjieshi reopened this Apr 19, 2022

594zyc closed this as completed Jun 14, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Questions about the evaluation rules for the Alexa Simbot Challenge #11

Questions about the evaluation rules for the Alexa Simbot Challenge #11

594zyc commented Feb 9, 2022

aishwaryap commented Feb 11, 2022

594zyc commented Feb 12, 2022

Questions about the evaluation rules for the Alexa Simbot Challenge #11

Questions about the evaluation rules for the Alexa Simbot Challenge #11

Comments

594zyc commented Feb 9, 2022

aishwaryap commented Feb 11, 2022

594zyc commented Feb 12, 2022