The format of Multiwoz dataset #2

yanzhangnlp · 2021-04-07T12:06:14Z

Hi Giovanni,

Nice work and thanks for the sharing. I am reproducing the results of the DST task. However, I found the processed data format of multiwoz 2.1 dataset using the script from https://github.com/jasonwu0731/trade-dst does not match your code. May I ask if you do additional preprocessing procedure? If so, would you mind sharing the script?

Sincerely,
Yan

iambabao · 2021-06-26T15:13:06Z

I have the same problem on ACE 05 NER dataset.

I download the ACE 05 NER dataset from the link provided in datasets.py and renamed it to {split}.ner.json, but it does not work :(

Magolor · 2021-07-05T08:11:11Z

@iambabao

I have the same problem on ACE 05 NER dataset.

I download the ACE 05 NER dataset from the link provided in datasets.py and renamed it to {split}.ner.json, but it does not work :(

Yes, but I believe modifying it by simply adding:

if 'label' not in x:
                    x['label'] = {
                        x['entity_label']:x['span_position'],
                    }

could work.

However, @giove91 , please add more links to all the datasets used in tanl if available. Most of the datasets reported in paper and defined in dataset.py are currently not provided with acquisition method, preprocessing scripts, or instructions. I would really appreciate it if you could complete the datasets.

giove91 · 2021-07-14T22:28:03Z

Hi, thanks for your interest in this project!

@yanzhangnlp We added the instructions to process the Multiwoz dataset (thanks @jasonkrone). Hope this helps!

@iambabao Apparently the version I downloaded from that link is not available anymore (it is different from the version that can be currently downloaded). Thanks @Magolor for providing a possible fix. I'll check and update the instructions.

MerrickWang1 · 2021-08-14T05:17:05Z

Hi,

The data files provided for the ACE2005 dataset are of .test, .train, and .dev file types. @iambabao how did you obtain .json files?

Here is where I am attempting to obtain the ACE2005 data:
https://github.com/ShannonAI/mrc-for-flat-nested-ner/blob/master/ner2mrc/download.md
https://drive.google.com/file/d/1iodaJ92dTAjUWnkMyYm8aLEi5hj3cseY/view

Thanks,

iambabao · 2021-08-15T07:02:13Z

Hi,

The data files provided for the ACE2005 dataset are of .test, .train, and .dev file types. @iambabao how did you obtain .json files?

Here is where I am attempting to obtain the ACE2005 data:
https://github.com/ShannonAI/mrc-for-flat-nested-ner/blob/master/ner2mrc/download.md
https://drive.google.com/file/d/1iodaJ92dTAjUWnkMyYm8aLEi5hj3cseY/view

Thanks,

The files are in JSON format, you can directly rename them.

David-Lee-1990 · 2022-06-03T12:37:36Z

@iambabao

I have the same problem on ACE 05 NER dataset.
I download the ACE 05 NER dataset from the link provided in datasets.py and renamed it to {split}.ner.json, but it does not work :(

Yes, but I believe modifying it by simply adding:
if 'label' not in x:
                    x['label'] = {
                        x['entity_label']:x['span_position'],
                    }
could work.

However, @giove91 , please add more links to all the datasets used in tanl if available. Most of the datasets reported in paper and defined in dataset.py are currently not provided with acquisition method, preprocessing scripts, or instructions. I would really appreciate it if you could complete the datasets.

hey guys, after preprocess ace2005 ner dataset following guidence here, and run tanl , i get F1 = 88.3 (tanl paper is 84.9). Is there a bug or else?

giove91 · 2022-06-08T16:22:43Z

Interesting! Are the splits correct and have you used the same hyperparameters as in the paper? (50 epochs, initial learning rate 0.0005, ...)

jasonkrone mentioned this issue Jul 14, 2021

add multi-woz 2.1 preprocessing scripts #7

Merged

yanzhangnlp closed this as completed Mar 27, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

The format of Multiwoz dataset #2

The format of Multiwoz dataset #2

yanzhangnlp commented Apr 7, 2021

iambabao commented Jun 26, 2021

Magolor commented Jul 5, 2021

giove91 commented Jul 14, 2021

MerrickWang1 commented Aug 14, 2021

iambabao commented Aug 15, 2021

David-Lee-1990 commented Jun 3, 2022

giove91 commented Jun 8, 2022

The format of Multiwoz dataset #2

The format of Multiwoz dataset #2

Comments

yanzhangnlp commented Apr 7, 2021

iambabao commented Jun 26, 2021

Magolor commented Jul 5, 2021

giove91 commented Jul 14, 2021

MerrickWang1 commented Aug 14, 2021

iambabao commented Aug 15, 2021

David-Lee-1990 commented Jun 3, 2022

giove91 commented Jun 8, 2022