How to get .source and .target file at comet_atomic2020_bart #2

yongho94 · 2021-02-20T04:45:27Z

Hello sir.

I tried to run your codes that use BART model to generate knowledge triples.

In your codes, "models/comet_atomic2020_bart/finetune.py" requires "train.source" file and "train.target" file...

However, I couldn't figure out how to get these files.

How can I get these files?

Thanks.

RubenBranco · 2021-02-22T11:54:04Z

Hi @yongho94,

I'm not one of the authors but I might be able to help here. The code expects a .source .target format that used to be the standard format for huggingface libraries before datasets came about. Here's the example page: https://github.com/huggingface/transformers/tree/master/examples/legacy/seq2seq

To produce this for comet, you iterate over the csv file and for each row you concatenate the head with the relation "{head} {rel}" and write to a "train.source" file, and then the tail is written to a train.target file, such that each row in the files correspond to each other.

Might be wrong though.

yongho94 · 2021-02-23T06:26:42Z

Hi @yongho94,

I'm not one of the authors but I might be able to help here. The code expects a .source .target format that used to be the standard format for huggingface libraries before datasets came about. Here's the example page: https://github.com/huggingface/transformers/tree/master/examples/legacy/seq2seq

To produce this for comet, you iterate over the csv file and for each row you concatenate the head with the relation "{head} {rel}" and write to a "train.source" file, and then the tail is written to a train.target file, such that each row in the files correspond to each other.

Might be wrong though.

Thanks @RubenBranco !!

It seems to be work.

I think i need to try it.

Thanks.

keisks · 2021-02-23T17:29:57Z

Hi @yongho94,

Thank you for your question. Regarding the data format for BART, @RubenBranco is correct.

The src and trg dataset (for BART) is available here. If you are also interested in the model we trained, you can get it from here

I hope this helps!

Kelaxon · 2021-03-29T00:46:31Z

@keisks, Sorry for re-opening this, and thanks for the fantastic work.

I have another question regarding the data format:

I saw there are some "none" targets in the training, validating, and testing set. Why do you introduce them? Are they used to prevent over-fitting? If so, how do you determine the ratio and sampling method?

Thanks!

keisks · 2021-03-31T21:26:12Z

The none targets mean that annotators answered there are no tails for given head and relation. In the dataset, we include all the annotations (=no sampling). As you can see, they are sometimes redundant because multiple annotators give same answers.

Kelaxon · 2021-04-01T08:01:39Z

Thanks for the explaination👍🏻

keisks closed this as completed Feb 23, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How to get .source and .target file at comet_atomic2020_bart #2

How to get .source and .target file at comet_atomic2020_bart #2

yongho94 commented Feb 20, 2021

RubenBranco commented Feb 22, 2021 •

edited

yongho94 commented Feb 23, 2021

keisks commented Feb 23, 2021

Kelaxon commented Mar 29, 2021

keisks commented Mar 31, 2021

Kelaxon commented Apr 1, 2021

How to get .source and .target file at comet_atomic2020_bart #2

How to get .source and .target file at comet_atomic2020_bart #2

Comments

yongho94 commented Feb 20, 2021

RubenBranco commented Feb 22, 2021 • edited

yongho94 commented Feb 23, 2021

keisks commented Feb 23, 2021

Kelaxon commented Mar 29, 2021

keisks commented Mar 31, 2021

Kelaxon commented Apr 1, 2021

RubenBranco commented Feb 22, 2021 •

edited