Making Dataset for Bengali Language #31

PrithwirajRizu · 2019-07-16T06:35:07Z

I have two files in Bengali. article.txt, and summary.txt. Now how can I convert it to corresponding train.bin, val.bin, test.bin? I just couldn't understand how to process my Bengali corpus for this summarization process. Thanks in advance.

sagorbrur · 2019-09-10T20:51:36Z

Hi @PrithwirajRizu
Your story should be like this.

article = open('article.txt', 'r').read()
summary = open('summary.txt', 'r').read()

story = article + '\n\n' + '@highlight'+'\n'+summary

Then follow this to generate train or test data.

senjed · 2020-06-16T04:05:51Z

Hi @PrithwirajRizu
Your story should be like this.
article = open('article.txt', 'r').read()
summary = open('summary.txt', 'r').read()

story = article + '\n\n' + '@highlight'+'\n'+summary 
Then follow this to generate train or test data.

I guess each sentence of the summary should be in a separate line and separated by the "@highlight" tag

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Making Dataset for Bengali Language #31

Making Dataset for Bengali Language #31

PrithwirajRizu commented Jul 16, 2019

sagorbrur commented Sep 10, 2019

senjed commented Jun 16, 2020

Making Dataset for Bengali Language #31

Making Dataset for Bengali Language #31

Comments

PrithwirajRizu commented Jul 16, 2019

sagorbrur commented Sep 10, 2019

senjed commented Jun 16, 2020