Skip to content

Latest commit

 

History

History
5 lines (3 loc) · 514 Bytes

File metadata and controls

5 lines (3 loc) · 514 Bytes

The Corpus of Diverse Styles

In this folder we present some samples from new dataset, the Corpus of Diverse Styles. We have 15 million sentences and 11 diverse styles in our dataset.

We present 1000 sentences from each our of our eleven diverse styles in this folder. WARNING: These samples have not been filtered by profanity / toxicity and some sentences contain expletives or disturbing content. We recognize this issue with the dataset and the potential issues models trained on this data could have.