Skip to content

dsmlr/40-Thai-Children-Stories

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 
 
 

Repository files navigation

alt text

The 40 Thai Children Stories

The dataset was collected from 40 Thai children stories. We manually split the text into sentences which leads to 1,964 sentences. Each sentence was manually tagged by experts into three classes which are positive, negative, and neutral according to the reader’s expressed emotion. The majority class given by the experts is neutral in this dataset. Sentences that were described by the same class chosen by all three experts are skimmed down to 1,114 sentences which comprise 298 negative sentences, 508 neutral sentences and 308 positive sentences.

Descriptions:

  • tale_name.csv
    • id_sentence -- Sentence ID
    • text -- Sentence
    • id_tale -- Tale ID that corresponds to tale_data.csv
    • Y_1 -- Label given by expert 1
    • Y_2 -- Label given by expert 2
    • Y_3 -- Label given by expert 3
    • Y_vote -- Voted Label
    • consensus -- All experts gave the same label
  • tale_data.csv
    • id_tale -- Tale ID
    • name_tale -- Tale name

If you are interested in further investigating your own study throughout this dataset, please cite this article:

Kitsuchart Pasupa, Ponruedee Netisopakul, Ratthawut Lertsuksakda, "Sentiment Analysis on Thai Children Stories", In Artificial Life and Robotics, vol. 21, no. 3, pp. 357-364, 2016. DOI: 10.1007/s10015-016-0283-8

We further compared a set of well-known deep learning techniques on this dataset. This can be found at:

Kitsuchart Pasupa, Thititorn Seneewong Na Ayutthaya, "Thai Sentiment Analysis with Deep Learning Techniques: A Comparative Study based on Word Embedding, POS-Tag, and Sentic Features", Sustainable Cities and Society, vol. 50, pp. 101615, 2019. DOI: 10.1016/j.scs.2019.101615

Contributors

  • Kitsuchart Pasupa
  • Ponruedee Netisopakul
  • Ratthawut Lertsuksakda
  • Thititorn Seneewong Na Ayutthaya