Kocasm : korean automatic sarcasm detection

Why this name? Kocasm is blend word, Korean + sarcasm

Why Irony detection is important?

Because it converts or distorts literal meaning of sentence, sarcasm is highly related to Sentiment Classification.

Preparing the data

HTML data gathered from a twitter
Data is composed of label 1,0.
- label 1: sarcasm, label0: randomly gatherd
korean data, queries for hashtags such as 역설, 아무말, 운수좋은날, 笑, 뭐래 아닙니다, 그럴리없다, 어그로, irony sarcastic, sarcasm was labeled as True data.(so still has lots of noise)
And pre-processed dataset (1) user anonymous (2) removing hashtag (3) removing url process.

If you have any other questions with corpus, please contacts me
- jiwon.kim.096@gmail.com

If you want to compare with other dataset, refer: [English]

ghosh: This english dataset collected by Aniruddha Ghosh and Tony Veale. See their repository and paper, Fracking Sarcasm using Neural Network

Language Model (It is still being editting)

bag_of_words.py: Basic bayesian model
dl_models.py: Model classes for a general transformer
tf_attention_models.py : Tensorflow attentive rnn model

I'm strongly inspired by MirunaPislar's code and I referred a lot to that codes, but I tried to make my codes more pythonic and pytorchic style. Actually, I am still modifying the code.
Kokasm is compatible with: Python 2.7-3.7

In case with your own data, clone this repository and...

export DATA_DIR=/path/to/data
export PREP_DIR=/path/to/preprocess
export SAVE_DIR=/path/to/save

python tf_attention_models.py \
    --mode train \
    --model_cfg config/attention_base.json \
    --data_file $DATA_DIR/jiwon/train.csv \
    --test_file $DATA_DIR/jiwon/test.csv \
    --pretrain_file $BERT_PRETRAIN \
    --vocab PREP_DIR/vocab.txt \
    --save_dir $SAVE_DIR \
    --max_len 128

Citation

If you found this dataset useful, please cite as:

@misc{kim2019kocasm,
  author = {Kim, Jiwon and Cho, Won Ik},
  title = {Kocasm: Korean Automatic Sarcasm Detection},
  year = {2019},
  publisher = {GitHub},
  journal = {GitHub repository},
  howpublished = {\url{https://github.com/SpellOnYou/korean-sarcasm}}
}

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

Kocasm : korean automatic sarcasm detection

Why Irony detection is important?

Preparing the data

Language Model (It is still being editting)

In case with your own data, clone this repository and...

Citation

See also

linguistic, computer science related to sarcasm

Kaggle - Twitter Inory Detection

Files

README.md

Latest commit

History

README.md

File metadata and controls

Kocasm : korean automatic sarcasm detection

Why Irony detection is important?

Preparing the data

Language Model (It is still being editting)

In case with your own data, clone this repository and...

Citation

See also

linguistic, computer science related to sarcasm

Kaggle - Twitter Inory Detection