Skip to content

david-yoon/detecting-incongruity

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

92 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

detecting-incongruity

This repository contains the source code & data corpus used in the following paper,

Detecting Incongruity Between News Headline and Body Text via a Deep Hierarchical Encoder, AAAI-19, paper

Requirements

  tensorflow==1.4 (tested on cuda-8.0, cudnn-6.0)
  python==2.7
  scikit-learn==0.20.0
  nltk==3.3

Download Dataset

  • download preprocessed dataset with the following script

    cd data
    sh download_processed_dataset_aaai-19.sh

  • the downloaded dataset will be placed into the following path of the project

    /data/aaai-19/para
    /data/aaai-19/whole

  • format (example)

    test_title.npy: [100000, 49] - (#samples, #token (index))
    test_body: [100000, 1200] - (#samples, #token (index))
    test_label: [100000] - (#samples)
    dic_mincutN.txt: dictionary

Source Code

  • according to the training method

    whole-type: using the codes in the ./src_whole
    para-type: using the codes in the ./src_para

Training Phase

  • each source code folder contains a reference script for training the model

    train_reference_scripts.sh
    << for example >>
    train dataset with AHDE model and "whole" method

python AHDE_Model.py --batch_size 256 --encoder_size 80 --context_size 10 --encoderR_size 49 --num_layer 1 --hidden_dim 300  --num_layer_con 1 --hidden_dim_con 300 --embed_size 300 --lr 0.001 --num_train_steps 100000 --is_save 1 --graph_prefix 'ahde' --corpus 'aaai-19_whole' --data_path '../data/target_aaai-19_whole/'
  • Results will be displayed in the console
  • The final test result will be stored in "./TEST_run_result.txt"

※ hyper parameters

  • major parameters: edit from the training script
  • other parameters: edit from "./params.py"

Inference Phase

  • each source code folder contains an inference script
  • you need to modify the "model_path" in the "eval_AHDE.sh" to a proper path

    << for example >>
    evaluate test dataset with AHDE model and "whole" method

	src_whole$ sh eval_AHDE.sh
  • Results will be displayed in the console
  • scores for the testset will be stored in "./output.txt"

Dataset Statistics

  • whole case

    data Samples tokens (avg)
    headline
    tokens (avg)
    body text
    train 1,700,000 13.71 499.81
    dev 100,000 13.69 499.03
    test 100,000 13.55 769.23
  • Note

    We crawled articles for "dev" and "test" dataset from different media outlets.

Newly introduced dataset (English version)

  • We create an English version of the dataset, nela-17, using NELA 2017 data. Please refer to the dataset repository [link].
  • If you want to run our model (AHDE) with the nela-17 data, you can use the preprocessed dataset that is compatible with our code.

    cd data
    sh download_processed_dataset_nela-17.sh

  • training script (refer to the "train_reference_scripts.sh")
python AHDE_Model.py --batch_size 64 --encoder_size 200 --context_size 50 --encoderR_size 25 --num_layer 1 --hidden_dim 100  --num_layer_con 1 --hidden_dim_con 100 --embed_size 300 --use_glove 1 --lr 0.001 --num_train_steps 100000 --is_save 1 --graph_prefix 'ahde' --corpus 'nela-17_whole' --data_path '../data/target_nela-17_whole/'

Other implementation (pytorch version)

cite

  • Please cite our paper, when you use our code | dataset | model

@inproceedings{yoon2019detecting,
title={Detecting Incongruity between News Headline and Body Text via a Deep Hierarchical Encoder},
author={Yoon, Seunghyun and Park, Kunwoo and Shin, Joongbo and Lim, Hongjun and Won, Seungpil and Cha, Meeyoung and Jung, Kyomin},
booktitle={Proceedings of the AAAI Conference on Artificial Intelligence},
volume={33},
pages={791--800},
year={2019}
}

About

TensorFlow implementation of "Detecting Incongruity Between News Headline and Body Text via a Deep Hierarchical Encoder," AAAI-19

Topics

Resources

License

Stars

Watchers

Forks

Packages

No packages published