Files for the system submitted to the LSDSem2017 Workshop Story Cloze Test Challenge
Switch branches/tags
Nothing to show
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Failed to load latest commit information.

LSDSem 2017: Exploring Data Generation Methods for the Story Cloze Test

This repository contains the code needed to reproduce the results reported in Bugert et al., LSDSem 2017: Exploring Data Generation Methods for the Story Cloze Test.

Please cite the paper as:

  author    = {Bugert, Michael and Puzikov, Yevgeniy and Rücklé, Andreas and
               Eckle-Kohler, Judith and Martin, Teresa and Martinez Camara, Eugenio and
               Sorokin, Daniil and Peyrard, Maxime and Gurevych, Iryna},
  title     = {{LSDSem 2017: Exploring Data Generation Methods for the Story Cloze Test}},
  booktitle = {Proceedings of the 2nd Workshop on Linking Models of Lexical, Sentential and Discourse-level Semantics (LSDSem)},
  month     = {April},
  year      = {2017},
  address   = {Valencia, Spain},
  publisher = {Association for Computational Linguistics},
  pages     = {56-61},
  url       = {}

Abstract: The Story Cloze test is a recent effort in providing a common test scenario for text understanding systems. As part of the LSDSem 2017 shared task, we present a system based on a deep learning architecture combined with a rich set of manually-crafted linguistic features. The system outperforms all known baselines for the task, suggesting that the chosen approach is promising. We additionally present two methods for generating further training data based on stories from the ROCStories corpus.

Don't hesitate to contact us if something is broken (and it shouldn't be) or if you have further questions.

This repository contains experimental software and is published for the sole purpose of giving additional background details on the respective publication.

Project structure

  • config_files/ -- this folder contains configuration files
  • data/ -- this folder contains system input files
  • data_generation/ -- the scripts for generating training data ("Shuffling" and "KDE Sampling" methods)
  • neural_network/ -- all the scripts necessary for running the experiments
  • requirements.txt -- a text file with the names of the required Python modules


  • 64-bit Linux versions (not tested on other platforms)
  • Python 2.7 or Python 3.5
  • Python modules in the requirements.txt file
  • TensorFlow tensorflow (tested on v0.11.0)
  • Suitable word embeddings in text format (e.g. [] glove_embed)

Running the experiments

The general procedure for running any experiment is:

  • Fill out all relevant configuration options in the config file

  • Run the system in training or prediction mode:

    $python <config_file>

  • Run a significance test:

    $python <pred_1> <pred_2> <gold_answers>

Here, <pred_1> and <pred_2> are predictions of the system (two different models) and <gold_answers> is the file with correct predictions. All three files should have the official CSV submission format (see [here] lsdsem17-eval).

The script computes the [McNemar's test] mcnemar-wiki (as implemented in the [Statsmodels] mcnemar-statsmodels Python library) and prints the result to standard output.

Expected results

Each configuration file has a global part with a checkpoint_dir field. After finishing the training procedure, you should expect the model to be saved into the folder specified in this field.

Also, the logging part of the configuration file specifies the path where you can save the log file (be it a log of the training or prediction procedure).

When performing prediction on test data, the folder specified by checkpoint_dir will also contain answer.txt file with predicted values in the submission format.

The best model (BILSTM-VF) achieved 71.7% accuracy on the official test data. The model was trained with the following parameter values:

  • Sentence length: 20
  • Embeddings: glove.6B.100d (lowercased data)
  • BiLSTM-VF model (trainable embeddings, cell size 141, use_last_hidden=true)
  • Optimizer: Adam
  • Initial learning rate: 0.0001
  • Batch size: 40
  • Dropout: 0.3
  • Num_epocs: 30

Parameter description

The parameters are documented in any of the configuration files in the config_files folder.

Please let us know if you have any questions regarding the parameters' meaning.

Feature description

The system can be trained with additional lexical features or without them (using only word embedding vectors).

In the config_files/ folder you can find four subfolders with sample configuration files:

  • bilstm-t
  • bilstm-tf
  • bilstm-v
  • bilstm-vf

Configuration files are customizable. You can create your own ones and experiment with various values for the fields. In order to make it easier to reproduce the results from the paper, we defined two types of configuration files, train.yaml and predict.yaml, which are used for training and evaluating models, respectively.

Note that the main difference between the configuration files in different folders is the value of the data-module field. The following subsections describe how to prepare a dataset in the suitable format.

Type 1: word embeddings only

  • Prepare three (train, dev, test) CSV files with the following header:

    story_id, sent1, sent2, sent3, sent4, ending1, ending2, label

  • Folders bilstm-t/ and bilstm-v/ contain sample configuration files which use only word embeddings. Note that the data-module field is set to csv_reader-t and csv_reader-v, respectively. You can change fields other than data-module as you see fit (e.g., for hyperparameter optimization)

Type 2: word embeddings and lexical features

We used the DKPro TC framework to extract features and integrate them into our deep learning architecture. You can use any features you want, but they should comply with the following naming conventions:

  • if the feature is defined for one ending, its name in the CSV header file should contain either "E1" (the first ending) or "E2" (the second ending): e.g., "SentimentE1", "SentimentE2".
  • if the feature is defined for both endings, then its name in the CSV header file should not contain "E1" or "E2" ("LengthDiff", "BigramOverlapDiff", etc.)

If you decide to incorporate features into the network, the general procedure to create an input for the system is as follows:

  • Prepare three (train, dev, test) CSV files (or use ours from the data/ folder) with the following header:

    story_id, sent1, sent2, sent3, sent4, ending1, ending2, label, feature1, feature2, ..., featureN

  • Folders bilstm-tf/ and bilstm-vf/ contain sample configuration files which use both word embeddings and extracted lexical features. Note that the data-module field is set to csv_reader-tf and csv_reader-vf, respectively. You can change fields other than data-module as you see fit (e.g., for hyperparameter optimization)


The data/ folder contains:

  • Data_for_neural_network.7z: the training, development and test datasets with features, which we used for our experiments
  • ROCStories_generated_by_KDE_sampling.7z: the output of our proposed KDE sampling method for generating ROCStories with wrong endings
  • Input_data_for_KDE_sampling.7z: the necessary input data for running said method (see below)

Data generation


Running $python <src> <dest> with a ROCStories CSV file as <src> will create a second CSV file at <dest>. This file will contain two lines for each ROCStory: the original (correct) story and a shuffled (wrong) story. Correct stories are marked with value 1 in the label column.

KDE Sampling

  • Extract the Input_data_for_KDE_sampling.7z archive from the data/ folder.
  • Specify the paths to the individual input files and pretrained embeddings (we used [] glove_embed) in the config.yaml file.
  • Run the data generation via: $python3 <dest_csv> <config.yaml>
  • The output file <dest_csv> will be a CSV file in the format of the Story Cloze validation/test sets, containing ROCStories with generated wrong endings.