Skip to content


Switch branches/tags

Name already in use

A tag already exists with the provided branch name. Many Git commands accept both tag and branch names, so creating this branch may cause unexpected behavior. Are you sure you want to create this branch?

Latest commit


Git stats


Failed to load latest commit information.
Latest commit message
Commit time

Challenges in the Automatic Analysis of Students' Diagnostic Reasoning

The following repository contains the code for our different evalution metrics applicable to multi-label sequence-labelling tasks such as epistemic activity identification. It also provides the code for training single- and multi-output Bi-LSTMs. The new corpora can be obtained on request, allowing to replicate all experiments of our paper.


If you find the implementation useful, please cite the following two papers:

	title = {Challenges in the Automatic Analysis of Students' Diagnostic Reasoning},
	author = {Schulz, Claudia and Meyer, Christian M. and Gurevych, Iryna},
	publisher = {AAAI Press},
	booktitle = {Proceedings of the 33rd AAAI Conference on Artificial Intelligence},
	year = {2019},
	note = {(to appear)},
	address = {Honolulu, HI, USA}

	author = {Schulz, Claudia and Meyer, Christian M. and Sailer, Michael and Kiesewetter, Jan and Bauer, Elisabeth and Fischer, Frank and Fischer, Martin R. and Gurevych, Iryna},
	title = {Challenges in the Automatic Analysis of Students' Diagnostic Reasoning},
	year = {2018},
	howpublished = {arXiv:1811.10550},
	url = {}

Abstract: We create the first corpora of students' diagnostic reasoning self-explanations from two domains annotated with the epistemic activities hypothesis generation, evidence generation, evidence evaluation, and drawing conclusions. We propose a separate performance metric for each challenge we identified for the automatic identification of epistemic activities, thus providing an evaluation framework for future research:

  1. the correct identification of epistemic activity spans,
  2. the reliable distinction of similar epistemic activities, and the
  3. detection of overlapping epistemic activities.

Contact person: Claudia Schulz,

Alternative contact person: Jonas Pfeiffer,

Please send us an e-mail if you want to get access to the corpora. Don't hesitate to contatct us to report issues or if you have further questions.

This repository contains experimental software and is published for the sole purpose of giving additional background details on the respective publication.

Experimental setup

All code is run using Python 3. In all scripts, we specify where the user has to adapt the code (mostly file paths) with 'USER ACTION NEEDED'.

Neural Network Experiments

The folder "neuralNetwork_experiments" contains the code required to train the neural networks. Our Bi-LSTM architectures are based on the implementation of Nils Reimers (NR):

  • neuralnets -- contains for the single-output architecture and for the multi-output architecture
  • util -- various scripts for processing data and other utilities by NR
  • data -- on request we provide train.txt, dev.txt, test.txt for all experimental setups

Setup with virtual environment (Python 3)

Set up a Python virtual environment (optional):

virtualenv --system-site-packages -p python3 env
source env/bin/activate

Install the requirements:

.env/bin/pip3 install -r requirements.txt

Get the word embeddings

  • Download German (text) fastText embeddings from github and place it in the neuralNetwork_experiments folder
  • Run to remove the first line (header)

Run the Experiments

  • to train models for prefBaseline, concat, or separate, use
  • to train models for multiOutput, use
  • to use a trained model for prediction run and NOTE: the loading of multiOutput models assumes a static layout, this needs to be changed if the model parameters are changed

Evaluation Metrics

The folder "evaluation" contains the code required to use our evaluation framework. implements our different evaluation metrics.

  • use the runModel scripts to create predictions for all (test) files
  • assumes the following folder structure of prediction results:
    • MeD / TeD for the two domains
      • separate, pref, concat, separate - folders for each method
        • MeD_pref1, MeD_pref2, ... - 10 folders with predicition files for 10 models trained for this model
        • note that "separate" has 4 subfolders (separate_dc, separate_hg, separate_ee, separate_eg) for the 4 epistemic activities, each with 10 subfolders for the results of the 10 models
      • goldData - gold annotations for the prediction files
      • human - different set of files used to evaluate human upper bound (all files annotated by all annotators)
        • MeD_human1, ... - annotations of each annotator
        • goldData - gold labels for the files used to evaluate human performance


Code for the paper "Challenges in the Automatic Analysis of Students' Diagnostic Reasoning"






No releases published


No packages published