Challenges in the Automatic Analysis of Students' Diagnostic Reasoning
The following repository contains the code for our different evalution metrics applicable to multi-label sequence-labelling tasks such as epistemic activity identification. It also provides the code for training single- and multi-output Bi-LSTMs. The new corpora can be obtained on request, allowing to replicate all experiments of our paper.
Citation
If you find the implementation useful, please cite the following two papers:
@inproceedings{Schulz:2019:AAAI,
title = {Challenges in the Automatic Analysis of Students' Diagnostic Reasoning},
author = {Schulz, Claudia and Meyer, Christian M. and Gurevych, Iryna},
publisher = {AAAI Press},
booktitle = {Proceedings of the 33rd AAAI Conference on Artificial Intelligence},
year = {2019},
note = {(to appear)},
address = {Honolulu, HI, USA}
}
@misc{SchulzEtAl2018_arxiv,
author = {Schulz, Claudia and Meyer, Christian M. and Sailer, Michael and Kiesewetter, Jan and Bauer, Elisabeth and Fischer, Frank and Fischer, Martin R. and Gurevych, Iryna},
title = {Challenges in the Automatic Analysis of Students' Diagnostic Reasoning},
year = {2018},
howpublished = {arXiv:1811.10550},
url = {https://arxiv.org/abs/1811.10550}
}
Abstract: We create the first corpora of students' diagnostic reasoning self-explanations from two domains annotated with the epistemic activities hypothesis generation, evidence generation, evidence evaluation, and drawing conclusions. We propose a separate performance metric for each challenge we identified for the automatic identification of epistemic activities, thus providing an evaluation framework for future research:
- the correct identification of epistemic activity spans,
- the reliable distinction of similar epistemic activities, and the
- detection of overlapping epistemic activities.
Contact person: Claudia Schulz, clauschulz1812@gmail.com
Alternative contact person: Jonas Pfeiffer, pfeiffer@ukp.informatik.tu-darmstadt.de
https://www.ukp.tu-darmstadt.de/
Please send us an e-mail if you want to get access to the corpora. Don't hesitate to contatct us to report issues or if you have further questions.
This repository contains experimental software and is published for the sole purpose of giving additional background details on the respective publication.
Experimental setup
All code is run using Python 3. In all scripts, we specify where the user has to adapt the code (mostly file paths) with 'USER ACTION NEEDED'.
Neural Network Experiments
The folder "neuralNetwork_experiments" contains the code required to train the neural networks. Our Bi-LSTM architectures are based on the implementation of Nils Reimers (NR): https://github.com/UKPLab/emnlp2017-bilstm-cnn-crf
- neuralnets -- contains BiLISTM2.py for the single-output architecture and BiLSTM2_multipleOutput.py for the multi-output architecture
- util -- various scripts for processing data and other utilities by NR
- data -- on request we provide train.txt, dev.txt, test.txt for all experimental setups
Setup with virtual environment (Python 3)
Set up a Python virtual environment (optional):
virtualenv --system-site-packages -p python3 env
source env/bin/activate
Install the requirements:
.env/bin/pip3 install -r requirements.txt
Get the word embeddings
- Download German (text) fastText embeddings from github and place it in the neuralNetwork_experiments folder
- Run embeddingsFirstLine.py to remove the first line (header)
Run the Experiments
- to train models for prefBaseline, concat, or separate, use train_singleOutput.py
- to train models for multiOutput, use train_multiOutput.py
- to use a trained model for prediction run runModel_singleOutput.py and trainModel_multiOutput.py NOTE: the loading of multiOutput models assumes a static layout, this needs to be changed if the model parameters are changed
Evaluation Metrics
The folder "evaluation" contains the code required to use our evaluation framework. evaluate.py implements our different evaluation metrics.
- use the runModel scripts to create predictions for all (test) files
- evaluate.py assumes the following folder structure of prediction results:
- MeD / TeD for the two domains
- separate, pref, concat, separate - folders for each method
- MeD_pref1, MeD_pref2, ... - 10 folders with predicition files for 10 models trained for this model
- note that "separate" has 4 subfolders (separate_dc, separate_hg, separate_ee, separate_eg) for the 4 epistemic activities, each with 10 subfolders for the results of the 10 models
- goldData - gold annotations for the prediction files
- human - different set of files used to evaluate human upper bound (all files annotated by all annotators)
- MeD_human1, ... - annotations of each annotator
- goldData - gold labels for the files used to evaluate human performance
- separate, pref, concat, separate - folders for each method
- MeD / TeD for the two domains