Auto-grading of students' answers

Absract:

In this project, we aim to assess students' answers in quizzes and exams and to give proper feedback accordingly. For this reason, we annotated about 6857 students' answers from FFHS courses. Compared to the correct teacher's answers, each student's answer can be either correct, wrong, or partially correct. Therefore, it's a multi-class classification problem. We tackled the problem with the BERT model. In this approach, we added a fully-connected layer followed by a softmax (for classification) to the BERT pre-trained architecture and fine-tune it on our dataset. We were able to obtain an accuracy of 70% in this dataset. In addition, we evaluated the performance of our model on two public datasets, namely CSSAG and CREG, and showed that our model outperforms the state-of-the-art accuracy for these datasets significantly.

Introduction:

This is the official code for the paper "", accepted and published in "".

Repository structure

├── auto-grading-project                 : Project repository
    ├── BERT-model.py                    : script for training and validation
    ├── utils.py                         : script for data preparation and preprocessing
    ├── script.sh                        : script for running jobs
    ├── run-script.sh                    : script for running a job
    ├── cm*.png                          : confusion matrixes for various datasets
    ├── output_all_13                    : results for random seed 13
    ├── requirements.txt                 : requirements

Proposed model

We use the huggingface implementation of the "BERT" language representation for our classification task. BERT or Bidirectional Encoder Representations from Transformers is introduced in 2019 by Google researchers and since then it has been one of the best language models for a variety of NLP tasks such as sentiment analysis, question answering, and text generation to name but a few. BERT's base model architecture (shown in the below figure) is a multi-layer bidirectional Transformer encoder, similar to Attention Is All You Need paper, trained to predict masked words in the input sentence (MLM) and to predict next sentence in a pair of input sentences (NSP). Once the model is trained it can be fine-tuned by adding more layers in the output to perform several downstream tasks.

In this approach, we use a pre-trained BERT model for German language and fine-tune it to do our classification task. We do this by adding a fully-connected layer followed by a softmax on top of the BERT's base architecture and train with our dataset. Since the input is made up of two sentences, namely student's answer and teacher's answer, we concatenate the two sentences using a <SEP> token.

Results

In all the datasets (CSSAG, CREG, and SAVOA) we split the dataset into 60% training, 20% validation, and 20% testing. Having evaluated the model on the test set the accuracy (%) is reported in the bellow table:

Dataset CSSAG (University of Stuttgart)

This is the orginal dataset as mentioned in the paper

Model	binary classification (strict)	binary classification (generous)	multi-class classification (complete, fail, partial)
BERT	88	86	76

Dataset SAVOA (IFEL)

Model	multi-class classification (0, 1, 2, 3)
BERT	70

Dataset CREG (University of Tübingen)

Model	binary classification
BERT	92

Installation

Start by cloning this repositiory:

git clone https://github.com/bpfrd/auto-grading-project.git
cd auto-grading-project

Linux:

Create and activate a virtual environment:

virtualenv <myenv>
source <myenv>/bin/activate

And install the dependencies:

pip install -r requirements.txt

Windows:

Create a virtual environment:

python.exe -m venv <myenv>

And install the requirements using

<myenv>/Scripts/python.exe -m pip install -r requirements.txt

Dataset

We use the the private dataset SAVOA from FFHS and two public datasets CSSAG and CREG.

Training/Testing

The training is done in two steps. The first step is data preparation. In order to prepare training, validation, and testing datasets run the below line:

Linux: python utils.py --datatset-type <dataset_type> --class-type <class_type> [--other-argument other-argument]
Windows: <myenv>/Scripts/python.exe utils.py --datatset-type <dataset_type> --class-type <class_type> [--other-argument other-argument]

Where <dataset_type> can be either CSSAG3, CREG, and SAVOA, <class_type> is required when dataset is CSSAG3 and it can be either strict_class, generous_class, or multi-class.

Other optional parameters are:

--random-state random_state: random state for shuffling the dataset, int, optional
--train-size train_size: ratio of trainset, float (between 0 and 1), optional
--val-size val_size: ratio of trainset, float (between 0 and 1), optional
--trainset-fname trainset_fname: filename for training set, str, optional
--valset-fname valset_fname: filename for val set, str, optional
--testset-fname testset_fname: filename for test set, str, optional
--labels-fname labels_fname: filename for labels, str, optional

In the second step -- training -- the trainset, valset, testset as well as labels (trainset_fname, valset_fname, testset_fname, labels_fname) are read in BERT-model.py:

Linux: python BERT-model.py --mode mode --model-path model_path --trainset-fname trainset_fname --valset-fname valset_fname --testset-fname testset_fname --labels-fname labels_fname --output-dir output_dir --log-dir log_dir --cm cm
Windows: <myenv>/Scripts/python.exe BERT-model.py --mode mode --model-path model_path --trainset-fname trainset_fname --valset-fname valset_fname --testset-fname testset_fname --labels-fname labels_fname --output-dir output_dir --log-dir log_dir --cm cm

the parameters are:

--mode mode: mode can be either train or evaluate, str, required
--model-path model_path: path to save the model for training (if mode is train) or load the model for evaluation (if mode is evaluate), str, required
--trainset-fname trainset_fname: path to training set, str, required for training
--valset-fname valset_fname: path to val set, str, required for training
--testset-fname testset_fname: path to test set, str, required for evaluation
--labels-fname labels_fname: path to labels, str, required for evaluation
--output-dir output_dir: path to save the output, str, required for training
--log-dir log_dir: path to save logs, str, required for training
--cm cm: path to save confusion matrix visualization for test set, str, optional

for simplicity, two bash scripts, run-script.sh and script.sh are provided and can be run as:

./script.sh | tee output_all_13

Tested Environments

Windows 10
Linux

Name		Name	Last commit message	Last commit date
Latest commit History 119 Commits
figures		figures
images		images
transformer-model		transformer-model
.gitignore		.gitignore
BERT-model.py		BERT-model.py
README.md		README.md
cm-CREG-binary.png		cm-CREG-binary.png
cm-CSSAG3-generous-class.png		cm-CSSAG3-generous-class.png
cm-CSSAG3-multi-class.png		cm-CSSAG3-multi-class.png
cm-CSSAG3-strict-class.png		cm-CSSAG3-strict-class.png
cm-SAVOA-multi-class.png		cm-SAVOA-multi-class.png
notebook.ipynb		notebook.ipynb
output_all_13		output_all_13
preprocessing.py		preprocessing.py
requirements.txt		requirements.txt
run-script.sh		run-script.sh
script.sh		script.sh
utils.py		utils.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Auto-grading of students' answers

Absract:

Introduction:

Contents

Repository structure

Proposed model

Results

Dataset CSSAG (University of Stuttgart)

Dataset SAVOA (IFEL)

Dataset CREG (University of Tübingen)

Installation

Linux:

Windows:

Dataset

Training/Testing

Tested Environments

Acknowledgments

Citation

About

Uh oh!

Releases

Packages

Contributors 2

Uh oh!

Languages

bpfrd/auto-grading-project

Folders and files

Latest commit

History

Repository files navigation

Auto-grading of students' answers

Absract:

Introduction:

Contents

Repository structure

Proposed model

Results

Dataset CSSAG (University of Stuttgart)

Dataset SAVOA (IFEL)

Dataset CREG (University of Tübingen)

Installation

Linux:

Windows:

Dataset

Training/Testing

Tested Environments

Acknowledgments

Citation

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Uh oh!

Languages

Packages