Skip to content

bpfrd/auto-grading-project

Repository files navigation

Auto-grading of students' answers

Absract:

In this project, we aim to assess students' answers in quizzes and exams and to give proper feedback accordingly. For this reason, we annotated about 6857 students' answers from FFHS courses. Compared to the correct teacher's answers, each student's answer can be either correct, wrong, or partially correct. Therefore, it's a multi-class classification problem. We tackled the problem with the BERT model. In this approach, we added a fully-connected layer followed by a softmax (for classification) to the BERT pre-trained architecture and fine-tune it on our dataset. We were able to obtain an accuracy of 70% in this dataset. In addition, we evaluated the performance of our model on two public datasets, namely CSSAG and CREG, and showed that our model outperforms the state-of-the-art accuracy for these datasets significantly.

Introduction:

This is the official code for the paper "", accepted and published in "".

Contents


Repository structure


├── auto-grading-project                 : Project repository
    ├── BERT-model.py                    : script for training and validation
    ├── utils.py                         : script for data preparation and preprocessing
    ├── script.sh                        : script for running jobs
    ├── run-script.sh                    : script for running a job
    ├── cm*.png                          : confusion matrixes for various datasets
    ├── output_all_13                    : results for random seed 13
    ├── requirements.txt                 : requirements 

Proposed model


We use the huggingface implementation of the "BERT" language representation for our classification task. BERT or Bidirectional Encoder Representations from Transformers is introduced in 2019 by Google researchers and since then it has been one of the best language models for a variety of NLP tasks such as sentiment analysis, question answering, and text generation to name but a few. BERT's base model architecture (shown in the below figure) is a multi-layer bidirectional Transformer encoder, similar to Attention Is All You Need paper, trained to predict masked words in the input sentence (MLM) and to predict next sentence in a pair of input sentences (NSP). Once the model is trained it can be fine-tuned by adding more layers in the output to perform several downstream tasks.

In this approach, we use a pre-trained BERT model for German language and fine-tune it to do our classification task. We do this by adding a fully-connected layer followed by a softmax on top of the BERT's base architecture and train with our dataset. Since the input is made up of two sentences, namely student's answer and teacher's answer, we concatenate the two sentences using a <SEP> token.

BERT base model architecture

Results


In all the datasets (CSSAG, CREG, and SAVOA) we split the dataset into 60% training, 20% validation, and 20% testing. Having evaluated the model on the test set the accuracy (%) is reported in the bellow table:

Dataset CSSAG (University of Stuttgart)

This is the orginal dataset as mentioned in the paper

Model binary classification (strict) binary classification (generous) multi-class classification (complete, fail, partial)
BERT 88 86 76

Dataset SAVOA (IFEL)

Model multi-class classification (0, 1, 2, 3)
BERT 70

Dataset CREG (University of Tübingen)

Model binary classification
BERT 92

Installation


Start by cloning this repositiory:

git clone https://github.com/bpfrd/auto-grading-project.git
cd auto-grading-project

Linux:

Create and activate a virtual environment:

virtualenv <myenv>
source <myenv>/bin/activate

And install the dependencies:

pip install -r requirements.txt

Windows:

Create a virtual environment:

python.exe -m venv <myenv>

And install the requirements using

<myenv>/Scripts/python.exe -m pip install -r requirements.txt

Dataset


  • We use the the private dataset SAVOA from FFHS and two public datasets CSSAG and CREG.

Training/Testing


The training is done in two steps. The first step is data preparation. In order to prepare training, validation, and testing datasets run the below line:

Linux: python utils.py --datatset-type <dataset_type> --class-type <class_type> [--other-argument other-argument]
Windows: <myenv>/Scripts/python.exe utils.py --datatset-type <dataset_type> --class-type <class_type> [--other-argument other-argument]

Where <dataset_type> can be either CSSAG3, CREG, and SAVOA, <class_type> is required when dataset is CSSAG3 and it can be either strict_class, generous_class, or multi-class.

Other optional parameters are:

  • --random-state random_state: random state for shuffling the dataset, int, optional
  • --train-size train_size: ratio of trainset, float (between 0 and 1), optional
  • --val-size val_size: ratio of trainset, float (between 0 and 1), optional
  • --trainset-fname trainset_fname: filename for training set, str, optional
  • --valset-fname valset_fname: filename for val set, str, optional
  • --testset-fname testset_fname: filename for test set, str, optional
  • --labels-fname labels_fname: filename for labels, str, optional

In the second step -- training -- the trainset, valset, testset as well as labels (trainset_fname, valset_fname, testset_fname, labels_fname) are read in BERT-model.py:

Linux: python BERT-model.py --mode mode --model-path model_path --trainset-fname trainset_fname --valset-fname valset_fname --testset-fname testset_fname --labels-fname labels_fname --output-dir output_dir --log-dir log_dir --cm cm
Windows: <myenv>/Scripts/python.exe BERT-model.py --mode mode --model-path model_path --trainset-fname trainset_fname --valset-fname valset_fname --testset-fname testset_fname --labels-fname labels_fname --output-dir output_dir --log-dir log_dir --cm cm

the parameters are:

  • --mode mode: mode can be either train or evaluate, str, required
  • --model-path model_path: path to save the model for training (if mode is train) or load the model for evaluation (if mode is evaluate), str, required
  • --trainset-fname trainset_fname: path to training set, str, required for training
  • --valset-fname valset_fname: path to val set, str, required for training
  • --testset-fname testset_fname: path to test set, str, required for evaluation
  • --labels-fname labels_fname: path to labels, str, required for evaluation
  • --output-dir output_dir: path to save the output, str, required for training
  • --log-dir log_dir: path to save logs, str, required for training
  • --cm cm: path to save confusion matrix visualization for test set, str, optional

for simplicity, two bash scripts, run-script.sh and script.sh are provided and can be run as:

./script.sh | tee output_all_13

Tested Environments


  • Windows 10
  • Linux

Acknowledgments


Citation


About

auto-grading of CSSAG3 dataset

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 2

  •  
  •