An Automated System for Essay Scoring of Online Exams in Arabic based on Stemming Techniques and Levenshtein Edit Operations
I implemented the paper based on the research methodology
https://arxiv.org/pdf/1611.02815.pdf
Develop an automated system is proposed for essay scoring in Arabic language for online exams based on stemming techniques and Levenshtein edit operations
- Python 2.7
Some important files / directories:
- heavy_stemming.py
The whole source code for heavy stemming approach
- light_stemming.py
The whole source code for light stemming approach
- docs
Several text files, such as questions, correct_ans, and student_ans
- prefixes
Stores the list of prefixes
- suffixes
Stores the list of suffixes
- stopwords
Stores the list of stopwords
To run the program, execute the following command:
- Heavy stemming approach: python heavy_stemming.py
- Light stemming approach: python light_stemming.py
Both approaches (heavy and light stemming) uses the following steps. The difference is only in the removal of prefixes and suffixes.
- Begin Heavy Stemming on both student and correct answers
This initial step consists of two sub-steps, such as removal of numbers from both answers and removal of diacritics from both answers. For the latter task, each answer is converted to unicode then the diacritics can be removed from both answers.
- Split each one of the two anwers into an array of words, processing one word at a time
It includes several steps, such as removal of stopwords, removal of prefix if word length is greater than 3, and removal of suffix if word length is greater than 3.
- Find the similarities by giving a weight to each word in both answers
The weight formula for each word: Word(i) weight = 1 / (total words in correct answer)
- For each word in student answer, calculate the similarity with words in correct answer
Several steps were included, such as calculating the Levenshtein distance between every word in student answer and words in correct answer AND calculating the similarity score between every word in student answer and words in correct answer.
- For each word in student answer, calculate the similarity with words in correct answer
These are the rules for calculating the final mark:
- If the similarity between StudentWord(i) and CorrectWord(i) = 1 then add weight to the final mark
- Elseif the similarity between StudentWord(i) and CorrectWord(i) < 1 and >= 0.96, add weight to the final mark
- Elseif the similarity between StudentWord(i) and CorrectWord(i) >= 0.8 and < 0.96, add half the weight to the final mark
- Elseif the similarity between StudentWord(i) and CorrectWord(i) < 0.8 then no weight is added to the final mark
Albertus Kelvin
Bandung Institute of Technology
Code was developed on January 21st, 2018
Code was made publicly available on January 31st, 2018