-
Notifications
You must be signed in to change notification settings - Fork 227
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Partial Grading #152
Partial Grading #152
Conversation
4791a49
to
c42e68d
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have few comments.
Also, we should add test cases for testing partial grade marks returned from the evaluator.
@@ -27,7 +28,7 @@ def teardown(self): | |||
delete_files(self.files) | |||
super(BashCodeEvaluator, self).teardown() | |||
|
|||
def check_code(self, user_answer, file_paths, test_case): | |||
def check_code(self, user_answer, file_paths, partial_grading, test_case, marks): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
check_code now returns 3 values, so it will be better if the Returns description is modified in the method docstring.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will add further test cases & validations in a subsequent PR, I was hoping this could be merged so that further Evaluator based PRs can be designed accordingly
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The docstring can be updated now though.
@@ -1104,9 +1115,11 @@ class TestCase(models.Model): | |||
|
|||
class StandardTestCase(TestCase): | |||
test_case = models.TextField(blank=True) | |||
marks = models.FloatField(default=0.0) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So each test case will have marks!
Instead I think a single constant fraction value for each testcase is a better option.
Else we can sum the total testcases and find the ratio of right to total.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agree with @prathamesh920 !
user_answer.correct = correct | ||
user_answer.error = result.get('error') | ||
if correct: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we have a fraction(value 0 to 1) returned from the code sever, then we do not need this if block, we can simply write
user_answer.marks = question.points * result['marks']
if wrong then marks is 0
if right then marks is 1
else fraction of marks.
So no need to check correct or wrong for partial grading when adding marks.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If condition will have to stay.
In case when there is no partial grading for a question, in that case the evaluator always returns 0.0
So if a question is answered correctly but it is not graded partially then it will end up being marked 0.0 marks (in case success=True is not checked)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree with @prathamesh920 that each test case having marks is non optimal. I think I have a nicer way to specify this. Basically tread the "marks" you have above as a weight for each test, then the total mark is sum(obtained weight)/total(weight)*marks. By default each test case has a weight of 1, this way is nicer I think and much easier to implement. A teacher may even change the total marks without having to worry about the individual weights.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Apart from the comments made, I am OK with this being a two stage PR, however, it will be good to resolve the current issues.
@@ -27,7 +28,7 @@ def teardown(self): | |||
delete_files(self.files) | |||
super(BashCodeEvaluator, self).teardown() | |||
|
|||
def check_code(self, user_answer, file_paths, test_case): | |||
def check_code(self, user_answer, file_paths, partial_grading, test_case, marks): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The docstring can be updated now though.
user_answer.correct = correct | ||
user_answer.error = result.get('error') | ||
if correct: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree with @prathamesh920 that each test case having marks is non optimal. I think I have a nicer way to specify this. Basically tread the "marks" you have above as a weight for each test, then the total mark is sum(obtained weight)/total(weight)*marks. By default each test case has a weight of 1, this way is nicer I think and much easier to implement. A teacher may even change the total marks without having to worry about the individual weights.
@@ -1104,9 +1115,11 @@ class TestCase(models.Model): | |||
|
|||
class StandardTestCase(TestCase): | |||
test_case = models.TextField(blank=True) | |||
marks = models.FloatField(default=0.0) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agree with @prathamesh920 !
…ased partial grading
c42e68d
to
a09df64
Compare
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are few comments, kindly resolve them and test them.
I also feel that there should be testcases for partial grading, though can be added in stage 2 while optimizing.
@@ -27,7 +28,7 @@ def teardown(self): | |||
delete_files(self.files) | |||
super(BashCodeEvaluator, self).teardown() | |||
|
|||
def check_code(self, user_answer, file_paths, test_case): | |||
def check_code(self, user_answer, file_paths, partial_grading, test_case, weightage): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
docstring not modified. Can be done in next stage.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead of weightage
perhaps use weight
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed
@@ -517,11 +517,16 @@ def check(request, q_id, attempt_num=None, questionpaper_id=None): | |||
if question.type == 'code' else None | |||
correct, result = paper.validate_answer(user_answer, question, json_data) | |||
if correct: | |||
new_answer.marks = (question.points * result['weightage'] / |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
in code_evaluator line number 92 is following
result = {'success': success, 'error': error, 'marks': marks}
so here result['weightage']
will throw an error!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I had already fixed this in a minor commit, It did not get pushed. I apologise. This has been fixed
new_answer.error = result.get('error') | ||
else: | ||
new_answer.error = result.get('error') | ||
new_answer.marks = (question.points * result['weightage'] / | ||
question.get_maximum_test_case_weightage()) \ | ||
if question.partial_grading and question.type == 'code' else 0 | ||
new_answer.save() | ||
paper.update_marks('inprogress') | ||
paper.set_end_time(timezone.now()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Below we see that there is check if not result.get('success'):
in else we add the current question to completed questions and show the next question.
So in case my all testcases fail except for the last one then the question will be added to the completed questions!!!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This has been fixed now
weightage += test_case_weightage | ||
error = err | ||
else: | ||
error += err + "\n" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So for all testcases we are appending the err.
In case, the first testcase passes and the next testcase fails then
error message will be Correct Answer + Assertion Error.
This can be covered in your next PR related to error message formatting,
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So maybe error should be a list?
else: | ||
err = ("Error:expected" | ||
" {0}, got {1}").format(inst_stdout+inst_stderr, | ||
stdnt_stdout+stdnt_stderr | ||
) | ||
return False, err | ||
return False, err, test_case_weightage |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is this returning the weightage?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have initialised the test_case_weightage as 0.0 earlier. Hence the variable is returned as is.
However I agree that this may be a bit unclear, hence fixing this to return a constant 0.0
result = json.dumps({'success': False, 'error': 'Unable to connect to any code servers!'}) | ||
result = json.dumps({'success': False, | ||
'weightage': 0.0, | ||
'error': 'Unable to connect to any code servers!'}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please describe what the return values are in the docstring very clearly. This looks like the sum of the weights that the user code passed, correct?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, that is correct, Fixed the docstrings accordingly
3636257
to
c4a39ba
Compare
|
test/have dissimilar output, when compared to the instructor script. | ||
|
||
Returns (False, error_msg): If mandatory arguments are not files or if | ||
Returns (False, error_msg, 0.0): If mandatory arguments are not files or if |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In this case, should this just raise a custom exception?
if file_paths: | ||
self.files = copy_files(file_paths) | ||
if not isfile(clean_ref_code_path): | ||
msg = "No file at %s or Incorrect path" % clean_ref_code_path | ||
return False, msg | ||
return False, msg, 0.0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All these should perhaps raise an exception, no?
if test_case_success: | ||
weight += test_case_weight | ||
|
||
error += err + "\n" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is dirty, can't we collect a list of errors?
@@ -47,11 +68,13 @@ def check_code(self, user_answer, file_paths, test_case): | |||
info = traceback.extract_tb(tb) | |||
fname, lineno, func, text = info[-1] | |||
text = str(test_case).splitlines()[lineno-1] | |||
err = "{0} {1} in: {2}".format(type.__name__, str(value), text) | |||
err = ("-----\nExpected Test Case:\n{0}\n" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks ugly, can't it be some HTML code or can this be delegated to something else?
Merging for now. |
Add partial grading to evaluation