Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement batch scheduling bridge call #1901

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

Ninjaclasher
Copy link
Member

@Ninjaclasher Ninjaclasher commented Mar 21, 2022

Closes #1318.

@quantum5
Copy link
Member

Did you know that you can edit the description?

@Ninjaclasher Ninjaclasher force-pushed the batch-bridge branch 5 times, most recently from 825d4db to 41806c7 Compare March 28, 2022 00:16
@Riolku
Copy link
Contributor

Riolku commented Mar 31, 2022

I scheduled 56 submissions for rejudging and nothing broke 👍 .

Copy link
Contributor

@Riolku Riolku left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not familiar enough with the bulk of the code in this PR to say if this code is ok, but I mean, nothing looks decidedly wrong I guess...

@@ -160,7 +160,6 @@ def judge(self, request, queryset):
self.message_user(request, gettext('You do not have the permission to rejudge submissions.'),
level=messages.ERROR)
return
queryset = queryset.order_by('id')
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this unrelated to the main focus of this PR? Just curious.


judge.alters_data = True

@classmethod
def batch_judge(cls, submissions, *, rejudge=False, force_judge=False, rejudge_user=None, chunk_size=100, **kwargs):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it be less clunky to put this on the QuerySet given how you are using it?

}

with transaction.atomic():
submission_queryset = Submission.objects.filter(id__in=map(attrgetter('id'), submissions)) \
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the argument is a queryset, then you wouldn't need to do this hack and could just do .exclude on the queryset directly. And then you can just load this queryset to get the submissions instead of filtering in Python.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the argument is a queryset, that means it will be lazy. We would be doing the splicing at the database layer, so the slice may change between batches. For example, consider we were trying to rejudge Submission.objects.exclude(status='QU'). As we're processing each batch, [0:batch_size] would always contain a not judged queryset since the previous batch's submissions would become QU (and thus filtered out). We would only process the [0:batch_size] splice once, and so we would be missing some submissions.

From what I could tell, there's 2 solutions:

  • Use a single batch. This wouldn't handle large querysets well, so I opted not to do this.
  • Evaluate the queryset so we have a fixed set of submissions to judge in each batch, which is what is currently done via .iterator().

Other solutions are welcome.

ids = set(submission_queryset.values_list('id', flat=True))
# Do the filtering using the new set of IDs rather than submission_queryset
# because submission_queryset itself takes a list of IDs anyways.
SubmissionTestCase.objects.filter(submission_id__in=ids).delete()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And then you can just pass submission_queryset as the argument here.

# It's not worth further complicating this code by separating contest submissions and
# normal submissions as that case is not even being used.
# For the moment, let's just set all submissions to have "DEFAULT_PRIORITY" if we're not rejudging.
'priority': BATCH_REJUDGE_PRIORITY if rejudge else DEFAULT_PRIORITY,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not just move the priority into the list? Then you can set per-submission priorities. In fact priority is a per-submission priority and should be in there anyway...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Batch queueing functionality in bridge
3 participants