Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[#10964] Paginate enroll calls for large courses (FE) #10968

Merged
merged 50 commits into from
Mar 30, 2021

Conversation

daongochieu2810
Copy link
Contributor

@daongochieu2810 daongochieu2810 commented Feb 16, 2021

Part of #10964

Outline of Solution

  • Enable auto row resizing for input table
  • Break down the student enrollment request into requests of at most 100 students
  • Use concat from rxjs to wait and combine the responses of child requests
  • Works fine with 1000 students with a waiting time of 2-3 minutes on staging server. UX can be improved with a progress indicator

Demo on localhost with split-and-retry mechanism: here

Copy link
Contributor

@moziliar moziliar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From the code construct, it looks alright!

Some pointers I noted down in the comments.

Suggestion: add a post-hook in your local repo (and put in .gitignore) to lint your code whenever you commit (just put npm run lint and ./gradlew lint) should be fine? Since linter will be run now or later.

Would be good to include a gif on how this enrollment process looks in both UI and network call (devtool) perspectives. Will have to measure the memory pressure but should not be a major concern in this issue. Of course, can do some testing in staging server too!

@rrtheonlyone would appreciate some feedback on #10943 as it would definitely help make this PR's UX a lot smoother. In the interim, if no big issue, perhaps you could merge that branch into this branch and start working on integrating the error display while that PR is undergoing final review.

Edit: can also merge this first before doing the error message optimisation. Just a suggestion.

@moziliar moziliar added the s.Ongoing The PR is being worked on by the author(s) label Feb 16, 2021
@moziliar
Copy link
Contributor

Extra thought on the linearly scaled waiting time (4-5min for 1000 students): how can we accelerate this? Or is it a concern at all? @damithc

@damithc
Copy link
Contributor

damithc commented Feb 17, 2021

Extra thought on the linearly scaled waiting time (4-5min for 1000 students): how can we accelerate this? Or is it a concern at all? @damithc

Not at the moment. It's fine as long as the user sees regular progress.

Copy link
Contributor

@moziliar moziliar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The problem of misalignment still persists. Please look at the 300+ LoC. This will most likely lead to lint failure anyways (not sure why the CI is not running for this PR to master)

Copy link
Contributor

@moziliar moziliar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM now. But pending discussion on whether to increase the batch size and split on demand.

@moziliar moziliar added s.ToReview The PR is waiting for review(s) and removed s.Ongoing The PR is being worked on by the author(s) labels Feb 18, 2021
Copy link
Contributor

@moziliar moziliar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A question for discussion: if you use chunk in many places, should you already define an interface for it to make the code look cleaner? Not a suggestion but a food for thought.

Could you snip a gif showcasing how it would work? I think a chunk size of 200 would be good demo for timeout and split.

Keep up the good work!

Copy link
Contributor

@moziliar moziliar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Impressive 💯! I like the code construct in the current phase :)

Can you simulate the error occasion and see how the request is broken down into two halves? I believe this can be done by adding some lines to make some chunks larger.

Also, concerning @rrtheonlyone 's worry about quick succession and eventual consistency of the datastore we use, should we pad some interval in between enrolments to achieve a best effort consistency preservation? On the other hand, I did have the thought of using FE to perform pre-flight check on the to-enrol students. That is for another time. Given the latency in local server, I suspect a 1s backoff can allow the datastore instance to reach quorum most of the time.

Comment on lines 394 to 402
(resp: Students) => {
if (resp.students.length !== 0) {
this.loadExistingStudentsData(existingStudentsHOTInstance, resp.students);
} else {
// Shows a message if there are no existing students. Panel would not be expanded.
this.isExistingStudentsPresent = false;
this.isExistingStudentsPanelCollapsed = !this.isExistingStudentsPanelCollapsed; // Collapse the panel again
}
}, (resp: ErrorMessageOutput) => {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why the change of indentation here?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

^

Copy link
Contributor

@rrtheonlyone rrtheonlyone left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the work put in here 👍

To me this is not the way forward for this... just because we got away with it in downloading session results, doesn't mean it always works.

Reasons why I feel this solution is too naive:

  • the number you break up into (200?) is a magic number and just a hopeful toss at the moment (unless you have stress tested on this)
  • the correction mechanism is adding unnecessary complexity to the code base (why split in half and try API again?)
  • datastore is eventually consistent (we have no control over this and we can't wait it out and hope we get correct data) => this adds more problems since we may end up with wrong data in the db

My stance on this is that we should aim to optimize the logic on the backend - that is more firm and will stand the test of time. Of course, feel free to convince me otherwise.

There must be some way to come up with a new execution logic for validation + creation. From how we interact with the database and how we do the validation in the logic layer.

First step, I feel is to separate enrolling new students from updating old students (or be more clever about how we figure out if a student is already there). If that can be done, there are def ways we can optimize the validation and creation logic to easily handle <1000 students. We can do a projection query to retrieve the needed data / we can do a batch create for instance.

My intuition tells me that this batching with retry mechanism on the frontend like this is unnecessarily complex - we are operating in the scale of 100s (its not like there is millions of datapoints). However, I may be wrong - def warrants some discussion.

@moziliar
Copy link
Contributor

Undeniably, the current logic is too complex atm.

Below are some of my opinions:

  • A common approach to improving such workflow (making multiple user calls simpler) is usually just automate what the user does: performing the same action (enrolling students in batches) as the user would, but through the help of frontend code.

  • The doubt about eventual consistency and indeterministic processing time brings about the split and retry, which is actually safe as the enrolment is actually an upsert action which is idempotent.

Improving the workflow won't help if the bulk of the workload is in IO bound operation, likened to Amdahl's law. Paralleling the individual enrolment might help but depends on the instance resource. However, if batching student entity creation helps in DB later, it's worth investigating. To this, I'd like to check the scalability of the solution still. Although I'm sure we won't have to enrol a University worth of students into a single course.

@rrtheonlyone
Copy link
Contributor

I am not sure what you are trying to say above. I am bouncing the idea that we can try achieve this entire operation by making 1/2 API calls with no batching (with backend optimization).

The doubt about eventual consistency and indeterministic processing time brings about the split and retry, which is actually safe as the enrolment is actually an upsert action which is idempotent.

No - split and retry is only triggered upon error. What if there is no error but a succesful enrollment after validating with stale data?

I am suspecting that the eventual consistency issue is going to be worsened by the repeated API calls since it is only increases your chances of getting stale data from the database. I do not know why this is "actually safe" as you point out.

Even the current production version has this problem - just that the probability of it happening is almost zero since the instructor will take some time to break up and re-upload. (can you help me test and say the same for your solution here?)

A common approach to improving such workflow (making multiple user calls simpler) is usually just automate what the user does: performing the same action (enrolling students in batches) as the user would, but through the help of frontend code.

This is indeed what this PR is doing. I have already addressed my concerns to this idea above. We need to be clear that this is the best way forward for this. (e.g. show that no amount of backend optimisation can help).

Improving the workflow won't help if the bulk of the workload is in IO bound operation, likened to Amdahl's law.

Yes, that's right - why not optimize on the IO operations then? Something like getting only a subset of data (projections) and making a small amount of calls to enroll (batch operations). I am not talking about parallelising here haha.

@moziliar
Copy link
Contributor

We should look into BE optimization for sure. The current entity-by-entity enrolment logic can definitely be optimized (unsure about the benefit without concrete stress test result: say if we shave 70% of the latency and allow 250 students enrolment instead of 100, is this scalable enough? This might be my naive thought, will test it out)

Just bouncing some ideas against eventual consistency issue: would enforcing strong consistency at objectify level help (like ofy().consistency(ReadPolicy.STRONG))? Or for a single enrolment workflow (given the condition that no other instructors are performing enrolment with the exact same data set at the exact same time), would validating the data in the frontend minimise the problem?

Copy link
Contributor

@rrtheonlyone rrtheonlyone left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great work! Code looks much better.

Here is another odd behaviour:

Suppose we have for instance 200 students and there is an error in the 200th student that only the backend can catch. When we click enroll, we get back an error message about the 200th student. However, we don't see any enrollment panel about the successful enrollment of students 100-150. This is def tricky but we should be clear to the user about which students got enrolled - right now a user might think the whole batch failed and might get a surprise later on.

const noInvalidRowsBefore: number = this.invalidRowsIndex.size;

Array.from(studentEnrollRequests.keys()).forEach((key: number) => {
const request: StudentEnrollRequest | undefined = studentEnrollRequests.get(key);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

when would this be undefined?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this would not be undefined, but enforcing non-null is not permitted and so putting studentEnrollRequests.get(key)!! will not pass lint


if (teamSectionMap.get(request.team) !== request.section) {
this.invalidRowsIndex.add(key);
const firstIndex: number | undefined = teamIndexMap.get(request.team);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same here - why undefined (also check for the others)

@daongochieu2810
Copy link
Contributor Author

@rrtheonlyone Hi I could not reproduce the scenario you mentioned. Here I try to enroll 200 students with the last one having an incorrect email which can only be caught by backend, but the panel still shows correct enrollment results
ezgif com-gif-maker

@rrtheonlyone
Copy link
Contributor

rrtheonlyone commented Mar 21, 2021

@daongochieu2810 Try put a duplicate team for the last student instead (duplicate team that can only be caught by the backend). I just tried with 51 students (where the 51st student has the error) and this is what I see:

captured

I verified that the first 50 have actually been uploaded.

@daongochieu2810
Copy link
Contributor Author

daongochieu2810 commented Mar 21, 2021

ezgif com-gif-maker (1)

@rrtheonlyone The result panel is populated if an error happens

@rrtheonlyone
Copy link
Contributor

rrtheonlyone commented Mar 21, 2021

Yup better now - however the enrollErrorMessage now becomes overridden by GENERAL_ERROR_MESSAGE. We should be seeing the error message corresponding to duplicate team that is returned by the backend.

@daongochieu2810
Copy link
Contributor Author

I solved that by setting error message after populating the result panels

Copy link
Contributor

@rrtheonlyone rrtheonlyone left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM 👍

@daongochieu2810 daongochieu2810 added s.FinalReview The PR is ready for final review and removed s.Ongoing The PR is being worked on by the author(s) labels Mar 22, 2021
@madanalogy madanalogy added s.ToMerge The PR is approved by all reviewers including final reviewer; ready for merging and removed s.FinalReview The PR is ready for final review labels Mar 23, 2021
@madanalogy madanalogy added the c.Feature User-facing feature; can be new feature or enhancement to existing feature label Mar 23, 2021
@madanalogy madanalogy added this to the V7.13.0 milestone Mar 23, 2021
@madanalogy
Copy link
Contributor

@daongochieu2810 can you resolve the conflicts and update me after

@daongochieu2810
Copy link
Contributor Author

@madanalogy I have resolved the conflicts

@madanalogy
Copy link
Contributor

Could you help to fix the failing component tests as well? Think you just need to update a snapshot as in #11051

@madanalogy madanalogy merged commit 72bafbf into TEAMMATES:master Mar 30, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
c.Feature User-facing feature; can be new feature or enhancement to existing feature s.ToMerge The PR is approved by all reviewers including final reviewer; ready for merging
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants