Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[TODO] Overhaul scheduling and thread / process lifecycle #48

Open
6 tasks
byteduck opened this issue Feb 4, 2023 · 5 comments
Open
6 tasks

[TODO] Overhaul scheduling and thread / process lifecycle #48

byteduck opened this issue Feb 4, 2023 · 5 comments
Assignees
Labels
critical This issue affects the core functionality of duckOS kernel The issue is about the kernel

Comments

@byteduck
Copy link
Owner

byteduck commented Feb 4, 2023

The current way that threads and processes are managed is unsafe and breaks easily. Thread, Process, and TaskManager contain some of the oldest code in the entire kernel, and I've learned a lot since writing most of it. In no particular order, here is a list of things that needs to be done:

  • Manage the deallocation / destruction of processes using reference counting.
  • Overhaul thread blocking - blockers should unblock a thread on-demand instead of needing to iterate over every blocked thread on every preemption to check if theyr'e ready to be unblocked.
  • Replace the current round-robin scheduling with something better that allows for different thread priorities.
  • Keep track of locks held by threads and the order in which they were acquired to prevent deadlocks.
  • Make exec work in-place instead of creating a new Process and setting the old one's pid to -1.
  • Reduce preemption overhead by as much as possible.

I'm sure there's more I'm forgetting, and I'll add to this list as I think of things.

@byteduck byteduck added suggestion New feature or request kernel The issue is about the kernel critical This issue affects the core functionality of duckOS labels Feb 4, 2023
@byteduck byteduck self-assigned this Feb 4, 2023
@MarcoCicognani
Copy link
Contributor

To improve the performance of the scheduling and the waiting infrastructure is clever to keep the logic of the waiting state into the execution context of the blocked thread.

I mean: instead of remove the blocked thread from the scheduling list, and put it into a special list which is iterated each context switch to check the waiting state; keep the thread in scheduling list and, when in waiting state, it remains for his time slice into che kernel context to do a while ( need_waiting() ) { yield() }

This simplifies the Scheduler code and reduces unsafe code

@byteduck
Copy link
Owner Author

byteduck commented Feb 5, 2023

To improve the performance of the scheduling and the waiting infrastructure is clever to keep the logic of the waiting state into the execution context of the blocked thread.

I mean: instead of remove the blocked thread from the scheduling list, and put it into a special list which is iterated each context switch to check the waiting state; keep the thread in scheduling list and, when in waiting state, it remains for his time slice into che kernel context to do a while ( need_waiting() ) { yield() }

This simplifies the Scheduler code and reduces unsafe code

That could be a possible solution, I never thought of that! My only worry with doing it that way is that context switches are fairly expensive (we have to invalidate the TLB, switch out registers, etc).

The way I was thinking of doing was instead of having the blocked thread (in waitpid with a WaitBlocker, for example) constantly checking if any children have died and then unblocking itself, we could just have the children unblock any pertinent waiting thread(s) when they die.

@MarcoCicognani
Copy link
Contributor

Even this could be a solution.
The solution I proposed tries to simplify the Scheduler code and the way the waiting is handled.
Different OS kernels use this solution. I don't know for SerenityOS, but GhostOS and EscapeOS for sure

@byteduck
Copy link
Owner Author

byteduck commented Feb 5, 2023

Even this could be a solution.

The solution I proposed tries to simplify the Scheduler code and the way the waiting is handled.

Different OS kernels use this solution. I don't know for SerenityOS, but GhostOS and EscapeOS for sure

Ah okay, I wasn't aware they did that! Doing it the way you suggested would definitely be easier and require rewriting a lot less code, so it's definitely worth looking into.

The main issue with the way it's done now is that we cannot acquire any locks, yield, etc while evaluating block conditions since it's done in the preemption logic, so doing it as you suggested would definitely fix that.

@byteduck
Copy link
Owner Author

byteduck commented Feb 10, 2023

I mean: instead of remove the blocked thread from the scheduling list, and put it into a special list which is iterated each context switch to check the waiting state; keep the thread in scheduling list and, when in waiting state, it remains for his time slice into che kernel context to do a while ( need_waiting() ) { yield() }

This simplifies the Scheduler code and reduces unsafe code

Just an update - I tried this method, and compared it to the old one of looping through blocked threads upon every preemption. Essentially, after preemption, if a thread is blocked, it will check if it should unblock. If not, it will yield to the next thread in the queue.

On my machine, it is about 30% slower to boot doing it this way, probably due to the overhead of having those blocked threads in the queue since the vast majority of threads are blocked at any given time. Preempting without switching out the page tables when the thread is blocked (since we only care about kernel memory anyway) helps a little bit, but it's still a lot slower than the old way. This could be alleviated by moving blocked threads to a lower-priority queue, but that would also result in less responsiveness since threads would take longer on average to wake back up.

Ultimately, I think an on-demand thread blocking system would be best, where a thread is immediately added back to the queue once it's ready instead of constantly polling whether or not it's ready. This will require a lot of changes, but it should result in higher responsiveness.

@byteduck byteduck removed the suggestion New feature or request label Feb 10, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
critical This issue affects the core functionality of duckOS kernel The issue is about the kernel
Projects
None yet
Development

No branches or pull requests

2 participants