New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fiber/Scheduler refactorings #6897

Merged
merged 7 commits into from Oct 9, 2018

Conversation

Projects
None yet
5 participants
@ysbaddaden
Member

ysbaddaden commented Oct 1, 2018

This pull request is a collection of refactors of Fiber, Event, Scheduler and EventLoop (previously merged into Scheduler). This is meant as a first step to isolate the Scheduler from the rest (fibers, libevent) and centralize the scheduling logic to the Scheduler.

The proposed changes shouldn't break anything, and don't seem to have much impact on the overall performance —from my limited testing. They should, however, simplify experiments on Scheduler itself (e.g. multithreads).

Each commit tries to be a single change, with the commit message explaining what and why.

@ysbaddaden ysbaddaden force-pushed the ysbaddaden:multithread-preparations branch from b8a153f to 55f6096 Oct 1, 2018

@RX14

RX14 approved these changes Oct 1, 2018

Show resolved Hide resolved src/thread.cr Outdated
Show resolved Hide resolved src/fiber.cr Outdated

@ysbaddaden ysbaddaden force-pushed the ysbaddaden:multithread-preparations branch from 55f6096 to f2879a9 Oct 1, 2018

@ysbaddaden

This comment has been minimized.

Member

ysbaddaden commented Oct 1, 2018

I removed Fiber.sleep and used an explicit initializer for Thread.@@current_key.

class Scheduler
@@runnables = Deque(Fiber).new
@@eb = Event::Base.new
class Crystal::EventLoop

This comment has been minimized.

@straight-shoota

straight-shoota Oct 1, 2018

Contributor

Why not a module?

This comment has been minimized.

@ysbaddaden

ysbaddaden Oct 1, 2018

Member

Probably because Scheduler used to be a class with only class methods too, and I overlooked that.

This comment has been minimized.

@bew

bew Oct 1, 2018

Contributor

would it be possible to keep it as a class, and make all current class methods → instance methods, and ultimately allow to make multiple event loops? (no idea what usecase there could possibly be for that though)

This comment has been minimized.

@ysbaddaden

ysbaddaden Oct 1, 2018

Member

I don't have any plans to experiment with many event loops. Quite the opposite.

class Thread
# :nodoc:
def scheduler
@scheduler ||= Crystal::Scheduler.new

This comment has been minimized.

@bcardiff

bcardiff Oct 1, 2018

Member

Currently the scheduler is global. In order to keep the current behavior this should be a class var.
I think there will be multiple schedulers with MT, so the accessing the scheduler via the current thread is ok.

This comment has been minimized.

@ysbaddaden

ysbaddaden Oct 1, 2018

Member

Is there a downside to attach the scheduler on the thread now?

# :nodoc:
def makecontext(stack_ptr, fiber_main) : Void*
# align the stack pointer to 16 bytes
stack_ptr = Pointer(Void*).new(stack_ptr.address & ~0x0f_u64)

This comment has been minimized.

@bcardiff

bcardiff Oct 1, 2018

Member

The alignment was done for every arch before. Wouldn't be better to keep that before the call to makecontext?

It would be great to leave somewhere a doc related to makecontext and swapcontext (and their required annotations).

This comment has been minimized.

@ysbaddaden

ysbaddaden Oct 1, 2018

Member

It's still done for every arch but that's probably not required on all architectures, so evolutions can choose to align or not. Is this required for something (like BDWGC)?

Yes, there should be some documentation for the arch-specific methods (that take their names off the deprecated POSIX makecontext / setcontext / swapcontext). I just wasn't exactly sure where to put them. Maybe in fiber.cr itself?

This comment has been minimized.

@bcardiff

bcardiff Oct 1, 2018

Member

I am not sure it is a requirement. But it itches me to repeat that "cryptic" call in every method.
Documentation in fiber.cr seems good.

{% unless flag?(:openbsd) %}
@@current = self
{% end %}
Thread.current = self

This comment has been minimized.

@bcardiff

bcardiff Oct 1, 2018

Member

For the main thread the #start is never called, so setting the current thread by this line wont happen.
In my WIP for mt-gc at bcardiff@aedde16 I changed how main fiber / thread will initialize.

On platforms with ThreadLocal there will be 2 instances of Thread objects with the current code since the @@current = new will be evaluated again.

For now, performing Thread.current = self in both constructors seem enough.

This comment has been minimized.

@ysbaddaden

ysbaddaden Oct 2, 2018

Member

I now always call self.current = new in Thread, but maybe it would be better placed in src/kernel.cr (eventually).

@@ -6,6 +6,10 @@ class Thread
# Don't use this class, it is used internally by the event scheduler.
# Use spawn and channels instead.
# all threads, so the GC can see them (GC doesn't scan thread locals)
@@threads = Set(Thread).new

This comment has been minimized.

@bcardiff

bcardiff Oct 1, 2018

Member

Is there a need to use Set here? Can't we use a linked list as in fibers? After this commit we stop iterating the set. I think we only need to linked pointers to avoid GC from collecting the instances, right?

This comment has been minimized.

@ysbaddaden

ysbaddaden Oct 1, 2018

Member

I just moved the line, wondered avout changing it and finally kept it. An Array would be simpler and enough.

This comment has been minimized.

@bcardiff

bcardiff Oct 1, 2018

Member

We can remove the Set later for sure, but I found that the linked list is enough. If we guarantee there is only one Thread object per thread then we don't even need to lookup to remove it from the list.

Bonus point the thread safe linked list pointer implementation from fibers can be extracted and reused.

I think the set was added for openbsd, hence maybe I was missing something regarding why Set was used in the first place.

@@ -4,14 +4,20 @@ require "thread"
class Thread
# :nodoc:
def scheduler
@scheduler ||= Crystal::Scheduler.new
@scheduler ||= Crystal::Scheduler.new(Fiber.new)

This comment has been minimized.

@bcardiff

bcardiff Oct 1, 2018

Member

Same as #6897 (comment)

Currently the scheduler is global. In order to keep the current behavior this should be a class var.
I think there will be multiple schedulers with MT, so the accessing the scheduler via the current thread is ok.

Also, related to #6897 (comment) I think the initialization of main Fiber/Threads can be simplified a bit, but that can come after this PR if needed.

This comment has been minimized.

@RX14

RX14 Oct 1, 2018

Member

This change is fine, surely, because the current behaviour with multiple threads is to segfault anyway.

This comment has been minimized.

@bcardiff

bcardiff Oct 1, 2018

Member

But it's already deciding something regarding the multiple schedulers and we should not introduce such change here.

@ysbaddaden ysbaddaden force-pushed the ysbaddaden:multithread-preparations branch from 8cbdf35 to 9eeda35 Oct 2, 2018

@ysbaddaden

This comment has been minimized.

Member

ysbaddaden commented Oct 2, 2018

I pushed some fixups, including more thread refactorings:

  • abstracts the new Fiber::LinkedList struct as a thread-safe generic Thread::LinkedList(T) and is now used to keep a list of thread objects (for the GC to find/mark them);
  • adds and always initializes Thread#fiber_main, passed to the Scheduler associated to the thread.

Oops. I broke GC. I now always create a main fiber for a thread but never removed it from the list of active fibers, and thus happily passed the reference to GC.collect even thought the thread may be long terminated.

@straight-shoota

This comment has been minimized.

Contributor

straight-shoota commented Oct 2, 2018

The documentation added for implementing makecontext and swapcontext shouldn't be part of the public API docs as the methods themselves are already nodoc. That's just internal documentation.

Maybe the the arch-specific implementations could be moved to the Crystal::Fiber namespace and included into Fiber, similar to other platform-specific implementations such as File.

@RX14

This comment has been minimized.

Member

RX14 commented Oct 2, 2018

This looks good, although i'd like to re-review once all fixups have been applied

@ysbaddaden

This comment has been minimized.

Member

ysbaddaden commented Oct 3, 2018

@straight-shoota read the patch again, the docs for the context methods is attached to the require not the Fiber class, hence not public AFAIK.

@ysbaddaden ysbaddaden requested a review from bcardiff Oct 3, 2018

Show resolved Hide resolved src/crystal/scheduler.cr
Show resolved Hide resolved src/fiber.cr
fiber_main = ->(f : Fiber) { f.run }
# ???

This comment has been minimized.

@bcardiff

bcardiff Oct 3, 2018

Member

I think this is because the @stack_bottom points to 1 position after the valid region of the whole @stack. So is allowing the stack_ptr to start from "END-1"

This comment has been minimized.

@RX14

RX14 Oct 3, 2018

Member

Yes, because the stack grows down, so issuing a store to stack_bottom would store off the end of the stack.

This comment has been minimized.

@bcardiff

bcardiff Oct 4, 2018

Member

So I think adding a comment here and apply the fixups in the commits is what is pending to re-review and merge this PR. Right?

Then we can move to add some changes to support MT GC and other required changes for parallelism in others PR.

Show resolved Hide resolved src/thread.cr
# :nodoc:
def scheduler
@scheduler ||= Crystal::Scheduler.new(main_fiber)

This comment has been minimized.

@bcardiff

bcardiff Oct 3, 2018

Member

So, besides the refactor this PR will allow each thread to have a the current ST scheduler, but the GC support is not yet guaranteed. Ok, we can go in that direction as intermediate stage.

Show resolved Hide resolved src/thread/linked_list.cr Outdated
@ysbaddaden

This comment has been minimized.

Member

ysbaddaden commented Oct 5, 2018

I'm squashing the fixups to have clean comments. That will take some time.

@ysbaddaden ysbaddaden force-pushed the ysbaddaden:multithread-preparations branch 2 times, most recently from e048df1 to 1089f2a Oct 5, 2018

@ysbaddaden

This comment has been minimized.

Member

ysbaddaden commented Oct 5, 2018

Squashed.

@bcardiff

It's great! @RX14 want to re review?

# For the initial resume, the register holding the first parameter must be set
# (see makecontext below) and thus must also be saved/restored.
#
# - `Fiber#makecontext(stack_ptr : Void*, fiber_main : Fiber ->))`

This comment has been minimized.

@bcardiff

bcardiff Oct 5, 2018

Member

there is an extra )

@@ -12,15 +12,15 @@ def sleep(seconds : Number)
raise ArgumentError.new "Sleep seconds must be positive"
end
Fiber.sleep(seconds)
Crystal::Scheduler.sleep(seconds.seconds)

This comment has been minimized.

@straight-shoota

straight-shoota Oct 5, 2018

Contributor

Crystal::Scheduler.sleep look a bit off as it seems that the scheduler would sleep when it actually puts the current fiber to sleep. I don't have better suggestion at hand, just wanted to mention it.

This comment has been minimized.

@RX14

RX14 Oct 5, 2018

Member

Crystal::Scheduler.sleep is used only used in these two methods so i think its acceptable

@RX14

RX14 approved these changes Oct 5, 2018

fiber_main = ->(f : Fiber) { f.run }
# point to first addressable pointer on the stack (@stack_bottom points to
# the limit):

This comment has been minimized.

@RX14

RX14 Oct 5, 2018

Member

I think it's more understandable to say "stack_bottom points past the stack because the stack grows down"

ysbaddaden added some commits Sep 14, 2018

Move Scheduler and EventLoop to Crystal namespace
Also moves Channel from src/concurrent/channel.cr to src/channel.cr
Move scheduler logic from Fiber to Crystal::Scheduler
- Thread now always initialize a Fiber instance for their main
  stack, and remove it from the list of fibers when shutting down
  (otherwise the GC would continue to scan it).

- Current fiber is now kept by Crystal::Scheduler not Thread. The
  currently running fiber is the responsibility of the scheduler,
  and it should avoid some pointer/thread-local dereferences.

- Switching contexts is now the responsibility of Crystal::Scheduler
  instead of Fiber. Fiber still provides the primitives to handle
  the arch-specific contexts.

- Fiber instance methods such as `#yield` or `#sleep` are moved to
  Crystal::Scheduler. Use `sleep` and `sleep(seconds)` instead.

- Fiber now only holds the fiber state (stack, stack top pointer,
  context (saved registers), resumable event, ...), and keeps a few
  methods such as `#resume` or `.yield` to avoid breaking changes.
OpenBSD: use thread specific storage to save Thread.current
Adds Thread.current= setter to set the Thread instance for the
current kernel thread. Always use the setter to initialize the
Thread instance for the main thread.

Uses pthread specific storage to save the Thread instance on OpenBSD
(and Android), which should be faster than iterating a Set.

Still uses a ThreadLocal for supported targets which is faster than
pthread specific storage. This may eventually fail with multithreaded
fiber context switches, thought. C compilers cache ThreadLocal
pointers and  LLVM probably does the same, which is unsafe when
context switches can happen anywhere anytime.

Bonus: `@th` doesn't have to nilable.
Move Event to Crystal namespace
Renames the following structs:
- Event::Event -> Crystal::Event
- Event::Base -> Crystal::Event::Base
- Event::DnsBase -> Crystal::Event::DnsBase
Extract Thread::LinkedList to reference active fibers/threads
Use a simple linked list struct with a few methods, instead of
inlining the logic into Fiber. This cleans up Fiber, and allows
Thread to reuse the struct for the list of active threads.

@ysbaddaden ysbaddaden force-pushed the ysbaddaden:multithread-preparations branch from 1089f2a to a110e12 Oct 7, 2018

@ysbaddaden

This comment has been minimized.

Member

ysbaddaden commented Oct 7, 2018

Fixed typos & rebased again.

@RX14

RX14 approved these changes Oct 9, 2018

@RX14 RX14 added this to the 0.27.0 milestone Oct 9, 2018

@RX14 RX14 merged commit c679102 into crystal-lang:master Oct 9, 2018

4 checks passed

ci/circleci: test_darwin Your tests passed on CircleCI!
Details
ci/circleci: test_linux Your tests passed on CircleCI!
Details
ci/circleci: test_linux32 Your tests passed on CircleCI!
Details
continuous-integration/travis-ci/pr The Travis CI build passed
Details
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment