Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Interaction with thread local storage #2

Closed
gnzlbg opened this Issue Jun 29, 2016 · 13 comments

Comments

Projects
None yet
2 participants
@gnzlbg
Copy link

gnzlbg commented Jun 29, 2016

I cannot find neither in the wording nor the papers any mention of the interactions of coroutines with thread-local storage. Is it mentioned somewhere?

@GorNishanov

This comment has been minimized.

Copy link
Owner

GorNishanov commented Jun 29, 2016

P0057 Coroutines do not represent independent threads of execution. When a coroutine is executing, it gets the same view of the thread-local storage as whomever called or resumed the coroutine. For example,

thread_local int tls;
generator<int> f() {
  for (;;) {
    printf("tls is %d\n", tls);
    yield 1;
}

whenever you pull from the generator. it will print the value from the thread that resumed the coroutine (pulled from the generator in this case).

I will check with Core Language group if they would like to see a non-normative note with this clarification.

@gnzlbg

This comment has been minimized.

Copy link
Author

gnzlbg commented Jun 29, 2016

What happens when a coroutine is migrated between threads by the scheduler? When the coroutine is resumed in a different thread, does it see the thread local variables of the thread it was moved from, or the ones it was resumed in?

@GorNishanov

This comment has been minimized.

Copy link
Owner

GorNishanov commented Jun 29, 2016

Coroutine initial call or resumption call are regular function calls that do not involve any thread switching, therefore, you always get the thread-local storage of the current thread. If you are thinking of fibers or boost::coroutines, that is a different story.

@gnzlbg

This comment has been minimized.

Copy link
Author

gnzlbg commented Jun 29, 2016

If you are thinking of fibers or boost::coroutines, that is a different story.

I was indeed thinking of these, sorry for the confusion.

Coroutine initial call or resumption call are regular function calls that do not involve any thread switching, therefore, you always get the thread-local storage of the current thread.

I guess I was missing this. So IIUC:

  • When I call a coroutine, the coroutine is executed in the current thread until some suspension point, where it returns a future<T> to the caller. Execution of the coroutine does then not continue until I call .get() on that future. Is this correct?
  • If I move that future<T> from thread A to another thread B, and call .get() in thread B, the coroutine is resumed on thread B (if the future isn't ready). That is, if the coroutine uses thread local variables, the thread-local variables of thread A are used within the coroutine before the suspension point, and after the suspension point the thread-local variables of thread B will be used within the coroutine. Is that right?
@GorNishanov

This comment has been minimized.

Copy link
Owner

GorNishanov commented Jun 29, 2016

When I call a coroutine, the coroutine is executed in the current thread until some suspension point, where it returns a future to the caller. Execution of the coroutine does then not continue until I call .get() on that future. Is this correct>

Not exactly, at least not with std::future or std::future in concurrency TS. The future::get() is a boring blocking call that does not donate its thread to a coroutine. It just blocks the current thread waiting for a signal that coroutine runs to completion and produced a result or an exception.

future<void> foo() { 
  cout << this_thread::id << endl;
  co_await SomeAsyncApi(); 
  cout << this_thread::id << endl;
}

Before await, it will be executing on a thread that called foo().
After await it will resume in OS completion routine on a threadpool or whatever facilities runs completions in that particular environment and thus will print thread::id of that thread.

If I move that future from thread A to another thread B, and call .get() in thread B, the coroutine is resumed on thread B (if the future isn't ready). That is, if the coroutine uses thread local variables, before the suspension point the thread-local variables of thread A are used within the coroutine, and after the suspension point the thread-local variables are used within the coroutine. Is that right?

s/thread that calls .get()/ thread that resumes the suspended coroutine/, then, yes. You always getting the thread local storage of the current thread.

@gnzlbg

This comment has been minimized.

Copy link
Author

gnzlbg commented Jun 29, 2016

Wait (no pun intended!), so if before and after await the coroutine might run on different threads, then the coroutine is getting "migrated" between threads (by the environment), or what am I misunderstanding?

I haven't seen much written about the requirements on the "environment scheduler" in the papers (but maybe I missed some). Consider the following code:

future<void> foo() { 
  thread_local auto tls = 314;
  for (int i = 0; i < 10; ++i) {
      cout << tls << std::endl;
      co_await SomeAsyncApi(); 
  }
}

On the thread that this function is initialized the thread_local variable tls is initialized to 314 (thread_local implies static). If after suspension I call .get on the same thread, but the coroutine is resumed in a different thread by the system scheduler, then reading from the variable tls would be a read from uninitialized memory (and thus UB).

From:

Before await, it will be executing on a thread that called foo().
After await it will resume in OS completion routine on a threadpool or whatever facilities runs completions in that particular environment and thus will print thread::id of that thread.

it seems to me that whether UB occurs depends on the platform's scheduler. Is this so?

@GorNishanov

This comment has been minimized.

Copy link
Owner

GorNishanov commented Jun 29, 2016

Correct.

@gnzlbg

This comment has been minimized.

Copy link
Author

gnzlbg commented Jun 29, 2016

Unless extra guarantees are provided by the scheduler [*] I think it will be very hard to reason about what is going on in a coroutine that uses or references thread_local storage (in particular if this happens implicitly or behind tons of layers of abstraction).

Sometimes it is desired to access a thread_local of the current thread, but I worry that the most common case is when one references a thread_local variable by mistake. For example because the variable is a global variable and the user doesn't know that it is thread_local:

future<void> ohno() {
  1.0 / 0.0;
  co_await SomeAsyncAPI();
  if (errno == 0) { cout << "I can divide by zero!" << endl; }
}

The variable errno will be set to ERANGE on the thread that initiated the coroutine, but unless another mathematical error happened it will be set to zero on the thread that resumed the coroutine.

(And no, checking errno is not a thing, this is just an example).

Another issue could be when silently using thread_locals inside generators that get resumed multiple times. Reasoning about the state of the generator might be impossible (calling a pseudo-random number generator that uses thread_local and is initialized with the same seed could return N times the same value if it gets rescheduled to N different threads on the first N resumptions...).

Another thing that I worry is, what optimizations the compiler can do in the presence of thread locals between coroutines? In my previous comment I had an example with the tls variable, here is a different one:

thread_local auto tls;
future<void> foo() { 
  for (int i = 0; i < 10; ++i) {
      cout << tls << std::endl;  // A
      co_await SomeAsyncApi(); 
      cout << tls << std::endl;  //  B
  }
}

Can the compiler generate code for foo that only reads tls once (instead of twice in the loop)? Or are tls reads effectively volatile across co_await / co_yield statements?

[*] Something I would be opposed to. The scheduler should be free to move coroutines around as it deems fit.

@GorNishanov

This comment has been minimized.

Copy link
Owner

GorNishanov commented Jun 29, 2016

what optimizations the compiler can do in the presence of thread locals between coroutines? Can the compiler generate code for foo that only reads tls once (instead of twice in the loop)? Or are tls reads effectively volatile across co_await / co_yield statements?

Compiler won't cache the addresses of a TLS across the suspend point as it will violate the "you get the thread-local of the currently running thread" behavior.

Unless extra guarantees are provided by the scheduler...Something I would be opposed to. The scheduler should be free to move coroutines around as it deems fit.

I agree. Note that P0057 gives you mechanical function to state machine transformation, library writer imbues it with meaning. How you want to use thread-local should be looked at in the context of semantics of the library layer utilizing the coroutines.

@gnzlbg

This comment has been minimized.

Copy link
Author

gnzlbg commented Jun 29, 2016

@GorNishanov Thanks for the explanations, really appreciated.

@gnzlbg

This comment has been minimized.

Copy link
Author

gnzlbg commented Jul 25, 2016

@GorNishanov reading through the LLVM RFC I do not find any mention about the caching (or lack thereof) of TLS variables across calls to @llvm.experimental.coro.suspend. Shouldn't it be mentioned somewhere?

@GorNishanov

This comment has been minimized.

Copy link
Owner

GorNishanov commented Jul 25, 2016

@gnzlbg In LLVM thread_locals are modelled as global variables (even if thread_local is a local variable in a function). A call to coro.save and coro.suspend intrinsics (from LLVM perspective) can read or write any memory, (just like any other function call), thus, LLVM is not free to cache any read from a global variable across suspend point (including thread_local globals). So, no special handling of thread_local is required, therefore, not mentioned.

Though, I am thinking of adding Q&A at the end of docs/Coroutines.rst. I can include thread_local discussion there.

@gnzlbg

This comment has been minimized.

Copy link
Author

gnzlbg commented Jul 25, 2016

I see, thanks!

Though, I am thinking of adding Q&A at the end of docs/Coroutines.rst. I can include thread_local discussion there.

That would be very helpful.

It might be good to also mention any thought that has been given to dynamically-sized types in coroutines and sketch how one could extend the proposal to allow these in the future.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.