-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
pthread_cleanup functions must be called from user space #1328
Conversation
13b9dfa
to
eab52cc
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just suggested some minor documentation improvements.
9ec36f4
to
b339224
Compare
The calls to pthread_exit() from the cancellation logic within the OS is a potential show-stopper. Or certainly a reason to stop and re-consider how thread cancellation should work. Thread cancellation should also be primarily a user-space implementation as well. |
I've now merged my branch in and am ready to put some time towards this. It sounds like the next big step is to unify the thread cancellation logic - we will definitely need to do some good testing on this! All of these make sense. I could take care of those. a. Rename pthread_exit() to nx_pthread_exit(). Remove logic that calls pthread_cleanup() functions. But I feel like we should figure out the cancellation stuff before I do any of that.
I don't really understand what the analogous call would do for us there, could you expand? |
Well, you are making me think on my feet, but I will do the best that I can. I might ramble a bit. First, each of the C files referenced above should be treated as a special case. For example in pthread_create.c, pthread_exit() is only called as a error recovery measure. In that case, there can be no pthread_cleanup routines or pthread-specific destructors registered. So we can just replace the call to pthread_exit() with a call to nx_pthread_exit(). This might apply in other places too. But in other cases, I am sure that it will be necessary to call the user-space pthread_exit() from the kernel code in order to perform those actions in user space. The basic problem is that in the PROTECTED build, the pthread_exit() symbol is not available to the kernel. That is because the user blob and the kernel blob are separately built and one blob knows nothing about the symbols internal to the other. The situation is worse in the KERNEL build because there will be multiple instances of pthread_exit() in each user block (aka, "process") and each will lie at a different virtual address -- unless of course we were to implement some kind of shared library support. The simplest way that I can think of to do that is to extend the nx_pthread_create() interface again. On this branch it currently receives the address of the pthread_startup() function and the address of the pthread entry point. It could be extended to also receive the address of pthread_exit() function. This address is of no interest to the pthread_create() logic but it could be saved in pthread_tcb_s structure in the same way that the pthread_startup() function is, perhaps as 'pthread_exit'; Then, where ever pthread_exit() is called in the kernel, this could be replaced with tcb->pthread_exit(). That is not too bad. [Hmm.. I suppose TLS would be a more appropriate place to keep all pthread data rather than the TCB. Although if you have the tcb in hand, you cannot be assured that the tcb corresponds to the current stack so you would have to use that stack base from the TCB.] This approach would also require something like up_pthread_start(). That function calls the specified pthread startup function and switches to user mode. We need some mechanism to call the saved pthread_exit() also switching to user mode. System calls are expensive and a lot of work. I wonder if we can combine the pthread_startup and the pthread_exit system calls into a single function that is agnostic about what it is calling as long as the function signature is the same. This would, then just require some renaming. Like SYS_pthread_startup -> SYS_pthread_action and up_pthread_start() -> up_pthread_action(). In the longer term, we need to get all of the pthread APIs out of the kernel, but this is certainly one positive step. What do you think? In KERNEL mode, there are other things to make sure we understand. In order to call a user space function in KERNEL mode, we have to have the correct address environment in place (up_select_addrenv()). I don't think that is will be a problem, because the main thread and all of the child pthreads share the same address environment. Therefore, I can't think of any situation were the cancellation could occur with the wrong address environment in place. But it is another complexity to keep in mind. |
1. A user-space shim is needed to catch any return from a pthead manin function and to automatically call pthread_exit() from user space. Rename pthread_create() in sched/pthread/pthread_create.c to nx_pthread_create(). Add one new parameter: The address of the user-space pthread startup function. Instead of calling the pthread main entry (directly or indirectly), pthread_start() would call the pthread startup function, passing it the real address of the pthread main function. The call to pthread_exist would be removed from pthread_startup() and move into a new function in user space. 2. Add libs/libc/pthread/lib_pthread_start.c that would contain two trivial functions: static void pthread_startup(pthread_startroutine_t startroutine, pthread_addr_t arg) { pthread_exit(startroutine(arg)); } int pthread_create(FAR pthread_t *thread, FAR const pthread_attr_t *attr, pthread_startroutine_t startroutine, pthread_addr_t arg) { return nx_pthread_create(pthread_startup, thread, attr, startroutine, arg); } 3. Modify up_pthread_start() so that it takes three parameters: startup, entry, and arg. Modify the kernel pthread_start() logic so that it receives these there parameters and calls up_pthread_start() will all three. 4. Remove pthread_startup function pointer from struct userspace_s; it is no longer needed. Still to do: a. Rename pthread_exit() to nx_pthread_exit(). Remove logic that calls pthread_cleanup() functions. b. Make nx_pthread_exit() a system call. c. Create libc/pthread/pthread_exit() that contains only i) the logic that calls the pthread_cleanup functions, and ii) calls the nx_pthread_exit() system call. d. Extend TLS and pthread-specific data function so that the destructor is retained in TLS e. Extend pthread_exit() so that it also calls the pthread-specific data destructors from user-space.
Update arch/arm/src/armv6-m/arm_svcall.c Update libs/libc/pthread/pthread_create.c Update libs/libc/pthread/pthread_create.c Update sched/pthread/pthread_create.c Update sched/pthread/pthread_create.c Update libs/libc/pthread/pthread_create.c Update sched/pthread/pthread_create.c
@antmerlino How should we coordinate work on the branch? I just rebased the pthread branch on my fork. Then I release that this will really mess up what you are doing. rebasing changes the history of the changes branch and can result in spurious merges and occasionally worse. Do you want to share the fork? Should I hold off on rebasing? I get concerned if about the state of the branch if we do not rebase frequently. |
Ah okay, now I see. The piece I wasn't following is that you are going to provide a reference to the userspace exit hook inside the create/startup call, so that the kernel knows how to call. I think what you laid out makes sense. Why don't I create PRs onto your branch and you can pull them in or decline them as we go. This will give you a chance to review the changes as we go. You definitely understand the subtleties more than I. |
Sounds good to me. The level of complexity and keeping the flow all in your head is tricky. I miss subtleties all of the time. And, as with most engineering topics, communicating the technical issues is always the most difficult part by far. I am terrible at trying to read and understand GIT differences. I will try to update the discussion in this PR. I think that a good set of comments here could form a kind of design specification and could also be the basis for a document or WiKi page that documents how this works. |
I have some changes to what you have so far that address build issues. However, I can't open a PR against your branch on your repo. Can we create a feature branch in the Apache repo to work on this together? That way we can make PRs against that branch. |
@antmerlino Yes, but you would have to do that. We have a policy here that no one can merge their own PRs, but you are also a committer and can to do the merge. You have registered your Github user (See https://whimsy.apache.org/roster/ppmc/nuttx). So I assume you have created created your 2 factor login: If not see https://cwiki.apache.org/confluence/display/NUTTX/Accessing+Apache+GitHub+as+a+Committer If you have done those things, then here is what you would have to do to create the feature branch:
From:
To:
That is all there is to it! NOTE: Since this is a WIP, we do not expect the PR checks to pass at this time. |
@antmerlino The branch is ready for your use: https://github.com/apache/incubator-nuttx/tree/feature/pthread-user We need to thank @Ouss4 for doing that for us. |
@antmerlino I neither us has the interest in finishing this change. I propose that we both save the PR as a path and remove the branch for now. What do you think? |
pthread_cleanup functions must be called from user space in PROTECTED and KERNEL build configurations. There are also issues in the FLAT build. Refer to GitHub Issue #1263 for additional background information.
Summary
Rename pthread_create() in sched/pthread/pthread_create.c to nx_pthread_create(). Add one new parameter: The address of the user-space pthread startup function. Instead of calling the pthread main entry (directly or indirectly), pthread_start() would call the pthread startup function, passing it the real address of the pthread main function.
The call to pthread_exit() would be removed from pthread_startup() and move into a new function in user space.
Add libs/libc/pthread/lib_pthread_start.c that would contain two trivial functions:
static void pthread_startup(pthread_startroutine_t startroutine, pthread_addr_t arg)
{
pthread_exit(startroutine(arg));
}
int pthread_create(FAR pthread_t *thread, FAR const pthread_attr_t *attr,
pthread_startroutine_t startroutine, pthread_addr_t arg)
{
return nx_pthread_create(pthread_startup, thread, attr, startroutine, arg);
}
Modify up_pthread_start() so that it takes three parameters: startup, entry, and arg. Modify the kernel pthread_start() logic so that it receives these there parameters and calls up_pthread_start() will all three.
Remove pthread_startup function pointer from struct userspace_s; it is no longer needed.
Still to do:
a. Rename pthread_exit() to nx_pthread_exit(). Remove logic that calls pthread_cleanup() functions.
b. Make nx_pthread_exit() a system call.
c. Create libc/pthread/pthread_exit() that contains only i) the logic that calls the pthread_cleanup functions, and ii) calls the nx_pthread_exit() system call.
There is a complexity... pthread_exit() is also called from within the OS from task_cancelpt.c, pthread_cancel.c, pthread_create.c, sig_default.c, task_cancelpt.c, task_setcancelstate.c, task_setcanceltype.c. This may require another system call analagous to up_pthread_start().
Then implement pthread-specific data destructors:
d. Extend TLS and pthread-specific data function so that the destructor is retained in TLS
e. Extend pthread_exit() so that it also calls the pthread-specific data destructors from user-space.
Impact
This PR would effect all logic that uses pthread and, hence, must be well verified before merging.
Testing
Most testing is performed with sim:ostest which exercises pthread logic pretty well.