pthread_cleanup functions must be called from user space #1328

patacongo · 2020-06-29T15:04:37Z

pthread_cleanup functions must be called from user space in PROTECTED and KERNEL build configurations. There are also issues in the FLAT build. Refer to GitHub Issue #1263 for additional background information.

Summary

A user-space shim is needed to catch any return from a pthead main function and to automatically call pthread_exit() from user space.

Rename pthread_create() in sched/pthread/pthread_create.c to nx_pthread_create(). Add one new parameter: The address of the user-space pthread startup function. Instead of calling the pthread main entry (directly or indirectly), pthread_start() would call the pthread startup function, passing it the real address of the pthread main function.

The call to pthread_exit() would be removed from pthread_startup() and move into a new function in user space.

Add libs/libc/pthread/lib_pthread_start.c that would contain two trivial functions:

static void pthread_startup(pthread_startroutine_t startroutine, pthread_addr_t arg)
{
pthread_exit(startroutine(arg));
}

int pthread_create(FAR pthread_t *thread, FAR const pthread_attr_t *attr,
pthread_startroutine_t startroutine, pthread_addr_t arg)
{
return nx_pthread_create(pthread_startup, thread, attr, startroutine, arg);
}
Modify up_pthread_start() so that it takes three parameters: startup, entry, and arg. Modify the kernel pthread_start() logic so that it receives these there parameters and calls up_pthread_start() will all three.
Remove pthread_startup function pointer from struct userspace_s; it is no longer needed.

Still to do:

a. Rename pthread_exit() to nx_pthread_exit(). Remove logic that calls pthread_cleanup() functions.
b. Make nx_pthread_exit() a system call.
c. Create libc/pthread/pthread_exit() that contains only i) the logic that calls the pthread_cleanup functions, and ii) calls the nx_pthread_exit() system call.

There is a complexity... pthread_exit() is also called from within the OS from task_cancelpt.c, pthread_cancel.c, pthread_create.c, sig_default.c, task_cancelpt.c, task_setcancelstate.c, task_setcanceltype.c. This may require another system call analagous to up_pthread_start().

Then implement pthread-specific data destructors:

d. Extend TLS and pthread-specific data function so that the destructor is retained in TLS
e. Extend pthread_exit() so that it also calls the pthread-specific data destructors from user-space.

Impact

This PR would effect all logic that uses pthread and, hence, must be well verified before merging.

Testing

Most testing is performed with sim:ostest which exercises pthread logic pretty well.

gustavonihei

Just suggested some minor documentation improvements.

arch/arm/src/armv6-m/arm_svcall.c

libs/libc/pthread/pthread_create.c

sched/pthread/pthread_create.c

patacongo · 2020-06-30T16:57:56Z

There is a complexity... pthread_exit() is also called from within the OS from task_cancelpt.c, pthread_cancel.c, pthread_create.c, sig_default.c, task_cancelpt.c, task_setcancelstate.c, task_setcanceltype.c. This may require another system call analogous to up_pthread_start().

The calls to pthread_exit() from the cancellation logic within the OS is a potential show-stopper. Or certainly a reason to stop and re-consider how thread cancellation should work. Thread cancellation should also be primarily a user-space implementation as well.

antmerlino · 2020-07-01T16:07:32Z

There is a complexity... pthread_exit() is also called from within the OS from task_cancelpt.c, pthread_cancel.c, pthread_create.c, sig_default.c, task_cancelpt.c, task_setcancelstate.c, task_setcanceltype.c. This may require another system call analogous to up_pthread_start().

The calls to pthread_exit() from the cancellation logic within the OS is a potential show-stopper. Or certainly a reason to stop and re-consider how thread cancellation should work. Thread cancellation should also be primarily a user-space implementation as well.

@gregory-nutt

I've now merged my branch in and am ready to put some time towards this. It sounds like the next big step is to unify the thread cancellation logic - we will definitely need to do some good testing on this!

All of these make sense. I could take care of those.

a. Rename pthread_exit() to nx_pthread_exit(). Remove logic that calls pthread_cleanup() functions.
b. Make nx_pthread_exit() a system call.
c. Create libc/pthread/pthread_exit() that contains only i) the logic that calls the pthread_cleanup functions, and ii) calls the nx_pthread_exit() system call.
d. Extend TLS and pthread-specific data function so that the destructor is retained in TLS
e. Extend pthread_exit() so that it also calls the pthread-specific data destructors from user-space.

But I feel like we should figure out the cancellation stuff before I do any of that.

There is a complexity... pthread_exit() is also called from within the OS from task_cancelpt.c, pthread_cancel.c, pthread_create.c, sig_default.c, task_cancelpt.c, task_setcancelstate.c, task_setcanceltype.c. This may require another system call analagous to up_pthread_start().

I don't really understand what the analogous call would do for us there, could you expand?

patacongo · 2020-07-01T16:35:55Z

But I feel like we should figure out the cancellation stuff before I do any of that.

There is a complexity... pthread_exit() is also called from within the OS from task_cancelpt.c, pthread_cancel.c, pthread_create.c, sig_default.c, task_cancelpt.c, task_setcancelstate.c, task_setcanceltype.c. This may require another system call analagous to up_pthread_start().

I don't really understand what the analogous call would do for us there, could you expand?

Well, you are making me think on my feet, but I will do the best that I can. I might ramble a bit.

First, each of the C files referenced above should be treated as a special case. For example in pthread_create.c, pthread_exit() is only called as a error recovery measure. In that case, there can be no pthread_cleanup routines or pthread-specific destructors registered. So we can just replace the call to pthread_exit() with a call to nx_pthread_exit(). This might apply in other places too.

But in other cases, I am sure that it will be necessary to call the user-space pthread_exit() from the kernel code in order to perform those actions in user space. The basic problem is that in the PROTECTED build, the pthread_exit() symbol is not available to the kernel. That is because the user blob and the kernel blob are separately built and one blob knows nothing about the symbols internal to the other. The situation is worse in the KERNEL build because there will be multiple instances of pthread_exit() in each user block (aka, "process") and each will lie at a different virtual address -- unless of course we were to implement some kind of shared library support.

The simplest way that I can think of to do that is to extend the nx_pthread_create() interface again. On this branch it currently receives the address of the pthread_startup() function and the address of the pthread entry point. It could be extended to also receive the address of pthread_exit() function. This address is of no interest to the pthread_create() logic but it could be saved in pthread_tcb_s structure in the same way that the pthread_startup() function is, perhaps as 'pthread_exit'; Then, where ever pthread_exit() is called in the kernel, this could be replaced with tcb->pthread_exit(). That is not too bad.

[Hmm.. I suppose TLS would be a more appropriate place to keep all pthread data rather than the TCB. Although if you have the tcb in hand, you cannot be assured that the tcb corresponds to the current stack so you would have to use that stack base from the TCB.]

This approach would also require something like up_pthread_start(). That function calls the specified pthread startup function and switches to user mode. We need some mechanism to call the saved pthread_exit() also switching to user mode.

System calls are expensive and a lot of work. I wonder if we can combine the pthread_startup and the pthread_exit system calls into a single function that is agnostic about what it is calling as long as the function signature is the same. This would, then just require some renaming. Like SYS_pthread_startup -> SYS_pthread_action and up_pthread_start() -> up_pthread_action().

In the longer term, we need to get all of the pthread APIs out of the kernel, but this is certainly one positive step.

What do you think?

In KERNEL mode, there are other things to make sure we understand. In order to call a user space function in KERNEL mode, we have to have the correct address environment in place (up_select_addrenv()). I don't think that is will be a problem, because the main thread and all of the child pthreads share the same address environment. Therefore, I can't think of any situation were the cancellation could occur with the wrong address environment in place. But it is another complexity to keep in mind.

1. A user-space shim is needed to catch any return from a pthead manin function and to automatically call pthread_exit() from user space. Rename pthread_create() in sched/pthread/pthread_create.c to nx_pthread_create(). Add one new parameter: The address of the user-space pthread startup function. Instead of calling the pthread main entry (directly or indirectly), pthread_start() would call the pthread startup function, passing it the real address of the pthread main function. The call to pthread_exist would be removed from pthread_startup() and move into a new function in user space. 2. Add libs/libc/pthread/lib_pthread_start.c that would contain two trivial functions: static void pthread_startup(pthread_startroutine_t startroutine, pthread_addr_t arg) { pthread_exit(startroutine(arg)); } int pthread_create(FAR pthread_t *thread, FAR const pthread_attr_t *attr, pthread_startroutine_t startroutine, pthread_addr_t arg) { return nx_pthread_create(pthread_startup, thread, attr, startroutine, arg); } 3. Modify up_pthread_start() so that it takes three parameters: startup, entry, and arg. Modify the kernel pthread_start() logic so that it receives these there parameters and calls up_pthread_start() will all three. 4. Remove pthread_startup function pointer from struct userspace_s; it is no longer needed. Still to do: a. Rename pthread_exit() to nx_pthread_exit(). Remove logic that calls pthread_cleanup() functions. b. Make nx_pthread_exit() a system call. c. Create libc/pthread/pthread_exit() that contains only i) the logic that calls the pthread_cleanup functions, and ii) calls the nx_pthread_exit() system call. d. Extend TLS and pthread-specific data function so that the destructor is retained in TLS e. Extend pthread_exit() so that it also calls the pthread-specific data destructors from user-space.

Update arch/arm/src/armv6-m/arm_svcall.c Update libs/libc/pthread/pthread_create.c Update libs/libc/pthread/pthread_create.c Update sched/pthread/pthread_create.c Update sched/pthread/pthread_create.c Update libs/libc/pthread/pthread_create.c Update sched/pthread/pthread_create.c

patacongo · 2020-07-01T17:13:12Z

@antmerlino How should we coordinate work on the branch? I just rebased the pthread branch on my fork. Then I release that this will really mess up what you are doing. rebasing changes the history of the changes branch and can result in spurious merges and occasionally worse.

Do you want to share the fork? Should I hold off on rebasing? I get concerned if about the state of the branch if we do not rebase frequently.

antmerlino · 2020-07-01T17:18:40Z

Ah okay, now I see. The piece I wasn't following is that you are going to provide a reference to the userspace exit hook inside the create/startup call, so that the kernel knows how to call.

I think what you laid out makes sense.

Why don't I create PRs onto your branch and you can pull them in or decline them as we go. This will give you a chance to review the changes as we go. You definitely understand the subtleties more than I.

patacongo · 2020-07-01T17:28:15Z

Why don't I create PRs onto your branch and you can pull them in or decline them as we go. This will give you a chance to review the changes as we go. You definitely understand the subtleties more than I.

Sounds good to me. The level of complexity and keeping the flow all in your head is tricky. I miss subtleties all of the time. And, as with most engineering topics, communicating the technical issues is always the most difficult part by far. I am terrible at trying to read and understand GIT differences.

I will try to update the discussion in this PR. I think that a good set of comments here could form a kind of design specification and could also be the basis for a document or WiKi page that documents how this works.

antmerlino · 2020-07-01T19:44:32Z

@gregory-nutt

I have some changes to what you have so far that address build issues. However, I can't open a PR against your branch on your repo.

Can we create a feature branch in the Apache repo to work on this together? That way we can make PRs against that branch.

patacongo · 2020-07-01T20:50:17Z

@gregory-nutt

I have some changes to what you have so far that address build issues. However, I can't open a PR against your branch on your repo.

Can we create a feature branch in the Apache repo to work on this together? That way we can make PRs against that branch.

@antmerlino Yes, but you would have to do that. We have a policy here that no one can merge their own PRs, but you are also a committer and can to do the merge. You have registered your Github user (See https://whimsy.apache.org/roster/ppmc/nuttx). So I assume you have created created your 2 factor login: If not see https://cwiki.apache.org/confluence/display/NUTTX/Accessing+Apache+GitHub+as+a+Committer

If you have done those things, then here is what you would have to do to create the feature branch:

Create the feature branch in your incubator-nuttx clone, say "pthread-user"
Push the branch to the incubator-nuttx origin
Go to this PR and select "Edit" on the right near the top.
Change the destination

From:

patacongo wants to merge 2 commits into apache:master from patacongo:pthread

To:

patacongo wants to merge 2 commits into apache:pthread-user from patacongo:pthread

Then just do the rebase merge using the buttons at the bottom

That is all there is to it!

NOTE: Since this is a WIP, we do not expect the PR checks to pass at this time.

patacongo · 2020-07-03T18:41:58Z

@antmerlino The branch is ready for your use: https://github.com/apache/incubator-nuttx/tree/feature/pthread-user We need to thank @Ouss4 for doing that for us.

patacongo · 2020-10-10T19:05:37Z

@antmerlino I neither us has the interest in finishing this change. I propose that we both save the PR as a path and remove the branch for now. What do you think?

patacongo marked this pull request as draft June 29, 2020 15:04

patacongo changed the title ~~pthread_cleanup functions must be called from user space~~ [DRAFT] pthread_cleanup functions must be called from user space Jun 29, 2020

patacongo force-pushed the pthread branch 7 times, most recently from 13b9dfa to eab52cc Compare June 29, 2020 19:59

patacongo added Type: Bug Something isn't working Area: Security Security of OS in secure modes labels Jun 29, 2020

patacongo force-pushed the pthread branch from eab52cc to 86a65be Compare June 29, 2020 23:32

gustavonihei reviewed Jun 30, 2020

View reviewed changes

patacongo force-pushed the pthread branch 2 times, most recently from 9ec36f4 to b339224 Compare June 30, 2020 15:54

patacongo closed this Jun 30, 2020

patacongo reopened this Jun 30, 2020

patacongo force-pushed the pthread branch from 1f0378d to eb942c1 Compare June 30, 2020 17:08

patacongo mentioned this pull request Jul 1, 2020

Call c++ global variables in nxtask_startup #1341

Merged

patacongo force-pushed the pthread branch from eb942c1 to 54dc5d7 Compare July 1, 2020 14:49

gregory-nutt and others added 2 commits July 1, 2020 11:06

patacongo force-pushed the pthread branch from 54dc5d7 to a6ba113 Compare July 1, 2020 17:07

Ouss4 marked this pull request as ready for review July 3, 2020 18:29

Ouss4 changed the base branch from master to feature/pthread-user July 3, 2020 18:29

Ouss4 merged commit be0ff4d into apache:feature/pthread-user Jul 3, 2020

Ouss4 changed the title ~~[DRAFT] pthread_cleanup functions must be called from user space~~ pthread_cleanup functions must be called from user space Jul 3, 2020

patacongo deleted the pthread branch July 3, 2020 18:50

patacongo mentioned this pull request Apr 29, 2021

libc: Move pthread stuff to userspace #3626

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

pthread_cleanup functions must be called from user space #1328

pthread_cleanup functions must be called from user space #1328

patacongo commented Jun 29, 2020 •

edited

Loading

gustavonihei left a comment

patacongo commented Jun 30, 2020

antmerlino commented Jul 1, 2020 •

edited

Loading

patacongo commented Jul 1, 2020 •

edited

Loading

patacongo commented Jul 1, 2020

antmerlino commented Jul 1, 2020

patacongo commented Jul 1, 2020 •

edited

Loading

antmerlino commented Jul 1, 2020

patacongo commented Jul 1, 2020 •

edited

Loading

patacongo commented Jul 3, 2020

patacongo commented Oct 10, 2020

pthread_cleanup functions must be called from user space #1328

pthread_cleanup functions must be called from user space #1328

Conversation

patacongo commented Jun 29, 2020 • edited Loading

Summary

Impact

Testing

gustavonihei left a comment

Choose a reason for hiding this comment

patacongo commented Jun 30, 2020

antmerlino commented Jul 1, 2020 • edited Loading

patacongo commented Jul 1, 2020 • edited Loading

patacongo commented Jul 1, 2020

antmerlino commented Jul 1, 2020

patacongo commented Jul 1, 2020 • edited Loading

antmerlino commented Jul 1, 2020

patacongo commented Jul 1, 2020 • edited Loading

patacongo commented Jul 3, 2020

patacongo commented Oct 10, 2020

patacongo commented Jun 29, 2020 •

edited

Loading

antmerlino commented Jul 1, 2020 •

edited

Loading

patacongo commented Jul 1, 2020 •

edited

Loading

patacongo commented Jul 1, 2020 •

edited

Loading

patacongo commented Jul 1, 2020 •

edited

Loading