Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

pthread_cleanup functions must be called from user space #1328

Merged
merged 2 commits into from
Jul 3, 2020

Conversation

patacongo
Copy link
Contributor

@patacongo patacongo commented Jun 29, 2020

pthread_cleanup functions must be called from user space in PROTECTED and KERNEL build configurations. There are also issues in the FLAT build. Refer to GitHub Issue #1263 for additional background information.

Summary

  1. A user-space shim is needed to catch any return from a pthead main function and to automatically call pthread_exit() from user space.

Rename pthread_create() in sched/pthread/pthread_create.c to nx_pthread_create(). Add one new parameter: The address of the user-space pthread startup function. Instead of calling the pthread main entry (directly or indirectly), pthread_start() would call the pthread startup function, passing it the real address of the pthread main function.

The call to pthread_exit() would be removed from pthread_startup() and move into a new function in user space.

  1. Add libs/libc/pthread/lib_pthread_start.c that would contain two trivial functions:

    static void pthread_startup(pthread_startroutine_t startroutine, pthread_addr_t arg)
    {
    pthread_exit(startroutine(arg));
    }

    int pthread_create(FAR pthread_t *thread, FAR const pthread_attr_t *attr,
    pthread_startroutine_t startroutine, pthread_addr_t arg)
    {
    return nx_pthread_create(pthread_startup, thread, attr, startroutine, arg);
    }

  2. Modify up_pthread_start() so that it takes three parameters: startup, entry, and arg. Modify the kernel pthread_start() logic so that it receives these there parameters and calls up_pthread_start() will all three.

  3. Remove pthread_startup function pointer from struct userspace_s; it is no longer needed.

Still to do:

a. Rename pthread_exit() to nx_pthread_exit(). Remove logic that calls pthread_cleanup() functions.
b. Make nx_pthread_exit() a system call.
c. Create libc/pthread/pthread_exit() that contains only i) the logic that calls the pthread_cleanup functions, and ii) calls the nx_pthread_exit() system call.

There is a complexity... pthread_exit() is also called from within the OS from task_cancelpt.c, pthread_cancel.c, pthread_create.c, sig_default.c, task_cancelpt.c, task_setcancelstate.c, task_setcanceltype.c. This may require another system call analagous to up_pthread_start().

Then implement pthread-specific data destructors:

d. Extend TLS and pthread-specific data function so that the destructor is retained in TLS
e. Extend pthread_exit() so that it also calls the pthread-specific data destructors from user-space.

Impact

This PR would effect all logic that uses pthread and, hence, must be well verified before merging.

Testing

Most testing is performed with sim:ostest which exercises pthread logic pretty well.

@patacongo patacongo marked this pull request as draft June 29, 2020 15:04
@patacongo patacongo changed the title pthread_cleanup functions must be called from user space [DRAFT] pthread_cleanup functions must be called from user space Jun 29, 2020
@patacongo patacongo force-pushed the pthread branch 7 times, most recently from 13b9dfa to eab52cc Compare June 29, 2020 19:59
@patacongo patacongo added Type: Bug Something isn't working Area: Security Security of OS in secure modes labels Jun 29, 2020
Copy link
Contributor

@gustavonihei gustavonihei left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just suggested some minor documentation improvements.

arch/arm/src/armv6-m/arm_svcall.c Outdated Show resolved Hide resolved
libs/libc/pthread/pthread_create.c Outdated Show resolved Hide resolved
libs/libc/pthread/pthread_create.c Outdated Show resolved Hide resolved
libs/libc/pthread/pthread_create.c Outdated Show resolved Hide resolved
sched/pthread/pthread_create.c Outdated Show resolved Hide resolved
sched/pthread/pthread_create.c Outdated Show resolved Hide resolved
sched/pthread/pthread_create.c Outdated Show resolved Hide resolved
@patacongo patacongo force-pushed the pthread branch 2 times, most recently from 9ec36f4 to b339224 Compare June 30, 2020 15:54
@patacongo
Copy link
Contributor Author

There is a complexity... pthread_exit() is also called from within the OS from task_cancelpt.c, pthread_cancel.c, pthread_create.c, sig_default.c, task_cancelpt.c, task_setcancelstate.c, task_setcanceltype.c. This may require another system call analogous to up_pthread_start().

The calls to pthread_exit() from the cancellation logic within the OS is a potential show-stopper. Or certainly a reason to stop and re-consider how thread cancellation should work. Thread cancellation should also be primarily a user-space implementation as well.

@antmerlino
Copy link
Contributor

antmerlino commented Jul 1, 2020

There is a complexity... pthread_exit() is also called from within the OS from task_cancelpt.c, pthread_cancel.c, pthread_create.c, sig_default.c, task_cancelpt.c, task_setcancelstate.c, task_setcanceltype.c. This may require another system call analogous to up_pthread_start().

The calls to pthread_exit() from the cancellation logic within the OS is a potential show-stopper. Or certainly a reason to stop and re-consider how thread cancellation should work. Thread cancellation should also be primarily a user-space implementation as well.

@gregory-nutt

I've now merged my branch in and am ready to put some time towards this. It sounds like the next big step is to unify the thread cancellation logic - we will definitely need to do some good testing on this!

All of these make sense. I could take care of those.

a. Rename pthread_exit() to nx_pthread_exit(). Remove logic that calls pthread_cleanup() functions.
b. Make nx_pthread_exit() a system call.
c. Create libc/pthread/pthread_exit() that contains only i) the logic that calls the pthread_cleanup functions, and ii) calls the nx_pthread_exit() system call.
d. Extend TLS and pthread-specific data function so that the destructor is retained in TLS
e. Extend pthread_exit() so that it also calls the pthread-specific data destructors from user-space.

But I feel like we should figure out the cancellation stuff before I do any of that.

There is a complexity... pthread_exit() is also called from within the OS from task_cancelpt.c, pthread_cancel.c, pthread_create.c, sig_default.c, task_cancelpt.c, task_setcancelstate.c, task_setcanceltype.c. This may require another system call analagous to up_pthread_start().

I don't really understand what the analogous call would do for us there, could you expand?

@patacongo
Copy link
Contributor Author

patacongo commented Jul 1, 2020

But I feel like we should figure out the cancellation stuff before I do any of that.

There is a complexity... pthread_exit() is also called from within the OS from task_cancelpt.c, pthread_cancel.c, pthread_create.c, sig_default.c, task_cancelpt.c, task_setcancelstate.c, task_setcanceltype.c. This may require another system call analagous to up_pthread_start().

I don't really understand what the analogous call would do for us there, could you expand?

Well, you are making me think on my feet, but I will do the best that I can. I might ramble a bit.

First, each of the C files referenced above should be treated as a special case. For example in pthread_create.c, pthread_exit() is only called as a error recovery measure. In that case, there can be no pthread_cleanup routines or pthread-specific destructors registered. So we can just replace the call to pthread_exit() with a call to nx_pthread_exit(). This might apply in other places too.

But in other cases, I am sure that it will be necessary to call the user-space pthread_exit() from the kernel code in order to perform those actions in user space. The basic problem is that in the PROTECTED build, the pthread_exit() symbol is not available to the kernel. That is because the user blob and the kernel blob are separately built and one blob knows nothing about the symbols internal to the other. The situation is worse in the KERNEL build because there will be multiple instances of pthread_exit() in each user block (aka, "process") and each will lie at a different virtual address -- unless of course we were to implement some kind of shared library support.

The simplest way that I can think of to do that is to extend the nx_pthread_create() interface again. On this branch it currently receives the address of the pthread_startup() function and the address of the pthread entry point. It could be extended to also receive the address of pthread_exit() function. This address is of no interest to the pthread_create() logic but it could be saved in pthread_tcb_s structure in the same way that the pthread_startup() function is, perhaps as 'pthread_exit'; Then, where ever pthread_exit() is called in the kernel, this could be replaced with tcb->pthread_exit(). That is not too bad.

[Hmm.. I suppose TLS would be a more appropriate place to keep all pthread data rather than the TCB. Although if you have the tcb in hand, you cannot be assured that the tcb corresponds to the current stack so you would have to use that stack base from the TCB.]

This approach would also require something like up_pthread_start(). That function calls the specified pthread startup function and switches to user mode. We need some mechanism to call the saved pthread_exit() also switching to user mode.

System calls are expensive and a lot of work. I wonder if we can combine the pthread_startup and the pthread_exit system calls into a single function that is agnostic about what it is calling as long as the function signature is the same. This would, then just require some renaming. Like SYS_pthread_startup -> SYS_pthread_action and up_pthread_start() -> up_pthread_action().

In the longer term, we need to get all of the pthread APIs out of the kernel, but this is certainly one positive step.

What do you think?

In KERNEL mode, there are other things to make sure we understand. In order to call a user space function in KERNEL mode, we have to have the correct address environment in place (up_select_addrenv()). I don't think that is will be a problem, because the main thread and all of the child pthreads share the same address environment. Therefore, I can't think of any situation were the cancellation could occur with the wrong address environment in place. But it is another complexity to keep in mind.

gregory-nutt and others added 2 commits July 1, 2020 11:06
1. A user-space shim is needed to catch any return from a pthead manin function and to automatically call pthread_exit() from user space.

Rename pthread_create() in sched/pthread/pthread_create.c to nx_pthread_create().  Add one new parameter:  The address of the user-space pthread startup function.  Instead of calling the pthread main entry (directly or indirectly), pthread_start() would call the pthread startup function, passing it the real address of the pthread main function.

The call to pthread_exist would be removed from pthread_startup() and move into a new function in user space.

2. Add libs/libc/pthread/lib_pthread_start.c that would contain two trivial functions:

    static void pthread_startup(pthread_startroutine_t startroutine, pthread_addr_t arg)
    {
      pthread_exit(startroutine(arg));
    }

    int pthread_create(FAR pthread_t *thread, FAR const pthread_attr_t *attr,
                       pthread_startroutine_t startroutine, pthread_addr_t arg)
    {
      return nx_pthread_create(pthread_startup, thread, attr, startroutine, arg);
    }

3. Modify up_pthread_start() so that it takes three parameters:  startup, entry, and arg.
   Modify the kernel pthread_start() logic so that it receives these there parameters and
   calls up_pthread_start() will all three.

4. Remove pthread_startup function pointer from struct userspace_s; it is no longer needed.

Still to do:

a. Rename pthread_exit() to nx_pthread_exit().  Remove logic that calls pthread_cleanup() functions.
b. Make nx_pthread_exit() a system call.
c. Create libc/pthread/pthread_exit() that contains only i) the logic that calls the pthread_cleanup functions, and ii) calls the nx_pthread_exit() system call.
d. Extend TLS and pthread-specific data function so that the destructor is retained in TLS
e. Extend pthread_exit() so that it also calls the pthread-specific data destructors from user-space.
Update arch/arm/src/armv6-m/arm_svcall.c
Update libs/libc/pthread/pthread_create.c
Update libs/libc/pthread/pthread_create.c
Update sched/pthread/pthread_create.c
Update sched/pthread/pthread_create.c
Update libs/libc/pthread/pthread_create.c
Update sched/pthread/pthread_create.c
@patacongo
Copy link
Contributor Author

@antmerlino How should we coordinate work on the branch? I just rebased the pthread branch on my fork. Then I release that this will really mess up what you are doing. rebasing changes the history of the changes branch and can result in spurious merges and occasionally worse.

Do you want to share the fork? Should I hold off on rebasing? I get concerned if about the state of the branch if we do not rebase frequently.

@antmerlino
Copy link
Contributor

Ah okay, now I see. The piece I wasn't following is that you are going to provide a reference to the userspace exit hook inside the create/startup call, so that the kernel knows how to call.

I think what you laid out makes sense.

Why don't I create PRs onto your branch and you can pull them in or decline them as we go. This will give you a chance to review the changes as we go. You definitely understand the subtleties more than I.

@patacongo
Copy link
Contributor Author

patacongo commented Jul 1, 2020

Why don't I create PRs onto your branch and you can pull them in or decline them as we go. This will give you a chance to review the changes as we go. You definitely understand the subtleties more than I.

Sounds good to me. The level of complexity and keeping the flow all in your head is tricky. I miss subtleties all of the time. And, as with most engineering topics, communicating the technical issues is always the most difficult part by far. I am terrible at trying to read and understand GIT differences.

I will try to update the discussion in this PR. I think that a good set of comments here could form a kind of design specification and could also be the basis for a document or WiKi page that documents how this works.

@antmerlino
Copy link
Contributor

@gregory-nutt

I have some changes to what you have so far that address build issues. However, I can't open a PR against your branch on your repo.

Can we create a feature branch in the Apache repo to work on this together? That way we can make PRs against that branch.

@patacongo
Copy link
Contributor Author

patacongo commented Jul 1, 2020

@gregory-nutt

I have some changes to what you have so far that address build issues. However, I can't open a PR against your branch on your repo.

Can we create a feature branch in the Apache repo to work on this together? That way we can make PRs against that branch.

@antmerlino Yes, but you would have to do that. We have a policy here that no one can merge their own PRs, but you are also a committer and can to do the merge. You have registered your Github user (See https://whimsy.apache.org/roster/ppmc/nuttx). So I assume you have created created your 2 factor login: If not see https://cwiki.apache.org/confluence/display/NUTTX/Accessing+Apache+GitHub+as+a+Committer

If you have done those things, then here is what you would have to do to create the feature branch:

  1. Create the feature branch in your incubator-nuttx clone, say "pthread-user"
  2. Push the branch to the incubator-nuttx origin
  3. Go to this PR and select "Edit" on the right near the top.
  4. Change the destination

From:

patacongo wants to merge 2 commits into apache:master from patacongo:pthread

To:

patacongo wants to merge 2 commits into apache:pthread-user from patacongo:pthread
  1. Then just do the rebase merge using the buttons at the bottom

That is all there is to it!

NOTE: Since this is a WIP, we do not expect the PR checks to pass at this time.

@Ouss4 Ouss4 marked this pull request as ready for review July 3, 2020 18:29
@Ouss4 Ouss4 changed the base branch from master to feature/pthread-user July 3, 2020 18:29
@Ouss4 Ouss4 merged commit be0ff4d into apache:feature/pthread-user Jul 3, 2020
@Ouss4 Ouss4 changed the title [DRAFT] pthread_cleanup functions must be called from user space pthread_cleanup functions must be called from user space Jul 3, 2020
@patacongo
Copy link
Contributor Author

@antmerlino The branch is ready for your use: https://github.com/apache/incubator-nuttx/tree/feature/pthread-user We need to thank @Ouss4 for doing that for us.

@patacongo patacongo deleted the pthread branch July 3, 2020 18:50
@patacongo
Copy link
Contributor Author

@antmerlino I neither us has the interest in finishing this change. I propose that we both save the PR as a path and remove the branch for now. What do you think?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Area: Security Security of OS in secure modes Type: Bug Something isn't working
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants