New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement stack unwinding and exceptions for Linux #177

Closed
kangaroo opened this Issue Feb 10, 2015 · 40 comments

Comments

Projects
None yet
8 participants
@kangaroo
Contributor

kangaroo commented Feb 10, 2015

@janvorli I know you're working on this, but I'd like to request that this work be done on a feature branch in the public if possible. Its the next major feature needed for me to bring OS X up to snuff, and I'd rather not duplicate effort.

Thoughts?

@sergiy-k

This comment has been minimized.

Contributor

sergiy-k commented Feb 10, 2015

@kangaroo That’s actually our ultimate goal - to do as much work in the public as possible  especially if this work is related to the cross-platform stuff. We do want everyone to be able to monitor the progress and provide comments, suggestions and contributions. So in the next few weeks you should see more and more work happing on GitHub. In addition, in the nearest future we will be opening new issues (work items) that will describe what we are currently working on and what we plan do next. These work items will vary in size and the ones that we are ready to accept PRs for will be marked ‘up-for-grabs’.

@janvorli can provide you with more detailed information about stack unwinding work if needed but here is a brief description:

  1. Stack walker for managed code. JIT will generate regular Windows style unwinding info. We will reuse Windows unwinder code that we currently have checked in for debugger components for unwinding calls in managed code on Linux/Mac. Unfortunately, this work requires changes in the runtime that currently cannot be tested in the CoreCLR repo so it is hard to do this in the public right now. But we are working on fixing that because, as I mentioned at the beginning, our goal is do most work in the public.
  2. Stack walker for native code. Here, in addition to everything else, we need to allow GC to unwind native stack of any thread in the current process until it finds a managed frame. Currently we are considering using libunwind (http://www.nongnu.org/libunwind) for unwinding native call stacks. @janvorli did some prototyping/experiments and it seems to do what we need. If you have any experience with this library or have any comments/suggestions please let us know.

Thank you.

@sergiy-k sergiy-k closed this Feb 10, 2015

@kangaroo

This comment has been minimized.

Contributor

kangaroo commented Feb 10, 2015

@sergiy-k If this is your goal, why is this issue closed? I actually want to work on stuff, but you're basically saying:

  1. We'll be better in the future
  2. We've looked at it
  3. Please internal people document

If prototyping of #2 is done, where is the feature branch?

Thats a lame reason to close what is a valid issue against the current codebase. The issue isn't fixed. Does stack unwinding exist on linux? No. The fact that its 'hard to do in the public repo' doesn't mean it doesn't suck. I wan't to contribute and you're throwing up barriers. :(

@sergiy-k

This comment has been minimized.

Contributor

sergiy-k commented Feb 10, 2015

@kangaroo I’m really sorry about that. I have not realized that I closed the issue. I certainly did not want to do that. It was an operator error. :) I was just about to assign the issue to @janvorli when I saw your reply. I’m reopening the issues. Once again, I’m sorry about confusion.

@kangaroo

This comment has been minimized.

Contributor

kangaroo commented Feb 10, 2015

@sergiy-k Thank you for the clarification, but this is case and point why to be effective this needs to be developed in the open. Take a small snippet from libunwind.h in OS X:

/* 
 * traditional libuwind "remote" API 
 *   NOT IMPLEMENTED on Mac OS X
 *
 * extern int               unw_init_remote(unw_cursor_t*, unw_addr_space_t, thread_t*);
 * extern unw_accessors_t   unw_get_accessors(unw_addr_space_t);
 * extern unw_addr_space_t  unw_create_addr_space(unw_accessors_t, int);
 * extern void              unw_flush_cache(unw_addr_space_t, unw_word_t, unw_word_t);
 * extern int               unw_set_caching_policy(unw_addr_space_t, unw_caching_policy_t);
 * extern void              _U_dyn_register(unw_dyn_info_t*);
 * extern void              _U_dyn_cancel(unw_dyn_info_t*);
 */

Perhaps your plans need this. Perhaps they don't. How can I know when you develop in the shadows? If the plan is to be open, and come to OS X (as you've documented), let people like me get involved early to avoid poor decisions. This issue is not the only example of this problem.

Microsoft is very close to going down a linux centric path with their porting efforts, which will dig a hole that we all need to get out of in the future. If OS X isn't a priority for MS yet, at least be open so the community has an opportunity to help you prevent future pitfalls.

@sergiy-k

This comment has been minimized.

Contributor

sergiy-k commented Feb 10, 2015

@kangaroo :) Thank you! :) The issue you have just pointed out is exactly the reason why I asked you about your opinion about libunwind.

@kangaroo

This comment has been minimized.

Contributor

kangaroo commented Feb 10, 2015

@sergiy-k Great! Where's the feature branch so I can port it, debug it, and support it on OS X before linux support is merged by dotnet-bot? :)

@sergiy-k

This comment has been minimized.

Contributor

sergiy-k commented Feb 10, 2015

@kangaroo ... and I almost hit the "Close and comment" button instead of "Comment" again!

@kangaroo

This comment has been minimized.

Contributor

kangaroo commented Feb 10, 2015

@sergiy-k Here's my point, if it wasn't abundantly clear. People want to contribute. Even branches which are 'hard'. You guys need to figure them out, otherwise you're going to lose contributors. No one wants to work on something already in progress, and no one wants to chase features that get dropped from a black hole (ref; android). Let us help.

@richlander

This comment has been minimized.

Member

richlander commented Feb 10, 2015

Hi @kangaroo. I'll provide a bit of context, which might help.

We made the decision to focus almost entirely on getting the code out. We definitely wanted it to build and have some initial tests, but that was the only bit of icing on top of just getting the code open. It meant that we hadn't really developed a plan on how to collaborate with the community. You'd think that we would have learned from our experiences with corefx (we're just one team) and do better. We honestly didn't expect this degree of participation/contribution on coreclr. We're very thankful and impressed, but scrambling a bit to keep up. We'll get to steady state relatively soon, but we're not there yet.

Some of us are also new to GitHub and developing a new muscle memory for using it and also learning the culture. You'll have to excuse @sergiy-k for not being able to distinguish green and grey buttons. Joke.

There is a very clear expectation from "management" that we over-index on community collaboration. They want us to publish as many of our work items as possible. "Working in the shadows" is very much the anti-pattern. They want us working in visible feature branches and all that good stuff.

We do have a bit of a bias towards Linux (it's a server platform), but we'd love to see OSX come up at the same time. I very much hope collaboration and coordination can enable that to happen.

@sergiy-k

This comment has been minimized.

Contributor

sergiy-k commented Feb 10, 2015

@janvorli, may I please ask you to create a feature branch and check in your code in there. Thank you!

@kangaroo

This comment has been minimized.

Contributor

kangaroo commented Feb 10, 2015

@richlander Appreciate the comment.

To start things off, I'm not new to the CLR or contributing. I get the problems, I've been there @novell. I really want to help out here. That said:

I appreciate the focus on getting the code out. It's awesome. I'm also sympathetic to new teams coming up to speed. I also don't blame @sergiy-k at all.

All that said, given your statement, why is such a fundamental problem as unwinding and exception support still 'in the shadows'?

You guys are learning, great. Please don't take this issue as anything more than frustration. Your team member told me:

https://twitter.com/geoffnorton/status/564849521314631680

So I tried. I know its new guys, but defer to open.

@richlander

This comment has been minimized.

Member

richlander commented Feb 10, 2015

Thanks @kangaroo. Yes, we will defer to open.

See @sergiy-k request for another dev, @janvorli to move to a feature branch. That's goodness. We'll get more code in feature branches.

I think I need to order a t-shirt for @sergiy-k!

@kangaroo

This comment has been minimized.

Contributor

kangaroo commented Feb 10, 2015

@richlander Oprah moment. Shirts for everyone!

@richlander

This comment has been minimized.

Member

richlander commented Feb 10, 2015

@kangaroo - this could happen!

@kangaroo

This comment has been minimized.

Contributor

kangaroo commented Feb 10, 2015

@richlander Let it rain!

@janvorli

This comment has been minimized.

Member

janvorli commented Feb 10, 2015

@kangaroo I am sorry for not being much responsive these days, I am fighting some kind of nasty virus or bacteria. I'll create the feature branch for this work so that you can work on the Mac stuff in parallel with me. I'll migrate the work I have done till now from TFS to GIT in this branch and let you know once it is ready.
My current work has two parts, as @sergiy-k has already mentioned. The windows style unwinder that will be used for the jitted code and Unix unwinder for native code that uses the libunwind's low level unw_xxxx functions like unw_step etc.
With these changes, the stack walking that is used by the GC and the stack walking used by the System.Diagnostics.StackTrace should work.
It is likely that we will be able to use the Apple's libunwind on OSX instead of the one from http://www.nongnu.org/libunwind. We were originally considering using the libunwind that's part of the clang's compiler runtime and that I believe is essentially the same as the one in OSX (guessing from the fact that it was donated to the LLVM project by Apple). But the issue with this one is that in LLVM, it is currently not a standalone library (even though there is a discussions currently going on to change that). So we would have to pull out its sources and build them ourselves and that would take quite some time to get approved by our legal and corporate affairs department. So we have opted for the other libunwind that we can use as a binary.
The actual exception unwinding work will be the next step. I'll prepare a description on how that's supposed to work in the world where native stack frames are interleaved with the stack frames of jitted code on the stack. That should serve as a foundation for our future cooperation.

@kangaroo

This comment has been minimized.

Contributor

kangaroo commented Feb 10, 2015

@janvorli Sounds great. Hope you feel better soon! lib unwind from apple should be able to work for the native unwinding -- I agree.

@janvorli

This comment has been minimized.

Member

janvorli commented Feb 13, 2015

@kangaroo I have created a feature branch off the dotnet/coreclr, named unix_issue177 to continue working on the stack unwinding in public. I will soon move a set of changes there so that you can help me to make changes necessary to use the existing libunwind on Mac.
FYI, the changes that I have together with the #259 allowed me to dump a correct managed code stacktrace using the System.Environment.StackTrace

@richlander

This comment has been minimized.

Member

richlander commented Feb 13, 2015

Yeahh! @kangaroo, this should make you happy. More of these will be coming.

What do you think of our branch naming pattern?

@kangaroo

This comment has been minimized.

Contributor

kangaroo commented Feb 13, 2015

Very happy @richlander :)

As for branch naming, tying things to issues is always ok, but descriptive names help so people don't have to refer back to issues. Something like:

'linux-stack-unwinding' is pretty obvious :)

@richlander

This comment has been minimized.

Member

richlander commented Feb 13, 2015

@kangaroo - we can do that, too. We talked about that. Maybe we'll try the next one that way.

We've got the "linux" part, so we're part way there.

@janvorli

This comment has been minimized.

Member

janvorli commented Feb 13, 2015

@richlander Actually, I've named it unix_ because it will be for Mac too.

@richlander

This comment has been minimized.

Member

richlander commented Feb 14, 2015

@janvorli Good point! It's all good.

@khellang

This comment has been minimized.

Contributor

khellang commented Feb 14, 2015

Just an FYI: Wit git in general, it's pretty common to use "group tokens" in branch names and separate them with /, like feature/unix-stack-unwinding (this is clearly a feature branch), bug/177, issue/177, release/1.0 etc. By having some conventions like these, it makes it much easier to keep track of branches 😄

See this SO answer for some details.

@landonf

This comment has been minimized.

landonf commented Feb 19, 2015

@janvorli FWIW, I'm happy to take on any tasks or answer any questions I can about native Mac stack unwinding -- I've previously implemented an async-safe DWARF expression interpreter, FDE/CFI lookup, compact unwind table support, etc for PLCrashReporter's (MIT-licensed) Mac/iOS unwinder -- just about everything except LSDA handling (although that will likely be coming soon).

@janvorli

This comment has been minimized.

Member

janvorli commented Feb 19, 2015

@landonf Thank you! Actually, the missing piece in the Mac libunwind is an API to get context pointers, which are pointers to stack locations where registers are stored in the frame (provided they are stored there). The GC uses that when it moves objects pointed to by registers. But it seems we will have to live with the fact we don't have an API for that on Mac and rather implement an workaround in the coreclr.

@landonf

This comment has been minimized.

landonf commented Feb 19, 2015

@janvorli I have some MIT-licensed code I could repurpose (and clean up to match local requirements) that would be suitable for computing frame offsets based on local decoding of the compact unwind and DWARF data:

However, I couldn't find licensing requirements in the contributor guidelines, so I don't know what's kosher in terms of contribution licensing.

@landonf

This comment has been minimized.

landonf commented Feb 19, 2015

@janvorli also, if you need this from the GC, do you also need async-safety? (re: #193 (comment))

@janvorli

This comment has been minimized.

Member

janvorli commented Feb 21, 2015

@landonf Could your code be used somehow as an addition to the standard OSX libunwind used only for getting the register offsets or would it have to replace the libunwind completely?
Regarding the async safety, I'm not sure. @jkotas, how does walking stack of a running thread for a GC work? I know we have a barrier between the managed / unmanaged code that prevents entering the managed code if the GC is scanning the managed part of the stack and native code is currently running. Does it mean that the GC pointers in registers would never be stored on the stack above (if you view the stack as growing upwards) the frame representing the function with the barrier?

@Nassiel

This comment has been minimized.

Nassiel commented Feb 21, 2015

Hi, I found that while using CentOS 7 and using the latest push at this moment on master my cmake building doesn't find libunwind.h

I already install it and it's located at /usr/include/libunwind.h (with x64_86 version), it's the typical include location and I'm pretty sure that cmake search there. Cmake told me about using libunwind8, to my this have no sense over CentOS 7 there are just one libunwind-devel package version 1.1-3.el7.x86_64.

Also I tried to download libunwind git project, and make installed it with the same result.

Any idea?

@janvorli

This comment has been minimized.

Member

janvorli commented Feb 21, 2015

The cmake's check_include_files basically just tries to compile the following:

#include <libunwind.h>
int main()
{
    return 0;
}

So it looks like include paths don't contain the /usr/include folder or the header for some reason refers to a symbol that is not defined.
You can look into the CMakeError.log, there should be info on what has exactly failed.
It should say something like "Determining if files libunwind.h exist failed with the following output:"

@Nassiel

This comment has been minimized.

Nassiel commented Feb 21, 2015

I did, here is the error and does not related with libunwind.h, is related with ucontext.h:

/usr/include/sys/ucontext.h:137:5: error: unknown type name 'stack_t'
stack_t uc_stack;

http://stackoverflow.com/questions/20778735/is-the-type-stack-t-no-longer-defined-on-linux

@janvorli

This comment has been minimized.

Member

janvorli commented Feb 21, 2015

Ah, then it is the #300 issue

@Nassiel

This comment has been minimized.

Nassiel commented Feb 21, 2015

Oh my! Thank you now is solved, you are planning to include it by default? Or incompatible with other linux distros?

@janvorli

This comment has been minimized.

Member

janvorli commented Feb 21, 2015

As you can see in #300, the proper solution is still being discussed.

@jkotas

This comment has been minimized.

Member

jkotas commented Feb 21, 2015

Does it mean that the GC pointers in registers would never be stored on the stack above (if you view the stack as growing upwards) the frame representing the function with the barrier?

Correct. The managed->unmanaged transition has to ensure that there are no unprotected GC pointers in registers - e.g. for PInvokes, the JIT has to spill all registers containing GC references onto the stack.

I do not think that we need any special handling of async safety in the native unwinder. All async safety issues should be taken care of by the thread suspension that does not depend on the native unwinder.

@landonf

This comment has been minimized.

landonf commented Feb 21, 2015

@janvorli The parsing/interpretation API is factored out from the actual unwinding API, so it would be possible to use separately; the only issue would be paying twice for lookup and interpretation.

(I suppose I could also in theory expose a libunwind-compatible interface; either way, I've been wanting to refactor this code so it can be easily used for runtime implementation outside of PLCrashReporter).

@jkotas Please forgive my ignorance :-) -- does that mean the unwinder doesn't need to run while threads are suspended in userspace, including any syscall invocations or other calls into non-managed state (on OS X, even traditional pure 'syscalls' are trampolined through libSystem, and may acquire+drop locks in the process).

@jkotas

This comment has been minimized.

Member

jkotas commented Feb 21, 2015

the unwinder doesn't need to run while threads are suspended in userspace

Correct.

@kangaroo

This comment has been minimized.

Contributor

kangaroo commented Mar 19, 2015

I think we're done with this now @jkotas ?

@jkotas

This comment has been minimized.

Member

jkotas commented Mar 19, 2015

Agree.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment