Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

EXC_BAD_ACCESS in decodeMessage(decoder:) due to stack overflow #1034

Closed
n8gray opened this issue Aug 7, 2020 · 28 comments
Closed

EXC_BAD_ACCESS in decodeMessage(decoder:) due to stack overflow #1034

n8gray opened this issue Aug 7, 2020 · 28 comments

Comments

@n8gray
Copy link

n8gray commented Aug 7, 2020

Note: This is likely a duplicate of #1014 but I didn't want to cram all of this into a comment in a long chain. Feel free to close as duplicate if you want.

Summary/TLDR

The change to move computed properties backed by _storage to properties stored directly on the message structs has caused massive size increases for messages at the top of a deep hierarchy. The switch code generated for oneof parsing is sensitive to these size increases, leading to massive stack frames. In debug mode without optimization these stack frames can easily exhaust the 512KB stack of a background dispatch thread and cause a crash.

Details

We are experiencing a crash in debug builds after upgrading SwiftProtobuf and protoc-gen-swift from 1.7 to 1.10.2. Our crash happens in the decodeMessage(decoder:) method of a message containing a oneof with lots of cases. From what I can tell the stack frame generated for the withExtendedLifetime closure in that method is huge, about 800KB. Since the decoding is happening on a background thread with a 512KB stack, the app crashes as soon as this closure is invoked.

The message is of the form:

message RenderDataDTO {
  SomeMessage1 m1 = 1;
  SomeMessage2 m2 = 2;
  SomeMessage3 m3 = 3;
  oneof render {
    SomeCaseMessage1 case1 = 100;
    SomeCaseMessage2 case2 = 101;
    SomeCaseMessage3 case3 = 102;
    // 20 cases in total
    SomeCaseMessage20 case20 = 120;
  }

The code that gets generated for decodeMessage(decoder:) has the form:

  public mutating func decodeMessage<D: SwiftProtobuf.Decoder>(decoder: inout D) throws {
    _ = _uniqueStorage()    // LINE A
    try withExtendedLifetime(_storage) { (_storage: _StorageClass) in    // LINE B
      while let fieldNumber = try decoder.nextFieldNumber() {
        switch fieldNumber {
        case 1: try decoder.decodeSingularMessageField(value: &_storage._m1)
        case 2: try decoder.decodeSingularMessageField(value: &_storage._m2)
        case 3: try decoder.decodeSingularStringField(value: &_storage._m3)
        case 100:
          var v: SomeCaseMessage1?
          if let current = _storage._render {
            try decoder.handleConflictingOneOf()
            if case .case1(let m) = current {v = m}
          }
          try decoder.decodeSingularMessageField(value: &v)
          if let v = v {_storage._case1 = .case1(v)}
        case 101:
          var v: SomeCaseMessage2?
          if let current = _storage._render {
            try decoder.handleConflictingOneOf()
            if case .case2(let m) = current {v = m}
          }
          try decoder.decodeSingularMessageField(value: &v)
          if let v = v {_storage._case2 = .case2(v)}
       // etc...

If I set a breakpoint at the line marked LINE A and check the stack pointer, then set a breakpoint at LINE B and do the same I find that the difference is about 800KB.

Here's one example:

RenderDataDTO.decodeMessage(decoder:)
       rsp = 0x00007000087cdb60

In closure:
       rsp = 0x0000700008704780   <-- 800KB jump!

I have a couple of observations to explain this:

Observation 1:

In the earlier version of swift-protobuf almost all fields of Message structs were stored in _storage, a reference-typed container, and exposed as computed vars. In the current version they seem to much more often (but not always?) be implemented as properties on the message itself. Thus the size of a leaf message struct with no child messages became O(n) instead of O(1), where n is the number of fields in the message.

The story gets a lot worse when you have child messages. Since messages are structs, children get embedded directly in their parents, so the size of a non-leaf message grows like the sum of its entire message sub-tree. So the size scales more like O(n^m), where n is the average number of children per level and m is the average number of levels in the message hierarchy. Since Swift always likes to allocate structs on the stack, this can lead to massive stack frames.

This agrees with what I saw when I measured the size of the SomeCaseMessageN messages using MemoryLayout<SomeCaseMessageN>.size. In the older code, they were all 24 bytes, with a few exceptions. The largest was 80 bytes. In the new code the sizes tended to increase by 10X-20X. 224 and 448 bytes were typical sizes, with several messages growing above 1000B. One grew to 5256 bytes, while the biggest was 6080 bytes.

Observation 2:

If the compiler is smart, it will know that all the cases in the switch statement inside that closure are mutually independent. Only one of them will ever execute per invocation, so their local variables can share space on the stack, and the size of the stack frame will be the size of the biggest case in the switch. Since we're in debug mode it might not be making that optimization, and instead allocating separate space for every case in the switch. Given the size impact of the changes I discussed above, this could add up to a lot of space.

In fact, when I change the optimization mode from none to size or speed, the message sizes don't change but the size of the stack frame drops to about 26KB. That's still an oversized stack frame eating 5% of the stack in release mode, but at least it's not bigger than the entire stack!

For reference, with the code generated from 1.7, the stack frame is about 16KB with optimization mode none. With optimization mode speed the stack frame drops to just 400B.

Wrapping up

There seem to be problems with putting so many stored properties on messages. Perhaps this could be dialed back to where it was before?

Environment:
OS: MacOS Catalina 10.15.6
Xcode: Version 11.6
Swift 5

@n8gray
Copy link
Author

n8gray commented Aug 7, 2020

I'm guessing this is the PR that led to this problem: #900

@ydnar
Copy link
Contributor

ydnar commented Aug 7, 2020

Hi, author of #900 here. This doesn’t look like a particularly fun bug—thanks for the diagnosis. The rationale for #900 was to reduce ARC / heap pressure, primarily for use cases involving lots of small messages.

In other languages, message fields are typically stored by reference, rather than by value (an Optional in Swift being just that, not syntactic sugar on a pointer), which makes passing deeply-nested messages cheap. Passing deeply-nested messages by value is certainly less cheap.

  • I’m curious if when this package was originally developed, were class-based messages (versus structs) considered?
  • The underlying Optional for message fields could theoretically be replaced with a reference-based storage class (rather than storing all fields in a storage class).
  • Could you share the proto that results in a 6KB message?
  • It sounds like you preferred the previous reference (versus value) behavior for message fields. Are you copying trees of messages?
  • Last, the stack growth for the oneof switch statement seems fixable.

@n8gray
Copy link
Author

n8gray commented Aug 7, 2020

Hi, thanks for the response! To answer your questions:

  • I wasn't around when this app was created so I don't know what was considered. Are class-based messages an option? I would expect them to be a better fit for our use case.
  • Using a reference for message fields would help a lot, and keep the size of messages linearly proportional to the number of fields. My thought (based on the conversation in Revisit when to use heap based storage for messages. #735) would be to use value-based storage while MemoryLayout<MessageType>.size <= 24, then switch to reference-based storage. If it's too hard to find a solution that works for everybody, maybe offer an option to choose the storage strategy.
  • I can't share the proto files publicly. The message tree represents the data and metadata of a complex interactive data visualization of the type seen here. Here's an illegible screenshot of the message subtree rooted at that particular message:
    message-diagram
    As you can see, the hierarchy is very deep, and this is only one subtree.
  • Yes, and yes. 😄 Value types are great for small flat chunks of data, but terrible for tree-structured data without indirection. The current SwiftProtobuf is unusable for us.
  • Does It? What do you have in mind?

@Lukasa
Copy link
Contributor

Lukasa commented Aug 8, 2020

Given that the encoder now assumes the protobufs will be value types, we could take a step to give some flexibility here by potentially modifying the codegen. There are a lot of use-cases where the reduced allocator pressure and improved data locality of simple structs is really valuable, and I’d like to find a way to keep it.

So, what if we considered having a codegen option that would change this behaviour to instead generate class-backed structs: that is, structs whose only stored member is a class, and that use entirely computed properties to compute their fields. These fields would still be stored properties on the class, giving good performance to load, but the structs would no longer be nested, so they’ll be scattered all over the heap as if they were classes. This will keep their sizes down.

@cburrows
Copy link
Contributor

cburrows commented Aug 8, 2020

If the stack usage in the generated function is not a Swift compiler bug, then it might make sense to actually count oneof fields while computing useHeapStorage in the MessageGenerator.

If that isn't sufficient then it seems like weighting fields by their size in the useHeapStorage calculation makes sense too, as suggested.

Either way I think understanding why the function is using so much stack is a great idea. Perhaps we could do better just in how we lay out the code, like by using nested functions.

@n8gray , out of curiosity, if you add a field of type "Tree Root Message" to the declaration of the problematic message (and obviously never set it), does the generated code then work? This will trigger useHeapStorage because of the declaration cycle.

I am not in favor of adding an option for this. I mean, just pondering the question of how you would document it reveals that it wouldn't be clear or usable.

@thomasvl
Copy link
Collaborator

thomasvl commented Aug 8, 2020

I’m curious if when this package was originally developed, were class-based messages (versus structs) considered?

ObjC protos are the reference model, and it causes no end to folks getting threading wrong. Especially with a deep chain like you have. Folks have to remember once they set a message into a field they can't touch it again since the message is referencing it. But if the messages are used for models or to collect things and send them off for logging, it is way to easy to accidentally end up making changes, setting as a field on a message to send to a server and then continuing to mutate it for the next thing to send out. Even if you try to copy the message on field setting (which greatly slows things down), there are still potential issues with repeated fields and map fields since you can't force a copy on the way into those structures without rolling your own custom ones. So value times makes it things much harder to have threading issues.

The current calculation for when to use storage has always been a bit of a minimal guess and it too shallow. I've got some ideas I've toyed with in the past to start computing real costs of objects to better decided when to transition to storage which should help the general problem here. But all of those past thoughts have been purely around the stack/heap costs and not looked specifically to much at the exact cost of the oneof enums but definitely not thought about the cost on extended lifetime calls.

We should be able to adapt to a better computed cost as a trade off between the old and current approaches, which will hopefully solve this specific problem, and it can always continue to be tuned in the future. I'll try to take a look at this early next week.

@Lukasa the Storage model already is a class backed storage. As mentioned on the FAQ, options that impact how things generate are rarely a good path because inevitable you end up with someone having a dependency on protos generated one way, and someone else depending on the same message(s), but wanted to generate it with another set of options.

@allevato
Copy link
Collaborator

allevato commented Aug 8, 2020

If the compiler is smart, it will know that all the cases in the switch statement inside that closure are mutually independent. Only one of them will ever execute per invocation, so their local variables can share space on the stack, and the size of the stack frame will be the size of the biggest case in the switch. Since we're in debug mode it might not be making that optimization, and instead allocating separate space for every case in the switch.

This is exactly what's happening—SwiftSyntax had the same issue with its visitation methods for syntax trees because they contained very large switch statements that created instances of value types in the cases. They solved it by moving each individual case into its own function.

I don't think we need to do something that drastic here. There's a much easier change that I think we can make to reduce stack usage for debug builds—we just update the generator so that we nest the body of each case in an immediately executed closure:

      switch fieldNumber {
        case 1: try { try decoder.decodeSingularMessageField(value: &_storage._m1) }()
        case 2: try { try decoder.decodeSingularMessageField(value: &_storage._m2) }()
        case 3: try { try decoder.decodeSingularStringField(value: &_storage._m3) }()
        case 100: try {
          var v: SomeCaseMessage1?
          if let current = _storage._render {
            try decoder.handleConflictingOneOf()
            if case .case1(let m) = current {v = m}
          }
          try decoder.decodeSingularMessageField(value: &v)
          if let v = v {_storage._case1 = .case1(v)}
        }()
        ...

(I'm probably missing some explicit selfs that would be required by doing that; SE-0269 may reduce them but we'd have to adopt a Swift 5.3 minimum to rely on that.)

Even though the oneof cases like case 100 are the ones using obvious local stack storage, I believe even the others will use some as intermediate storage (at least in an unoptimized build) to handle the retrieval and modification of the _storage fields, so the surest way to reduce the stack usage is to wrap all of them.

The double-try requirement (one for the throwing operation inside the closure, and then again around the outer closure) is unfortunate, but I'm not sure there's a way to avoid it.

This will result in an extra function call for debug builds, but optimized builds should remove the closure completely. But by moving each case into its own "function", we ensure that it only occupies space on the stack when that case is executed, instead of the sum of all the cases being on the stack for the entire execution of the outer function.

There are other improvements we should look into to remove the likelihood that the memory size of messages is ever excessively and surprisingly large, but in the meantime, the above should be a straightforward change that would still be useful independent of other improvements.

@n8gray
Copy link
Author

n8gray commented Aug 9, 2020

@allevato Brilliant! That sounds like a great solution to the stack frame problem.

@thomasvl It’s good to hear you’re willing to take this on! I agree it’s best to avoid knobs and switches when possible, so I hope you can find heuristics that strike the right balance and avoid these problems with deep hierarchies.

@n8gray
Copy link
Author

n8gray commented Aug 9, 2020

@cburrows I’d try out your idea for adding a recursive reference but I’ve reverted all my pods and brews back to 1.7 at this point. I’m sure it would succeed at forcing that message to use heap allocation, but there are many other messages in our app that will have the same problem.

@tbkka
Copy link
Contributor

tbkka commented Aug 9, 2020

I can clarify the thinking behind the current design a bit.

First of all, class vs. struct is a somewhat over-simplified way to think about this. The two questions that really matter are:

  • value semantics vs. reference semantics ? and
  • inline storage vs. indirect storage?

The first question involves what happens when you copy and then mutate a value: Reference semantics cause all copies to reflect the mutations; value semantics mean that mutations do not affect other copies. (In particular, as @thomasvl pointed out, value semantics have real advantages for multi-threaded code.) The latter question involves how the storage gets actually allocated. Traditional Obj-C style classes provide reference semantics and indirect storage. Naive use of Swift structs give you value semantics and inline storage. But value semantics are not incompatible with indirect storage; indeed much of the standard library (String, Array, etc) provides value semantics with indirect storage by using a struct or enum as a facade for a separate heap-allocated data store.

So to answer the real questions:

  • SwiftProtobuf has always provided value semantics for the generated types. This was a very deliberate decision.
  • Internally, SwiftProtobuf has support logic for both inline and indirect storage, but this is always hidden from the user. (In particular, any generated _Storage class is kept private.) Currently, we either store all fields inline or put all of them in the backing _Storage class, but it should be possible to split this, storing some fields inline (directly in the struct facade) and others indirectly (in the _Storage class) for a single generated message.

The difficult piece has always been deciding what fields should be stored inline and what should be stored indirectly. There is really only one hard requirement: reference cycles cannot be stored entirely inline. But otherwise, we have a lot of options to consider:

I'm not eager to provide options to control the layout. I think we might have to do that someday, but I'd rather focus our attention now on finding better heuristics so that most users can get good results without having to manually tweak a lot of options.

Here are some ideas I would like to explore:

  • For oneof fields, we could consider making the generated enum be indirect if any case had a message type.
  • We could store some fields inline and some in the backing _Storage class. This would require some work on the generator to allow storage and accessor differentiation on a per-field basis.
  • Once we are able to make per-field storage decisions, we have a lot of choices: Protobuf encoding is designed to be more efficient for fields numbered <= 15, so it might make sense to prefer inline storage for those; we could store submessages inline if we knew they were "simple enough"; we could estimate the in-memory size of the fields and use that to guide decisions; many other possibilities come to mind.

@tbkka
Copy link
Contributor

tbkka commented Aug 9, 2020

@n8gray -- It looks like your message hierarchy might be entirely tree-structured, which raises an interesting question: Would it make more sense for each sub message to be a separate heap allocation? Or would it be better for the top-level message to be allocated on the heap as a single 6k structure with the submessages inlined into that storage?

@n8gray
Copy link
Author

n8gray commented Aug 9, 2020

@tbkka Thanks for laying out the history and trade offs in a clear and concise way. As I mentioned earlier, the protobufs in our app represent the structure and data of complex interactive data visualizations, so they are large, deep trees. The root of the tree is usually just some FooRequest or FooResponse, with subtrees for data, metadata, filters, settings, contexts, etc. This is, I’d say, the diametric opposite of the use case in #733 where they were working with massive volumes of small messages. Instead, we have one giant message that is split apart for use in multiple subsystems. As the app is developed, new message cases, levels, and entire subtrees are constantly being added and removed by a team of dozens of developers.

With that in mind, the answer to your question is that pushing the inlining down one level is not going to help, as the next level will likely be just as bad, and if it’s not today it could easily be tomorrow. (In fact the “Root object” we are discussing is not top level at all, it just happened to be at a level that had a lot of oneof cases and triggered the stack problem.)

Aggressive inline storage is never going to work for our use case as a general strategy. It’s a great optimization to prevent extra boxing at the leaves of the tree, and to get cost-free abstraction when you want to wrap a single item or two in a message for whatever reason, but that’s probably it. We need a solution that reliably scales to lots of fields, lots of depth, and lots of oneof cases. Indirect storage is reliable that way, at the expense of performance penalties at the leaves of the tree. Fine for us, not great for #733.

The more I think about it the harder it seems to find a strategy that will work for both. Hopefully you folks have more imagination. 😉

@thomasvl
Copy link
Collaborator

Catching back up on the comments -

The switch mentioned is in decoding, but we also generate switch statements for some of the oneof handling: required field validation, generating equality code, and traverse/visit. Will they have the same stack related issues when the optimizer is off (and if the message has enough large message fields in the oneof)? Those all also generate case xxx(let yyy) in the switch, so would the closure trick work there since the let would be outside the closure scope?

@allevato
Copy link
Collaborator

All switch statements would be affected, so if a branch is likely to involve a large message, it could hit the same problem.

In general, a function's stack usage in debug builds will grow proportionally to the sum of the local storage of the cases in its switch statements, but this also affects if/else if... branch structures as well. My completely uneducated guess (i.e., not going to look at IRGen before speaking) is that debug builds just give each basic block in the generated code its own space for locals, even when execution is mutually exclusive.

Binding an associated value in a case makes things a little trickier, but we can still work around it. It's ugly though. Instead of this:

switch foo {
  case .bar(let value): try doSomething(with: value)
}

we'd do this:

switch foo {
  case .bar: try {
    guard case .bar(let value) = foo else { preconditionFailure() }
    try doSomething(with: value)
  }()
}

Using preconditionFailure() here instead of fatalError() is important; the former will get removed in optimized builds (making this the same as the original code, at least in the simple tests I tried), but the latter wouldn't be.

If we go and make any of these changes, we should consider them carefully—we're giving the compiler/optimizer more work to do, for example, so we need to make sure that by reducing stack usage we're not raising compile time (for both types of builds), especially for targets with large messages.

@thomasvl
Copy link
Collaborator

Using preconditionFailure() here instead of fatalError() is important; the former will get removed in optimized builds (making this the same as the original code, at least in the simple tests I tried), but the latter wouldn't be.

That almost seems like it might rely on another compiler bug then. If the body of the guard...else becomes empty, then how is the return on the else satisfied? Shouldn't the compiler catch that in some optimization modes, the guard...else is invalid because it doesn't return?

@tbkka
Copy link
Contributor

tbkka commented Aug 10, 2020

@n8gray Thanks for clarifying.

... pushing the inlining down one level is not going to help, as the next level will likely be just as bad, and if it’s not today it could easily be tomorrow ...

I wasn't really proposing pushing the inlining down exactly one level -- there's no really practical mechanism for determining what that means -- but rather trying to better understand whether your concern was about a few large top-level messages or a larger number of intermediate messages. As you pointed out, inlining still looks like a good choice for small messages, and #733 is really about how we can avoid unnecessary boxing for very small messages. (~32 byes in that case)

... expense of performance penalties at the leaves of the tree ...

How big are your leaf messages? What would happen in your use case if messages smaller than about 64 bytes were stored inline? (Allocating a single object on the heap has about that much overhead by the time you add up the Swift object header, malloc data structures, and allocation rounding.)

@allevato
Copy link
Collaborator

That almost seems like it might rely on another compiler bug then. If the body of the guard...else becomes empty, then how is the return on the else satisfied? Shouldn't the compiler catch that in some optimization modes, the guard...else is invalid because it doesn't return?

So I was being a little loose with terminology above. Here's the behavior:

fatalError unconditionally prints a message and then unavoidably traps (Builtin.int_trap). The optimizer won't remove this, no matter what.

preconditionFailure only prints a message in debug mode. Then, in both debug and release mode, it invokes a special Builtin.conditionallyUnreachable, which is treated differently than the trap above. In this case, the optimizer sees that the code inside the guard ... else block is unreachable so it can drop the condition entirely and just bind the associated value directly.

@n8gray
Copy link
Author

n8gray commented Aug 10, 2020

@tbkka

How big are your leaf messages? What would happen in your use case if messages smaller than about 64 bytes were stored inline? (Allocating a single object on the heap has about that much overhead by the time you add up the Swift object header, malloc data structures, and allocation rounding.)

It's hard to generalize since we have over 100 .proto files, but I wouldn't expect a 64 byte limit to present major issues. We're probably building structs in our own code that are around that size. I honestly don't think our app is all that sensitive to the absolute performance of allocating protobufs. We basically make one big request when you enter a visualization and then some smaller requests when you select different data points, but I'm sure the network latency dwarfs the (de)serialization costs. My main performance concern would be making sure whatever inline storage limit gets set is respected no matter how deep or wide the tree gets, so there aren't sneaky edge cases where you end up with 1KB structs. Hard-to-reproduce stack overflow crashes are way more frightening than an extra 20ms of parsing time or an extra MB of heap overhead.

@thomasvl
Copy link
Collaborator

Also, re this being a duplicate of #1014, I don't believe so. #1014 is crashing the some of the Swift initialization code. Yes, the amount of stack using during decode also plays into that, but it is always going to be possible for that Swift internal runtime code to need more space than what is available in some context. Fixing the issues here won't guarantee that some other case couldn't still fail in the same way #1014 fails.

@allevato
Copy link
Collaborator

Some more quick findings:

Having a try in the body of the case still results in a small amount of local stack usage for that case, so we can't eliminate the local growth completely if we want to propagate error state out:

      switch fieldNumber {
        case 1: try { try decoder.decodeSingularMessageField(value: &_storage._m1) }()
        case 2: try { try decoder.decodeSingularMessageField(value: &_storage._m2) }()
        case 3: try { try decoder.decodeSingularStringField(value: &_storage._m3) }()

The outer trys on each of these still uses a small amount of space. I tried a few things, like wrapping them in Results that got immediately assigned to some variable outside the switch, but it looks like the codegen still assigns some temporary space inside the basic block (and more than try would have used) that doesn't get eliminated until the optimizer passes run.

Whether the message uses a storage class or not also makes a difference. If we pass a stored property of a struct to a function using inout, it looks like even in debug mode, no extra stack space will be used to read/modify/write that value (at least in the simple tests I tried). If we change it to a class instead, then stack space is used (and significantly more), likely due to all the extra retains/releases inside each branch, but also space is used to read/modify/write the value instead of doing it in-place. Fortunately we're already doing the right thing to eliminate the retains/releases by calling withExtendedLifetime.

So if we go the direction of wrapping each case in a closure, we could possibly avoid the closures for fields like cases 1, 2, 3 in messages that don't use a storage class, but we have to ask whether saving that extra call instruction and prolog/epilog are worth the added complexity in the generator.

@thomasvl
Copy link
Collaborator

@allevato opened a forums thread on this also: https://forums.swift.org/t/uneconomical-stack-usage-in-non-optimized-builds/39268

thomasvl added a commit to thomasvl/swift-protobuf that referenced this issue Aug 11, 2020
thomasvl added a commit to thomasvl/swift-protobuf that referenced this issue Aug 11, 2020
thomasvl added a commit to thomasvl/swift-protobuf that referenced this issue Aug 11, 2020
@thomasvl
Copy link
Collaborator

First PR up. In theory, we could attempt to compute the cost of each field and only need to use this closure workaround when things are over some size check, but given a message or message chain can be recursive, there's no way to tell how expensive those would be. So those cases would still have to use closures for everything, so it seems better to just keep the generator simple and if we generate a closure, always generate a closure. i.e. - less variation in the generated code, less to prove correct.

thomasvl added a commit to thomasvl/swift-protobuf that referenced this issue Aug 11, 2020
thomasvl added a commit to thomasvl/swift-protobuf that referenced this issue Aug 11, 2020
thomasvl added a commit to thomasvl/swift-protobuf that referenced this issue Aug 11, 2020
thomasvl added a commit to thomasvl/swift-protobuf that referenced this issue Aug 11, 2020
The helper is fileprivate, so the compiler should always inline it, but
this will help when dealing with static usage for non optimized builds
(apple#1034).
thomasvl added a commit to thomasvl/swift-protobuf that referenced this issue Aug 11, 2020
The helper is fileprivate, so the compiler should always inline it, but
this will help when dealing with static usage for non optimized builds
(apple#1034).
thomasvl added a commit that referenced this issue Aug 11, 2020
The helper is fileprivate, so the compiler should always inline it, but
this will help when dealing with static usage for non optimized builds
(#1034).
thomasvl added a commit to thomasvl/swift-protobuf that referenced this issue Aug 11, 2020
thomasvl added a commit that referenced this issue Aug 13, 2020
thomasvl added a commit that referenced this issue Aug 13, 2020
thomasvl added a commit that referenced this issue Aug 13, 2020
@thomasvl
Copy link
Collaborator

@n8gray if you can test with master, I'd be curious how much the just merged changes help your situation. I'll work on when Storage is used next, but the first data point on your real world case would also be useful.

@n8gray
Copy link
Author

n8gray commented Aug 13, 2020

Thanks, I'll try to check it out today.

@n8gray
Copy link
Author

n8gray commented Aug 13, 2020

@thomasvl The new codegen has solved the stack issue for decodeMessage. The stack frame is now 736 bytes in debug builds. Nice work! 👏

thomasvl added a commit to thomasvl/swift-protobuf that referenced this issue Aug 17, 2020
Look at the recursive cost of message fields also, so a nesting of messages all
just under the limit doesn't result in a top level message over the limit.

This also addresses some of the issues from apple#1034 and hopefully from apple#1014.

This is also work on apple#735
thomasvl added a commit that referenced this issue Aug 19, 2020
Look at the recursive cost of message fields also, so a nesting of messages all
just under the limit doesn't result in a top level message over the limit.

This also addresses some of the issues from #1034 and hopefully from #1014.

This is also work on #735
@thomasvl
Copy link
Collaborator

Second round of work which makes the calculation around use of Storage transitive (and thus consistent), is now on HEAD. That should reduce the usage a little more.

We'll let this sit on HEAD for a few days before considering cutting a new release to get these changes out to everyone.

While there are other place we likely could do more codegen changes (the if cascades), I'm going to call this fixed and if folks find other concrete cases where things fail, we'll go after them in new issues.

@n8gray
Copy link
Author

n8gray commented Aug 19, 2020

I just did some testing on 6806de7 and it looks a lot better. The sizes of the message structs I mentioned before have dropped significantly:

Size: 6080 -> 24
Size: 5256 -> 160
Size: 1944 -> 24
Size: 1400 -> 24
Size: 1096 -> 24
Size: 704 -> 144
Size: 456 -> 144
Size: 448 -> 72
Size: 440 -> 24
Size: 432 -> 24
Size: 256 -> 256
Size: 224 -> 224
Size: 224 -> 224
Size: 208 -> 208
Size: 136 -> 136
Size: 128 -> 128
Size: 88 -> 88
Size: 80 -> 80
Size: 80 -> 80
Size: 48 -> 48
Size: 48 -> 48

I still have misgivings about letting structs grow to 256 bytes but the worst case seems to be under control. Thanks!

@thomasvl
Copy link
Collaborator

thomasvl commented Aug 19, 2020

Yea, as mentioned on the other issue and commits, we've now got a framework in place to actually use real costs going forward which should be much better.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

7 participants