Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve binary size of generated code #908

Closed
27 of 31 tasks
treiher opened this issue Jan 18, 2022 · 53 comments
Closed
27 of 31 tasks

Improve binary size of generated code #908

treiher opened this issue Jan 18, 2022 · 53 comments
Assignees
Labels
generator Related to generator package (SPARK code generation)

Comments

@treiher
Copy link
Collaborator

treiher commented Jan 18, 2022

Potential improvements:

  • Enable pragma Restrictions (No_Exceptions)
  • Enable pragma Restrictions (No_Secondary_Stack)

The code size will be calculated by the size of the .text section of the generated binary minus the size of the .text section of a reference binary that is built from an empty main procedure.

Important compiler switches:

  • -ffunction-sections, -fdata-sections, -Wl,-gc-sections
  • -Os
  • -gnatp
  • -fno-inline / -gnatd.8 (for GNAT 22.1, not needed for 23.0w)
  • -fno-early-inlining

Improvements in the code generator:

  • Group case statements with identical branches into a single case

  • Group if-else statements with identical branches into a single expression

  • Replace case statements that use exceptions with assertions (at least if many/all branches are equal)

  • Replace case statements that generate boolean values with single boolean expressions

  • Rewrite case statements in expression functions that return case based enum values as regular functions with single returns

  • Remove code duplications in RFLX_Generic_Types (merge Insert functions)

(Part 1)

  • Remove exceptions and enforce pragma Restrictions (No_Exceptions)
  • Inline Set_Field_Value_* into Set_*
  • Simplify Incomplete_Message and Initialized by using quantified expressions (no effect on binary size, but maybe positive effect on proof time)
  • Simplify Valid_Length
  • Simplify Path_Condition
  • Improve Field_Size
  • Split first part of Set_* functions
  • Defer dereferencing of buffer pointers for Extract and Insert
  • Simplify Composite_Field

(Part 2)

  • Remove duplicated finalization code for session states

(Part 3)

  • Use common integer type (e.g., U64) in context for representing field values to prevent need for variant records (i.e. Field_Dependent_Value) and therefore case statements

(Part 4)

(Part 5)

  • Move Has_Buffer of Verify into precondition

(Part 6)

  • Replace RFLX_Exception variable with gotos. Currently this variable is used to stay inside a block and call Update on the buffer at the end of that block before jumping via the goto. However it seems to be more size efficient to call the Update and jump directly.
  • Replace the pattern Available_Field (Ctx, Field) >= Field_Size (Ctx, Field) with separate function.

(Part 7)

  • Some scalar fields that do not affect the path through the message (e.g. long lists of flags) should be settable with less checks.
    • For message aggregates:
      • Check Available_Space once (Message.size must consider message path)
      • Do not check Valid_Next

(Ideas for messages)

(Ideas for sessions)

(Rejected ideas)

  • Add Inline_Always to RFLX.RFLX_Generic_Types.(Insert|Insert_LE|Extract|Extract_LE) (no effect when using 23.0w)
  • Split Get_Field_Value, Set_Field_Value and Verify into separate functions (negative impact for Get_Field_Value and Verify)
  • Refactor Successor and Valid_Predecessor (negative impact)
  • Remove duplication in Valid, Invalid and Incomplete for fields and cursors (slight negative impact on binary size)
  • Remove duplications in Field_Condition, Path_Condition and (Structural_|)Valid_Message by adding Link_Condition function (no impact on binary size)
  • Add precalculation of Val.Fld to Set
@treiher treiher added the generator Related to generator package (SPARK code generation) label Jan 18, 2022
@treiher treiher added this to To do in RecordFlux 0.6 via automation Jan 18, 2022
@jklmnn jklmnn self-assigned this Jan 18, 2022
@jklmnn
Copy link
Member

jklmnn commented Jan 19, 2022

I have done some analysis. With the method of calculating the size from the .text sections we still get approx. 160-170 KB. I have looked up another approach and it's possible to get the symbol size with the following command (the binary must not be stripped for this to work):

arm-eabi-nm --print-size --size-sort --radix=d obj/main

Most of the symbols are quite small, with 4 exceptions (sizes are decimal):

134217748 00015984 t main__responder__tick.5515
134243992 00029452 T rflx__spdm__request__get_field_value
134330152 00040700 T rflx__spdm__response__set_field_value
134273444 00050848 T rflx__spdm__request__verify

All of them have in common that they contain huge case statements (multiple hundred cases). Set_Field_Value and Get_Field_Value also have 30 to 40 instances of RFLX_Types.Extract and RFLX_Types.Insert respectively.

Knowing that generics increase the code size, we have to find a better way to express the functionality, than with generics. Looking at the fact that not all of these subprograms have generics, but all of them contain huge case statements, we may have to find another solution for this.

@kanigsson
Copy link
Collaborator

I have also identified these large case statements as major sources of proof slowness. So refactoring that would be doubly beneficial.

@treiher
Copy link
Collaborator Author

treiher commented Jan 19, 2022

All of them have in common that they contain huge case statements (multiple hundred cases). Set_Field_Value and Get_Field_Value also have 30 to 40 instances of RFLX_Types.Extract and RFLX_Types.Insert respectively.

Knowing that generics increase the code size, we have to find a better way to express the functionality, than with generics.

I wonder if the generic instances have actually a big impact on the code size. These generic functions just consist of one function call. So I would expect the impact to be rather small.

I have also identified these large case statements as major sources of proof slowness. So refactoring that would be doubly beneficial.

@kanigsson I think then it would make sense that you have a look at how the mentioned functions could be improved (also as part of #767).

@kanigsson
Copy link
Collaborator

@kanigsson I think then it would make sense that you have a look at how the mentioned functions could be improved (also as part of #767).

I agree.

@senier
Copy link
Member

senier commented Jan 19, 2022

No_Secondary_Stack will be handled in #911.

@senier senier moved this from To do to Implementation in RecordFlux 0.6 Jan 25, 2022
@jklmnn
Copy link
Member

jklmnn commented Jan 26, 2022

As it seems, gcc is doing aggressive inlining here. I manually added No_Inline so some functions, including all calls in main__responder__tick which only consists of a case statement calling other procedures. The first result was that this procedure is indeed quite small itself and that the calls contribute to most of the size. One of these calls was rflx__spdm_responder__session__receive with a size of 10kb. This procedure itself isn't big, so I went one step further. Receive has two calls to rflx__spdm__request__structural_valid_message which is ~4kb in size. After adding No_Inline here it turns out that Receive got quite small. I also noticed that not inlining Structural_Valid_Message reduced the binary size by 8kb (so it probably was inlined 3 times).
My expectation is that there are many more occurrences of this scheme (and also that the size of Structural_Valid_Message may be explained due to inlining).

@senier
Copy link
Member

senier commented Jan 26, 2022

This is unexpected, especially given -Os. Can you try to disable inlining completely (either globally or by adding No_Inline in the code generator) and see where we end up?

@jklmnn
Copy link
Member

jklmnn commented Jan 26, 2022

So I just got the hint to use -fno-inline and trying that out it indeed reduces the size of the code from 170kb to 120kb.

@jklmnn
Copy link
Member

jklmnn commented Jan 26, 2022

I noticed another source of problems. E.g. in Verify there is a large case statement that looks as follows:

case Fld is
   when F_Major_Version .. F_Negotiate_Algorithms_Request_Req_Alg_Structs =>
      Ctx.Verified_Last := ((Field_Last (Ctx, Fld) + 7) / 8) * 8;
   when F_Get_Digests_Request_Param_1 =>                         
      Ctx.Verified_Last := ((Field_Last (Ctx, Fld) + 7) / 8) * 8;
   when F_Get_Measurements_Request_Reserved_1 =>                 
      Ctx.Verified_Last := ((Field_Last (Ctx, Fld) + 7) / 8) * 8;
   when F_Get_Version_Request_Param_1 =>                         
      Ctx.Verified_Last := ((Field_Last (Ctx, Fld) + 7) / 8) * 8;
   when F_Key_Exchange_Request_Measurement_Summary_Hash_Type =>  
      Ctx.Verified_Last := ((Field_Last (Ctx, Fld) + 7) / 8) * 8;
   when F_Key_Update_Request_Key_Operation =>                    
      Ctx.Verified_Last := ((Field_Last (Ctx, Fld) + 7) / 8) * 8;
   when F_Negotiate_Algorithms_Request_Req_Alg_Struct_Count =>   
      Ctx.Verified_Last := ((Field_Last (Ctx, Fld) + 7) / 8) * 8;

You get the idea. This statement can be reduced to:

case Fld is
   when F_Major_Version .. F_Negotiate_Algorithms_Request_Req_Alg_Structs =>
      Ctx.Verified_Last := ((Field_Last (Ctx, Fld) + 7) / 8) * 8;
end case;

This change for this single instance alone reduces the code size by 6kb.

@jklmnn
Copy link
Member

jklmnn commented Jan 26, 2022

Adding -gnatp suppresses all checks. This brings the binary size down to ~70kb.

@treiher
Copy link
Collaborator Author

treiher commented Jan 26, 2022

I noticed another source of problems. E.g. in Verify there is a large case statement that looks as follows:

That is related to #644. This case statement was introduced to increase the provability of the context predicate. All branches contain the exact same statement. There are more instances of this.

@senier
Copy link
Member

senier commented Jan 26, 2022

That is related to #644. This case statement was introduced to increase the provability of the context predicate. All branches contain the exact same statement. There are more instances of this.

Do you mean #664?

@treiher
Copy link
Collaborator Author

treiher commented Jan 26, 2022

Yes, you are right. That was a typo.

@jklmnn
Copy link
Member

jklmnn commented Jan 26, 2022

Is there a way to easily reverse this behaviour?

@treiher
Copy link
Collaborator Author

treiher commented Jan 26, 2022

Removing these case statements in the code generator is quite simple. There are three instances in rflx/generator/parser.py (all marked with Componolit/RecordFlux#664).

@jklmnn
Copy link
Member

jklmnn commented Jan 26, 2022

With -fno-inline and -gnatp there are still a few large functions left. One of them is Field_First containing a large case statement. As it is a expression function contains many equal cases I suppose that's to help the provers?
I've taken a look and I think the following should express the same thing without increasing the code size:

function Field_First (Ctx : Context; Fld : Field) return RFLX_Types.Bit_Index is
   if Fld = F_Major_Version then
      return Ctx.First;
   end if;
   pragma Assert (Fld = F_Minor_Version and then Ctx.Cursors (Fld).Predecessor = F_Major_Version and then Ctx.Cursors (F_Major_Version).Value.Major_Version_Value = 1);
   pragma Assert (Fld = F_Code and then Ctx.Cursors (Fld).Predecessor = F_Minor_Version and then Ctx.Cursors (F_Minor_Version).Value.Minor_Version_Value <= 1);
   ...
   return Ctx.Cursors (Ctx.Cursors (Fld).Predecessor).Last + 1;
end Field_First;

This has the advantage that the checking code (which I suppose is handled by the precondition?) is not compiled and that no exception raises are required.

This also seem to apply to Valid_Length.

I think generally we should try to avoid these kind of case statements that include lots of exception raises. Each raise causes a jump to a local exception call. While this is already the most efficient way to call the exception at each case, it still makes up roughly half of the generated code in these functions.

@jklmnn
Copy link
Member

jklmnn commented Jan 26, 2022

The currently two largest functions are Get_Field_Value and Set_Field_Value. Their size can be reduced a bit by setting Always_Inline for Extract and Insert however that only contributes to a hew hundred byte of the whole binary. The main reason are probably the large case statements. However in this case I wasn't able to come up with a good solution without breaking the type safety.

@treiher
Copy link
Collaborator Author

treiher commented Jan 26, 2022

One of them is Field_First containing a large case statement. As it is a expression function contains many equal cases I suppose that's to help the provers?
I've taken a look and I think the following should express the same thing without increasing the code size:

Field_First encodes all the First aspects of the model. Using a case statement was the most obvious solution to do that. Using a case statement, is apparently not the best solution, especially as the First aspect is rarely used. We should improve the code generation there. I think your proposed code would only work, if we would put the case statement (or something semantically equivalent) in the post condition. I guess there could be an even simpler solution, using if-expressions only for the cases where an First aspect was defined.

Could you please collect all your findings where the code generator needs to be modified in a task list? I think that would be helpful to keep track.

@treiher
Copy link
Collaborator Author

treiher commented Jan 26, 2022

I think generally we should try to avoid these kind of case statements that include lots of exception raises. Each raise causes a jump to a local exception call. While this is already the most efficient way to call the exception at each case, it still makes up roughly half of the generated code in these functions.

I guess a function call with Pre => False would not be better? That is what we used before.

@jklmnn
Copy link
Member

jklmnn commented Jan 26, 2022

I guess a function call with Pre => False would not be better? That is what we used before.

That would get rid of the exception code, however it would not get rid of a jump/call in each branch of the case statement. And this is what currently increases the code size. The problem is that the compiler can't infer that these functions will never be called.

I think your proposed code would only work, if we would put the case statement (or something semantically equivalent) in the post condition.

Putting the case statement as is into a post condition shouldn't be a problem in terms of code size since the compiler doesn't compile that. We could also put it into a ghost function and use that.

@jklmnn
Copy link
Member

jklmnn commented Jan 27, 2022

After some discussions it seems that the aggressive inlining is caused by expression functions. Expression functions are always inlined (see V127-006). With GNAT 23.0w using -gnatd.8 yields slightly better results than using -fno-inline, presumably because functions where inlining is useful are inlined now while the overly aggressive inlining is no longer existing.

@treiher treiher moved this from Implementation to To do in RecordFlux 0.6 May 16, 2022
treiher added a commit that referenced this issue May 17, 2022
treiher added a commit that referenced this issue May 17, 2022
treiher added a commit that referenced this issue May 18, 2022
treiher added a commit that referenced this issue May 18, 2022
treiher added a commit that referenced this issue May 19, 2022
@jklmnn jklmnn moved this from To do to Implementation in RecordFlux 0.6 Jun 16, 2022
treiher added a commit that referenced this issue Jul 18, 2022
treiher added a commit that referenced this issue Jul 18, 2022
treiher added a commit that referenced this issue Jul 18, 2022
treiher added a commit that referenced this issue Jul 18, 2022
treiher added a commit that referenced this issue Jul 19, 2022
treiher added a commit that referenced this issue Jul 19, 2022
treiher added a commit that referenced this issue Jul 19, 2022
treiher added a commit that referenced this issue Jul 21, 2022
@senier
Copy link
Member

senier commented Aug 30, 2022

All realistic optimization have been done.

@senier senier closed this as completed Aug 30, 2022
RecordFlux 0.6 automation moved this from Implementation to Done Aug 30, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
generator Related to generator package (SPARK code generation)
Projects
No open projects
Development

No branches or pull requests

4 participants