## 7.14 函数

Function calls may slow down a program for the following reasons:
- The function call makes the microprocessor jump to a different code address and back
again. This may take up to 4 clock cycles. In most cases the microprocessor is able to
overlap the call and return operations with other calculations to save time.
- The code cache works less efficiently if the code is fragmented and scattered around in
memory.
- Function parameters are stored on the stack in 32-bit mode. Storing the parameters on
the stack and reading them again takes extra time. The delay is significant if a
parameter is part of a critical dependency chain.
- Extra time is needed for setting up a stack frame, saving and restoring registers, and
possibly save exception handling information.
- Each function call statement occupies a space in the branch target buffer (BTB).
Contentions in the BTB can cause branch mispredictions if the critical part of a program
has many calls and branches

The following methods may be used for reducing the time spent on function calls in the critical part of a program.

**<u>Avoid unnecessary functions</u>**

Some programming textbooks recommend that every function that is longer than a few lines
should be split up into multiple functions. I disagree with this rule. Splitting up a function into
multiple smaller functions only makes the program less efficient. Splitting up a function just
because it is long does not make the program more clear unless the function is doing
multiple logically distinct tasks. A critical innermost loop should preferably be kept entirely
inside one function, if possible.

**<u>Use inline functions</u>**

An inline function is expanded like a macro so that each statement that calls the function is
replaced by the function body. A function is usually inlined if the inline keyword is used or
if its body is defined inside a class definition. Inlining a function is advantageous if the
function is small or if it is called only from one place in the program. Small functions are
often inlined automatically by the compiler. On the other hand, the compiler may in some
cases ignore a request for inlining a function if the inlining causes technical problems or
performance problems.

</u>**Avoid nested function calls in the innermost loop</u>**

A function that calls other functions is called a frame function, while a function that does not
call any other function is called a leaf function. Leaf functions are more efficient than frame
functions for reasons explained on page 63. If the critical innermost loop of a program
contains calls to frame functions then the code can probably be improved by inlining the
frame function or by turning the frame function into a leaf function by inlining all the
functions that it calls.

**<u>Use macros instead of functions</u>**

A macro declared with #define is certain to be inlined. But beware that macro parameters
are evaluated every time they are used. Example:
```cpp
// Example 7.34a. Use macro as inline function
#define MAX(a,b) (a > b ? a : b)
y = MAX(f(x), g(x));
```
In this example, f(x) or g(x) is calculated twice because the macro is referencing it twice.
You can avoid this by using an inline function instead of a macro. If you want the function to
work with any type of parameters then make it a template:
```cpp
// Example 7.34b. Replace macro by template
template <typename T>
static inline T max(T const & a, T const & b) {
 return a > b ? a : b;
}
```
Another problem with macros is that the name cannot be overloaded or limited in scope. A
macro will interfere with any function or variable having the same name, regardless of scope
or namespaces. Therefore, it is important to use long and unique names for macros,
especially in header files.


**<u>Use fastcall functions</u>**

The keyword `__fastcall changes` the function calling method in 32-bit mode so that the
first two (three on CodeGear compiler) integer parameters are transferred in registers rather
than on the stack. This can improve the speed of functions with integer parameters.
Floating point parameters are not affected by `__fastcall`. The implicit 'this' pointer in
member functions is also treated like a parameter, so there may be only one free register
left for transferring additional parameters. Therefore, make sure that the most critical integer
parameter comes first when you are using `__fastcall`. Function parameters are
transferred in registers by default in 64-bit mode. Therefore, the `__fastcall` keyword is
not recognized in 64-bit mode.

**<u>Make functions local</u>**

A function that is used only within the same module (i.e. the current .cpp file) should be
made local. This makes it easier for the compiler to inline the function and to optimize
across function calls. There are three ways to make a function local:

1. Add the keyword static to the function declaration. This is the simplest method,
but it doesn't work with class member functions, where static has a different
meaning.

2. Put the function or class into an anonymous namespace.

3. The Gnu compiler allows `"__attribute__((visibility("hidden")))"`.

**<u>Use whole program optimization</u>**

Some compilers have an option for whole program optimization or for combining multiple
`.cpp` files into a single object file. This enables the compiler to optimize register allocation
and parameter transfer across all .cpp modules that make up a program. Whole program
optimization cannot be used for function libraries distributed as object or library files.

**<u>Use 64-bit mode</u>**

Parameter transfer is more efficient in 64-bit mode than in 32-bit mode, and more efficient in
64-bit Linux than in 64-bit Windows. In 64-bit Linux, the first six integer parameters and the
first eight floating point parameters are transferred in registers, totaling up to fourteen
register parameters. In 64-bit Windows, the first four parameters are transferred in registers,
regardless of whether they are integers or floating point numbers. Therefore, 64-bit Linux is
more efficient than 64-bit Windows if functions have more than four parameters. There is no
difference between 32-bit Linux and 32-bit Windows in this respect.