Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

common: make ceph_clock_now() inlineable. #20443

Merged

Conversation

rzarzynski
Copy link
Contributor

The ceph_clock_now() is a widely spread but thin routine. All it does is wrap clock_gettime or gettimeofday with accompanying conversion to utime_t.

Unfortunately, as it is defined outside of header, compilers are enforced to generate a full-blown function. The overhead is related not only the well visible stack smashing protection but also to enforcing callers to go through PLT each time.

Taking into account the time getters are usually user-space syscalls (leveraging e.g. the VDSO mechanism), eradicating even small boilerplate might be beneficial.

0000000000000000 <ceph_clock_now()>:
   0:   48 83 ec 28             sub    $0x28,%rsp
   4:   31 ff                   xor    %edi,%edi
   6:   48 89 e6                mov    %rsp,%rsi
   9:   64 48 8b 04 25 28 00    mov    %fs:0x28,%rax
  10:   00 00
  12:   48 89 44 24 18          mov    %rax,0x18(%rsp)
  17:   31 c0                   xor    %eax,%eax
  19:   e8 00 00 00 00          callq  1e <ceph_clock_now()+0x1e>
  1e:   8b 44 24 08             mov    0x8(%rsp),%eax
  22:   48 c1 e0 20             shl    $0x20,%rax
  26:   48 89 c2                mov    %rax,%rdx
  29:   8b 04 24                mov    (%rsp),%eax
  2c:   48 09 d0                or     %rdx,%rax
  2f:   48 8b 4c 24 18          mov    0x18(%rsp),%rcx
  34:   64 48 33 0c 25 28 00    xor    %fs:0x28,%rcx
  3b:   00 00
  3d:   75 05                   jne    44 <ceph_clock_now()+0x44>
  3f:   48 83 c4 28             add    $0x28,%rsp
  43:   c3                      retq
  44:   e8 00 00 00 00          callq  49 <SubProcess::spawn()::__PRETTY_FUNCTION__+0x9>

Signed-off-by: Radoslaw Zarzynski rzarzyns@redhat.com

The `ceph_clock_now()` is a widely spread but thin routine.
All it does is wrap `clock_gettime` or `gettimeofday` with
accompanying conversion to `utime_t`.

Unfortunately, as it is defined outside of header, compilers
are enforced to generate a full-blown function. The overhead
is related not only the well visible stack smashing protection
but also to enforcing callers to go through PLT each time.

Taking into account the time getters are usually *user-space
syscalls* (leveraging e.g. the VDSO mechanism), eradicating
even small boilerplate might be beneficial.

```
0000000000000000 <ceph_clock_now()>:
   0:   48 83 ec 28             sub    $0x28,%rsp
   4:   31 ff                   xor    %edi,%edi
   6:   48 89 e6                mov    %rsp,%rsi
   9:   64 48 8b 04 25 28 00    mov    %fs:0x28,%rax
  10:   00 00
  12:   48 89 44 24 18          mov    %rax,0x18(%rsp)
  17:   31 c0                   xor    %eax,%eax
  19:   e8 00 00 00 00          callq  1e <ceph_clock_now()+0x1e>
  1e:   8b 44 24 08             mov    0x8(%rsp),%eax
  22:   48 c1 e0 20             shl    $0x20,%rax
  26:   48 89 c2                mov    %rax,%rdx
  29:   8b 04 24                mov    (%rsp),%eax
  2c:   48 09 d0                or     %rdx,%rax
  2f:   48 8b 4c 24 18          mov    0x18(%rsp),%rcx
  34:   64 48 33 0c 25 28 00    xor    %fs:0x28,%rcx
  3b:   00 00
  3d:   75 05                   jne    44 <ceph_clock_now()+0x44>
  3f:   48 83 c4 28             add    $0x28,%rsp
  43:   c3                      retq
  44:   e8 00 00 00 00          callq  49 <SubProcess::spawn()::__PRETTY_FUNCTION__+0x9>
```

Signed-off-by: Radoslaw Zarzynski <rzarzyns@redhat.com>
@yehudasa
Copy link
Member

@rzarzynski did you look into modifying it to use the new time api (e.g., ceph::real_time) instead of utime_t?

@branch-predictor
Copy link
Contributor

In my testing, benefit is in the range of 3%. Considering that entire call of ceph_clock_now().nsec() takes (on my test/dev Haswell Xeon machine) takes around 0.000000054s (test runtime 53.7 divided by 1000000000 iterations), this optimization reduces it to something aorund 0.000000052.

Tested with

void check_time()
{
  uint64_t l = 0;
  for (int i = 0; i < 1000000000; ++i) {
     l += ceph_clock_now().nsec();
  }
  cerr << l << std::endl;               
}

I think it would be more beneficial to just reduce number of calls to ceph_clock_now() where it matters.

Copy link
Member

@liewegas liewegas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good to me!

Agree we need to reduce calls to this. I think that is also a good opportunity to evaluate each user to decide whether it should be coarse or monotonic time instead!

@rzarzynski
Copy link
Contributor Author

One of the common use cases of ceph_clock_now() is to feed PerfCounters. They are enabled by default. What about changing the policy and making the early exit inlineable?

@liewegas
Copy link
Member

Sounds good. I don't expect them to be off much, but it doesn't hurt!

@tchaikov tchaikov merged commit b06dd3b into ceph:master Feb 21, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
5 participants