-
Notifications
You must be signed in to change notification settings - Fork 292
412 - Make timestamp generator monotonic #335
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
mpenick
commented
Jan 23, 2017
- Timestamps now use microsecond precision instead of millisecond.
* Timestamps now use microsecond precision instead of millisecond.
|
It looks like I need to fix the format specifiers in that clock skew warning log message. I'll do that tomorrow. |
stamhankar999
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Generally looks very good; please address the typos, comment refinements, and mull over the warning reporting logic that I commented on.
include/cassandra.h
Outdated
| * | ||
| * This timestamp generator will generate warnings if greater than 1 second of | ||
| * clock skew is detected. I will print an error every second until the clock skew | ||
| * is resolved. These setting can be changed by using |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These setting => These settings
src/timestamp_generator.cpp
Outdated
| int64_t current = get_time_since_epoch_us(); | ||
|
|
||
| if (last >= current) { // There's clock skew | ||
| // If we execeed our warning threshhold then warn periodically that clock |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
execeed => exceed
include/cassandra.h
Outdated
| * microsecond timestamps with the sub-millisecond part generated using a counter. | ||
| * The implementation guarantees that no more than 1000 timestamps will be generated | ||
| * for a given clock tick even if shared by multiple session objects. If that rate is | ||
| * exceeded then a warning is logged and timestamps stop incrementing until the next |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The above comment-block probably needs to be updated -- the new code has proper microsecond precision and bumps the timestamp by 1 when there is clock skew, rather than using the lowest 3 digits as a counter?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch. Fixing.
include/cassandra.h
Outdated
| * warnings about clock skew. | ||
| * | ||
| * @param warning_threshold_us The amount of clock skew, in microseconds, that | ||
| * must be detected before a warning is triggered. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You should mention that setting this < 0 will disable warnings generation.
| if (warning_threshold_us_ >= 0 && last > current + warning_threshold_us_) { | ||
| int64_t now = get_time_since_epoch_ms(); | ||
| int64_t last_warning = last_warning_.load(); | ||
| if (now > last_warning + warning_interval_ms_ && |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
With the default settings, will this trigger a warning after the first? If your clock is skewed by 1 sec (as determined in line 72), your now will be 1 sec less than real current time. If your last warning was emitted a second ago, then now won't be greater than last_warning + warning_interval_ms_. OTOH, last_warning was recorded with a skewed time, so maybe it balances out.
Maybe there isn't an issue here, but worth mulling over some more to be sure.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch. This is interesting. I guess this could happen if your clock skews by another second consecutively several times in a row.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We just ran into this decision too. We decided that either behavior is reasonable, so we chose not to account for the 1-second offset to match the Java driver's behavior, as we understand it:
apache/cassandra-python-driver#681 (comment)
Open to hearing other opinions if people have strong ones, but I don't see a strong reason to diverge from the Java driver.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A real fix is to use a monotonic system clock to manage the warning interval. The problem is getting that to work properly on all supported platforms (esp. in Java and Python). It might be overkill for handling the warnings, but I'm going at least investigate how hard it would be to implement a millisecond precision monotonic clock for the C++ driver's major platforms (BSD, Linux, Windows).
| Atomic<int64_t> last_warning_; | ||
|
|
||
| const int64_t warning_threshold_us_; | ||
| const int64_t warning_interval_ms_; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I understand that warning_threshold_us_ is signed so that a user can set it negative to indicate "don't emit warnings"; why is warning_interval_ms_ a signed 64-bit int?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If either of them being negative indicates disabling skew warnings, that's cool, but it should be documented as such and line 72 should check that warning_threshold_ms_ is non-negative.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I changed this to make warning_interval_ms_ zero if it's a negative number. I also added documentation. What do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You mean warning_interval_ms_. But why not making it a uint64_t then? A user who supplies a negative value will get a compiler warning, wouldn't he?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It doesn't work like that. Try compiling this: https://gist.github.com/mpenick/7878d5d328bbb9aa74a1f7dd125ebe15
$ clang++ -Wall -Wextra -pedantic unsigned.cpp
unsigned.cpp:5:30: warning: 'long long' is an extension when C99 mode is not enabled [-pedantic,-Wlong-long]
printf("%llu\n", (unsigned long long)value);
^
1 warning generated.
$ g++ -Wall -Wextra -pedantic unsigned.cpp
unsigned.cpp: In function ‘void print(uint64_t)’:
unsigned.cpp:5:21: warning: ISO C++ 1998 does not support ‘long long’ [-Wlong-long]
unsigned.cpp:5:45: warning: ISO C++ does not support the ‘ll’ gnu_printf length modifier [-Wformat]
A warning is not generated for the negative values.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I meant warning_interval_ms.
I think it was a mistake to have unsigned values exposed in the API. I've been attempting to move away from it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Interesting that the compiler doesn't warn when it knows you're trying to put a negative value into an unsigned type.
Also, why do you think it was a mistake to expose unsigned values? It shows an intent for valid argument values (even if the compiler doesn't enforce it).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It shows an intent for valid argument values.
I use to think this too, but for the pitfall shown in the above example it's not worth it. If a user passes a negative number the value turns into a really large number and it's not possible to detect an error. It's better to use signed values and return an explicit error (or at least document and correct it).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point. Signed it is! :)
| // limit and the clock skew threshold then a log should have been printed. | ||
| double timestamp_rate = static_cast<double>(NUM_TIMESTAMPS) / elapsed; | ||
| if (timestamp_rate > 1000000.0 && elapsed > 1.0) { | ||
| BOOST_CHECK(counter > 0); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In practice, does the test get here? Does it take at least 1.0 seconds to get 5,000,000 timestamps?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes. I don't expect the warning to trigger if with the default settings unless we sustain a timestamp generation rate greater than one per millisecond for at least one second. Maybe this is incorrect?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmmm. I think a better test might use time instead of an arbitrary number of timestamps...I'm fixing this.
|
|
||
| // The 100ms timestamp generator should have logged more times because | ||
| // it had a shorter iterval. | ||
| BOOST_CHECK(counter_100ms > counter_default); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was confused at first that you're not doing any magic to fake out the clock (to make actual clock skew) but you get a good approximation by getting the current-time multiple times per microsecond and making sure the timestamp is still monotonically increasing. Good stuff.
* Fixed typos. * Set `warning_interval_ms` to 0 if it's passed as a negative value.
* Added monotonic clock support via `get_time_monotonic_ns()`. * Leveraging the monotonic clock to properly handle timestamp generation clock skew warnings.
* Monotonic timestamp tests now use a duration instead of an arbitrary number of generated timestamps. * Fix compiler errors for the 64-bit format specifiers.
Fixed unit test.
stamhankar999
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fix the remaining typos and consider making the interval attribute uint64_t. Looks good and merge at will.
include/cassandra.h
Outdated
| * is corrected. | ||
| * | ||
| * By default, this timestamp generator will generate warnings if more than | ||
| * 1 second of clock skew is detected. I will print an error every second until |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I => It
include/cassandra.h
Outdated
| * be used to disable warnings. | ||
| * @param warning_interval_ms The amount of time, in milliseonds, to wait before | ||
| * warning again about clock skew. An interval value less than or equal to 0 allows | ||
| * the warning to be trigger every millisecond. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
trigger => triggered
src/get_time.cpp
Outdated
| #elif defined(__APPLE__) && defined(__MACH__) | ||
|
|
||
| uint64_t get_time_since_epoch_ms() { | ||
| // Information on converting the absolute time to nanaoseconds can be found |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nanaoseconds => nanoseconds
src/get_time.cpp
Outdated
| static_cast<uint64_t>(tp.tv_nsec); | ||
| } else { | ||
| return get_time_since_epoch_us() * NANOSECONDS_PER_MICROSECOND; | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm torn on one point here -- this whole file is platform-specific, so I feel like it would be better to split it into three files (say get_time-win.cpp, get_time-mac.cpp, get_time-unix.cpp) to make it easier for a developer to look at in the future. On the other hand, it's 150+ lines of code, which isn't that big.
Consider splitting, but I'm fine either way.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good call. I didn't realized this file was getting big.
| Atomic<int64_t> last_warning_; | ||
|
|
||
| const int64_t warning_threshold_us_; | ||
| const int64_t warning_interval_ms_; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You mean warning_interval_ms_. But why not making it a uint64_t then? A user who supplies a negative value will get a compiler warning, wouldn't he?
Fixed typos.
Split platform specific get time implementations into platform specific files.
Adjusted the timestamp generation unit test parameters so that they're less fragile (esp. in the CI environment).
Fix test build issue on Windows.
|
Squashed and merged at 16577c5 |