Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Using TimeStamp class can lead to incompatible timestamps in RedisTimeSeries #20

Open
shaunsales opened this issue Jun 20, 2020 · 7 comments

Comments

@shaunsales
Copy link
Contributor

shaunsales commented Jun 20, 2020

The TimeStamp class uses DateTime.Ticks which is incompatible with the native Unix Epoch milliseconds used by Redis TimeSeries.

If you have cross-platform clients accessing the time series data, or use the * auto timestamp which defaults to Unix ms, it's easy to end up with unexpected results when querying ranges and aggregating samples.

I suggest the TimeStamp class bounds check entries and default to supporting only UnixMs min and max timestamps. After testing, it appears that Redis TimeSeries only supports positive timestamp values, which means any dates before epoch won't work.

This would allow * to work as expected from the NRedisTimeSeries client and timestamps would be compatible across Unix/MacOS and Windows platforms. However using Unix ms means that DateTime.MinValue would need to throw an out of range exception. I feel if we build out a decent TimeStamp struct, this is an acceptable tradeoff.

@DvirDukhan
Copy link
Collaborator

Hi @shaunsales
Regarding the TimeStamp type: As RedisTimeSeries timestamps are 64-bit integers, you can model your data with your own sequence of timestamps values (as long as this sequence is monotonically increasing). As long as you are consistent with your values it should work. My original intention for the TimeStamp class is to have a holistic solution for time representation for .NET users. By that, I tried to give the users an opportunity to set their own timestamps with long values that they can calculate however they want. Also, I wanted the users of this client to avoid additional conversions from c# native time representation which is the DateTime. So In addition to the RedisTimeSeries API, users can have two options to use set their own timestamp sequence values, either by giving a long value, which created by them or by giving a DateTime value, from their .NET application using this client.
In my mind, users who model their data with DateTime will not call create new timestamp with *, because of the format differences.

I want first to put aside the issue that you mentioned which is the negative timestamp values. I missed that the module defined its timestamp type as an unsigned 64-bit integer, as it seems trivial for me that you might want to model your data with timestamps prior to epoch:
https://github.com/RedisTimeSeries/RedisTimeSeries/blob/612abc3defb796ecdf11cf459c4ebe8e34ae552d/src/consts.h#L16
I would like to have the module team @ashtul and @danni-m input for having a negative timestamp value before I add validation on it. Adding a validation will cause a breaking change - moving from long to unsigned long an will cause a new release with a new major version (2.x)

Regarding the mixture of DateTime and Unix epoch milliseconds, I would like to propose two options and I will happy to have your input, as well as other members of the community, reading this:

  1. Have two methods that allow converting from DateTime to TimeStamp, and vice versa, such that the TimeStamp value will hold a Unix milliseconds value. This can be done as shown here
  2. Have the TimeStamp hold only Unix milliseconds value, by the conversion above used in the implicit cast between DateTime and TimeStamp

Personally I prefer the first one as this leaves more freedom to this project users to model their data as they see fit. Although it will make them use a conversion, their data model should be uniform and they will be able to use RedisTimeSeries default API.

Let me know what you think, also, if I missed anything in my answer let me know.

@shaunsales
Copy link
Contributor Author

To expand on the original issue, there were two problems I ran up against when I first integrated the client into our project, which cost a few hours and I felt could be easily avoided;

  1. It's not intuitive that the .NET implementation of the client uses one time format and that Redis uses another. The documentation for RedisTimeSeries mentions milliseconds for timestamps and aggregation time buckets and uses it in all the examples, whereas the atomic unit for DateTime.Ticks is 100ns. Typically when you start out with NRedisTimeSeries you would mix a code based approach with CLI testing to familiarise yourself with how the underlying system works, which can lead to confusion for the developer.

  2. The * timestamp parameter will always use the Unix epoch ms. This makes an opinionated push for all timestamps to use the same format, which I don't think is a bad thing for cross-platform compatibility. There may be some limitations storing samples pre-Unix epoch, but that could be addressed separately as part of the RedisTimeSeries module.

@DvirDukhan I would agree with your first suggestion as being the preferable option. Ideally we want the following to be consistent for .NET developers;

  1. Using * or DateTime.UtcNow should not yield wildly different timestamps
  2. The timbucket aggregation units should be consistent with the underlying module and match the documentation
  3. Using the .NET client should not break cross-platform compatibility (i.e. force a Unix client to calculate .NET DateTime.Ticks)

It seems we will need a breaking change to the API, but it will greatly improve usability and reduce the amount of time needed when onboarding new developers. If we could remove the TimeStamp class and rewrite the methods to be more intuitive that would be a bonus; For example they take a DateTime by default, and if no DateTime is provided they default to *.

Hope this is useful. I'm happy to create a PR with a proposed API to scaffold out the features above.

Best,
Shaun

@DvirDukhan
Copy link
Collaborator

@shaunsales
As for now, RedisTimeSeries still allows 64-bit integers to be timestamps values, regardless of their semantic value (e.g, offset from Unix Epoch, or offset from 1.1.1900). Until the module team decides to use exclusively Unix Epoch offsets for internal calculations I want users to have the freedom to choose their data format, so I still want to support native DateTime ticks as a possible value for timestamp, where it is the users' responsibility to enforce their own semantics, e.g. not using *, so for that reason, I want to leave TimeStamp as is. If I'm missing something, please let me know, as this is a very productive discussion.
Having said that
I think that in order to be compliant with both RedisTimeSeries * operator as well as DateTime, either I will either introduce the conversions that I write to you in the previous answer as static functions
OR
I will introduce a new class UnixTimeStamp which will inherit from TimeStamp. This class's static cast from and to DateTime will invoke the above conversions, so the user will not have to explicitly call those conversions, and the API does not change as UnixTimeStamp is a TimeStamp
Be aware that the above conversions might cause data accuracy loss, as this is lower resolution value (ms vs ticks)

Waiting for your input

@shaunsales
Copy link
Contributor Author

How about the following, for the best of both worlds - the signature for methods requiring a timestamp can be one of three types;

Add(string key, DateTime timeStamp, double value ...)
Add(string key, UnixTimeStamp timeStamp, double value ...)
Add(string key, long timeStamp, double value ...) // Roll your own

If we want to use DateTime, we can use the following;

new DateTime() // As per .NET implementation
DateTime.MinValue
DateTime.MaxValue
DateTime.UtcNow

The UnixTimeStamp, for Unix Ms, provides the following options;

new UnixTimeStamp(DateTime timeStamp)
new UnixTimeStamp(long timeStamp) // Provides bounds checks
UnixTimeStamp.MinValue // Same as "-"
UnixTimeStamp.MaxValue // Same as "+"
UnixTimeStamp.Auto // Same as "*"

The reason I'm in favour of this method is that it's always easy to infer what timestamping format you're using. I think 90% of developers will be happy with the DateTime implementation, and those who need cross-platform compatibility will likely prefer the UnixTimeStamp.

Also, another quite important thing to consider is the way we work with timebuckets - currently it's very easy to make mistakes with the range methods. I propose the following;
Range(string key, DateTime from, DateTime to, ... TimeSpan timeBucket)
Range(string key, UnixTimeStamp from, UnixTimeStamp to, ... long timeBucketMs)
Range(string key, long from, long to, ... long timeBucket)

I'm not sure if we want UnixTimeStamp to implicitly cast to long, but I do like the idea of being very obvious to developers what type of timestamp they are using, and keeping it consistent.

Look forward to your thoughts on the above.

Best,
Shaun

@DvirDukhan
Copy link
Collaborator

@shaunsales
sorry for the delay in the response
what do you expect to be the return type of timestamp from RedisTimeSeries commands?

@shaunsales
Copy link
Contributor Author

@DvirDukhan The return type should match the input type.

I would expect that most users would use one type of time stamp per project.

@DvirDukhan
Copy link
Collaborator

@shaunsales
Sorry for the long delay, I have been busy with RedisGraph 2.2 release
I consulted again with my colleges:
I will have the conversion from DateTime to Unix epoch as the implicit cast to and from TimeStamp (breaking change), so I will release a new version (hopefully this or next week) as there is a breaking change in time-series module itself also
Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants