Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: End time #26

Merged
merged 4 commits into from
Mar 3, 2023
Merged

feat: End time #26

merged 4 commits into from
Mar 3, 2023

Conversation

pnadolny13
Copy link
Collaborator

@pnadolny13 pnadolny13 commented Mar 2, 2023

Closes #25

  • allow end_date configuration
  • add alter end date method that takes the earlier of the given timestamp or now - 5 mins. This makes sure we're always leaving a 5 minute buffer for late logs to arrive.
  • when splitting time ranges into batches use the end timestamp as the upper limit so it never exceeds our buffer.
  • adds and updates tests to cover these changes
  • alters tests to use stop using epoch timestamps because its hard to understand what its trying to do. I find it more readable to see the string timestamp representations in the input/expected output. The util method helps convert it for us so its correct under the hood.

@aaronsteers let me know if you have any thoughts?

@pnadolny13 pnadolny13 changed the title End time feat: End time Mar 2, 2023
@aaronsteers
Copy link

aaronsteers commented Mar 2, 2023

@pnadolny13 - Wdyt of adding support for a relative offset here? I did some research and ISO 8601 standard we're using elsewhere does have support for intervals, which (when provided as negative) we could interpret as a definition relative to "now". It occurs to me since we're basically defaulting to a relative date here, it would be worthwhile to be able to express that default in terms on the input values supported.

Proposal

For this PR:

What do you think of us defining end_date to be either an ISO 8601 date or datetime value (as is expected for start_date) or an ISO 8601 time interval. We could then use the following interpretation logic:

  • If start_date is a fixed date/datetime and end_date is a positive interval: then end date would be start_date+interval.
  • If start_date is omitted and end_date is a positive interval, then the range is undefined and the tap should fail.
    • Intentionally ignore bookmark dates, since those should not be presumed to be a replacement for start_date config.
  • If end_date is a negative interval, then end date is now()-interval.
    • "Now" in this case is fixed to the start of the tap, and not recalculated for obvious reasons.
  • If both start_date and end_date are provided, and if end_date is prior to start_date after above rules are applied, then the range is invalid and the tap should fail.

Implications for start_date in the SDK

A common request for start_time is to support relative dates or relative time inputs.

If we like this implementation for end_date, a natural next step would be to support negative intervals as offsets from now() within the SDK handling of start_date. This would potentially be a valuable feature for tap users, while still being based on standards, and backwards compatible with existing legacy taps. The caveat would be that only SDK-based taps accept relative date values in start_date, but absolute date values would still be accepted in start_date and end_date.

Interpreting the interval format

From: https://tc39.es/proposal-temporal/docs/duration.html

Briefly, the ISO 8601 notation consists of a P character, followed by years, months, weeks, and days, followed by a T character, followed by hours, minutes, and seconds with a decimal part, each with a single-letter suffix that indicates the unit. Any zero components may be omitted. For more detailed information, see the ISO 8601 standard or the Wikipedia page.

image

Not specified above, but negative intervals can be provided by prefixing '-'.

So, from the above, our proposed default of "5 minutes ago" would be "end_date": "-PT5M". To include exactly 10 days of data from the provided start_date, the end date could use a interval:"end_date": "P10D".

Other notes

cc @edgarrmondragon, @kgpayne

@pnadolny13
Copy link
Collaborator Author

pnadolny13 commented Mar 3, 2023

@aaronsteers oh I like that idea! I have thoughts on that but do you mind if we merge this as v1 then have another PR to implement the improved end_date syntax? I want to get this deployed as soon as possible to test that its actually resolving the problem I think we have.

UPDATE: I opened #27 with my thoughts

@pnadolny13 pnadolny13 merged commit b2258e5 into main Mar 3, 2023
@aaronsteers
Copy link

@pnadolny13 - sounds great! 👍

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Handle late arriving events
2 participants