Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

model: fix timezone handling for Edm.DateTimeOffset #184

Merged
merged 3 commits into from
Jan 12, 2022

Conversation

rettichschnidi
Copy link
Contributor

@rettichschnidi rettichschnidi commented Dec 22, 2021

Convert Edm.DateTimeOffset entities to Python datetime objects with a
timezone attached, instead of simply passing them around as plain
string.

@cla-assistant
Copy link

cla-assistant bot commented Dec 22, 2021

CLA assistant check
All committers have signed the CLA.

@rettichschnidi rettichschnidi force-pushed the rs/implement-Edm.DateTimeOffset branch 5 times, most recently from 67bf119 to a05ffd2 Compare December 22, 2021 14:53
@phanak-sap
Copy link
Contributor

Seems that current_timezone() method (which should probably be deleted in the PR as well, since this replaces its usages) and the comments there are wrong.

But then, this same problem should be fixed for Edm.DateTime as well, not only for Edm.DateTimeOffset. Correct?

Note, relevant documents:

  • Spec in odata.org - Edm.DateTimeOffset: Defined by the lexical representation for datetime (including timezone offset)
  • MS
  • XML schema, no anchors - Section 3.2.7.3 Timezones

CHANGELOG.md Outdated
@@ -6,6 +6,9 @@ and this project adheres to [Semantic Versioning](http://semver.org/).

## [Unreleased]

### Added
Copy link
Contributor

@phanak-sap phanak-sap Dec 22, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not new feature, but bugfix (DateTime and DateTimeOffset are already supported, however they expects that everything incoming is in UTC).

Pls change from "Added" to "Fixed" section, as we are using https://keepachangelog.com/en/1.0.0/

Copy link
Contributor Author

@rettichschnidi rettichschnidi Jan 2, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems that current_timezone() method (which should probably be deleted in the PR as well, since this replaces its usages) and the comments there are wrong.

Not sure I understand you correctly, but this would essentially mean that Edm.DateTime will be converted to naive datetime objects, instead of "aware" ones with a hard coded UTC timezone?

But then, this same problem should be fixed for Edm.DateTime as well, not only for Edm.DateTimeOffset. Correct?

The literal form of Edm.DateTime in Overview does not mention either, timezone nor offset. Atom Format references the literal form directly, therefore can also not represent a timezone/offset. JSON however has an offset described. 😕

As I have yet not seen a timezone/offset on a Edm.DateTime, I assumed it will never have an offset. Additionally, what would be the purpose of Edm.DateTimeOffset if both types can contain a timezone?

Possible way forward (Again, not sure if I understood you correctly):

  • Edm.DateTime: Remove fake timezone information (separate PR?)
  • Edm.DateTimeOffset: Add timezone information (original idea of this PR)

What do you think?

Copy link
Contributor Author

@rettichschnidi rettichschnidi Jan 2, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not new feature, but bugfix (DateTime and DateTimeOffset are already supported, however they expects that everything incoming is in UTC).

The level of support currently differs:

  • DateTime: Converted to a Python datetime object with fake/guessed UTC timezone
  • DateTimeOffset: Converted to Python string. Works well enough to fetch from and send to services, but is not convenient when one wants to perform work on it. This PR converts to/from aware Python datetime objects.

Pls change from "Added" to "Fixed" section, as we are using https://keepachangelog.com/en/1.0.0/

Will do.

Copy link
Contributor

@phanak-sap phanak-sap Jan 2, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Edm.DateTime - while odata.org has is just datetime’yyyy-mm-ddThh:mm[:ss[.fffffff]]’ - and SAP Gateway returns this in UTC by design, so it is understandable that pyodata started from there - the MS ODATA specification for odata has optional timezone as well for this. That's why I thought that what you did for Edm.DateTimeOffset type should be applied to Edm.DateTime as well - and apply UTC as the default, if no information about timezone is received.

dateTimeLiteral =   year "-" 
                    month "-" 
                    day "T"
                    hour ":" 
                    minute 
                    [":" second ["." nanoSeconds]] 
                    [timeZone]

https://docs.microsoft.com/en-us/openspecs/windows_protocols/ms-odata/61934eae-311a-4af4-b8f8-82c112248651

Copy link
Contributor Author

@rettichschnidi rettichschnidi Jan 3, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

> Edm.DateTime - while odata.org has is just datetime’yyyy-mm-ddThh:mm[:ss[.fffffff]]’ - and SAP Gateway returns this in UTC by design

It will never include any timezone information, correct?

If so: Since Pythons datetime objects can be naive (explicitly not knowing anything about timezone) or aware (having some kind of timezone information), what about working with this to keep track whether a value is timezone aware or not instead of adding a fake UTC timezone?

the MS ODATA specification for odata has optional timezone as well for this.

This refers to version 3.0 of OData. I think it should not be used as reference for OData 2.0.

That's why I thought that what you did for Edm.DateTimeOffset type should be applied to Edm.DateTime as well - and apply UTC as the default, if no infomration about timezone is received.

Changing Edm.DateTime and Edm.DateTimeOffset so that the only change will be that the first one defaults to UTZ and the later one will fail if the offset/timezone information is missing should be easy to be implemented when /consuming/ data.

However, when /sending/ an aware (possibly non-UTC) Python datetime object to a server (converting it to its literal form) however, I can see two options:

  1. Just expecting the server to deal with the timezone information. If there are implementations which can not handle it, hopefully someone will create a bug report.
  2. "Normalize" the time information to UTZ. Implies that receiving entities, and then using them (unmodified) to "update" data on the server strips the previously existing timezone information. Feels wrong.

I think both attempts are not sound and Edm.DateTime should be strictly agnostic of timezones, while Edm.DateTimeOffset requires it.

In case the SAP Gateway never adds a timezone information to Edm.DateTime, this approach should work well with it.

I will come up with an implementation for this. The current state can still be found here.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK. I understand this long crossed comment as that the request for covering timezone support for Edm.DateTime is accepted. If it is true, thanks for that. I will leave this PR open.

I would like only to address the odata v3 part in the comment - yes, the MS documentation is quite a chaotic one. They version the document itself, not by the protocol version. The OASIS is clearer for understanding. However, IMHO the document is essentialy "everything is applied to both odata v2 and odata v3" - and things that are relevant to one protocol version only are listed in the section 1.7.1 and 1.7.2 - https://docs.microsoft.com/en-us/openspecs/windows_protocols/ms-odata/31a103dd-acaa-4e3d-a637-2d5de28a20a1

tests/test_model_v2.py Outdated Show resolved Hide resolved
assert testdate.microsecond == 0
assert testdate.tzinfo == timezone.utc

# UTC lowercase
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

where us the lowercase 'z' test case coming from? XML schema defines only 'Z' - see section 3.2.7.3 Timezones, cite:

The lexical representation of a timezone is a string of the form: (('+' | '-') hh ':' mm) | 'Z',

On the other hand, for the UTC timezone, two other offset test cases are also valid input that should end to same result: +00:00 and -00:00

Copy link
Contributor Author

@rettichschnidi rettichschnidi Jan 2, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Error on my side - while datetime is case-insensitive as per spec, this is not the case for the timezone part.

Will add suggested tests.

Copy link
Contributor

@phanak-sap phanak-sap Jan 2, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One other funny test cases... add something on the dateline or even across - +/- 12h timezone exactly or +14 (Kiribati)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@phanak-sap phanak-sap changed the title model: implement handling for Edm.DateTimeOffset model: fix timezone handling for Edm.DateTimeOffset Dec 22, 2021
@phanak-sap phanak-sap added the bug Something isn't working label Dec 22, 2021
@rettichschnidi
Copy link
Contributor Author

rettichschnidi commented Jan 2, 2022

Just pushed with the requested changes I am clear about.

CHANGELOG.md Show resolved Hide resolved
# https://stackoverflow.com/questions/36179914/timestamp-out-of-range-for-platform-localtime-gmtime-function
return datetime.datetime(1970, 1, 1, tzinfo=tzinfo) + datetime.timedelta(milliseconds=int(value))
except (ValueError, OverflowError):
if FIX_SCREWED_UP_MINIMAL_DATETIME_VALUE and int(value) < -62135596800000:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The -62135596800000 and 253402300799999 are obviously the "magic numbers", that repeats in if statement and the log error message as well. Local constant in the method would be nicer.

you see the statement yourself:
if FIX_SCREWED_UP_MAXIMUM_DATETIME_VALUE and int(value) > 253402300799999:

f' or `FIX_SCREWED_UP_MAXIMUM_DATETIME_VALUE` as a workaround.')


def parse_datetime_literal(value):
Copy link
Contributor

@phanak-sap phanak-sap Jan 2, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. For future generations, this should have short docstring why these three formats were chosen. I did not check to be fair, but I am guessing all statements and possible code path are covered by new tests, right?

  2. This looks to me like ideal example for usage pytest parametrized tests, with throwing bunch of valid inputs and invalid inputs to two tests, with difference in expected result. The collections for valid a invalid input string can be in conftest if too big to have the test readable.

This is idea how to possibly enhance the PR, not a blocker to merging (will convert to issue for myself if not done)

Copy link
Contributor Author

@rettichschnidi rettichschnidi Jan 4, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. For future generations, this should have short docstring why these three formats were chosen.

Will do

I did not check to be fair, but I am guessing all statements and possible code path are covered by new tests, right?

It gets tested implicitly (but fully) by test_traits_datetime(). test_traits_datetimeoffset_from_literal() only uses the first two forms.

2. This looks to me like ideal example for usage [pytest parametrized test](https://docs.pytest.org/en/6.2.x/parametrize.html)s, with throwing bunch of valid inputs and invalid inputs to two tests, with difference in expected result. The collections for valid a invalid input string can be in conftest if too big to have the test readable.

Will add some.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great, thank you for that.

- /Date(1000+0030)/ (As DateTime, but with a 30 minutes timezone offset)
Example 1: datetimeoffset'1970-01-01T00:00:01-00:60'
- /Date(1000-0030)/ (As DateTime, but with a negative 60 minutes timezone offset)
https://blogs.sap.com/2017/01/05/date-and-time-in-sap-gateway-foundation/
Copy link
Contributor

@phanak-sap phanak-sap Jan 2, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

blog SAP is sadly not the greatest reference. I used links to odata spec in this issue and in #186 - in fact, we found that what the SAP Gateway do, is kind of "odata flavor" and should not be considered same in behavior as plain, vanilla odata specification.

Copy link
Contributor Author

@rettichschnidi rettichschnidi Jan 4, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given this is a SAP project, changes should remain compatible to the SAP flavor nevertheless?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

to clear the possibility of misunderstanding:

Depends on what exactly you mean by "compatible to the SAP flavor ". Compatible with SAP Gateway, if it produces something different that exact odata specification? Yes. (I know about added annotations for UI). Use the SAP Gateway as the base for everything - since so far changes are in the model.py - no.

Even that it started from within the SAP (so started de-facto with premise of equivalence that SAP Gateway == OData) and is obviously used in SAP, the pyodata as generic python package should be ideally covering odata specification in its vanilla form - and add different vendor flavors support where they differs. Our "sister JS library" (which was created from scratch basically as port of this to JS) has this handled better, in more generic way, but so far there was no other need than vanilla oasis and sap flavor, so we are good :).

There is is something in SAP vendor module, so far handling specific gateway errors. Some SAP specific things from model.py should end there, e.g.

# TODO: create a class SAP attributes

Did you find any difference in behavior of timezones that would be SAP specific - and not applicaple to "odata model in general" in model.py? AFAIK, all your changes so far should be odata-generic, without any need for SAP special handling (i.e. SAP adhers to DateTime and DateTimeOffset odata specification).

actual line comment relevant:

Everything previous is little tangential, I only commented that I would use the link to odata specification (as scattered in comments in this PR) instead of SAP blog post, that's all.

pyodata/v2/model.py Show resolved Hide resolved
offset_in_minutes = int(matches.group('offset_in_minutes'))
except (ValueError, AttributeError):
raise PyODataModelError(
f"Malformed value {value} for primitive Edm type. Expected format is /Date(datetime±offset_in_min)/")
Copy link
Contributor

@phanak-sap phanak-sap Jan 2, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"for primitive Edm type" -> "for Edm.DateTimeOffset". We know where we are for the log message.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

will fix


try:
# Note: parse_datetime_literal raises a PyODataModelError exception on invalid formats
if re.match(r'\d{4}-\d{2}-\d{2}T\d{2}:\d{2}:\d{2}Z', value, flags=re.ASCII | re.IGNORECASE):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will copypaste this... I am just asking.. regexp also the only option?

Copy link
Contributor Author

@rettichschnidi rettichschnidi Jan 4, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Dismissed alternatives:

  • fromisoformat - but this needs Python 3.7+. Ubuntu 18.04 will be EOL soon - any chance you will drop support 3.6?
  • python-dateutil - would be a new dependency

Copy link
Contributor

@phanak-sap phanak-sap Jan 4, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for explanation of the thoughts behind this line.

Actually, since you asked.. yes, I have intention in near future drop support for Python 3..6, since it is EOL, will not receive any security fixes - and pyodata is a type of library that most probably will be used in more fresh environments, simply since you want the latest requests library as well, I guess.

So far there was no actual need for code change that would be Py3.6 incompatible - so I wanted to mark 3.6 python build as possible to fail, but still monitor them. But 3.6 backward-compatibility really does not need to be enforced.

I would go in a way that we create an spinoff issue for the fromisoformat refactoring in the future (the more standard library the better IMHO, we need only LXML and request-like lib so far, so no new dependancy), that would depend on actual drop of Py3.6 support in pyodata - I created issue #190 for that. In this PR, let's stay on what you got already.

@filak-sap
Copy link
Contributor

Good job! I like this PR. What's the blocker? Why can't we merge it?

@phanak-sap
Copy link
Contributor

phanak-sap commented Jan 4, 2022

Good job! I like this PR. What's the blocker? Why can't we merge it?

  • No real, big blocker.
  • There are still small fixes yet to be done from this code review, but they could be in another, small PR.
  • We agreed that timezone supports needs to be done (for consistency) for Edm.DateTime type as well. So far it seems that @rettichschnidi want to do it in same branch/PR (has created two other already for different issues), but it can be of course merged as is, and add support for timezones in Edm.DateTime in additional PR.
  • I also think that this can be merged to master, I simply leaved it still open for day or two and wait for answer from @rettichschnidi, how it is the easiest for him to continue; since this PR converged to this state with still one commit and git commit --amend + push --force.

@rettichschnidi
Copy link
Contributor Author

I will update my PR with all the improvements we have discussed. Please do not yet merge.

Convert Edm.DateTimeOffset entities to Python datetime objects with a
timezone attached, instead of simply passing them around as plain
string.
Edm.DateTime is underspecified and its timezone handling, if existing at
all, up to interpretation.

The current implementation allows creating Edm.DateTime properties,
using a datetime object with an arbitrary timezone. Before sending the
containing entity to the server, pyodata converts the Python datetime
object to a UTC based representation. Therefore, when fetching the
entity later on, its Edm.DateTime property will be UTC based, with the
original timezone information being missing.

This kind of information loss might surprise users of this library!

Disallowing the creation of non-UTC datetime properties forces the user
to convert manually to UTC, therefore raising awareness of the
limitations of Edm.DateTime and encourage the usage of
Edm.DateTimeOffset.

For compatibility with (maybe existing) implementations that interpret
the specs differently and add offsets to Edm.DateTime, pyodata now
supports such values when being received from an OData server, converts
them to UTC.
@rettichschnidi
Copy link
Contributor Author

Just updated with (hopefully) all changes incorporated.

Warnings/Questions:

  • EdmDateTimeTypTraits:
    • I am not sure if and when the checks for None are actually needed in from_json() and from_literal(). As far as I can see, None is already checked for in the respective functions in VariableDeclaration.
    • to_json() seems to accept strings for compatibility reasons (which I do not understand) - would this be needed for this for Emd.DateTimeOffset too?
  • I added a 2nd commit which implements timezone handling for Emd.DateTimeOffset. This is what I tried to describe in the crossed out text above. My reasons for doing it the way I did are stated in the commit message. I do realize that this is opinionated and am OK with creating a separate PR for it.

@codecov-commenter
Copy link

codecov-commenter commented Jan 12, 2022

Codecov Report

Attention: Patch coverage is 92.10526% with 6 lines in your changes missing coverage. Please review.

Project coverage is 92.66%. Comparing base (674fab0) to head (6cc26cb).
Report is 68 commits behind head on master.

Files with missing lines Patch % Lines
pyodata/v2/model.py 92.10% 6 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##           master     #184      +/-   ##
==========================================
- Coverage   92.74%   92.66%   -0.09%     
==========================================
  Files           6        6              
  Lines        2717     2768      +51     
==========================================
+ Hits         2520     2565      +45     
- Misses        197      203       +6     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@phanak-sap
Copy link
Contributor

Looks very good. Thanks for all the work.

@phanak-sap phanak-sap merged commit ec56995 into SAP:master Jan 12, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants