Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix time encoding to use int64 and "nanoseconds since 1970-01-01 00:00:00Z" #1299

Conversation

ctuguinay
Copy link
Collaborator

@ctuguinay ctuguinay commented Apr 11, 2024

This PR addresses the not faithful time serialization warning from issue #1290.

@ctuguinay ctuguinay changed the title default to float64 as encode dtype Default to float64 as encoding dtype in _encode_dataarray Apr 11, 2024
@ctuguinay ctuguinay self-assigned this Apr 11, 2024
@ctuguinay ctuguinay added the bug Something isn't working label Apr 11, 2024
@codecov-commenter
Copy link

codecov-commenter commented Apr 11, 2024

Codecov Report

Attention: Patch coverage is 90.00000% with 1 lines in your changes are missing coverage. Please review.

Project coverage is 74.57%. Comparing base (9f56124) to head (900ae7d).
Report is 44 commits behind head on main.

Files Patch % Lines
echopype/utils/coding.py 80.00% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #1299      +/-   ##
==========================================
- Coverage   83.52%   74.57%   -8.95%     
==========================================
  Files          64       23      -41     
  Lines        5686     3182    -2504     
==========================================
- Hits         4749     2373    -2376     
+ Misses        937      809     -128     
Flag Coverage Δ
unittests 74.57% <90.00%> (-8.95%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@ctuguinay ctuguinay marked this pull request as ready for review April 11, 2024 03:32
@ctuguinay
Copy link
Collaborator Author

Explanation of the current warning (from slack conversation)

@leewujung

Here's the current order of events in the code that is causing this warning: (https://github.com/pydata/xarray/blob/1d43672574332615f225089d69f95a9f8d81d912/xarray/coding/times.py#L723C1-L803C32):

  1. The computation converts nanosecond resolution time to int64.

  2. Computes needed time time_delta (represents the time unit granularity based on the specified units), and needed_time_delta (represents the minimum unit granularity required for lossless integer encoding of the dates).

  3. (Optional) If time_delta > needed_time_delta, and we specify None dtype encoding we get the not faithful serialization message, since with integer bit 64 encoding, we will lose precision.

  4. The operation to finalize the encoding is then done in int64.

  5. (Optional) Then, if we actually specify dtype, we cast to dtype (if numpy deems safe).

Now, I am not quite sure at this point why float64 should silence this warning: The casting to float64 is a bit of a 'moot point', since the casting operation is done post-encoding computation, which we know will result in lossful integer encoding.
This is probably why None and float64 encoding result in the same values. The float64 dtype does practically nothing to change the encoded array, since it is done at the very last step. Additionally, the time_delta > needed_time_delta also is why we lose precision even when there is no warning (in the case where we set dtype set to float64).

Perhaps this is an issue I bring up in xarray? This still doesn't make sense to me, but I do think it'd be good to silence this error as soon as possible since it can be very disheartening seeing so many time serialization warnings on your first use of Echopype.

On our own end, do you think adding a warning in the logger for this would be a good idea? That way those users that have logger verbose turned on can still get the message.

leewujung
leewujung previously approved these changes Apr 17, 2024
Copy link
Member

@leewujung leewujung left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ctuguinay : Looks great! I am glad we got this resolved after much digging and trial and error!

I only have one comment: What do you think of changing the function name _encode_dataarray to _encode_time_dataarray? Since it is only used for set_time_encodings and the content does not make sense for other types of dataarray?

Otherwise I think this is good to go.

@leewujung leewujung changed the title Default to float64 as encoding dtype in _encode_dataarray Fix time encoding to use int64 and "nanoseconds since 1970-01-01 00:00:00Z" Apr 17, 2024
@ctuguinay
Copy link
Collaborator Author

@leewujung Made that change. This should be ready to merge now

@leewujung leewujung merged commit a6cc36e into OSOceanAcoustics:main Apr 17, 2024
5 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants