Skip to content

[AIRFLOW-5664] Store timestamps with microseconds precision in GCSToPSQL#6354

Merged
turbaszek merged 1 commit intoapache:masterfrom
osule:5664-full-timestamp-resolution
Mar 16, 2020
Merged

[AIRFLOW-5664] Store timestamps with microseconds precision in GCSToPSQL#6354
turbaszek merged 1 commit intoapache:masterfrom
osule:5664-full-timestamp-resolution

Conversation

@osule
Copy link
Contributor

@osule osule commented Oct 16, 2019

Make sure you have checked all steps below.

Jira

  • My PR addresses the following Airflow Jira issues and references them in the PR title. For example, "[AIRFLOW-XXX] My Airflow PR"
    • https://issues.apache.org/jira/browse/AIRFLOW-5664
    • In case you are fixing a typo in the documentation you can prepend your commit with [AIRFLOW-XXX], code changes always need a Jira issue.
    • In case you are proposing a fundamental code change, you need to create an Airflow Improvement Proposal (AIP).
    • In case you are adding a dependency, check if the license complies with the ASF 3rd Party License Policy.

Description

  • Here are some details about my PR, including screenshots of any UI changes:

Microseconds value is lost in the conversion to timestamp using time.mktime.
Timestamp is now computed to be precise up to microseconds.

Tests

  • My PR adds the following unit tests OR does not need testing for this extremely good reason: Synonymous with testing pendulum fluent helpers.

Commits

  • My commits all reference Jira issues in their subject lines, and I have squashed multiple commits if they address the same issue. In addition, my commits follow the guidelines from "How to write a good git commit message":
    1. Subject is separated from body by a blank line
    2. Subject is limited to 50 characters (not including Jira issue reference)
    3. Subject does not end with a period
    4. Subject uses the imperative mood ("add", not "adding")
    5. Body wraps at 72 characters
    6. Body explains "what" and "why", not "how"

Documentation

  • In case of new functionality, my PR adds documentation that describes how to use it.
    • All the public functions and the classes in the PR contain docstrings that explain what it does
    • If you implement backwards incompatible changes, please leave a note in the Updating.md so we can assign it to a appropriate release

@mik-laj mik-laj changed the title [AIRFLOW-5664] Store timestamps with microseconds precision [AIRFLOW-5664] Store timestamps with microseconds precision in GCSToPSQL Nov 21, 2019
@mik-laj mik-laj added the provider:google Google (including GCP) related issues label Nov 21, 2019
@stale
Copy link

stale bot commented Jan 5, 2020

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

@stale stale bot added the stale Stale PRs per the .github/workflows/stale.yml policy file label Jan 5, 2020
Copy link
Contributor

@RosterIn RosterIn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The title of PR and Jira says GCSToPSQL but you suggest edit to postgres_to_gcs ?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why UTC?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@RosterIn At the time I wrote this, I understood that the ISO-8601 string representation for value could have a timezone offset and thought extra handling necessary to convert the parsed datetime in the local time.

For a fact, this extra conversion isn't necessary to perform. I'll remove it.

@stale stale bot removed the stale Stale PRs per the .github/workflows/stale.yml policy file label Jan 6, 2020
@osule osule force-pushed the 5664-full-timestamp-resolution branch from 00b2a1d to 0173d41 Compare January 6, 2020 16:52
@osule
Copy link
Contributor Author

osule commented Jan 6, 2020

The title of PR and Jira says GCSToPSQL but you suggest edit to postgres_to_gcs ?

@Rosterln I could be wrong. I think it's mistitled. The description for the Jira ticket refers to the postgres_to_gcs operator.

@codecov-io
Copy link

codecov-io commented Jan 6, 2020

Codecov Report

Merging #6354 into master will decrease coverage by 5.35%.
The diff coverage is 50%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #6354      +/-   ##
==========================================
- Coverage   86.66%   81.31%   -5.36%     
==========================================
  Files         893      897       +4     
  Lines       42242    46595    +4353     
==========================================
+ Hits        36610    37887    +1277     
- Misses       5632     8708    +3076
Impacted Files Coverage Δ
...roviders/google/cloud/operators/postgres_to_gcs.py 85.71% <50%> (+0.42%) ⬆️
airflow/kubernetes/volume_mount.py 44.44% <0%> (-55.56%) ⬇️
airflow/kubernetes/volume.py 52.94% <0%> (-47.06%) ⬇️
airflow/kubernetes/pod_launcher.py 47.18% <0%> (-45.08%) ⬇️
airflow/providers/google/cloud/hooks/dlp.py 64.06% <0%> (-34.63%) ⬇️
airflow/providers/amazon/aws/hooks/s3.py 66.27% <0%> (-29.58%) ⬇️
airflow/providers/google/cloud/operators/dlp.py 70.97% <0%> (-29.03%) ⬇️
airflow/jobs/scheduler_job.py 63.09% <0%> (-26.58%) ⬇️
...viders/cncf/kubernetes/operators/kubernetes_pod.py 69.69% <0%> (-25.26%) ⬇️
airflow/jobs/backfill_job.py 67.51% <0%> (-24.6%) ⬇️
... and 36 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update b4ce8f2...7e7f7e7. Read the comment docs.

@RosterIn
Copy link
Contributor

Looks good

@turbaszek
Copy link
Member

@osule can you please, rebase onto new master?

@osule osule force-pushed the 5664-full-timestamp-resolution branch from 0173d41 to 2caa205 Compare January 27, 2020 16:36
@osule
Copy link
Contributor Author

osule commented Jan 27, 2020

@osule can you please, rebase onto new master?

Sure, I'll resolve the conflicts and push a changed commit.

@osule osule force-pushed the 5664-full-timestamp-resolution branch 3 times, most recently from b940ac3 to 7514095 Compare January 27, 2020 16:58
@osule osule force-pushed the 5664-full-timestamp-resolution branch from 7514095 to 2cc0d05 Compare February 24, 2020 18:49
Microseconds value is lost in the conversion to timestamp
using time.mktime.

Timestamp is now computed to be precise up to microseconds.
@osule osule force-pushed the 5664-full-timestamp-resolution branch from 2cc0d05 to 7e7f7e7 Compare February 24, 2020 18:50
@RosterIn
Copy link
Contributor

@nuclearpinguin can this PR get merged?

@turbaszek turbaszek merged commit 51161db into apache:master Mar 16, 2020
@ashb
Copy link
Member

ashb commented Mar 16, 2020

@nuclearpinguin Commit messages are what we use to build our changelog; can you please make sure that they are descriptive for end users to usefully identify the change. For example in this case this commit subject does not identify where this change is -- it should have included "PostgresToGCSOperator" or similar in the subject.

"""
if isinstance(value, (datetime.datetime, datetime.date)):
return time.mktime(value.timetuple())
return pendulum.parse(value.isoformat()).float_timestamp
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Holy in-efficiency batman!

We're doing from a datetime, to a string (a bit slow) parsing it back to a datetime-like object (VERY slow) to then call an attribute on it.

This should be

             return value.timestamp()

Also the comment should be clarified. "Times are converted to fractional-seconds"

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're correct that this is inefficient.
However, value.timestamp doesn't take timezone information into account.

Now that I think of it, it should have been written as or some other way

return pendulum.timezone('UTC').convert(value).timestamp()

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ashb WDYT? Should the commit be reverted and new one added?

@turbaszek
Copy link
Member

@nuclearpinguin Commit messages are what we use to build our changelog; can you please make sure that they are descriptive for end users to usefully identify the change. For example in this case this commit subject does not identify where this change is -- it should have included "PostgresToGCSOperator" or similar in the subject.

Sorry, next time I will check if the commit message is the same as the PR title.

turbaszek added a commit that referenced this pull request Mar 16, 2020
@osule
Copy link
Contributor Author

osule commented Mar 17, 2020

@nuclearpinguin Should I open a new PR that addresses concern from the discussion?

Or will you handle it from here?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

provider:google Google (including GCP) related issues

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants