[AIRFLOW-5664] Store timestamps with microseconds precision in GCSToPSQL#6354
Conversation
|
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions. |
RosterIn
left a comment
There was a problem hiding this comment.
The title of PR and Jira says GCSToPSQL but you suggest edit to postgres_to_gcs ?
airflow/operators/postgres_to_gcs.py
Outdated
There was a problem hiding this comment.
@RosterIn At the time I wrote this, I understood that the ISO-8601 string representation for value could have a timezone offset and thought extra handling necessary to convert the parsed datetime in the local time.
For a fact, this extra conversion isn't necessary to perform. I'll remove it.
00b2a1d to
0173d41
Compare
@Rosterln I could be wrong. I think it's mistitled. The description for the Jira ticket refers to the |
Codecov Report
@@ Coverage Diff @@
## master #6354 +/- ##
==========================================
- Coverage 86.66% 81.31% -5.36%
==========================================
Files 893 897 +4
Lines 42242 46595 +4353
==========================================
+ Hits 36610 37887 +1277
- Misses 5632 8708 +3076
Continue to review full report at Codecov.
|
|
Looks good |
|
@osule can you please, rebase onto new master? |
0173d41 to
2caa205
Compare
Sure, I'll resolve the conflicts and push a changed commit. |
b940ac3 to
7514095
Compare
7514095 to
2cc0d05
Compare
Microseconds value is lost in the conversion to timestamp using time.mktime. Timestamp is now computed to be precise up to microseconds.
2cc0d05 to
7e7f7e7
Compare
|
@nuclearpinguin can this PR get merged? |
|
@nuclearpinguin Commit messages are what we use to build our changelog; can you please make sure that they are descriptive for end users to usefully identify the change. For example in this case this commit subject does not identify where this change is -- it should have included "PostgresToGCSOperator" or similar in the subject. |
| """ | ||
| if isinstance(value, (datetime.datetime, datetime.date)): | ||
| return time.mktime(value.timetuple()) | ||
| return pendulum.parse(value.isoformat()).float_timestamp |
There was a problem hiding this comment.
Holy in-efficiency batman!
We're doing from a datetime, to a string (a bit slow) parsing it back to a datetime-like object (VERY slow) to then call an attribute on it.
This should be
return value.timestamp()
Also the comment should be clarified. "Times are converted to fractional-seconds"
There was a problem hiding this comment.
You're correct that this is inefficient.
However, value.timestamp doesn't take timezone information into account.
Now that I think of it, it should have been written as or some other way
return pendulum.timezone('UTC').convert(value).timestamp()
There was a problem hiding this comment.
@ashb WDYT? Should the commit be reverted and new one added?
Sorry, next time I will check if the commit message is the same as the PR title. |
|
@nuclearpinguin Should I open a new PR that addresses concern from the discussion? Or will you handle it from here? |
Make sure you have checked all steps below.
Jira
For example, "[AIRFLOW-XXX] My Airflow PR"In case you are fixing a typo in the documentation you can prepend your commit with [AIRFLOW-XXX], code changes always need a Jira issue.In case you are proposing a fundamental code change, you need to create an Airflow Improvement Proposal (AIP).In case you are adding a dependency, check if the license complies with the ASF 3rd Party License Policy.Description
including screenshots of any UI changes:Tests
Commits
DocumentationIn case of new functionality, my PR adds documentation that describes how to use it.All the public functions and the classes in the PR contain docstrings that explain what it doesIf you implement backwards incompatible changes, please leave a note in the Updating.md so we can assign it to a appropriate release