-
Notifications
You must be signed in to change notification settings - Fork 13
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Remove pandas-gbq from testing #31
Conversation
@@ -19,6 +20,7 @@ def df(): | |||
{ | |||
"name": random.choice(["fred", "wilma", "barney", "betty"]), | |||
"number": random.randint(0, 100), | |||
"timestamp": datetime.now(timezone.utc) - timedelta(days=i % 2), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Before adding timezone.utc
I was getting this assertion error:
E AssertionError: Attributes of DataFrame.iloc[:, 2] (column name="timestamp") are different
E
E Attribute "dtype" are different
E [left]: datetime64[ns, UTC]
E [right]: datetime64[ns]
It seems like when reading back from bigquery, it will automatically convert to utc if not otherwise specified, causing the error.
@tswast can you confirm this is the case? any comments?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TIMESTAMP
columns are intended to come back as datetime64[ns, UTC]
, yes.
DATETIME
should come back as datetime64[ns]
.
See my answer here on the difference between the two: https://stackoverflow.com/a/47724366/101923
Also note: both will come back as object
dtype if there's a date outside of the pandas representable range, e.g. 0001-01-01 or 9999-12-31.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm actually working on making the pandas-gbq dtypes consistent with google-cloud-bigquery as we speak in googleapis/python-bigquery-pandas#444
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems that if I don't provide a schema, bigquery will infer that the dataframe column named "timestamp"
is a TIMESTAMP
column therefore it's converting it is coming back as datetime64[ns, UTC]
. That been said to keep the test simple I think we can have the local dataframe to be timezone aware and test that it comes back as it should.
cc: @jrbourbeau Does this convince you? If so this PR is ready for review.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds good 👍
It looks like when on macOS it can't find |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for fixing @ncclementi and reviewing @tswast! This is in
Forgot to mention the macOS CI failure. This looks like a totally unrelated packaging issue (we're seeing similar things over in Dask's CI). I'm not currently able to reproduce locally -- let me rerun CI to see if the issue has already been resolved |
Hmm unfortunately the macOS environment issue is still around. I'm highly confident this is unrelated to the changes in this PR (see similar things being reported in |
Replace use of pandas-gbq for pure Bigquery