Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Datetimes on plot are always treated as local time and shifted to UTC #5499

Closed
mwerezak opened this Issue Dec 1, 2016 · 20 comments

Comments

Projects
None yet
5 participants
@mwerezak
Copy link

mwerezak commented Dec 1, 2016

Description

When displaying datetime.datetime values along the x-axis using x_axis_type = "datetime", the values are shifted if the system's timezone is not UTC.

For example, if you live in UTC-05:00, then all the dates are shifted ahead 5 hours (7am appears at noon).

This happens regardless of the tzinfo on the datetimes, including for naive datetimes where tzinfo is None.

Sample Code
from datetime import datetime
from bokeh.plotting import figure, show
from bokeh.models.sources import ColumnDataSource
from bokeh.models.tools import CrosshairTool, HoverTool

hours = range(24)
datetimes = [datetime.now().replace(hour=h, minute=0, second=0, tzinfo=None) for h in hours]

example = figure(
	x_axis_label = 'Date/Time',
	x_axis_type = 'datetime',
	y_axis_label = 'Expected Hour',
)
example.circle(
	x = 'x',
	y = 'y',
	fill_color = 'blue',
	size = 6,
	source = ColumnDataSource({
		'x': datetimes,
		'y': hours,
		'time_fmt': [dt.strftime("%H:%M") for dt in datetimes],
	}),
)
example.add_tools(CrosshairTool())
example.add_tools(HoverTool(
	tooltips = [
		("index", "$index"),
		("expected time", "@time_fmt"),
	],
))

show(example)
Expected Behaviour

I would have expected that naive datetimes are displayed as is.

Actual Behaviour

screenshot at 2016-12-01 12 53 01

System, Library, Browser, Version information

I am using Python 3.5.2 (64-bit) with bokeh 0.12.3 and Firefox 50.0 as well as Chrome 54.0.2840.100 (64-bit) on Ubuntu MATE 16.04.1 LTS and on CentOS release 6.8.

bokeh info (Ubuntu):

Python version      :  3.5.2 (default, Sep 10 2016, 08:21:44) 
IPython version     :  5.1.0
Bokeh version       :  0.12.3
BokehJS static path :  /usr/local/lib/python3.5/dist-packages/bokeh/server/static

bokeh info (CentOS):

Python version      :  3.5.2 (default, Nov 24 2016, 13:03:33) 
IPython version     :  Not installed
Bokeh version       :  0.12.3
BokehJS static path :  /usr/local/lib/python3.5/site-packages/bokeh/server/static
Other notes

You can work around this issue by setting your system timezone to UTC. This is not ideal though.

@mwerezak

This comment has been minimized.

Copy link
Author

mwerezak commented Dec 2, 2016

Does anyone with more insight into Bokeh/BokehJS's handling of datetimes than myself have any idea what might be responsible? From what I gather we should expect datetimes to be displayed as-is so am I right in guessing that Bokeh does not have any timezone handling implemented? In which case we could expect the issue to be on the client side or during serialization?

@bryevdv

This comment has been minimized.

Copy link
Member

bryevdv commented Dec 10, 2016

For reference, the BokehJS DatetimeTickFormatter code is here:

https://github.com/bokeh/bokeh/blob/master/bokehjs/src/coffee/models/formatters/datetime_tick_formatter.coffee

This makes use of the JS timezone library: https://www.npmjs.com/package/timezone

Maybe there is a misuse of tz there, or some upstream bug.

As a first experiment, can you print out the actual timestamp values from the column data source in the hover tool? Then round trip them to human readable. If they match what you passed in to start this will confirm the issues is purely on the JS side.

@mwerezak

This comment has been minimized.

Copy link
Author

mwerezak commented Dec 13, 2016

@bryevdv I can print out the actual timestamp in the hover tooltip however it appears as a UNIX timestamp and I am not sure what you mean by "round trip them to human readable."

untitled

@mwerezak

This comment has been minimized.

Copy link
Author

mwerezak commented Dec 21, 2016

@bryevdv could you please provide a code example of what you meant by, "round trip them to human readable"? Thanks.

@bryevdv

This comment has been minimized.

Copy link
Member

bryevdv commented Dec 21, 2016

@mwerezak sorry, What I meant was to compare the timestamps in the in the hover tool (from the column in the BokehJS data source) to the original python timestamps, and by "human readable" I just meant it might be easier to compare the values if you use one of the many datetime functions for turning timestamps into something more friendly for humans to understand. What I am trying to get at: are the timestamp in the data source correct (i.e. they match the python original data), and just the presentation/view of them is wrong? Or are the actual timestamps in the data source changed/incorrect?

@bryevdv

This comment has been minimized.

Copy link
Member

bryevdv commented Dec 21, 2016

Tho as a final note, timestamps in bokeh are stored as "ms since epoch" so there maybe be some factor of 10^3 or 10^6 to apply depending on what the original python data type was (e.g. pandas stores as ns since epoch iirc)

@mwerezak

This comment has been minimized.

Copy link
Author

mwerezak commented Dec 21, 2016

@bryevdv does this resemble what you were looking for at all?

source = ColumnDataSource({
	'x': datetimes,
	'y': hours,
	'time_fmt': [dt.strftime("%H:%M") for dt in datetimes],
	'original_timestamp': [dt.timestamp()*(10**3) for dt in datetimes],
}),

tooltips = [
	("index", "$index"),
	("H:M", "@time_fmt"),
	("original ts", "@original_timestamp{1.111111}"),
	("datetime ts", "@x{1.111111}"),
],

Result:
screenshot from 2016-12-21 11-28-55

I did notice a factor of 10^3 discrepancy at first, which is likely due to what you mentioned in your last comment. After applying the corrective factor, there does still seem to be a small difference, however it is very small, and certainly would not account for a 5 hour shift.

@bryevdv

This comment has been minimized.

Copy link
Member

bryevdv commented Dec 21, 2016

@mwerezak OK thanks, that is useful information. I believe then perhaps the issue is purely with our usage of timezone for formatting.

Just to complete things, can you confirm the axis tick value "16hr" does correspond to the correct hour in the original python data for that point?

@bryevdv bryevdv added this to the 0.12.5 milestone Dec 21, 2016

@mwerezak

This comment has been minimized.

Copy link
Author

mwerezak commented Dec 22, 2016

@bryevdv does this help? The string value of the H:M item is generated from the original python data.

screenshot from 2016-12-22 15-12-18

@timothywalsh

This comment has been minimized.

Copy link

timothywalsh commented Feb 14, 2017

@bryevdv, @mwerezak did you get a chance to confirm whether this bug is due to bokeh's usage of the timezone js library?

I just came across this issue myself, and I'm wondering whether there is any hack to work around it, or if I could possibly contribute to fixing it properly.

@bryevdv bryevdv modified the milestones: 0.12.6, 0.12.5 Mar 21, 2017

@gismo2004

This comment has been minimized.

Copy link

gismo2004 commented May 1, 2017

Any news on this? or a workaround?

@bryevdv

This comment has been minimized.

Copy link
Member

bryevdv commented May 11, 2017

Based on some observations in #6278 I no longer think this is a BokehJS or purely display issue. Rather, we convert datetimes like this:

time.mktime(d.timetuple()) * 1000. + d.microsecond / 1000.

But the help for mktime reads:

Convert a time tuple in local time to seconds since the Epoch.
Note that mktime(gmtime(0)) will not generally return zero for most
time zones; instead the returned value will either be equal to that
of the timezone or altzone attributes on the time module.

I believe this might be where unintended introduction of local time interpretation is inserted.

@bryevdv

This comment has been minimized.

Copy link
Member

bryevdv commented May 11, 2017

Well, maybe not, but there is something weird going on, and I don't understand it.

NumPy datetime64 are converted like this:

NP_EPOCH = np.datetime64(0, 'ms')
NP_MS_DELTA = np.timedelta64(1, 'ms')
d = np.datetime64("2016-05-11")
(d - NP_EPOCH) / NP_MS_DELTA

which yields 1462924800000.

Datetimes were converted like this:

d = dt.datetime(2016, 5, 11)
mktime(d.timetuple()) * 1000. + d.microsecond / 1000.

which yields 1462942800000.0 Different from NumPy version by 5 hrs

So I tried this instead, to avoid the use of mktime

d = dt.datetime(2016, 5, 11)
dtd.timestamp() * 1000. + dtd.microsecond / 1000.

But that ALSO gives 1462942800000.0 so STILL different from NumPy.

Which one of these answers should be considered correct? AFAIK the numpy value is UTC and I think that is what we intend.

FYI my local timezone is CST

Anyone have ideas here? This is not my area of expertise.

ping @canavandl @mattpap @bokeh/dev

@bryevdv

This comment has been minimized.

Copy link
Member

bryevdv commented May 11, 2017

OK, this matches the NumPy computation:

epoch = dt.datetime.utcfromtimestamp(0)
(d - epoch).total_seconds() * 1000 + d.microsecond / 1000.

That yields 1462924800000.0 which is the SAME as the datetime64 value above.

Question then: Is this the correct thing to do??

@canavandl

This comment has been minimized.

Copy link
Contributor

canavandl commented May 11, 2017

I believe so. I get the same, tz-independent value as you if I do (i'm in ART):

>> d = dt.datetime(2016, 5, 11)
>> epoch = dt.datetime.utcfromtimestamp(0)
>> (d - epoch).total_seconds() * 1000 + d.microsecond / 1000.
1462924800000.0

I get a different value using mktime.

@bryevdv

This comment has been minimized.

Copy link
Member

bryevdv commented May 11, 2017

OK so I am proposing this snippet for the convert_datetime_types function in the linked PR:

    # Datetime (datetime is a subclass of date)
    elif isinstance(obj, dt.datetime):
        return (obj - DT_EPOCH).total_seconds() * 1000. + obj.microsecond / 1000.

    # Date
    elif isinstance(obj, dt.date):
        return (dt.datetime(*obj.timetuple()[:6]) - DT_EPOCH).total_seconds() * 1000

    # NumPy datetime64
    elif isinstance(obj, np.datetime64):
        epoch_delta = obj - NP_EPOCH
        return (epoch_delta / NP_MS_DELTA)

With that implementation, all the following agree on my machine:

assert bus.convert_datetime_type(datetime.datetime(2016, 5, 11)) == 1462924800000.0
assert bus.convert_datetime_type(datetime.date(2016, 5, 11))     == 1462924800000.0
assert bus.convert_datetime_type(np.datetime64("2016-05-11"))    == 1462924800000.0
@bryevdv

This comment has been minimized.

Copy link
Member

bryevdv commented May 11, 2017

I just pushed this change to the linked PR. With this change, the original code now yields:
screen shot 2017-05-11 at 14 40 03

@mwerezak @gismo2004 any chance you can test the PR out by cloning and building bokeh locally?

@gismo2004

This comment has been minimized.

Copy link

gismo2004 commented May 13, 2017

Sorry, was too late, but works now :-)

@bryevdv

This comment has been minimized.

Copy link
Member

bryevdv commented May 13, 2017

Still good to hear, thanks!

@mwerezak

This comment has been minimized.

Copy link
Author

mwerezak commented May 17, 2017

Looks like gismo2004 beat me to it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.