-
Notifications
You must be signed in to change notification settings - Fork 82
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
isoformat for datetime objects in hdf5 #641
Conversation
This is still in WIP state but read/write to/from hdf5 is working. There is also a WIP:commit which adds some comments and will be dropped before merge. [ ] unit tests @t-b Can you take a look if this is going in the right direction? |
@ukos-git Looks good from the general direction. I've added some minor comments.
From reading the code I would have expected to see timezones for both before writing and both after reading, and of course no error. |
@ukos-git The NWB file is still missing the timezone info.
file_create_date has it. In addition the separator is See
(Output created with h5dump). |
ok there is something fundamentally wrong here. This should return datetime objects. Thank you for finding this.
From what I learnt about the pynwb structure: The builder object gets "constructed" on "read" and "written" on write. So there is indeed a split into the two interfaces. The container is then built from the builder by writing the read objects as docvals to the container class. For converting the str datetime objects from the builder to the docvals there are the two override functions in the NWBFileMap. Therefore, In the end the container class is the same interface for read and write. The timezone is then added within the container class. It is the only class that gets real datetime objects. I think a graph which shows the dependencies for write/read interactions with the main classes and subclasses as well as thier interfaces would be highly needed. |
If you can do that in a finite and small amount of time. Go for it! |
I can not reproduce the error. output on python 2.7, 3.6 and 3.7 is the following: $ python docs/gallery/domain/icephys.py
/home/matthias/pynwb/local/lib/python2.7/site-packages/h5py/__init__.py:36: FutureWarning: Conversion of the second argument of issubdtype from `float` to `np.floating` is deprecated. In future, it will be treated as `np.float64 == np.dtype(float).type`.
from ._conv import register_converters as _register_converters
before write
tzlocal()
2018-10-05T15:18:21.445020+02:00
tzlocal()
2018-10-05T15:18:21.446525+02:00
after read
tzoffset(None, 7200)
2018-10-05T15:18:21.445020+02:00
tzoffset(None, 7200)
2018-10-05T15:18:21.446525+02:00 Maybe the problem is related to Anaconda2. Maybe anaconda uses an old pynwb lib? Debugging the traceback with $ python -m pudb docs/gallery/domain/icephys.py @t-b Should I test again on Windows or can you try? |
I have added a function for adding timezone information. This way it is easier to add the warning in a uniforma way. I'm not sure why everything is acting differently on Windows. This is probably connected with the other error you get with your modified icephys file. I will install python for Windows for testing. I wish there was a way for holding windows10 in a docker file :-) my h5dump by the way looks like this: $ h5dump --dataset=/file_create_date --dataset=/session_start_time icephys_example.nwb
HDF5 "icephys_example.nwb" {
DATASET "/file_create_date" {
DATATYPE H5T_STRING {
STRSIZE H5T_VARIABLE;
STRPAD H5T_STR_NULLTERM;
CSET H5T_CSET_UTF8;
CTYPE H5T_C_S1;
}
DATASPACE SIMPLE { ( 1 ) / ( 1 ) }
DATA {
(0): "2018-10-08T15:05:00.155631+02:00"
}
}
DATASET "/session_start_time" {
DATATYPE H5T_STRING {
STRSIZE H5T_VARIABLE;
STRPAD H5T_STR_NULLTERM;
CSET H5T_CSET_UTF8;
CTYPE H5T_C_S1;
}
DATASPACE SCALAR
DATA {
(0): "2018-10-08T15:05:00.152845+02:00"
}
}
} |
How should we deal with this test case error? ======================================================================
FAIL: test_build (ui_write.test_nwbfile.TestNWBFileIO)
----------------------------------------------------------------------
Traceback (most recent call last):
File "/home/circleci/project/tests/integration/ui_write/base.py", line 49, in test_build
self.assertDictEqual(result, self.builder)
AssertionError:
- 'file_create_date': {'attributes': {}, 'data': ['2017-04-15T12:00:00+00:00']},
? ^ ------
+ 'file_create_date': {'attributes': {}, 'data': ['2017-04-15 12:00:00']},
? |
I can don that. Or even better, let CI handle that. |
Codecov Report
@@ Coverage Diff @@
## dev #641 +/- ##
==========================================
- Coverage 75.21% 75.14% -0.07%
==========================================
Files 58 58
Lines 6702 6720 +18
Branches 1365 1367 +2
==========================================
+ Hits 5041 5050 +9
- Misses 1270 1275 +5
- Partials 391 395 +4
Continue to review full report at Codecov.
|
@t-b circleci is not passing and I don't know why. |
@ukos-git I've no clue. Does the tests pass locally with at least one python version? If yes, let's pretend it has nothing to do with this PR and let's finish up with PR. |
@dorukozturk could you have a look at the CircleCI error in this PR? |
Specification for `file_create_date` and `session_start_time` are updated to only accept iso8601 date format. since hdf5 does not accept np.datetime64 and datetime64 is time-zone-naive, a new dtype `isodatetime` is introduced that transformes datetime objects to string representation on write. on read, the string has to be transformed back. The builder representation has the bare hdf5object. The Main File Container Class now only accepts datetime objects. Therefore, the transformation between string and datetime is done in the NWBFileMap definition as an override function. The File Class adds the local timezone object if no object was given as input. date conversion from datetime to iso string using datetime.isoformat() conversion back from iso string to datetime is done using the parse function fo the dateutils library. `datetime.fromisoformat` would be better here but is only available in python3.7 and is missing in python2.7 (2018-10-03) description of session_start_time and file_create_date was updated to allow UTC timezone ("Z") and an accuracy up to milliseconds (note that np.datetime64 uses nanoseconds) unit tests with string inputs were updated to get datetime objects. a function is added for converting datetime objects with missing timezones. A warning is raised when a datetime object is used with missing timezone information.
currently, a missing timezone throws a warning to stderr. All unit tests therefore have to be initiated with a valid timezone objects. This is also good practice for the documentation.
simple unit test for the container creation with lists in datetime object.
now waiting for #661 |
On a separate note, when implementing changes that affect other parts of our ecosystem, it will be useful if you could create corresponding issue tickets so that folks are aware of the changes. In this case, the addition of the isodatetime dtype in the schema language affects both MatNWB and the nwb-schema docs. I just created the following two issues for this: NeurodataWithoutBorders/nwb-schema#195 |
@oruebel Thanks for the explanation and creating of the issues. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks in general good to me.
- Date accuracy is up to milliseconds. | ||
- The file can be created after the experiment was run, so this may | ||
differ from the experiment start time. | ||
- Each modification to the nwb file adds a new entry to the array.' | ||
dims: | ||
- '*unlimited*' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this should be something like:
dims:
- timestamps
if not isinstance(date, datetime): | ||
raise ValueError("require datetime object") | ||
if date.tzinfo is None: | ||
warn("Date is missing timezone information. Updating to local timezone.") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it would be good to say in the warning text what the local timezone.
Issue
NeurodataWithoutBorders/nwb-schema#50 (comment)
Motivation
Previously, all date formats were allowed in NWB for storing datetime objects. This results in incompatibility, making the program more complicated and making magic functions like
dateutil.parse
necessary.ISO 8601 date strings are a defined way for storing datetime object information as readable strings as UTC with timezone offset.
They are independent from dtype specifications of underlying modules like hdf5 or numpy which only accepts timezone naive utc formats. DateTime naive is not wanted in neuro sciences as specific information about the local time (is it morning or evening) is missing.
How to test the behavior?
input
datetime.now()
(tz naive) to NWBFile Container class and write to hdf5. The time datasets get written as iso string. Loading a hdf5 file will result in correct datetime object in the Container class.Simple testing can be performed by executing
docs/gallery/domain/ecephys.py
.Checklist
flake8
from the source directory.#XXX
notation whereXXX
is the issue number ?