Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

isoformat for datetime objects in hdf5 #641

Merged
merged 6 commits into from
Oct 12, 2018
Merged

isoformat for datetime objects in hdf5 #641

merged 6 commits into from
Oct 12, 2018

Conversation

ukos-git
Copy link
Contributor

@ukos-git ukos-git commented Oct 3, 2018

Issue

NeurodataWithoutBorders/nwb-schema#50 (comment)

Motivation

Previously, all date formats were allowed in NWB for storing datetime objects. This results in incompatibility, making the program more complicated and making magic functions like dateutil.parse necessary.

ISO 8601 date strings are a defined way for storing datetime object information as readable strings as UTC with timezone offset.

They are independent from dtype specifications of underlying modules like hdf5 or numpy which only accepts timezone naive utc formats. DateTime naive is not wanted in neuro sciences as specific information about the local time (is it morning or evening) is missing.

How to test the behavior?

input datetime.now() (tz naive) to NWBFile Container class and write to hdf5. The time datasets get written as iso string. Loading a hdf5 file will result in correct datetime object in the Container class.

Simple testing can be performed by executing docs/gallery/domain/ecephys.py.

Checklist

  • Have you checked our Contributing document?
  • Have you ensured the PR description clearly describes problem and the solution?
  • Is your contribution compliant with our coding style ? This can be checked running flake8 from the source directory.
  • Have you checked to ensure that there aren't other open Pull Requests for the same change?
  • Have you included the relevant issue number using #XXX notation where XXX is the issue number ?

@ukos-git
Copy link
Contributor Author

ukos-git commented Oct 3, 2018

This is still in WIP state but read/write to/from hdf5 is working. There is also a WIP:commit which adds some comments and will be dropped before merge.

[ ] unit tests
[ ] stress test
[ ] maybe introduce a generic location for dtype conversions after reading from hdf5 (directly in the ObjectMapper or define a subclass for datetime representations).

@t-b Can you take a look if this is going in the right direction?

@ukos-git ukos-git mentioned this pull request Oct 3, 2018
5 tasks
@t-b t-b self-assigned this Oct 3, 2018
@t-b
Copy link
Collaborator

t-b commented Oct 4, 2018

@ukos-git Looks good from the general direction.

I've added some minor comments.
The testing commit I added gives for the icephys.py example the following output

$ python icephys.py
C:\Anaconda2\envs\py36\lib\site-packages\h5py\__init__.py:36: FutureWarning: Conversion of the second argument of issubdtype from `float` to
 `np.floating` is deprecated. In future, it will be treated as `np.float64 == np.dtype(float).type`.
  from ._conv import register_converters as _register_converters
before write
None
2018-10-04T19:08:05.285137
tzutc()
2018-10-04T17:08:05.288137+00:00
after read
None
2018-10-04T19:08:05.285137
Traceback (most recent call last):
  File "icephys.py", line 132, in <module>
    print(nwbfile.file_create_date[0].tzinfo)
AttributeError: 'str' object has no attribute 'tzinfo'
(py36)

From reading the code I would have expected to see timezones for both before writing and both after reading, and of course no error.

src/pynwb/data/nwb.file.yaml Outdated Show resolved Hide resolved
src/pynwb/data/nwb.file.yaml Outdated Show resolved Hide resolved
src/pynwb/file.py Show resolved Hide resolved
@t-b t-b assigned ukos-git and unassigned t-b Oct 4, 2018
@t-b
Copy link
Collaborator

t-b commented Oct 4, 2018

@ukos-git The NWB file is still missing the timezone info.

   DATASET "session_start_time" {
      DATATYPE  H5T_STRING {
         STRSIZE H5T_VARIABLE;
         STRPAD H5T_STR_NULLTERM;
         CSET H5T_CSET_UTF8;
         CTYPE H5T_C_S1;
      }
      DATASPACE  SCALAR
      DATA {
      (0): "2018-10-04 19:08:05.285137"
      }
   }

file_create_date has it. In addition the separator is instead of T for both.

See

  DATASET "file_create_date" {
     DATATYPE  H5T_STRING {
        STRSIZE H5T_VARIABLE;
        STRPAD H5T_STR_NULLTERM;
        CSET H5T_CSET_UTF8;
        CTYPE H5T_C_S1;
     }
     DATASPACE  SIMPLE { ( 1 ) / ( 1 ) }
     DATA {
     (0): "2018-10-04 17:08:05.288137+00:00"
     }
  }

(Output created with h5dump).

@ukos-git
Copy link
Contributor Author

ukos-git commented Oct 4, 2018

File "icephys.py", line 132, in
print(nwbfile.file_create_date[0].tzinfo)
AttributeError: 'str' object has no attribute 'tzinfo'

ok there is something fundamentally wrong here. This should return datetime objects. Thank you for finding this.

From reading the code I would have expected to see timezones for both before writing and both after reading, and of course no error.

From what I learnt about the pynwb structure:

The builder object gets "constructed" on "read" and "written" on write. So there is indeed a split into the two interfaces.

The container is then built from the builder by writing the read objects as docvals to the container class. For converting the str datetime objects from the builder to the docvals there are the two override functions in the NWBFileMap.

Therefore, In the end the container class is the same interface for read and write.

The timezone is then added within the container class. It is the only class that gets real datetime objects.

I think a graph which shows the dependencies for write/read interactions with the main classes and subclasses as well as thier interfaces would be highly needed.

@t-b
Copy link
Collaborator

t-b commented Oct 4, 2018

@ukos-git

I think a graph which shows the dependencies for write/read interactions with the main classes and subclasses as well as thier interfaces would be highly needed.

If you can do that in a finite and small amount of time. Go for it!

@ukos-git
Copy link
Contributor Author

ukos-git commented Oct 5, 2018

I can not reproduce the error. output on python 2.7, 3.6 and 3.7 is the following:

$ python docs/gallery/domain/icephys.py
/home/matthias/pynwb/local/lib/python2.7/site-packages/h5py/__init__.py:36: FutureWarning: Conversion of the second argument of issubdtype from `float` to `np.floating` is deprecated. In future, it will be treated as `np.float64 == np.dtype(float).type`.
  from ._conv import register_converters as _register_converters
before write
tzlocal()
2018-10-05T15:18:21.445020+02:00
tzlocal()
2018-10-05T15:18:21.446525+02:00
after read
tzoffset(None, 7200)
2018-10-05T15:18:21.445020+02:00
tzoffset(None, 7200)
2018-10-05T15:18:21.446525+02:00

Maybe the problem is related to Anaconda2. Maybe anaconda uses an old pynwb lib? Debugging the traceback with pudb would be good:

$ python -m pudb docs/gallery/domain/icephys.py

@t-b Should I test again on Windows or can you try?

@ukos-git ukos-git mentioned this pull request Oct 5, 2018
@ukos-git
Copy link
Contributor Author

ukos-git commented Oct 8, 2018

I have added a function for adding timezone information. This way it is easier to add the warning in a uniforma way.

I'm not sure why everything is acting differently on Windows. This is probably connected with the other error you get with your modified icephys file. I will install python for Windows for testing. I wish there was a way for holding windows10 in a docker file :-)

my h5dump by the way looks like this:

$ h5dump --dataset=/file_create_date --dataset=/session_start_time icephys_example.nwb
HDF5 "icephys_example.nwb" {
DATASET "/file_create_date" {
   DATATYPE  H5T_STRING {
      STRSIZE H5T_VARIABLE;
      STRPAD H5T_STR_NULLTERM;
      CSET H5T_CSET_UTF8;
      CTYPE H5T_C_S1;
   }
   DATASPACE  SIMPLE { ( 1 ) / ( 1 ) }
   DATA {
   (0): "2018-10-08T15:05:00.155631+02:00"
   }
}
DATASET "/session_start_time" {
   DATATYPE  H5T_STRING {
      STRSIZE H5T_VARIABLE;
      STRPAD H5T_STR_NULLTERM;
      CSET H5T_CSET_UTF8;
      CTYPE H5T_C_S1;
   }
   DATASPACE  SCALAR
   DATA {
   (0): "2018-10-08T15:05:00.152845+02:00"
   }
}
}

@ukos-git
Copy link
Contributor Author

ukos-git commented Oct 8, 2018

How should we deal with this test case error?

======================================================================
FAIL: test_build (ui_write.test_nwbfile.TestNWBFileIO)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/circleci/project/tests/integration/ui_write/base.py", line 49, in test_build
    self.assertDictEqual(result, self.builder)
AssertionError: 
-  'file_create_date': {'attributes': {}, 'data': ['2017-04-15T12:00:00+00:00']},
?                                                             ^   ------

+  'file_create_date': {'attributes': {}, 'data': ['2017-04-15 12:00:00']},
? 

@t-b
Copy link
Collaborator

t-b commented Oct 8, 2018

Should I test again on Windows or can you try?

I can don that. Or even better, let CI handle that.

@codecov
Copy link

codecov bot commented Oct 8, 2018

Codecov Report

Merging #641 into dev will decrease coverage by 0.06%.
The diff coverage is 70.83%.

Impacted file tree graph

@@            Coverage Diff             @@
##              dev     #641      +/-   ##
==========================================
- Coverage   75.21%   75.14%   -0.07%     
==========================================
  Files          58       58              
  Lines        6702     6720      +18     
  Branches     1365     1367       +2     
==========================================
+ Hits         5041     5050       +9     
- Misses       1270     1275       +5     
- Partials      391      395       +4
Impacted Files Coverage Δ
src/pynwb/form/spec/spec.py 80.46% <ø> (ø) ⬆️
src/pynwb/form/backends/hdf5/h5tools.py 69.29% <ø> (ø) ⬆️
src/pynwb/form/build/map.py 76.08% <100%> (+0.08%) ⬆️
src/pynwb/io/file.py 97.36% <100%> (+0.81%) ⬆️
src/pynwb/file.py 82.53% <41.66%> (-3.26%) ⬇️
src/pynwb/form/utils.py 87.66% <0%> (-0.67%) ⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 8a5fa0b...4303a88. Read the comment docs.

@ukos-git
Copy link
Contributor Author

ukos-git commented Oct 8, 2018

@t-b circleci is not passing and I don't know why.
I removed the arrow package, as I suspected that this was the source of the problem (no dpkg euqivivalent) but the error remains.

@ukos-git ukos-git changed the title WIP: Mk/datetime isoformat for datetime objects in hdf5 Oct 8, 2018
@t-b
Copy link
Collaborator

t-b commented Oct 8, 2018

@ukos-git I've no clue. Does the tests pass locally with at least one python version? If yes, let's pretend it has nothing to do with this PR and let's finish up with PR.

@oruebel
Copy link
Contributor

oruebel commented Oct 8, 2018

@dorukozturk could you have a look at the CircleCI error in this PR?

@t-b
Copy link
Collaborator

t-b commented Oct 9, 2018

@ukos-git Could you rebase onto newest dev branch now that #659 is merged?

Matthias Kastner added 3 commits October 9, 2018 14:04
Specification for `file_create_date` and `session_start_time` are
updated to only accept iso8601 date format.

since hdf5 does not accept np.datetime64 and datetime64 is
time-zone-naive, a new dtype `isodatetime` is introduced that
transformes datetime objects to string representation on write.

on read, the string has to be transformed back. The builder
representation has the bare hdf5object. The Main File Container Class
now only accepts datetime objects. Therefore, the transformation between
string and datetime is done in the NWBFileMap definition as an override
function.

The File Class adds the local timezone object if no object was given as
input.

date conversion from datetime to iso string using datetime.isoformat()
conversion back from iso string to datetime is done using the parse
function fo the dateutils library.  `datetime.fromisoformat` would be
better here but is only available in python3.7 and is missing in
python2.7 (2018-10-03)

description of session_start_time and file_create_date was updated to
allow UTC timezone ("Z") and an accuracy up to milliseconds (note that
np.datetime64 uses nanoseconds)

unit tests with string inputs were updated to get datetime objects.

a function is added for converting datetime objects with missing
timezones.  A warning is raised when a datetime object is used with
missing timezone information.
currently, a missing timezone throws a warning to stderr. All unit tests
therefore have to be initiated with a valid timezone objects. This is
also good practice for the documentation.
simple unit test for the container creation with lists in datetime
object.
@ukos-git
Copy link
Contributor Author

ukos-git commented Oct 9, 2018

now waiting for #661

@oruebel
Copy link
Contributor

oruebel commented Oct 10, 2018

On a separate note, when implementing changes that affect other parts of our ecosystem, it will be useful if you could create corresponding issue tickets so that folks are aware of the changes. In this case, the addition of the isodatetime dtype in the schema language affects both MatNWB and the nwb-schema docs. I just created the following two issues for this:

NeurodataWithoutBorders/nwb-schema#195
NeurodataWithoutBorders/matnwb#74

@t-b
Copy link
Collaborator

t-b commented Oct 10, 2018

@oruebel Thanks for the explanation and creating of the issues.

Copy link
Contributor

@oruebel oruebel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks in general good to me.

- Date accuracy is up to milliseconds.
- The file can be created after the experiment was run, so this may
differ from the experiment start time.
- Each modification to the nwb file adds a new entry to the array.'
dims:
- '*unlimited*'
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this should be something like:

dims:
- timestamps

if not isinstance(date, datetime):
raise ValueError("require datetime object")
if date.tzinfo is None:
warn("Date is missing timezone information. Updating to local timezone.")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it would be good to say in the warning text what the local timezone.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants