Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WIP: Reader that does not need open file handles (#239) #926

Closed
wants to merge 2 commits into from

Conversation

richardjgowers
Copy link
Member

@richardjgowers richardjgowers commented Aug 3, 2016

Adds a new base Reader class which doesn't need to maintain an open file handle. Should fix problems related to too many open file handles.

Explanation

Readers have 4 "front doors", (__getitem__, next, __iter__, rewind) which are publicly accessible methods to (indirectly) manipulate the file handle.

These front doors filter through to various backend methods in the base Reader (_sliced_iter, _full_iter and _goto_frame)

These backend methods then create the open file handle and call things in the individual implementations (eg XYZReader._read_frame)

Individual implementations are oblivious to the file handle manipulation above them, they just assume that a handle exists in a known location (ie this needs to be unified in API).

EDIT kain88-de: issue #239 (comment)

@richardjgowers
Copy link
Member Author

Also fixes #875

@richardjgowers
Copy link
Member Author

So XYZ currently fails a single test to do with the StreamIO source of data. This probably requires a little thought because a stream by definition will always remain "open", so can't work with this open/close style reader.

@richardjgowers
Copy link
Member Author

Pushed some changes for DCD. This isn't working yet because the file handle isn't used at a Python level, so again more thought in how to make manipulations to the file handle work with C extensions (see also XDR).

@kain88-de
Copy link
Member

Can you do the XDR fix first. I set it up to act as close as possible to a filehandle object.

@kain88-de kain88-de mentioned this pull request Aug 8, 2016
12 tasks
@kain88-de
Copy link
Member

@richardjgowers why do you create the extra super filehandle class here? Wouldn't it be enough to create a decorator with_open_file and use this on all function that should open and close a filehandle. Then you can even use the open/close functions as defined by the readers, without caring how they call their filehandle object.

def with_open_file(self, f):
    with self.open():
        f()

@with_open_file
def __iter__(self):
    ...

I might be missing some edgecases with this idea but I don't see any reason why it shouldn't be possible to do it like that.

@kain88-de
Copy link
Member

Why does this change need a function like _bytes_seek in the XDRFile class? You mentions this in #929.

@richardjgowers
Copy link
Member Author

@kain88-de I need to save the file handle position somewhere (ie the last tell), so the class was mostly being used for that, and having the file logic all in one place.

I had a read, the current seek and tell should still work for XDR using frame numbers not byte offsets

I'll have a look if I can simplify things, the main reason there's more functions is so that the files are only opened once, for example __getitem__ can call __iter__ which would trigger the decorator twice potentially.

@kain88-de
Copy link
Member

I had a read, the current seek and tell should still work for XDR using frame numbers not byte offsets

Good news.

I'll have a look if I can simplify things, the main reason there's more functions is so that the files are only opened once, for example getitem can call iter which would trigger the decorator twice potentially.

Good point I haven't thought about that issue.

@dotsdl
Copy link
Member

dotsdl commented Aug 14, 2016

@richardjgowers, @kain88-de I'm so happy to see this. :D

@@ -1330,6 +1330,95 @@ def __del__(self):
self.close()


class NewReader(ProtoReader):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the plan to have this class completely replace Reader? If so, is it suppose to take over the Reader name? If not, having "new" in the name is not a smart move, at some point it will become the old one...

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's just a temporary thing while I try and see what's possible... hopefully it will be the new Reader class across everything

@orbeckst orbeckst changed the title Start of Issue #239 WIP: Reader that does not need open file handles (#239) Nov 16, 2016
Added NewReader which doesn't maintain an open filehandle

XYZReader changed to use NewReader
@richardjgowers
Copy link
Member Author

So I'm taking another swing at this. I've got the XYZReader working so it only opens a filehandle when you use it, and only opens the file a single time when iterating.

I'll try and extend this to the XDR stuff next for a more complicated case.

@richardjgowers
Copy link
Member Author

@kain88-de is there a reason that the XTCFile object has to always remain open? I think I'm going to have to change this so that it doesn't keep the file handle open

Can I take out the ability to open different files with the same XTCFile object? It seems to complicate things and isn't how python's regular file object work either.

Copy link
Member

@kain88-de kain88-de left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you leave the is_open int private and add a property is open that returns a true python bool.

@kain88-de
Copy link
Member

@kain88-de is there a reason that the XTCFile object has to always remain open? I think I'm going to have to change this so that it doesn't keep the file handle open

I copied most file system behavior. Once you open a file it stays open until closed explicitly. That allows filesystems to work with buffers on files. A close/open always causes a flush to the filesystem. Which is slow and can even stop filesystem operations all together on large parallel cluster file systems. But your current change for an explicit open is OK.

Can I take out the ability to open different files with the same XTCFile object? It seems to complicate things and isn't how python's regular file object work either.

Sure thing that can be removed.

@@ -318,7 +322,10 @@ cdef class _XDRFile:
set_offsets
"""
if not self._has_offsets:
# wrap this in try/except to ensure it gets closed?
self.open(self.fname, self._mode)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why not

with self:
    ...

that might work and should close the file when an IOError is thrown

@@ -308,7 +308,7 @@ def __init__(self, filename, **kwargs):
# etc.
# (Also cannot just use seek() or reset() because that would break
# with urllib2.urlopen() streams)
self._read_next_timestep()
self.next()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For python 3 compatibility this should be next(self).

@kain88-de kain88-de deleted the issue-239-filehandles branch May 21, 2018 08:38
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants