-
-
Notifications
You must be signed in to change notification settings - Fork 40
Conversation
Whilst I don't see any reason not to implement these, I am surprised that you find |
pandas check for with hdfs.open(path_to_csv) as f:
df = pd.read_csv(f) I will look into readline and tests! |
hdfs3/core.py
Outdated
""" Enables reading a file as a buffer in pandas """ | ||
return next(self._genline()) | ||
|
||
def next(self): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
next = __next__
is simpler.
hdfs3/core.py
Outdated
@@ -672,6 +672,14 @@ def __iter__(self): | |||
""" Enables `for line in file:` usage """ | |||
return self._genline() | |||
|
|||
def __next__(self): | |||
""" Enables reading a file as a buffer in pandas """ | |||
return next(self._genline()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you care about performance, you could call readline() directly:
def __next__(self):
out = self.readline()
if out:
return out
else:
raise StopIteration
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note that readline()
is known to perform poorly (and seems somewhat odd for binary files), so I hope pandas does not really call this repeatedly. If it does, wrapping the file in io.TextIOWrapper
is probably far better.
@@ -672,6 +672,14 @@ def __iter__(self): | |||
""" Enables `for line in file:` usage """ | |||
return self._genline() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps this can now be replaced with return self
?
Thanks for the comment. I've updated @martindurant As for wrapping in On the side: can this be merged in the meantime? It would be very helpful to have this feature. |
The wrapper should work like
where Pandas would now see a text-mode file with buffering and correct line-end handling. I would merge, but there ought to be some test of the new method(s). I notice, also, that |
This allows for libraries as pandas to read a file as a buffer.
@martindurant I've wrote an additional test. BTW, pandas is using |
Cool, thank you. |
This allows for libraries as pandas to read a file as a buffer.